Download Probabilistic Inference in PRISM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Probabilistic Inference in PRISM
Taisuke Sato
Tokyo Institute of Technology
Problem
• Statistical machine learning is a labor-intensive process:
{modeling  learning  evaluation}* of trial-and-error
• Pains of deriving and implementing model-specific learning
algorithms and model-specific probabilistic inference
Model 1
Model 2
EM2
EM1
EM
...
Model n
...
MCMC
EMn
VB
model-specific learning algorithms
Our solution
• Develop a high-level modeling language that offers universal
learning and inference methods applicable to every model
Model 1
Model 2
...
Model n
modeling language
EM
VB
...
MCMC
• The user concentrates on modeling and the rest (learning
and inference) is taken care of by the system
PRISM (http://sato-www.cs.titech.ac.jp/prism/)
• Logic-based high-level modeling language
Probabilistic models
Bayesian
network
HMM
New
model
PCFG
...
PRISM system
EM/MAP
VT
VBVT
VB
MCMC
Learning methods
• Its generic inference/learning methods subsume standard algorithms
such as FB for HMMs and BP for Bayesian networks
Basic ideas
• Semantics
• program = Turing machine + probabilistic choice
+ Dirichlet prior
• denotation = a probability measure over possible worlds
• Propositionalized probability computation (PPC)
• programs written at predicate logic level
• probability computation at propositional logic level
• Dynamic programming for PPC
• proof search generates a directed graph (explanation graph)
• Probabilities are computed from bottom to top in the graph
• Discriminative use
• generatively define a model by a PRISM program and
descriminatively use it for better prediction performance
ABO blood type program
values(abo,[a,b,o],[0.5,0.2,0.3]).
msw(abo,a) is true with prob. 0.5
btype(X):- gtype(Gf,Gm), pg_table(X,[Gf,Gm]).
pg_table(X,GT):((X=a;X=b),(GT=[X,o];GT=[o,X];GT=[X,X])
; X=o,GT=[o,o]
; X=ab,(GT=[a,b];GT=[b,a])).
gtype(Gf,Gm):- msw(abo,Gf),msw(abo,Gm).
father
a
mother
b
probabilistic primitives simulate gene inheritance
from father (left) and mother (right)
a o
AB
b
A
o
child
B
Propositionalized probability computation
0.55
btype(a)0.25
0.15
0.15
<=> gtype(a,a) v gtype(a,o) v gtype(o,a)
0.25
0.5
0.5
0.15
0.5
0.3
0.15
0.3
0.5
gtype(a,a) <=> msw(abo,a) & msw(abo,a)
Explanation graph for btype(a)
that explains how btype(a) is
proved by probabilistic choice
made by msw-atoms
gtype(a,o) <=> msw(abo,a) & msw(abo,o)
gtype(o,a) <=> msw(abo,o) & msw(abo,a)
PPC+DP subsumes
forward-backward, belief
propagation, insideoutside computation
Sum-product computation of
probabilities in a bottom-up
manner using probabilities
assigned to msw atoms
Expl. graph is acyclic and
dynamic programming (DP)
is possible
Learning
• A program defines a joint distributionP(x,y|q) where x hidden
and y observed
• P(msw(abo,a),..btype(a),… |qa,qb,qo) where qa+qb+qo=1
• Learning q from observed data y by maximizing
• P(y|q)  MLE/MAP
• P(x*,y|q) where x* = argmax_x P(x,y|q)  VT
• From a Bayesian point of view, a program defines marginal
likelihood ∫P(x,y|q,a) dq
• We wish to compute
• predictive distribution = ∫P(x|y,q,a) dq
• marginal likelihood P(y|a) = Sx∫P(x,y|q,a) dq
• Both need approximation
• Variational Bayes (VB)  VB, VB-VT
• MCMC  Metropolis-Hastings
Sample session 1
- Expl. graph and prob. computation
built-in predicate
| ?- prism(blood)
loading::blood.psm.out
| ?- show_sw
Switch gene: unfixed_p: a (p: 0.500000000) b (p: 0.200000000) o (p: 0.300000000)
| ?- probf(btype(a))
btype(a)
<=> gtype(a,a) v gtype(a,o) v gtype(o,a)
gtype(a,a)
<=> msw(gene,a) & msw(gene,a)
gtype(a,o)
<=> msw(gene,a) & msw(gene,o)
gtype(o,a)
<=> msw(gene,o) & msw(gene,a)
| ?- prob(btype(a),P)
P = 0.55
Sample session 2
- MLE and Viterbi inference
| ?- D=[btype(a),btype(a),btype(ab),btype(o)],learn(D)
Exporting switch information to the EM routine ... done
#em-iters: 0(4) (Converged: -4.965121886)
Statistics on learning:
Graph size: 18
Number of switches: 1
Number of switch instances: 3
Number of iterations: 4
Final log likelihood: -4.965121886
| ?- prob(btype(a),P)
P = 0.598211
| ?- viterbif(btype(a))
btype(a) <= gtype(a,a)
gtype(a,a) <= msw(gene,a) & msw(gene,a)
Sample session 3
- Bayes inference by MCMC
| ?- D=[btype(a), btype(a), btype(ab), btype(o)],
marg_mcmc_full(D,[burn_in(1000),end(10000),skip(5)],[VFE,ELM]), marg_exact(D,LogM)
VFE = -5.54836
ELM = -5.48608
LogM = -5.48578
|?- D=[btype(a), btype(a), btype(ab) ,btype(o)], predict_mcmc_full(D,[btype(a)],[[_,E,_]]),
print_graph(E,[lr('<=')])
btype(a) <= gtype(a,a)
gtype(a,a) <= msw(gene,a) & msw(gene,a)
Summary
• PRISM = Probabilistic Prolog for statistical machine learning
• Forward sampling
• Exact probability computation
• Parameter learning
• MLE/MAP,
• VT
• Bayesian inference
• VB
• VBVT
• MCMC
• Viterbi inference
• model core (BIC,Cheesman-Stutz,VFE)
• smoothing
• Current version 2.1
Related documents