Download Deductive and Inductive Probabilistic Programming

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Deductive and Inductive Probabilistic Programming
Fabrizio Riguzzi
F. Riguzzi
Deductive and Inductive PP
1 / 43
Outline
Probabilistic programming
Probabilistic logic programming
Inference
Learning
Applications
F. Riguzzi
Deductive and Inductive PP
2 / 43
Probabilistic Programming
Users specify a probabilistic model in its entirety (e.g., by writing
code that generates a sample from the joint distribution) and
inference follows automatically given the specification.
PP languages provide the full power of modern programming
languages for describing complex distributions
Reuse of libraries of models
Interactive modeling
Abstraction
F. Riguzzi
Deductive and Inductive PP
3 / 43
Probabilistic Programming
F. Riguzzi
Deductive and Inductive PP
4 / 43
Probabilistic Programming Languages
Name
Venture
Probabilistic-C
Anglican
IBAL
PRISM
Infer.NET
dimple
chimple
BLOG
PSQL
BUGS
FACTORIE
PMTK
Alchemy
Dyna
Extends from
Scheme
C
Scheme
OCaml
B-Prolog
.NET Framework
MATLAB, Java
MATLAB, Java
Java
SQL
Scala
MATLAB
C++
Prolog
Host language
C++
C
Clojure
.NET Framework
MATLAB
Name
Figaro
Church
ProbLog
ProBT
Stan (software)
Hakaru
BAli-Phy (software)
ProbCog
Gamble
Tuffy
PyMC
Lea
WebPPL
Picture
Turing.jl
Extends from
Scala
Scheme
Prolog
C++, Python
Haskell
Haskell
Python
Python
JavaScript
Julia
Julia
Host language
JavaScript, Scheme
Python, Jython
C++
Haskell
C++
Java, Python
Racket
Java
Python
Python
JavaScript
Julia
Julia
Source https://en.wikipedia.org/wiki/Probabilistic_
programming_language
F. Riguzzi
Deductive and Inductive PP
5 / 43
Probabilistic Programming Languages
Only three are logic, the others imperative/functional/object
oriented
DARPA released in 2013 the funding call “Probabilistic
Programming for Advancing Machine Learning” (PPAML)
Aim: develop probabilistic programming languages and
accompanying tools to facilitate the construction of new machine
learning applications across a wide range of domains.
Focus: functional PP
F. Riguzzi
Deductive and Inductive PP
6 / 43
Probabilistic Logic Programming
What are we missing?
Is logic programming to blame?
F. Riguzzi
Deductive and Inductive PP
7 / 43
Thesis
Probabilistic logic programming is alive and kicking!
F. Riguzzi
Deductive and Inductive PP
8 / 43
Strengths
Relationships are first class citizens
Conceptually easier to lift
Strong semantics
Inductive systems
F. Riguzzi
Deductive and Inductive PP
9 / 43
Weaknesses
Handling non-termination
Continuous variables
F. Riguzzi
Deductive and Inductive PP
10 / 43
Non-termination
Possible when the number of explanations for the query is infinite
F. Riguzzi
Deductive and Inductive PP
11 / 43
Non-termination: Inducing Arithmetic Functions
Church code http://forestdb.org/models/arithmetic.html
(define (random-arithmetic-fn)
(if (flip 0.3)
(random-combination (random-arithmetic-fn)
(random-arithmetic-fn))
(if (flip)
(lambda (x) x)
(random-constant-fn))))
(define (random-combination f g)
(define op (uniform-draw (list + -)))
(lambda (x) (op (f x) (g x))))
(define (random-constant-fn)
(define i (sample-integer 10))
(lambda (x) i))
F. Riguzzi
Deductive and Inductive PP
12 / 43
Non-termination: Inducing Arithmetic Functions
LPAD (cplint) code http://cplint.lamping.unife.it/
example/inference/arithm.pl
eval(X,Y):random_fn(X,0,F),
Y is F.
op(L,+):0.5;op(L,-):0.5.
random_fn(X,L,F):comb(L), random_fn(X,l(L),F1), random_fn(X,r(L),F2),
op(L,Op), F=..[Op,F1,F2].
random_fn(X,L,F):\+ comb(L), base_random_fn(X,L,F).
comb(_):0.3.
base_random_fn(X,L,X):- identity(L).
base_random_fn(_X,L,C):- \+ identity(L), random_const(L,C).
identity(_):0.5.
random_const(_,C):discrete(C,[0:0.1,1:0.1,2:0.1,3:0.1,4:0.1,
5:0.1,6:0.1,7:0.1,8:0.1,9:0.1]).
F. Riguzzi
Deductive and Inductive PP
13 / 43
Non-termination: Inducing Arithmetic Functions
Aim: given observations of couples input-output for the random
function, predict the output for a new input
Arbitrarily complex functions have a non-zero probability of being
selected
The program has a non-terminating execution.
Exact inference: infinite number of explanations
F. Riguzzi
Deductive and Inductive PP
14 / 43
Non-termination: Inducing Arithmetic Functions
(define (sample)
(rejection-query
(define my-proc (random-arithmetic-fn))
(my-proc 2)
(= (my-proc 1) 3)))
(hist (repeat 100 sample))
F. Riguzzi
Deductive and Inductive PP
15 / 43
Solution
Use (T. Sato, P. Meyer, Infinite probability computation by cyclic
explanation graphs, Theor. Pract. Log. Prog. 14 2014) or (A.
Gorlin, C. R. Ramakrishnan, S. A. Smolka, Model checking with
proba- bilistic tabled logic programming, Theor. Pract. Log. Prog.
12 (4-5) 2012)
or resort to sampling: with the increase of complexity, the
probability of functions tend to 0 and the probability of the infinite
trace is 0
Metropolis Hastings: (Nampally, A., Ramakrishnan, C.: Adaptive
MCMC-based inference in probabilistic logic programs. arXiv
preprint arXiv:1403.6036 2014)
Monte Carlo sampling is attractive for the simplicity of its
implementation and because you can improve the estimate as
more time is available.
F. Riguzzi
Deductive and Inductive PP
16 / 43
Monte Carlo
The disjunctive clause
Cr = H1 : α1 ∨ . . . ∨ Hn : αn ← L1 , . . . , Lm .
is transformed into the set of clauses MC(Cr )
MC(Cr , 1) = H1 ← L1 , . . . , Lm , sample_head(n, r , VC, NH), NH = 1.
...
MC(Cr , n) = Hn ← L1 , . . . , Lm , sample_head(n, r , VC, NH), NH = n.
Sample truth value of query Q:
...
(call(Q)->
...
F. Riguzzi
NT1 is NT+1 ; NT1 =NT),
Deductive and Inductive PP
17 / 43
Metropolis-Hastings MCMC
A Markov chain is built by taking an initial sample and by
generating successor samples.
The initial sample is built by randomly sampling choices so that
the evidence is true.
A successor sample is obtained by deleting a fixed number of
sampled probabilistic choices.
Then the evidence is queried
If the query succeeds, the goal is queried.
0
The sample is accepted with a probability of min{1, N
N1 } where N0
(N1 ) is the number of choices sampled in the previous (current)
sample.
F. Riguzzi
Deductive and Inductive PP
18 / 43
Solution
In cplint:
?- mc_mh_sample(eval(2,4),eval(1,3),100,100,3,T,F,P).
Probability of eval(2,4) given that eval(1,3) is true
F = 90,
T = 10,
P = 0.1
You can also try rejection sampling (usually slower)
?- mc_rejection_sample(eval(2,4),eval(1,3),100,
T,F,P).
F. Riguzzi
Deductive and Inductive PP
19 / 43
Solution
You may be interested in the distribution of the output
In cplint:
?- mc_mh_sample_arg_bar(eval(2,Y),eval(1,3),100,
100,3,Y,V).
F. Riguzzi
Deductive and Inductive PP
20 / 43
Solution
You may be interested in the expected value of the output
In cplint:
?- mc_mh_expectation(eval(2,Y),eval(1,3),
100,100,3,Y,E).
E = 3.21
F. Riguzzi
Deductive and Inductive PP
21 / 43
Continuous Random Variables
Distributional clauses (B. Gutmann, I. Thon, A. Kimmig, M.
Bruynooghe, and L. De Raedt, “The magic of logical inference in
probabilistic programming,” Theory and Practice of Logic
Programming, 2011)
Gaussian mixture model in cplint:
heads:0.6;tails:0.4.
g(X): gaussian(X,0, 1).
h(X): gaussian(X,5, 2).
mix(X) :- heads, g(X).
mix(X) :- tails, h(X).
F. Riguzzi
Deductive and Inductive PP
22 / 43
Continuous Random Variables
Inference by sampling
Without evidence or evidence on discrete random variables, you
can reuse the same methods
Sampling arguments of goals for building a probability density of
the arguments.
F. Riguzzi
Deductive and Inductive PP
23 / 43
Gaussian Mixture Model
heads:0.6;tails:0.4.
g(X): gaussian(X,0, 1).
h(X): gaussian(X,5, 2).
mix(X) :- heads, g(X).
mix(X) :- tails, h(X).
?- mc_sample_arg(mix(X),10000,X,L0),
histogram(L0,40,Chart).
F. Riguzzi
Deductive and Inductive PP
24 / 43
Evidence on Continuous Random Variables
You cannot use rejection sampling or Metropolis-Hastings, as the
probability of the evidence is 0
You can use likelihood weighting to obtain samples of continuous
arguments of a goal. (Nitti, D., De Laet, T., De Raedt, L.:
Probabilistic logic programming for hybrid relational domains.
Mach. Learn. 103(3), 407-449 2016)
F. Riguzzi
Deductive and Inductive PP
25 / 43
Likelihood Weighting
For each sample to be taken, likelihood weighting samples the
query and then assigns a weight to the sample on the basis of
evidence.
The weight is computed by deriving the evidence backward in the
same sample of the query starting with a weight of one
Each time a choice should be taken or a continuous variable
sampled, if the choice/variable has already been taken, the
current weight is multiplied by probability of the choice/by the
density value of the continuous value.
F. Riguzzi
Deductive and Inductive PP
26 / 43
Bayesian Estimation
Problem from
http://www.robots.ox.ac.uk/~fwood/anglican/
examples/viewer/?worksheet=gaussian-posteriors
Estimate the true value of a Gaussian distributed random variable,
given some observed data.
The variance is known and we suppose that the mean has itself a
Gaussian distribution with mean 1 and variance 5 (prior on the
parameter)
We take different measurement (e.g. at different times), indexed
with an integer.
F. Riguzzi
Deductive and Inductive PP
27 / 43
Bayesian Estimation
Anglican code
(def dataset [9 8])
(defquery gaussian-model [data]
(let [mu (sample (normal 1 (sqrt 5)))
sigma (sqrt 2)]
(doall (map (fn [x] (observe
(normal mu sigma) x)) data))
mu))
(def posterior
((conditional gaussian-model :smc :number-of-particles 10) dataset))
(def posterior-samples (repeatedly 20000 #(sample* posterior)))
F. Riguzzi
Deductive and Inductive PP
28 / 43
Bayesian Estimation
cplint code http://cplint.lamping.unife.it/
example/inference/gauss_mean_est.pl
value(I,X) :mean(M),
value(I,M,X).
mean(M): gaussian(M,1.0, 5.0).
value(_,M,X): gaussian(X,M, 2.0).
?- mc_sample_arg(value(0,Y),10000,Y,L0),
mc_lw_sample_arg(value(0,X),(value(1,9),value(2,8)),10000,X,L),
densities(L0,L,40,Chart).
F. Riguzzi
Deductive and Inductive PP
29 / 43
Learning
Parameter learning
Structure learning more developed for PLP, but
(Perov, Yura N., and Frank D. Wood. Learning Probabilistic
Programs. arXiv preprint arXiv:1407.2646 2014).
(Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B.
Tenenbaum. Human-level concept learning through probabilistic
program induction. Science 350.6266 2015).
(Gaunt, Alexander L., et al. TerpreT: A Probabilistic Programming
Language for Program Induction. arXiv preprint arXiv:1608.04428
2016).
F. Riguzzi
Deductive and Inductive PP
30 / 43
Parameter Learning
Problem: given a set of interpretations, a program, find the
parameters maximizing the likelihood of the interpretations (or of
instances of a target predicate)
Exploit the equivalence with BN to use BN learning algorithms
The interpretations record the truth value of ground atoms, not of
the choice variables
Unseen data: relative frequency can’t be used
F. Riguzzi
Deductive and Inductive PP
31 / 43
Parameter Learning
(Thon et al. ECML 2008) proposed an adaptation of EM for
CPT-L, a simplified version of LPADs
The algorithm computes the counts efficiently by repeatedly
traversing the BDDs representing the explanations
(Ishihata et al. ILP 2008) independently proposed a similar
algorithm
LFI-P ROB L OG (Gutamnn et al. ECML 2011) is the adaptation of
EM to ProbLog
EMBLEM (Riguzzi & Bellodi IDAJ 2013) adapts (Ishihata et al. ILP
2008) to LPADs
F. Riguzzi
Deductive and Inductive PP
32 / 43
Structure Learning
Given a trivial LPAD or an empty one, a set of interpretations
(data)
Find the model and the parameters that maximize the probability
of the data (log-likelihood)
SLIPCOVER: Structure LearnIng of Probabilistic logic program by
searching OVER the clause space
1
2
Beam search in the space of clauses to find the promising ones
Greedy search in the space of probabilistic programs guided by the
LL of the data.
Parameter learning by means of EMBLEM
F. Riguzzi
Deductive and Inductive PP
33 / 43
Applications
Link prediction: given a (social) network, compute the probability
of the existence of a link between two entities (UWCSE)
advisedby(X, Y) :0.3 :publication(P, X),
publication(P, Y),
student(X).
F. Riguzzi
Deductive and Inductive PP
34 / 43
Applications
Classify web pages on the basis of the link structure (WebKB)
coursePage(Page1): 0.3 :- linkTo(Page2,Page1),coursePage(Page2).
coursePage(Page1): 0.3 :- linkTo(Page2,Page1),facultyPage(Page2).
...
coursePage(Page): 0.3 :- has(’abstract’,Page).
...
F. Riguzzi
Deductive and Inductive PP
35 / 43
Applications
Entity resolution: identify identical entities in text or databases
samebib(A,B):0.3 :samebib(A,C), samebib(C,B).
sameauthor(A,B):0.3 :sameauthor(A,C), sameauthor(C,B).
sametitle(A,B):0.3 :sametitle(A,C), sametitle(C,B).
samevenue(A,B):0.3 :samevenue(A,C), samevenue(C,B).
samebib(B,C):0.3 :author(B,D),author(C,E),sameauthor(D,E).
samebib(B,C):0.3 :title(B,D),title(C,E),sametitle(D,E).
samebib(B,C):0.3 :venue(B,D),venue(C,E),samevenue(D,E).
samevenue(B,C):0.3 :haswordvenue(B,word_06),
haswordvenue(C,word_06).
...
F. Riguzzi
Deductive and Inductive PP
36 / 43
Applications
Chemistry: given the chemical composition of a substance,
predict its mutagenicity or its carcenogenicity
active(A):0.5 :atm(A,B,c,29,C),
gteq(C,-0.003),
ring_size_5(A,D).
active(A):0.5 :lumo(A,B), lteq(B,-2.072).
active(A):0.5 :bond(A,B,C,2),
bond(A,C,D,1),
ring_size_5(A,E).
active(A):0.5 :carbon_6_ring(A,B).
active(A):0.5 :anthracene(A,B).
...
F. Riguzzi
Deductive and Inductive PP
37 / 43
Applications
Medicine: diagnose diseases on the basis of patient information
(Hepatitis), influence of genes on HIV, risk of falling of elderly
people (FFRAT)
F. Riguzzi
Deductive and Inductive PP
38 / 43
Experiments - Area Under the PR Curve
System
SLIPCOVER
SLIPCASE
LSM
ALEPH++
RDN-B
MLN-BT
MLN-BC
BUSL
F. Riguzzi
HIV
0.82 ± 0.05
0.78 ± 0.05
0.37 ± 0.03
0.28 ± 0.06
0.29 ± 0.04
0.51 ± 0.04
0.38 ± 0.03
UW-CSE
0.11 ± 0.08
0.03 ± 0.01
0.07 ± 0.02
0.05 ± 0.01
0.28 ± 0.06
0.18 ± 0.07
0.06 ± 0.01
0.01 ± 0.01
Deductive and Inductive PP
Mondial
0.86 ± 0.07
0.65 ± 0.06
0.87 ± 0.07
0.77 ± 0.07
0.74 ± 0.10
0.59 ± 0.09
-
39 / 43
Experiments - Area Under the PR Curve
System
SLIPCOVER
SLIPCASE
LSM
ALEPH++
RDN-B
MLN-BT
MLN-BC
BUSL
F. Riguzzi
Carcinogenesis
0.60
0.63
0.74
0.55
0.50
0.62
-
Mutagenesis
0.95 ± 0.01
0.92 ± 0.08
0.95 ± 0.01
0.97 ± 0.03
0.92 ± 0.09
0.69 ± 0.20
-
Deductive and Inductive PP
Hepatitis
0.80 ± 0.01
0.71 ± 0.05
0.53 ± 0.04
0.88 ± 0.01
0.78 ± 0.02
0.79 ± 0.02
0.51 ± 0.03
40 / 43
PLP Online
http://cplint.lamping.unife.it/
Inference (knwoledge compilation, Monte Carlo)
Parameter learning (EMBLEM)
Structure learning (SLIPCOVER)
http://www.cs.kuleuven.be/~dtai/problog/
Inference (knwoledge compilation, Monte Carlo)
Parameter learning (LFI-ProbLog)
F. Riguzzi
Deductive and Inductive PP
41 / 43
Conclusions
PLP is still a fertile field but...
...we must look at other communities and build bridges and...
...join forces!
Much is left to do:
Tractable sublanguages (see following talk)
Lifted inference
Structure/Parameter learning (also for programs with continuous
variables)
F. Riguzzi
Deductive and Inductive PP
42 / 43
F. Riguzzi
Deductive and Inductive PP
43 / 43
Related documents