Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Deductive and Inductive Probabilistic Programming Fabrizio Riguzzi F. Riguzzi Deductive and Inductive PP 1 / 43 Outline Probabilistic programming Probabilistic logic programming Inference Learning Applications F. Riguzzi Deductive and Inductive PP 2 / 43 Probabilistic Programming Users specify a probabilistic model in its entirety (e.g., by writing code that generates a sample from the joint distribution) and inference follows automatically given the specification. PP languages provide the full power of modern programming languages for describing complex distributions Reuse of libraries of models Interactive modeling Abstraction F. Riguzzi Deductive and Inductive PP 3 / 43 Probabilistic Programming F. Riguzzi Deductive and Inductive PP 4 / 43 Probabilistic Programming Languages Name Venture Probabilistic-C Anglican IBAL PRISM Infer.NET dimple chimple BLOG PSQL BUGS FACTORIE PMTK Alchemy Dyna Extends from Scheme C Scheme OCaml B-Prolog .NET Framework MATLAB, Java MATLAB, Java Java SQL Scala MATLAB C++ Prolog Host language C++ C Clojure .NET Framework MATLAB Name Figaro Church ProbLog ProBT Stan (software) Hakaru BAli-Phy (software) ProbCog Gamble Tuffy PyMC Lea WebPPL Picture Turing.jl Extends from Scala Scheme Prolog C++, Python Haskell Haskell Python Python JavaScript Julia Julia Host language JavaScript, Scheme Python, Jython C++ Haskell C++ Java, Python Racket Java Python Python JavaScript Julia Julia Source https://en.wikipedia.org/wiki/Probabilistic_ programming_language F. Riguzzi Deductive and Inductive PP 5 / 43 Probabilistic Programming Languages Only three are logic, the others imperative/functional/object oriented DARPA released in 2013 the funding call “Probabilistic Programming for Advancing Machine Learning” (PPAML) Aim: develop probabilistic programming languages and accompanying tools to facilitate the construction of new machine learning applications across a wide range of domains. Focus: functional PP F. Riguzzi Deductive and Inductive PP 6 / 43 Probabilistic Logic Programming What are we missing? Is logic programming to blame? F. Riguzzi Deductive and Inductive PP 7 / 43 Thesis Probabilistic logic programming is alive and kicking! F. Riguzzi Deductive and Inductive PP 8 / 43 Strengths Relationships are first class citizens Conceptually easier to lift Strong semantics Inductive systems F. Riguzzi Deductive and Inductive PP 9 / 43 Weaknesses Handling non-termination Continuous variables F. Riguzzi Deductive and Inductive PP 10 / 43 Non-termination Possible when the number of explanations for the query is infinite F. Riguzzi Deductive and Inductive PP 11 / 43 Non-termination: Inducing Arithmetic Functions Church code http://forestdb.org/models/arithmetic.html (define (random-arithmetic-fn) (if (flip 0.3) (random-combination (random-arithmetic-fn) (random-arithmetic-fn)) (if (flip) (lambda (x) x) (random-constant-fn)))) (define (random-combination f g) (define op (uniform-draw (list + -))) (lambda (x) (op (f x) (g x)))) (define (random-constant-fn) (define i (sample-integer 10)) (lambda (x) i)) F. Riguzzi Deductive and Inductive PP 12 / 43 Non-termination: Inducing Arithmetic Functions LPAD (cplint) code http://cplint.lamping.unife.it/ example/inference/arithm.pl eval(X,Y):random_fn(X,0,F), Y is F. op(L,+):0.5;op(L,-):0.5. random_fn(X,L,F):comb(L), random_fn(X,l(L),F1), random_fn(X,r(L),F2), op(L,Op), F=..[Op,F1,F2]. random_fn(X,L,F):\+ comb(L), base_random_fn(X,L,F). comb(_):0.3. base_random_fn(X,L,X):- identity(L). base_random_fn(_X,L,C):- \+ identity(L), random_const(L,C). identity(_):0.5. random_const(_,C):discrete(C,[0:0.1,1:0.1,2:0.1,3:0.1,4:0.1, 5:0.1,6:0.1,7:0.1,8:0.1,9:0.1]). F. Riguzzi Deductive and Inductive PP 13 / 43 Non-termination: Inducing Arithmetic Functions Aim: given observations of couples input-output for the random function, predict the output for a new input Arbitrarily complex functions have a non-zero probability of being selected The program has a non-terminating execution. Exact inference: infinite number of explanations F. Riguzzi Deductive and Inductive PP 14 / 43 Non-termination: Inducing Arithmetic Functions (define (sample) (rejection-query (define my-proc (random-arithmetic-fn)) (my-proc 2) (= (my-proc 1) 3))) (hist (repeat 100 sample)) F. Riguzzi Deductive and Inductive PP 15 / 43 Solution Use (T. Sato, P. Meyer, Infinite probability computation by cyclic explanation graphs, Theor. Pract. Log. Prog. 14 2014) or (A. Gorlin, C. R. Ramakrishnan, S. A. Smolka, Model checking with proba- bilistic tabled logic programming, Theor. Pract. Log. Prog. 12 (4-5) 2012) or resort to sampling: with the increase of complexity, the probability of functions tend to 0 and the probability of the infinite trace is 0 Metropolis Hastings: (Nampally, A., Ramakrishnan, C.: Adaptive MCMC-based inference in probabilistic logic programs. arXiv preprint arXiv:1403.6036 2014) Monte Carlo sampling is attractive for the simplicity of its implementation and because you can improve the estimate as more time is available. F. Riguzzi Deductive and Inductive PP 16 / 43 Monte Carlo The disjunctive clause Cr = H1 : α1 ∨ . . . ∨ Hn : αn ← L1 , . . . , Lm . is transformed into the set of clauses MC(Cr ) MC(Cr , 1) = H1 ← L1 , . . . , Lm , sample_head(n, r , VC, NH), NH = 1. ... MC(Cr , n) = Hn ← L1 , . . . , Lm , sample_head(n, r , VC, NH), NH = n. Sample truth value of query Q: ... (call(Q)-> ... F. Riguzzi NT1 is NT+1 ; NT1 =NT), Deductive and Inductive PP 17 / 43 Metropolis-Hastings MCMC A Markov chain is built by taking an initial sample and by generating successor samples. The initial sample is built by randomly sampling choices so that the evidence is true. A successor sample is obtained by deleting a fixed number of sampled probabilistic choices. Then the evidence is queried If the query succeeds, the goal is queried. 0 The sample is accepted with a probability of min{1, N N1 } where N0 (N1 ) is the number of choices sampled in the previous (current) sample. F. Riguzzi Deductive and Inductive PP 18 / 43 Solution In cplint: ?- mc_mh_sample(eval(2,4),eval(1,3),100,100,3,T,F,P). Probability of eval(2,4) given that eval(1,3) is true F = 90, T = 10, P = 0.1 You can also try rejection sampling (usually slower) ?- mc_rejection_sample(eval(2,4),eval(1,3),100, T,F,P). F. Riguzzi Deductive and Inductive PP 19 / 43 Solution You may be interested in the distribution of the output In cplint: ?- mc_mh_sample_arg_bar(eval(2,Y),eval(1,3),100, 100,3,Y,V). F. Riguzzi Deductive and Inductive PP 20 / 43 Solution You may be interested in the expected value of the output In cplint: ?- mc_mh_expectation(eval(2,Y),eval(1,3), 100,100,3,Y,E). E = 3.21 F. Riguzzi Deductive and Inductive PP 21 / 43 Continuous Random Variables Distributional clauses (B. Gutmann, I. Thon, A. Kimmig, M. Bruynooghe, and L. De Raedt, “The magic of logical inference in probabilistic programming,” Theory and Practice of Logic Programming, 2011) Gaussian mixture model in cplint: heads:0.6;tails:0.4. g(X): gaussian(X,0, 1). h(X): gaussian(X,5, 2). mix(X) :- heads, g(X). mix(X) :- tails, h(X). F. Riguzzi Deductive and Inductive PP 22 / 43 Continuous Random Variables Inference by sampling Without evidence or evidence on discrete random variables, you can reuse the same methods Sampling arguments of goals for building a probability density of the arguments. F. Riguzzi Deductive and Inductive PP 23 / 43 Gaussian Mixture Model heads:0.6;tails:0.4. g(X): gaussian(X,0, 1). h(X): gaussian(X,5, 2). mix(X) :- heads, g(X). mix(X) :- tails, h(X). ?- mc_sample_arg(mix(X),10000,X,L0), histogram(L0,40,Chart). F. Riguzzi Deductive and Inductive PP 24 / 43 Evidence on Continuous Random Variables You cannot use rejection sampling or Metropolis-Hastings, as the probability of the evidence is 0 You can use likelihood weighting to obtain samples of continuous arguments of a goal. (Nitti, D., De Laet, T., De Raedt, L.: Probabilistic logic programming for hybrid relational domains. Mach. Learn. 103(3), 407-449 2016) F. Riguzzi Deductive and Inductive PP 25 / 43 Likelihood Weighting For each sample to be taken, likelihood weighting samples the query and then assigns a weight to the sample on the basis of evidence. The weight is computed by deriving the evidence backward in the same sample of the query starting with a weight of one Each time a choice should be taken or a continuous variable sampled, if the choice/variable has already been taken, the current weight is multiplied by probability of the choice/by the density value of the continuous value. F. Riguzzi Deductive and Inductive PP 26 / 43 Bayesian Estimation Problem from http://www.robots.ox.ac.uk/~fwood/anglican/ examples/viewer/?worksheet=gaussian-posteriors Estimate the true value of a Gaussian distributed random variable, given some observed data. The variance is known and we suppose that the mean has itself a Gaussian distribution with mean 1 and variance 5 (prior on the parameter) We take different measurement (e.g. at different times), indexed with an integer. F. Riguzzi Deductive and Inductive PP 27 / 43 Bayesian Estimation Anglican code (def dataset [9 8]) (defquery gaussian-model [data] (let [mu (sample (normal 1 (sqrt 5))) sigma (sqrt 2)] (doall (map (fn [x] (observe (normal mu sigma) x)) data)) mu)) (def posterior ((conditional gaussian-model :smc :number-of-particles 10) dataset)) (def posterior-samples (repeatedly 20000 #(sample* posterior))) F. Riguzzi Deductive and Inductive PP 28 / 43 Bayesian Estimation cplint code http://cplint.lamping.unife.it/ example/inference/gauss_mean_est.pl value(I,X) :mean(M), value(I,M,X). mean(M): gaussian(M,1.0, 5.0). value(_,M,X): gaussian(X,M, 2.0). ?- mc_sample_arg(value(0,Y),10000,Y,L0), mc_lw_sample_arg(value(0,X),(value(1,9),value(2,8)),10000,X,L), densities(L0,L,40,Chart). F. Riguzzi Deductive and Inductive PP 29 / 43 Learning Parameter learning Structure learning more developed for PLP, but (Perov, Yura N., and Frank D. Wood. Learning Probabilistic Programs. arXiv preprint arXiv:1407.2646 2014). (Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science 350.6266 2015). (Gaunt, Alexander L., et al. TerpreT: A Probabilistic Programming Language for Program Induction. arXiv preprint arXiv:1608.04428 2016). F. Riguzzi Deductive and Inductive PP 30 / 43 Parameter Learning Problem: given a set of interpretations, a program, find the parameters maximizing the likelihood of the interpretations (or of instances of a target predicate) Exploit the equivalence with BN to use BN learning algorithms The interpretations record the truth value of ground atoms, not of the choice variables Unseen data: relative frequency can’t be used F. Riguzzi Deductive and Inductive PP 31 / 43 Parameter Learning (Thon et al. ECML 2008) proposed an adaptation of EM for CPT-L, a simplified version of LPADs The algorithm computes the counts efficiently by repeatedly traversing the BDDs representing the explanations (Ishihata et al. ILP 2008) independently proposed a similar algorithm LFI-P ROB L OG (Gutamnn et al. ECML 2011) is the adaptation of EM to ProbLog EMBLEM (Riguzzi & Bellodi IDAJ 2013) adapts (Ishihata et al. ILP 2008) to LPADs F. Riguzzi Deductive and Inductive PP 32 / 43 Structure Learning Given a trivial LPAD or an empty one, a set of interpretations (data) Find the model and the parameters that maximize the probability of the data (log-likelihood) SLIPCOVER: Structure LearnIng of Probabilistic logic program by searching OVER the clause space 1 2 Beam search in the space of clauses to find the promising ones Greedy search in the space of probabilistic programs guided by the LL of the data. Parameter learning by means of EMBLEM F. Riguzzi Deductive and Inductive PP 33 / 43 Applications Link prediction: given a (social) network, compute the probability of the existence of a link between two entities (UWCSE) advisedby(X, Y) :0.3 :publication(P, X), publication(P, Y), student(X). F. Riguzzi Deductive and Inductive PP 34 / 43 Applications Classify web pages on the basis of the link structure (WebKB) coursePage(Page1): 0.3 :- linkTo(Page2,Page1),coursePage(Page2). coursePage(Page1): 0.3 :- linkTo(Page2,Page1),facultyPage(Page2). ... coursePage(Page): 0.3 :- has(’abstract’,Page). ... F. Riguzzi Deductive and Inductive PP 35 / 43 Applications Entity resolution: identify identical entities in text or databases samebib(A,B):0.3 :samebib(A,C), samebib(C,B). sameauthor(A,B):0.3 :sameauthor(A,C), sameauthor(C,B). sametitle(A,B):0.3 :sametitle(A,C), sametitle(C,B). samevenue(A,B):0.3 :samevenue(A,C), samevenue(C,B). samebib(B,C):0.3 :author(B,D),author(C,E),sameauthor(D,E). samebib(B,C):0.3 :title(B,D),title(C,E),sametitle(D,E). samebib(B,C):0.3 :venue(B,D),venue(C,E),samevenue(D,E). samevenue(B,C):0.3 :haswordvenue(B,word_06), haswordvenue(C,word_06). ... F. Riguzzi Deductive and Inductive PP 36 / 43 Applications Chemistry: given the chemical composition of a substance, predict its mutagenicity or its carcenogenicity active(A):0.5 :atm(A,B,c,29,C), gteq(C,-0.003), ring_size_5(A,D). active(A):0.5 :lumo(A,B), lteq(B,-2.072). active(A):0.5 :bond(A,B,C,2), bond(A,C,D,1), ring_size_5(A,E). active(A):0.5 :carbon_6_ring(A,B). active(A):0.5 :anthracene(A,B). ... F. Riguzzi Deductive and Inductive PP 37 / 43 Applications Medicine: diagnose diseases on the basis of patient information (Hepatitis), influence of genes on HIV, risk of falling of elderly people (FFRAT) F. Riguzzi Deductive and Inductive PP 38 / 43 Experiments - Area Under the PR Curve System SLIPCOVER SLIPCASE LSM ALEPH++ RDN-B MLN-BT MLN-BC BUSL F. Riguzzi HIV 0.82 ± 0.05 0.78 ± 0.05 0.37 ± 0.03 0.28 ± 0.06 0.29 ± 0.04 0.51 ± 0.04 0.38 ± 0.03 UW-CSE 0.11 ± 0.08 0.03 ± 0.01 0.07 ± 0.02 0.05 ± 0.01 0.28 ± 0.06 0.18 ± 0.07 0.06 ± 0.01 0.01 ± 0.01 Deductive and Inductive PP Mondial 0.86 ± 0.07 0.65 ± 0.06 0.87 ± 0.07 0.77 ± 0.07 0.74 ± 0.10 0.59 ± 0.09 - 39 / 43 Experiments - Area Under the PR Curve System SLIPCOVER SLIPCASE LSM ALEPH++ RDN-B MLN-BT MLN-BC BUSL F. Riguzzi Carcinogenesis 0.60 0.63 0.74 0.55 0.50 0.62 - Mutagenesis 0.95 ± 0.01 0.92 ± 0.08 0.95 ± 0.01 0.97 ± 0.03 0.92 ± 0.09 0.69 ± 0.20 - Deductive and Inductive PP Hepatitis 0.80 ± 0.01 0.71 ± 0.05 0.53 ± 0.04 0.88 ± 0.01 0.78 ± 0.02 0.79 ± 0.02 0.51 ± 0.03 40 / 43 PLP Online http://cplint.lamping.unife.it/ Inference (knwoledge compilation, Monte Carlo) Parameter learning (EMBLEM) Structure learning (SLIPCOVER) http://www.cs.kuleuven.be/~dtai/problog/ Inference (knwoledge compilation, Monte Carlo) Parameter learning (LFI-ProbLog) F. Riguzzi Deductive and Inductive PP 41 / 43 Conclusions PLP is still a fertile field but... ...we must look at other communities and build bridges and... ...join forces! Much is left to do: Tractable sublanguages (see following talk) Lifted inference Structure/Parameter learning (also for programs with continuous variables) F. Riguzzi Deductive and Inductive PP 42 / 43 F. Riguzzi Deductive and Inductive PP 43 / 43