Download Rich Probabilistic Models for Genomic Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to Probabilistic
Models for Computational Biology
Lectures 2 – Oct 3, 2011
CSE 527 Computational Biology, Fall 2011
Instructor: Su-In Lee
TA: Christopher Miles
Monday & Wednesday 12:00-1:20
Johnson Hall (JHN) 022
1
Review: Gene Regulation
a switch! (“transcription factor binding site”)
Gene regulation
DNA
AGATATGTGGATTGTTAGGATTTATGCGCGTCAGTGACTACGCATGTTACGCACCTACGACTAGGTAATGATTGATC
Gene
RNA
Protein
transcription
AUGUGGAUUGUU
MWIV
AUGCGCGUC
AUGCGCGUC
MRV
MRV
AUGUUACGCACCUAC
translation
RNA
degradation
MLRTY
AUGAUUGAU
AUGAUUAU
MID
“Gene Expression”
gene
Genes regulate each others’
expression and activity.
Genetic regulatory network
Review: Variations in the DNA
“Single nucleotide polymorphism (SNP)”
C
T
X
X
A
X
T
G
X
X
AGATATGTGGATTGTTAGGATTTATGCGCGTCAGTGACTACGCATGTTACGCACCTACGACTAGGTAATGATTGATC
RNA
Protein
C
X
AUGUGGAUUGUU
X
MWIV
C
X
X
AUGCGCGUC
U
X
AUGUUACGCACCUAC
T
X
MRV
AUGAUUGAU
X
MLRTY
MID
L
gene
Sequence variations perturb
the regulatory network.
Genetic regulatory network
Outline

Probabilistic models in biology

Model selection problems

Mathematical foundations

Bayesian networks


Probabilistic Graphical Models: Principles and
Techniques, Koller & Friedman, The MIT Press
Learning from data


Maximum likelihood estimation
Expectation and maximization
4
Example 1


How a change in a nucleotide in DNA, blood
pressure and heart disease are related?
There can be several “models”…
DNA
alteration
Blood
pressure
Heart
disease
DNA
alteration
Blood
pressure
Heart
disease
OR
Blood
pressure
DNA
alteration
Heart
disease
5
Example 2


How genes A, B and C regulate each other’s
expression levels (mRNA levels) ?
There can be several models…
A
B
A
C
B
A
OR
C
B
?
C
6
Model I
Model II
Model III
A
A
A
B
C
B
Exp 1 Exp 2
OR
C
…
Gene A


C
Exp N
N instances
Gene B
Probabilistic
graphical models


B
?
Gene
C
A
graphical
representation of statistical dependencies.
Statistical dependencies between expression
levels of genes A, B, C?
Probability that model x is true given the data

Model selection: argmaxx P(model x is true | Data)
7
Outline

Probabilistic models in biology

Model selection problem

Mathematical foundations

Bayesian networks

Learning from data


Maximum likelihood estimation
Expectation and maximization
8
Probability Theory Review

Assume random variables Val(A)={a1,a2,a3}, Val(B)={b1,b2}

Conditional probability

Definition

Chain rule

Bayes’ rule

Probabilistic independence
9
Probabilistic Representation

Joint distribution P over {x1,…, xn}



xi is binary
2n-1 entries
If x’s are independent

P(x) = p(x1) … p(xn)
10
Conditional Parameterization

The Diabetes example



Genetic risk (G), Diabetes (D)
Val (G) = {g1,g0}, Val (D) = {d1,d0}
P(G,D) = P(G) P(D|G)


P(G): Prior distribution
P(D|G): Conditional probabilistic
distribution (CPD)
Genetic risk
Diabetes
11
Naïve Bayes Model - Example

Elaborating the diabetes example,




Genetic Risk (G), Diabetes (D), Hypertension (H)
Val (G) = {g1,g0}, Val (D) = {d1,d0}, Val (H) = {h1,h0}
8 entries
If S and G are independent given I,


P(G,D,H) = P(G)P(D|G)P(H|G)
5 entries; more compact than joint
Genetic risk
Diabetes
Hypertension
12
Naïve Bayes Model

A class C where Val (C) = {c1,…,ck}.

Finding variables x1,…,xn

Naïve Bayes assumption



The findings are conditionally independent given the
individual’s class.
The model factorizes as:
The Diabetes example

class: Genetic risk, findings: Diabetes, Hypertension
13
Naïve Bayes Model - Example

Medical diagnosis system


Class C: disease
Findings X: symptoms

Computing the confidence:

Drawbacks

Strong assumptions
14
Bayesian Network

Directed acyclic graph (DAG)



Node: a random variable
Edge: direct influence of one node on another
The Diabetes example revisited


Genetic risk (G), Diabetes (D), Hypertension (H)
Val (G) = {g1,g0}, Val (D) = {d1,d0}, Val (H) = {h1,h0}
Genetic risk
Diabetes
Hypertension
15
Bayesian Network Semantics

A Bayesian network structure G is a directed acyclic graph
whose nodes represent random variables X1,…,Xn.



PaXi: parents of Xi in G
NonDescendantsXi: variables in G that are not descendants of Xi.
G encodes the following set of conditional independence
assumptions, called the local Markov assumptions, and
denoted by IL(G):
x2
For each variable Xi:
x1
x3
x4
x11
x3
x10
x7
x5
x6
x8
x9
16
The Genetics Example

Variables


B: blood type (a phenotype)
G: genotype of the gene that encodes a person’s blood
type; <A,A>, <A,B>, <A,O>, <B,B>, <B,O>, <O,O>
17
Bayesian Network Joint Distribution


Let G be a Bayesian network graph over the variables
X1,…,Xn. We say that a distribution P factorizes according
to G if P can be expressed as:
A Bayesian network is a pair (G,P) where P factorizes over
G, and where P is specified as a set of CPDs associated
with G’s nodes.
18
The Student Example

More complex scenario



Course difficulty (D), quality of the recommendation
letter (L), Intelligence (I), SAT (S), Grade (G)
Val(D) = {easy, hard}, Val(L) = {strong, weak},
Val(I) = {i1,i0}, Val (S) = {s1,s0}, Val (G) = {g1,g2,g3}
Joint distribution requires 47 entries
19
The Student Bayesian network

Joint distribution

P(I,D,G,S,L) =
from Koller & Friedman
20
Parameter Estimation

Assumptions



For example,
{i0,d1,g1,l0,s0}
Fixed network structure
Fully observed instances of the network variables: D={d[1],…,d[M]}
Maximum likelihood estimation (MLE)!
“Parameters” of the
Bayesian network
from Koller & Friedman
21
Outline

Probabilistic models in biology

Model selection problem

Mathematical foundations

Bayesian networks

Learning from data


Maximum likelihood estimation
Expectation and maximization
22
Acknowledgement

Profs Daphne Koller & Nir Friedman,
“Probabilistic Graphical Models”
23
Related documents