Download Approximating the Bacterial Ancestral Recombination Graph

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Approximating the Bacterial
Ancestral Recombination Graph
Xavier Didelot (University of Warwick)
Daniel Lawson (University of Bristol)
Aaron Darling (University of California-Davis)
Peter Green (University of Bristol)
Daniel Falush (University College Cork)
Graphical Models and Genetic Applications
University of Warwick
15-17 April 2009
Approximating the Bacterial Ancestral Recombination Graph – p. 1/2
Bacterial Reproduction
Mutation
•
Bacteria reproduce clonally:
• Single bacteria splits into two identical copies
Approximating the Bacterial Ancestral Recombination Graph – p. 2/2
Bacterial Reproduction
Mutation
Recombination
•
•
Bacteria reproduce clonally:
• Single bacteria splits into two identical copies
But exchange DNA:
• Transformation - from the environment
• Transduction - transmission via bacteriophage
• Conjugation - direct from another bacterium
Approximating the Bacterial Ancestral Recombination Graph – p. 2/2
Forward in time models
Moran model with constant population size N = 6.
Timestep:
1. Each individual gives birth at rate b.
2. A randomly chosen individual is killed for each birth.
Time
Observed Sequences
Approximating the Bacterial Ancestral Recombination Graph – p. 3/2
The coalescent
Pure death process with rate k(k − 1)/2 where k is
the number of lineages alive at a given time.
Coalescent
time
Time
Observed Sequences
Allows us to calculate probability of a given tree.
Approximating the Bacterial Ancestral Recombination Graph – p. 4/2
Recombination model
Allow recombination to occur with probability r for
each birth; a second parent is chosen at random.
Time
Observed Sequences
Approximating the Bacterial Ancestral Recombination Graph – p. 5/2
Ancestral recombination graph
Birth-death process with recombination (birth) at rate
ρk/2 and coalescence (death) at rate k(k − 1)/2.
Allows us to calculate probability of a given graph.
Approximating the Bacterial Ancestral Recombination Graph – p. 6/2
ARG with crossover
When a recombination occurs, a point S is chosen uniformly at random on the
gene. The material on the left of S follows one line of descent and the
material on the right the other.
• Symmetry of the two parents;
• This is the version of the ARG found in most of the literature.
Approximating the Bacterial Ancestral Recombination Graph – p. 7/2
ARG with gene conversion
When a recombination occurs, a small fragment is contributed by the donor
cell and the rest by the recipient.
• Asymmetry of the two parents (recipient/donor);
• This is the version of the ARG which is relevant to bacteria;
• Studied by Wiuf and Hein (2000).
Approximating the Bacterial Ancestral Recombination Graph – p. 8/2
Clonal genealogy
If we always follow the line of the recipient, we get a tree called “clonal
genealogy”.
• Concept unique to ARG with gene conversion;
• The clonal genealogy is distributed as a coalescent tree.
Approximating the Bacterial Ancestral Recombination Graph – p. 9/2
Ancestral material and local trees
For each branch, the material that ends up in the leaves is called “ancestral
material” and shown in bold. If we follow the ancestral material for a given
site, we get the “local tree” of that site.
Local tree for the first DNA base
Local tree for the last DNA base
Approximating the Bacterial Ancestral Recombination Graph – p. 10/2
Full ARG inference
• Probability of the model:
P (Graph|Data) ∝ P (Data|Graph)P (Graph)
• Inference under the full ARG model is difficult:
• The state-space of ARGs is huge;
• The data are uninformative about the actual ARG;
• An ARG is more than the sum of its local trees.
• Importance sampling algorithm of Fearnhead and Donnelly
(2001): a month for 31 sequences of 500bp;
• Can we find a good approximation?
Approximating the Bacterial Ancestral Recombination Graph – p. 11/2
Approximating the Bacterial
Ancestral Recombination Graph
Xavier Didelot (University of Warwick)
Daniel Lawson (University of Bristol)
Aaron Darling (University of California-Davis)
Peter Green (University of Bristol)
Daniel Falush (University College Cork)
Graphical Models and Genetic Applications
University of Warwick
15-17 April 2009
Approximating the Bacterial Ancestral Recombination Graph – p. 12/2
Approximating the ARG
• A good approximation to the ARG should:
• Contain the parts of the ARG that are well informed and
biologically important
• Simplify the rest of the graph to make inference possible
• The clonal genealogy is of central interest in most studies. It is also
likely to be well informed when the data is large and the recombination
rate not too big.
• Thus we use the clonal genealogy as the centre of our approximation,
and simplify the recombination process.
Approximating the Bacterial Ancestral Recombination Graph – p. 13/2
The weak ARG model
• Two differences with full ARG:
• Recombinant edges not allowed to coalesce with each other;
• Recombinant edges not allowed to recombine.
• The weak ARG is a “tree with added edges” rather than a graph
Approximating the Bacterial Ancestral Recombination Graph – p. 14/2
Probability of a wARG
• A weak ARG is made of a clonal genealogy T and a set R of
recombinant edges.
• Each recombinant edge in R is characterized by an origin ai on T , a
destination bi on T , a starting site xi and an ending site yi .
• Probability of a wARG:
P (Graph) = P (T )P (R|T )
with:
N
Y
i
exp −
ti
P (T ) =
2
i=2
P (R|T ) = exp(−ρT /2)
ρT
2
R Y
R
P (xi , yi )T −1 exp(−L(ai , bi ))
i=1
Approximating the Bacterial Ancestral Recombination Graph – p. 15/2
Bayesian Inference
• Inference problem: given a set of observed sequences, can we infer the
ancestry graph?
• Likelihood calculation: as for the full ARG, the wARG defines a local
tree for each site, so we can use Felsenstein’s pruning algorithm.
• Reversible-jump Monte-Carlo Markov Chain
• Moves:
• Subtree pruning and regrafting for T ;
• Add and remove recombinant edges in R (only the local
likelihood needs recalculating);
• Updates for mutation rate θ, recombination rate ρ and mean
import length δ.
Approximating the Bacterial Ancestral Recombination Graph – p. 16/2
Implementation
• Implementation in progress, should be ready this year
• Current status: infer the recombination process given the clonal
genealogy
• In progress: jointly infer the clonal genealogy
• The current implementation is interesting in itself, when the clonal
genealogy is clear (eg. because ρ is low or lots of data) and we want to
characterize patterns of recombination
• Example: Multi-Locus Sequence Typing (MLST) data for 57 isolates
from the Bacillus cereus group (Priest et al., 2004).
• These are very preliminary results!
Approximating the Bacterial Ancestral Recombination Graph – p. 17/2
Bacillus: clonal genealogy
0
1
2
9
54
24
44
42
23
28
4
29
6
5
27
25
35
3
21
16
19
47
11
26
14
49
45
31
50
43
7
22
12
41
15
51
36
37
48
30
56
17
55
8
40
10
13
20
53
52
46
18
32
38
33
34
39
Anthracis
Cereus
Tolworthi
Kurstaki
Thuringiensis
Sotto
Others
0.1
Approximating the Bacterial Ancestral Recombination Graph – p. 18/2
Bacillus: a single iteration
0
1
2
9
54
24
44
42
23
28
4
29
6
5
27
25
35
3
21
16
19
47
11
26
14
49
45
31
50
43
7
22
12
41
15
51
36
37
48
30
56
17
55
8
40
10
13
20
53
52
46
18
32
38
33
34
39
Anthracis
Cereus
Tolworthi
Kurstaki
Thuringiensis
Sotto
Others
0.1
Problem: how can we summarize all iterations?
Approximating the Bacterial Ancestral Recombination Graph – p. 19/2
Bacillus analysis
0
1
2
9
54
24
44
42
23
28
4
29
6
5
27
25
35
3
21
16
19
47
11
26
14
49
45
31
50
43
7
22
12
41
15
51
36
37
48
30
56
17
55
8
40
10
13
20
53
52
46
18
32
38
33
34
39
0.1
Anthracis
Cereus
Tolworthi
Kurstaki
Thuringiensis
Sotto
Others
0.1
Distribution of orgins for the imports on a given branch
Approximating the Bacterial Ancestral Recombination Graph – p. 20/2
Bacillus analysis
0
1
2
9
54
24
44
42
23
28
4
29
6
5
27
25
35
3
21
16
19
47
11
26
14
49
45
31
50
43
7
22
12
41
15
51
36
37
48
30
56
17
55
8
40
10
13
20
53
52
46
18
32
38
33
34
39
1
Anthracis
Cereus
Clade 1
Tolworthi
Kurstaki
Clade 2
Thuringiensis
Sotto
Others
Clade 3
0.1
Distribution of arrivals colour-coded by origin
Approximating the Bacterial Ancestral Recombination Graph – p. 21/2
Bacillus analysis
25.964
0
1
2838
Distribution of recombination along the genes
Approximating the Bacterial Ancestral Recombination Graph – p. 22/2
Conclusions
• The ancestral recombination graph is a powerful model of ancestry, but
is not usable in inference
• Need for efficient approximations
• Ignoring recombination completely (as most methods do) is dangerous
• The weak ARG is based on the existence in bacteria of a clonal
genealogy, and approximates the recombination process by modelling
the origin of each import but not its full ancestry
• Our approximation is simple enough to perform inference
• Our program can reveal patterns of recombination
• It can also be used to infer the clonal genealogy
Approximating the Bacterial Ancestral Recombination Graph – p. 23/2