Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
A Prediction of the Neutral Theory of Molecular Evolution
And Departures from Neutrality
HKA
Test
Fixed Poly
Frequency-Dependent Selection
Balancing Selection
Variation
Within
Species
at Locus X
Locus 1
50 5
Locus 2
30 3
Neutral
Zone
Hudson Kreitman Aguadé 1987
Adaptive Divergence
Selective Sweep
Multilocus approach:
Jody Hey’s web page
Divergence Between Species at Locus X
Demography affects entire genome
Selection acts on single (few) loci
Maximum Likelihood aaproach
Wright & Charlesworth (2004)
The “footprint” of balancing selection at Adh in Drosophila
polymorphism
Kreitman and Hudson (1991) Genetics 127:565-82
Adh-dup
Adh locus
Adjacent silent sites
in linkage disequilibrium
Fast
Fast
Fast
Fast
Slow
Slow
Slow
Slow
F
F
F
F
S
S
S
S
Distant sites in Linkage Equilibrium
Fast/Slow
polymorphism
HKA
HKA
TestTest
Fixed Poly
Locus 1
Adh
50
16
520
Locus 2
Adh-dup
50
30
13
3
P < 0.02
Marker loci
• • • • • • • * • • • #• • •
Rate
of
recombination
Physical position along 3rd chromosome
HKA
Test
DNA polymorphism
The effect of recombination on levels of polymorphism
0.012
0.010
0.008
0.006
0.004
0.002
0.000
33
50
51
Locus 2
26
30
36
•• # •
•• • •
•
•• • Aquadro, Begun
•
*
& Kindahl, 1994
0.1
Fixed Poly
Locus11
Locus
•
*
DNA 0.05
divergence
• ••
0.000
••
#
•
Begun & Aquadro 1992
0
P < 0.05
•
0.02
0.04
0.06
0.08
Rate of recombination
Reduced polymorphism due to selective sweeps (adaptive mutations and hitchhiking)
* = beneficial mutation
*
*
*
*
*
*
No recombination:Polymorphism removed
Free recombination: Locally reduced variation
Reduced polymorphism due to background selection eliminating deleterious mutations
“mutation-free” chromosomes
X
= deleterious mutation
{
X
No recombination:Polymorphism
removed
X
X
Free recombination: little effect
Testing for selection in mtDNA
The genetic code and DNA “phenotypes”
Replacement sites = "Phenotype"
Protein sequence
Nucelotide sequence
Ala Cys Asp Ser
GCA TGC GAC TCA
C
T
T
C
G
G
T
T
AGA
2-fold silent sites C
"para-phenotype" G
T
4-fold silent sites
essentially neutral
Polymorphism in mtDNA and the MK test
A.
B.
Polymorphism and divergence of mutations
Allele frequency
N
S
S
N
S
dN/dS S
‘between’ N
S
Time
Type
of
mutation
Polymorphic
within
populations?species?
Fixed
between
Neutral
Beneficial
Deleterious
Balanced
Yes
No
Yes
Yes
Yes
Yes
No
Yes & No
Sister species
or unrelated
strain
dN/dS
‘within’
S
N
S
S
S
N
S N dN/dS
N NS S N
‘within’
Population or family
= Neutrality Index (NI)
dN/dS
‘between’
NI < 1.0 implies positive selection
NI > 1.0 implies negative selection
Rand & Kann (1996) MBE
Rand (2008) PLoS Biology
(opposite of simple dN/dS)
McDonald Kreitman Test
N
S
S
N
S
S
N
S
S
N
S
S
S
Sister species
or unrelated
strain
Poly
Fixed morphic
MK test
100
10
Replacement 20
2
Silent
N
S N dN/dS
N NS S N
‘within’
dN/dS
‘within’
dN/dS
‘between’
Population or family
= Neutrality = 1.0
Index (NI)
F P
F P
F P
S 100 10
S 100 10
S 100 10
R 20 10
R 20 2
R 50 2
NI>1
NI=1
NI<1
negative
neutral
positive
Polymorphism and Divergence at
Silent and Replacement Sites
McDonald-Kreitman Test
Fixed Poly
Replacement
20
2
Silent
60
6
(PR/FR)
Neutrality Index =
Rand
and Kann (1996
Rand&Kann
1996)
MBE 13:735-748
Fixed Poly
=1
(PS/FS)
Fixed Poly
Replacement
20
6
Replacement
40
2
Silent
60
6
Silent
60
6
"Excess" Amino Acid
polymorphism
N. I. = (6/20) / (6/60) = 3.0
=> mildly deleterious
"Deficiency" of Amino Acid
polymorphism
N. I. = (2/40) / (6/60) = 0.5
=> advantageous
N.I. is very sensitive to
selection
1000
Polymorphis m
Divergenc e
Kimura (1983)
Diffusion Approximation of
N.I. = (S sel /k sel )/(S neut /k neut )
Sawyer&Hartl (1992)
Akashi (1995)
Nachman (1998)
Weinreich & Rand
N = 1000; = 1; h = 1
(2000)
100
N.I.
10
1
0.1
NeS
Negative Neutral Positive
-4
-2
0
2
4
NeS
6
8
10
Nuclear and mtDNA genes have distinct N.I. distributions
Arabidopsis nuclear genes are like mtDNA
Weinreich and Rand (2000)
20
mtDNA (all)
15
All
data
sets
P < 0.005
Drosophila nuclear (all)
Arabidopsis (all)
10
Data include:
• 31 mtDNA data sets
• 37 Drosophila nuclear data sets
• 6 Arabidopsis nuclear data sets
• About 1000bp per data set
• About 20 alleles per data set
5
0
0.05
0.1
0.5
8
mtDNA (signif.)
Significant
data
sets
6
1
5
N.I.
10
50
P < 0.0004
• About 1.5 million base pairs
• About 1.5 liters of Taq polymerase
Drosophila nuclear (signif.)
Arabidpsis (signif.)
4
2
0
0.05
0.1
Adaptive evolution:
Excess amino acid
fixed differences
0.5
1
N.I.
5
10
50
Mildly deleterious evolution:
Excess amino acid
polymorphism
Gerber et al. 2001
Ann. Rev. Genet.
41 animal mtDNAs
Same result
Low recombination, Muller’s ratchet effects:
Accumulation of weakly deleterious mutations
Occasional beneficial mutations
cannot out-compete load of deleterious mutations
Selective sweeps reduce effective population size, and weaken selection
15 vs. 3
High recombination, faster evolution:
Chromosome segments can respond to “local” fitness differences
Beneficial mutations fix, deleterious mutations remain at low frequency
2 vs. 1
can go to fixation
Tajima’s D
Recombination rate
DNA divergence
DNA polymorphism
419 genes, 24 alleles per gene
Supports hitchhiking
model (weakly)
Recombination rate
Fay & Wu (2002): A frequency twist to the MK test
The
distribution
of site
important
Polymorphism,
Divergence
andfrequencies
Fitness Effects ofisMutations
Type ofofMutation
Fixed BetweenCommon?
Species?
Type
mutation Polymorphic
Rare?Within Species?
Intermediate?
Neutral
Advantageous
Mildly Deleterious
Balanced
Yes
Yes
No
No
Yes
Yes
Yes
No
Yes
More
No
Yes
Yes
Yes
No
No & Yes
yes
likely
No
No
Allele (Site) frequency distribution
Excess of low frequency alleles
Excess of intermediate frequency alleles
Excess of low frequency alleles
Neutral (Ewens) frequency spectrum
Outgroup needed
to distinguish “derived”
Alleles (18,19 vs 1,2)
Human
Chimp
N
S
Nielsen (2005) Ann. Rev. Genet. 39:197
Sequence 20 alleles, record frequency of each SNP
419 genes, 24 alleles sequenced/gene, compared to D. simulans
Fixation Index =
(Fixed A) / (Fixed S)
(Poly A) / (Poly S)
(Poly A) / (Fixed A)
NI = (Poly S) / (Fixed S)
Noncoding DNA is more constrained than silent sites
How do you perform an MK test on non-coding DNA?
Need to define functional classes
http://genome.imim.es/courses/Lisboa01/images/Gene.jpg
Proportion of positively selected fixations
= 1 - (Ds•Px) / (Dx•Ps)
Where: s = silent, x = other class
Smith & Eyre-Walker (2002) Nature 415:1022-24
• Silent sites are neutral
• Assumes polymorphism is neutral
• Excluding singletons increases estimate of positive selection
Estimating selection coefficients from MK data
Excluding singletons increases estimate
Mildly deleterious polymorphisms at low frequency
Measuring DNA Evolution
•
•
•
•
Align sequences between species
Determine length of sequences, L
Count number of differences
Divergence = proportion of
differences
• D = p-distance = (number of
differences) / (length of sequence)
• Rate of divergence
= (sequence divergence) / (age of
common ancestor)
= D / time
• Rate of substitution
= D / 2 x time
time
Example: 5 differences in 100
D = 0.05, t = 6 million years
Divergence = 0.05/6x106
Divergence = 8.3 x 10-9
Jukes Cantor One parameter model
= rate of substitution
PA(t) = ¼ + ¾ e-4t = probability that A remains A at time t
PNN = ¼ + ¾ e-8t = probability that two sequences have the same nucleotide at N
D = proportion of different nucleotides = 1 - PNN
Dhat = 3/4(1-e-8t)
K = - ¾ ln (1-4/3p)
where p = proportion of nucleotide differences (# diffs./total bp)
Kimura two-parameter model
= rate of transition substitution
b
b
b
b
b = rate of transversion substitution
PAA(t) = ¼ + ¾ e-4bt + ½ e-2(+b)t
= probability that A remains A at time t
K = ½ ln(1/[1- 2P-Q]) + ¼ ln(1/[1-2Q])
where P = proportion of transitional differences
Q = proportion of transversional differences
•
•
•
•
P-distance
Jukes Cantor
Kimura 2-parameter
Tamura-Nei
• Etc…
Molecular clocks
Approximately constant
Divergence of proteins
K = •f0
Rate of substitution =
Mutation rate x proportion of
neutral mutations
“Saturation” due to multiple
Hits in DNA evolution
100%
Synonymous
site
33%
Synonymous AND 66% nonsynonymous
site: T->C silent; T->A or G nonsynonymous
dN and dS
dS = number of synonymous differences PER
synonymous site (not per all sites)
dN = number of nonsynonymous differences PER
nonsynonymous site (not per all sites)
More nonsynonymous sites than synonymous
DNA test of neutrality
•
•
•
Neutral prediction:
amino acid (nonsynonymous)
substitution rate (dN) should be
lower than silent (synonymous)
substitution rate (dS)
True for most genes
Antigen binding sites: dN/dS > 1
“positive” selection
– Follows from functional constraint
argument
•
Different for Major
Histocompatibility Complec
(MHC) loci
– Antigen recognition sequence
shows dN > dS
– Rest of molecule shows dN > dS,
as expected
•
•
Amino acid mutations are favored
in antigen recognition region
Promotes diversity, better
recognition of foreign peptides
http://depts.washington.edu/rhwlab/dq/3structure.html
Rest of molecule: dN/dS < 1
Negative (purifying) selection
Fixed
Replacement
FR
Poly
MK vs HCY
tests
Non
synonymous
PR
Fixed
Poly
dN
dN
between within
dS
Silent
FS
dS
synonymous betweenwithin
PS
NI = (PR/FR) / (PS/FS)
w = (dN/dS); Niw =
(dN/dS)within
(dN/dS)between
McDonald-Kreitman test
2x2 G-test
Or FET
Human
Chimp
N
S
Hasegawa, Cao, Yang test
Likelihood of tree with 1 w
vs.
Likelihood of tree with 2 w
HCY test is more powerful than MK test
w ratio is better than NI
NI vs. w ratio
7.000
NINI
wNIw
ratio
** *
6.000
5.000
*
4.000
NI 3.000
or
ratio 2.000
orwNIw
**
*
**
1.000
**
**
*
**
0.000
nd1
nd2
co1
co2 atp8 atp6 co3 nd3 nd4l nd4
nd5
nd6 cytb
gene
Variation among genes in NeS
NI values larger with HCY
HCY NIw leads to larger estimate of negative selection
HCY becomes more powerful as species divergence increases
Anatomy of a phylogenetic tree
Terminal (external) nodes
Taxa =
OTUs =
Operational
taxonomic units
Taxon1
External branch
Internal branch
Taxon2
Taxon3 Taxon4
Taxon5
Taxon6
Polytomy
Non-dichotomous
splitting
Internal nodes
Root
Relative rate test
• KAC = KBC
• KOC is shared
• Tajima test
• (m1-m2)2 / (m1+m2)
• Chi square, df=1
Species O
m1
m2
Species A
Species B
Species C