Download The Pattern of Nucleotide Difference at Individual

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Pattern of Nucleotide Difference at Individual Codons Among
Mouse, Rat, and Human
Robert Friedman and Austin L. Hughes
Department of Biological Sciences, University of South Carolina
The patterns of nucleotide difference were compared at 3,473,111 codons from 9,390 aligned orthologous genes of mouse
(Mus musculus), rat (Rattus norvegicus), and human (Homo sapiens). The results showed evidence of a higher frequency of
both synonymous and nonsynonymous differences from human in the rat than in the mouse. However, contrary to a previous report, there was no evidence of a greater frequency of codons with multiple nonsynonymous substitutions between
the two rodent species than expected under random substitution.
Introduction
All evolutionary changes initially arise as mutations to
the sequence of DNA, which then become fixed in populations as a result of processes such as genetic drift and natural
selection. Thus, an understanding of the mechanisms by
which sequence differences accumulate between homologous sequences is necessary for an understanding of the evolutionary process (Nei 1987). Protein-coding regions are
particularly interesting for a comparative study of related
sequences. Because of the redundant nature of the genetic
code, the pattern of synonymous and nonsynonymous
(amino acid-altering) nucleotide change in protein-coding
regions is believed to be particularly informative regarding
past natural selection (Hughes 1999), including both purifying selection acting to eliminate selectively disadvantageous
mutants (Kimura 1977) and positive selection favoring advantageous mutants (Hughes and Nei 1988). Nonetheless,
the patterns of coding sequence evolution are complex
and remain poorly understood.
The availability of completely sequenced genomes
from a number of organisms holds promise for increasing
our understanding of sequence evolution because of the statistical power available from exploiting large numbers of
orthologous gene comparisons. In a recent paper exploiting
complete mammalian genomic sequences, Bazykin et al.
(2004) reconstructed the pattern of nonsynonymous (amino
acid-altering) nucleotide change at a large number of individual codons in rat and mouse genes. They examined
28,196 codons at which rat and mouse differed from each
other at two nucleotide sites and 1,982 codons showing three
differences. In codons with two nonsynonymous nucleotide
differences, Bazykin et al. (2004) reported that two nonsynonymous differences occurred in the same lineage 64% of
the time, whereas they argued that this should occur only
50% of the time if mutations were independent. Similarly,
in codons with three nonsynonymous differences, they
found that all three occurred in the same lineage 46% of
the time, whereas they argued that this should occur only
25% of the time if mutations were independent. These
authors argued that the excess of multiple nonsynonymous
mutations in the same lineage is an evidence of positive
Key words: nucleotide substitution, positive selection, purifying
selection.
E-mail: austin@biol.sc.edu.
Mol. Biol. Evol. 22(5):1285–1289. 2005
doi:10.1093/molbev/msi113
Advance Access publication February 16, 2005
Ó The Author 2005. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: journals.permissions@oupjournals.org
selection favoring rapid successive amino acid replacements
at the same site.
An attractive aspect of the study of Bazykin et al.
(2004) was that the authors did not assume any model of
sequence evolution, but rather based their argument on simple probability. Although numerous recent studies have
relied on complex models of coding sequence evolution,
some recent evidence (e.g., Suzuki and Nei 2004; Zhang
2004) suggests that model-based approaches can be misleading. In the absence of a thorough understanding of
the process of sequence evolution, it is important to use
model-free approaches that can increase our understanding
of sequence evolution without making untested assumptions. As a result of such investigations, it will be possible
in the future to develop more realistic models than those
that are currently available.
However, the probability calculations on which the
argument of Bazykin et al. (2004) is based are questionable. Define Pr(x) as the probability of x substitutions in
the rat lineage and Pm(x) as the probability of x substitutions in the mouse lineage. The probability of one substitution in each lineage is Pr(1)Pm(1). The probability that
there will be two substitutions in one lineage and zero in
the other is Pr(2)Pm(0) 1 Pr(0)Pm(2). The expectation of
Bazykin et al. (2004) that when there are two nonsynonymous differences between the two species, there will be
one substitution in each lineage in 50% of cases will be
true only if
Prð1ÞPmð1Þ 5 Prð2ÞPmð0Þ 1 Prð0ÞPmð2Þ:
ð1Þ
But it unclear how often the latter relationship holds in
practice.
A similar logic applies in the case of three-substitution
codons. The expectation of Bazykin et al. (2004) that, under
random substitution, 25% of such cases will show three
substitutions in one lineage requires that
Prð3ÞPmð0Þ 1 Pmð3ÞPrð0Þ
5 1=3½Pmð2ÞPrð1Þ 1 Pmð1ÞPrð2Þ: ð2Þ
Again it is uncertain whether this assumption is met in real
data.
Here we further analyze the pattern of nucleotide difference in coding regions between orthologous genes from
three complete mammalian genomes, mouse, rat, and
human. Using the same data set as Bazykin et al. (2004),
we examine the patterns of difference among species in
order to test whether multiple substitutions occur in the
1286 Friedman and Hughes
Table 1
Mean Proportions of Synonymous ( pS) and
Nonsynonymous (pN) Substitutions per Site 6 standard error
(SE) at 3,353,142 Homologous Codons in Comparisons
Among Mouse, Rat, and Human
Comparison
Mouse-rat
Mouse-human
Rat-human
pS 6 SE
pN 6 SE
0.2220 6 0.0004
0.4684 6 0.0005
0.4758 6 0.0005a
0.0345 6 0.0001
0.0794 6 0.0001
0.0832 6 0.0001a
a
Significantly different from corresponding value for mouse-human comparison by paired t-test (two-tailed P , 0.001).
same codon to a greater extent than expected by chance. We
use simple methods that do not depend on any model of
nucleotide substitution, but rather on comparative analysis
of patterns of nucleotide difference.
Methods
Using the data set of Bazykin et al. (2004), we compared 3,473,111 codons from 9,390 aligned orthologous
genes of mouse (Mus musculus), rat (Rattus norvegicus),
and human (Homo sapiens). For each codon, we computed
the numbers of synonymous and nonsynonymous nucleotide differences, the proportion of synonymous differences
per synonymous site (pS), and the proportion of nonsynonymous differences per nonsynonymous site (pN) by the
method of Nei and Gojobori (1986). Because we computed
pS and pN for individual codons, we excluded from calculations of mean pS and pN any codon at which any of the three
species had an ATG and a TGG codon because these codons
include no synonymous sites and thus pS is undefined. No
correction formula for multiple hits was applied because
these are inapplicable in the case of individual codons.
In cases where there are two or more nucleotide
differences between homologous codons, the method of
Nei and Gojobori (1986) averages across possible pathways
in counting numbers of synonymous and nonsynonymous
differences. In averaging across pathways, no pathway
including a stop codon is considered. When the estimates
derived from averaging across pathways are integers, it
implies that the possible pathways did not differ regarding
the numbers of synonymous and nonsynonymous differences. Therefore, in counting differences among codons we
used only cases where the numbers of synonymous and nonsynonymous differences were integral.
In comparing synonymous and nonsynonymous differences within codons, we used codon positions at which
mouse, rat, and human all possessed fourfold degenerate
codons. There were 1,351,695 such codons in our data
set, or 38.9% of all codons. At fourfold degenerate codons,
all mutations at the third position are synonymous, while all
mutations at the first or second position are nonsynonymous.
Thus, the opportunities for synonymous and nonsynonymous substitution are the same at each fourfold degenerate
codon.
Results
The mean values of pS and pN were computed for
3,353,142 homologous codons shared among mouse, rat,
Table 2
Numbers of Codons Having Different Numbers of
Synonymous Differencesa Between Mouse and Human
and Between Rat and Human
Number of Synonymous
Differences
Mouse-Human
Rat-Human
0
0.167
0.250
0.333
0.500
0.667
0.750
0.833
1
1.167
1.333
1.500
1.750
2
2,440,588
1,603
783
1,933
43,758
4,450
1,552
4,939
960,376
598
103
1,861
32
10,535
2,422,106
1,802
1,013
2,235
47,341
5,106
1,838
5,537
972,499
676
114
2,013
37
10,794
Total
Integral values (% total)
All nonzero
Integral nonzero values
(% all nonzero)
3,473,111
3,411,499 (98.2%)
1,032,523
3,473,111
3,405,399 (98.1%)
1,051,005
970,911 (94.0%)
983,293 (93.6%)
a
The method of Nei and Gojobori (1986).
and human (excluding codons at which any of the three species had an ATG and a TGG codon). Both mean pS and
mean pN were significantly higher in the comparison
between rat and human than in the comparison between
mouse and human (table 1). Because human is an out-group
to mouse and rat, this result means that rat has evolved
faster at both synonymous and nonsynonymous sites since
the last common ancestor of mouse and rat.
When the estimated numbers of synonymous differences between mouse and human and between rat and human
were tabulated for all codons, similar patterns were observed in the two rodent species (table 2). In each species,
over 98% of differences were integral in value (including
zero) (table 2). Even when codons with no synonymous differences were excluded, about 94% of estimated synonymous differences were integral in both species (table 2).
Similarly, over 98% of nonsynonymous substitutions
between mouse and human and between rat and human
showed integral values (including zero) (table 3). When
codons with no nonsynonymous differences were excluded,
about 87% of estimated nonsynonymous differences were
integral in both species (table 3).
Integral values of the numbers of synonymous and
nonsynonymous differences imply the absence of conflict
among possible mutational pathways. Therefore, we compared the possible combinations of integral synonymous
(table 4) and nonsynonymous (table 5) differences between
mouse and human and between rat and human. Assuming
that mutations are independent in mouse and rat, the
expected numbers of each possible combination can be
calculated by multiplying marginal frequencies of the contingency table (tables 4 and 5).
At both synonymous and nonsynonymous sites, the
results showed a highly significant deviation from expectations (tables 4 and 5). However, this did not occur because
of greater than expected frequencies of codons with two or
Nucleotide difference at individual codons in rodents and human 1287
Table 3
Numbers of Codons Having Different Numbers of
Nonsynonymous Differencesa Between Mouse and
Human and Between Rat and Human
Number of Nonsynonymous
Differences
Mouse-Human
Rat-Human
0
1
1.250
1.500
1.667
1.833
2
2.167
2.250
2.333
2.500
2.667
2.750
2.833
3
2,979,314
362,828
32
35,953
103
598
68,592
4,939
1,552
4,450
9,666
1,933
783
1,603
765
2,962,139
367,973
37
37,963
114
676
74,377
5,537
1,838
5,106
11,391
2,235
1,013
1,802
910
Total
Integral values (% total)
All nonzero
Integral nonzero values
(% all nonzero)
3,473,111
3,411,499 (98.2%)
493,797
3,473,111
3,405,399 (98.1%)
510,972
432,185 (87.5%)
443,260 (86.7%)
a
The method of Nei and Gojobori (1986).
three differences in one species and none in the other.
Rather, for both synonymous and nonsynonymous differences, codons with two or more differences in one species
and zero in the other occurred much less frequently than the
random expectation (tables 4 and 5). Codons with two nonsynonymous differences in one species and none in the
other species occurred only about 13% as often as expected
(table 5). Similarly, codons with three nonsynonymous differences in one species and none in the other occurred only
about 26% as often as expected (table 5). On the other hand,
in the cases of both synonymous and nonsynonymous differences, codons with one or more differences in both species occurred much more frequently than expected (tables 4
and 5).
An obvious explanation for the latter pattern was that
codons in which both mouse and rat showed one or more
differences from human were those at which substitutions
occurred after the primate-rodent divergence but before the
rat-mouse divergence. We tested this interpretation in the
case of nonsynonymous substitutions by computing mean
pS and pN between mouse and rat for codons showing at
least one nonsynonymous difference between human and
mouse for each species (fig. 1). ATG and TGG codons were
excluded because pS is undefined at these codons. There
were highly significant differences in both mean pS and
mean pN between codons with different patterns of nonsynonymous difference between the two rodents and human
(fig. 1). However, it was striking that mean pN between
mouse and rat was near zero for every case where the
two rodents had an equal number of nonsynonymous differences from human (fig. 1). This pattern is easily explained
if in most of these cases the two rodents were identical at
nonsynonymous sites because the nonsynonymous substitutions occurred in the rodent lineage prior to the mouse-rat
divergence.
In order to examine the pattern of substitutions occurring in the rodent lineage after the mouse-rat divergence, we
examined codons at which mouse, rat, or both rodents had
the same codon as human. In addition, in order to ensure
that all codons analyzed were identical as regards the opportunity for synonymous and nonsynonymous substitution,
we examined only codons at which all three species had
codons belonging to fourfold degenerate codon sets (table
6). The results showed that, in both mouse and rat, one or
two nonsynonymous differences from human were
observed more frequently than expected in codons having
a synonymous difference from human (table 6). By contrast, in both mouse and rat, one or two nonsynonymous
differences from human were observed less frequently than
expected in codons lacking a synonymous difference from
human (table 6).
In mouse, there were over four times as many codons
with one synonymous difference and one nonsynonymous
difference (1,738 codons) as there were codons with two
nonsynonymous differences (423 codons) (table 6). Likewise, in rat, there were four times as many codons with
one synonymous difference and one nonsynonymous difference (2,665 codons) as there were codons with two nonsynonymous differences (661 codons) (table 6). Furthermore,
in mouse there were over twice as many codons with one
synonymous and one nonsynonymous difference (1,738)
as there were codons with two nonsynonymous differences
(762), including in the latter total both codons with no synonymous differences and those with one synonymous difference (table 6). A similar pattern was seen in rat, where the
number of codons with one synonymous and one nonsynonymous difference (2,665) was nearly twice that of all codons
with two nonsynonymous differences (1,404) (table 6).
Table 4
Observed (O) and Expected (E) Numbers of Codons with Integral Numbers of Synonymous Differences Between
Mouse and Human and Between Rat and Humana
Mouse-Human
Rat-human
0
1
2
Totals (frequency)
a
2
0
1
2
Totals (frequency)
O: 2,227,369, E: 1,726,765
O: 197,411, E: 691,014
O: 618, E: 7,619
O: 183,696, E: 677,396
O: 764,490, E: 271,079
O: 3,278, E: 2,989
O: 480, E: 7,384
O: 3,147, E: 2,955
O: 6,744, E: 33
2,411,545, (0.7120)
965,048, (0.2849)
10,640, (0.0031)
2,425,398 (0.7160)
951,464 (0.2809)
10,371 (0.0031)
3,387,233
6
Test of independence: v 5 3.15 3 10 ; 4 df; P , 0.001.
1288 Friedman and Hughes
Table 5
Observed (O) and Expected (E) Numbers of Codons with Integral Numbers of Nonsynonymous Differences Between
Mouse and Human and Between Rat and Humana
Mouse-Human
Rat-human
Totals (frequency)
a
0
1
2
3
0
1
2
3
O: 2,891,466, E: 2,588,444
O: 64,507, E: 315,634
O: 10,664, E: 62,100
O: 277, E: 736
2,966,914 (0.8759)
O: 57,861, E: 309,306
O: 284,735, E: 37,717
O: 11,879, E: 7,420
O: 56, E: 88
354,531 (0.1047)
O: 5,703, E: 56,799
O: 11,062, E: 6,926
O: 48,288, E: 1,362
O: 51, E: 16
65,104 (0.0192)
O: 113, E: 597
O: 48, E: 73
O: 67, E: 14
O: 456, E: 0
684 (0.0002)
Totals (frequency)
2,955,143,
360,352,
70,898,
840,
3,387,233
(0.8724)
(0.1064)
(0.0209)
(0.0002)
Test of independence (categories 2 and 3 pooled because of low expectation in category 3): v2 5 1.89 3 106; 4 df; P , 0.001.
Thus, in both rodent species, codons with two nonsynonymous differences from human were considerably less frequent than expected under random substitution.
Overall, many more synonymous than nonsynonymous differences were observed at the fourfold degenerate
codons where at least one of the two rodent species was identical to human. In codons identical between rat and human,
72,852 synonymous differences were observed between
mouse and human, compared with 11,222 nonsynonymous
differences, for a ratio of 6.5:1. In codons identical between
mouse and human, 78,283 synonymous differences were
observed between mouse and human, compared with
14,034 nonsynonymous differences, for a ratio of 5.6:1.
FIG. 1.—Mean number of synonymous differences per synonymous
site (pS) and mean number of nonsynonymous differences per nonsynonymous site (pN) between mouse and rat at 347,817 individual codons categorized on the basis of the pattern of nucleotide differences between the
two rodent species and human. The patterns of difference from human are
as follows: 1:1 5 one difference in each species; 1:2 5 one difference in
one species and two differences in the other; 1:3 5 one difference in one
species and three differences in the other; 2:2 5 two differences in each
species; 2:3 5 two differences in one species and three in the other; 3:3 5
three differences in each species. Any codon at which either species did not
differ from human was not included in the analysis. Both mean pS and
mean pN differed significantly among categories (one-way analysis of variance; P , 0.001 in each case).
Because each fourfold degenerate codon includes twice as
many nonsynonymous sites as synonymous sites, the
expected ratio in the absence of selection would be 0.5:1.
The observed ratios of synonymous to nonsynonymous differences were thus more than an order of magnitude greater
than expected under random substitution.
Discussion
A detailed analysis of patterns of nucleotide difference
at individual codons revealed no support for the hypothesis
of Bazykin et al. (2004) that multiple nonsynonymous substitutions per codon have occurred in the rat and mouse lineages to a greater extent than expected by chance. When
expected frequencies were calculated based on the occurrence of patterns of nucleotide differences between each
rodent species and human, codons with multiple nonsynonymous substitutions in one rodent and none in the other
were found to have occurred at a much lower frequency that
expected by chance. Codons with multiple substitutions in
both lineages occurred much more frequently than expected
by chance. This obviously reflects the shared evolutionary
history of rat and mouse; many of these substitutions no
doubt arose after the rodent-primate divergence but before
the mouse-rat divergence.
In order to examine a set of differences that occurred
independently in the rat and mouse lineages after the
mouse-rat divergence, we considered fourfold degenerate
codons at which either mouse or rat or both had the same
codon as human. When one of the two rodent species had
the same codon as human, this codon was highly likely to
have been present in the ancestor of mouse and rat. Thus,
changes to these codons represent changes that occurred in
one of the two rodent lineages after the mouse-rat divergence. In both rodent species, these codons included about
four times as many codons with one synonymous difference
and one nonsynonymous difference as there were codons
with two nonsynonymous differences. This result does
not support the hypothesis of an excess of codons with
two nonsynonymous differences in either of the two rodent
lineages. Rather, because codons with two nonsynonymous
differences occur less frequently than expected under random substitution, these results support the hypothesis that
purifying selection has acted to eliminate many nonsynonymous mutations at codons that had previously undergone a
nonsynonymous substitution in one of the rodent lineages.
The reason for the different conclusion of Bazykin et al.
(2004) was apparently the reasoning those authors used to
Nucleotide difference at individual codons in rodents and human 1289
Table 6
Observed (O) and Expected (E) Numbers of Synonymous and Nonsynonymous Differences Between Each Rodent
Species and Human at Fourfold Degenerate Codons where the Other Rodent Species has the Same Codon as Human
Nonsynonymous Differences
0
1
Codons identical between rat and human (N 5 825,355)a
Mouse-human
0
1
Synonymous Differences
O: 744,120, E: 742,966
O: 70,775, E: 71,929
O: 7,960, E: 8,842
O: 1,738, E: 856
O: 423, E: 695
O: 339, E: 67
Codons identical between mouse and human (N 5 831,625)b
Rat-human
0
1
O: 744,120, E: 741,901
O: 74,875, E: 77,094
O: 8,561, E: 10,169
O: 2,665, E: 1,057
O: 661, E: 1,272
O: 743, E: 132
a
b
2
Test of independence: v2 5 2.22 3 104; 2 df; P , 0.001.
Test of independence: v2 5 5.89 3 104; 2 df; P , 0.001.
calculate the expected frequencies of codons with two or
more nonsynonymous differences in one rodent lineage
and none in the other. When these expected frequencies
are calculated from marginal frequencies of contingency
tables, as was done here, it is clear that there is no excess
of such codons.
Our analyses showed that, in comparisons between the
two rodent species and human, the numbers of nucleotide
differences between homologous codons, as estimated by
the method of Nei and Gojobori (1986), were integral in
a large majority of cases (tables 2 and 3). The method of
Nei and Gojobori (1986) provides integral counts of numbers of synonymous and nonsynonymous differences when
there is no ambiguity among possible evolutionary pathways; fractional values result from averaging among possible pathways. Available methods of estimating synonymous
and nonsynonymous substitution incorporate different
approaches to the problem of multiple possible evolutionary pathways (Nei and Kumar 2000, pp. 51–71). In assessing the potential impact of these different approaches, it is
of interest that codons with multiple possible pathways represented only a small fraction (less than 2%) of codons in
comparisons between orthologous sequences from mammals belonging to different placental orders.
A surprising result of our analyses was the finding of a
consistently higher degree of nucleotide difference between
rat and human than between mouse and human. Many studies have addressed the hypothesis that the rate of molecular
evolution has accelerated in rodents, particularly murid
rodents, in comparison to other placental mammals (Wu
and Li 1985; Gu and Li 1992; Easteal, Collet, and Betty
1995; Li et al. 1996), but so far little attention has been paid
to the possibility of evolutionary rate differences among the
murid rodents themselves. Further analysis of the apparent
evolutionary rate difference between rat and mouse may
help in deciding among proposed mechanisms for evolutionary rate differences among species.
Bazykin, G. A., F. A. Kondrashov, A. Y. Ogurtsov, S. Sunyaev,
and A. S. Kondrashov. 2004. Positive selection at sites of multiple amino acid replacements since rat-mouse divergence.
Nature 429:558–562.
Easteal, S., C. C. Collet, and D. J. Betty. 1995. The mammalian
molecular clock. Springer and Landes, Austin, Tex.
Gu, X., and W.-H. Li. 1992. Higher rates of amino acid substitution in rodents than in humans. Mol. Phylogenet. Evol. 1:
211–214.
Hughes, A. L. 1999. Adaptive evolution of genes and genomes.
Oxford University Press, New York.
Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at MHC class I loci reveals overdominant selection. Nature
335:167–170.
Kimura, M. 1977. Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature
267:275–276.
Li, W.-H., D. L. Ellsworth, J. Krushkal, B. H.-J. Chang, and
D. Hewett-Emmett.1996. Rates of nucleotide substitution in
primates and rodents and the generation-time effect hypothesis.
Mol. Phyloget. Evol. 5:182–187.
Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the
number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426.
Nei, M., and S. Kumar. 2000. Molecular evolution and phylogenetics. Oxford University Press, New York.
Suzuki, Y., and M. Nei. 2004. False-positive selection identified
by ML-based methods: examples from the Sig1 gene of the
diatom Thalassiosira weissflogii and the tax gene of a human
T-cell lymphotropic virus. Mol. Biol. Evol. 21:914–921.
Wu, C. I., and W. H. Li. 1985. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc. Natl. Acad. Sci.
USA 82:1741–1745.
Zhang, J. 2004. Frequent false detection of positive selection by
the likelihood method with branch-site models. Mol. Biol.
Evol. 21:1332–1339.
Acknowledgments
William Martin, Associate Editor
This research was supported by grant GM43940 from
the National Institutes of Health.
Accepted February 8, 2005
Literature Cited
Related documents