Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Pattern of Nucleotide Difference at Individual Codons Among Mouse, Rat, and Human Robert Friedman and Austin L. Hughes Department of Biological Sciences, University of South Carolina The patterns of nucleotide difference were compared at 3,473,111 codons from 9,390 aligned orthologous genes of mouse (Mus musculus), rat (Rattus norvegicus), and human (Homo sapiens). The results showed evidence of a higher frequency of both synonymous and nonsynonymous differences from human in the rat than in the mouse. However, contrary to a previous report, there was no evidence of a greater frequency of codons with multiple nonsynonymous substitutions between the two rodent species than expected under random substitution. Introduction All evolutionary changes initially arise as mutations to the sequence of DNA, which then become fixed in populations as a result of processes such as genetic drift and natural selection. Thus, an understanding of the mechanisms by which sequence differences accumulate between homologous sequences is necessary for an understanding of the evolutionary process (Nei 1987). Protein-coding regions are particularly interesting for a comparative study of related sequences. Because of the redundant nature of the genetic code, the pattern of synonymous and nonsynonymous (amino acid-altering) nucleotide change in protein-coding regions is believed to be particularly informative regarding past natural selection (Hughes 1999), including both purifying selection acting to eliminate selectively disadvantageous mutants (Kimura 1977) and positive selection favoring advantageous mutants (Hughes and Nei 1988). Nonetheless, the patterns of coding sequence evolution are complex and remain poorly understood. The availability of completely sequenced genomes from a number of organisms holds promise for increasing our understanding of sequence evolution because of the statistical power available from exploiting large numbers of orthologous gene comparisons. In a recent paper exploiting complete mammalian genomic sequences, Bazykin et al. (2004) reconstructed the pattern of nonsynonymous (amino acid-altering) nucleotide change at a large number of individual codons in rat and mouse genes. They examined 28,196 codons at which rat and mouse differed from each other at two nucleotide sites and 1,982 codons showing three differences. In codons with two nonsynonymous nucleotide differences, Bazykin et al. (2004) reported that two nonsynonymous differences occurred in the same lineage 64% of the time, whereas they argued that this should occur only 50% of the time if mutations were independent. Similarly, in codons with three nonsynonymous differences, they found that all three occurred in the same lineage 46% of the time, whereas they argued that this should occur only 25% of the time if mutations were independent. These authors argued that the excess of multiple nonsynonymous mutations in the same lineage is an evidence of positive Key words: nucleotide substitution, positive selection, purifying selection. E-mail: austin@biol.sc.edu. Mol. Biol. Evol. 22(5):1285–1289. 2005 doi:10.1093/molbev/msi113 Advance Access publication February 16, 2005 Ó The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oupjournals.org selection favoring rapid successive amino acid replacements at the same site. An attractive aspect of the study of Bazykin et al. (2004) was that the authors did not assume any model of sequence evolution, but rather based their argument on simple probability. Although numerous recent studies have relied on complex models of coding sequence evolution, some recent evidence (e.g., Suzuki and Nei 2004; Zhang 2004) suggests that model-based approaches can be misleading. In the absence of a thorough understanding of the process of sequence evolution, it is important to use model-free approaches that can increase our understanding of sequence evolution without making untested assumptions. As a result of such investigations, it will be possible in the future to develop more realistic models than those that are currently available. However, the probability calculations on which the argument of Bazykin et al. (2004) is based are questionable. Define Pr(x) as the probability of x substitutions in the rat lineage and Pm(x) as the probability of x substitutions in the mouse lineage. The probability of one substitution in each lineage is Pr(1)Pm(1). The probability that there will be two substitutions in one lineage and zero in the other is Pr(2)Pm(0) 1 Pr(0)Pm(2). The expectation of Bazykin et al. (2004) that when there are two nonsynonymous differences between the two species, there will be one substitution in each lineage in 50% of cases will be true only if Prð1ÞPmð1Þ 5 Prð2ÞPmð0Þ 1 Prð0ÞPmð2Þ: ð1Þ But it unclear how often the latter relationship holds in practice. A similar logic applies in the case of three-substitution codons. The expectation of Bazykin et al. (2004) that, under random substitution, 25% of such cases will show three substitutions in one lineage requires that Prð3ÞPmð0Þ 1 Pmð3ÞPrð0Þ 5 1=3½Pmð2ÞPrð1Þ 1 Pmð1ÞPrð2Þ: ð2Þ Again it is uncertain whether this assumption is met in real data. Here we further analyze the pattern of nucleotide difference in coding regions between orthologous genes from three complete mammalian genomes, mouse, rat, and human. Using the same data set as Bazykin et al. (2004), we examine the patterns of difference among species in order to test whether multiple substitutions occur in the 1286 Friedman and Hughes Table 1 Mean Proportions of Synonymous ( pS) and Nonsynonymous (pN) Substitutions per Site 6 standard error (SE) at 3,353,142 Homologous Codons in Comparisons Among Mouse, Rat, and Human Comparison Mouse-rat Mouse-human Rat-human pS 6 SE pN 6 SE 0.2220 6 0.0004 0.4684 6 0.0005 0.4758 6 0.0005a 0.0345 6 0.0001 0.0794 6 0.0001 0.0832 6 0.0001a a Significantly different from corresponding value for mouse-human comparison by paired t-test (two-tailed P , 0.001). same codon to a greater extent than expected by chance. We use simple methods that do not depend on any model of nucleotide substitution, but rather on comparative analysis of patterns of nucleotide difference. Methods Using the data set of Bazykin et al. (2004), we compared 3,473,111 codons from 9,390 aligned orthologous genes of mouse (Mus musculus), rat (Rattus norvegicus), and human (Homo sapiens). For each codon, we computed the numbers of synonymous and nonsynonymous nucleotide differences, the proportion of synonymous differences per synonymous site (pS), and the proportion of nonsynonymous differences per nonsynonymous site (pN) by the method of Nei and Gojobori (1986). Because we computed pS and pN for individual codons, we excluded from calculations of mean pS and pN any codon at which any of the three species had an ATG and a TGG codon because these codons include no synonymous sites and thus pS is undefined. No correction formula for multiple hits was applied because these are inapplicable in the case of individual codons. In cases where there are two or more nucleotide differences between homologous codons, the method of Nei and Gojobori (1986) averages across possible pathways in counting numbers of synonymous and nonsynonymous differences. In averaging across pathways, no pathway including a stop codon is considered. When the estimates derived from averaging across pathways are integers, it implies that the possible pathways did not differ regarding the numbers of synonymous and nonsynonymous differences. Therefore, in counting differences among codons we used only cases where the numbers of synonymous and nonsynonymous differences were integral. In comparing synonymous and nonsynonymous differences within codons, we used codon positions at which mouse, rat, and human all possessed fourfold degenerate codons. There were 1,351,695 such codons in our data set, or 38.9% of all codons. At fourfold degenerate codons, all mutations at the third position are synonymous, while all mutations at the first or second position are nonsynonymous. Thus, the opportunities for synonymous and nonsynonymous substitution are the same at each fourfold degenerate codon. Results The mean values of pS and pN were computed for 3,353,142 homologous codons shared among mouse, rat, Table 2 Numbers of Codons Having Different Numbers of Synonymous Differencesa Between Mouse and Human and Between Rat and Human Number of Synonymous Differences Mouse-Human Rat-Human 0 0.167 0.250 0.333 0.500 0.667 0.750 0.833 1 1.167 1.333 1.500 1.750 2 2,440,588 1,603 783 1,933 43,758 4,450 1,552 4,939 960,376 598 103 1,861 32 10,535 2,422,106 1,802 1,013 2,235 47,341 5,106 1,838 5,537 972,499 676 114 2,013 37 10,794 Total Integral values (% total) All nonzero Integral nonzero values (% all nonzero) 3,473,111 3,411,499 (98.2%) 1,032,523 3,473,111 3,405,399 (98.1%) 1,051,005 970,911 (94.0%) 983,293 (93.6%) a The method of Nei and Gojobori (1986). and human (excluding codons at which any of the three species had an ATG and a TGG codon). Both mean pS and mean pN were significantly higher in the comparison between rat and human than in the comparison between mouse and human (table 1). Because human is an out-group to mouse and rat, this result means that rat has evolved faster at both synonymous and nonsynonymous sites since the last common ancestor of mouse and rat. When the estimated numbers of synonymous differences between mouse and human and between rat and human were tabulated for all codons, similar patterns were observed in the two rodent species (table 2). In each species, over 98% of differences were integral in value (including zero) (table 2). Even when codons with no synonymous differences were excluded, about 94% of estimated synonymous differences were integral in both species (table 2). Similarly, over 98% of nonsynonymous substitutions between mouse and human and between rat and human showed integral values (including zero) (table 3). When codons with no nonsynonymous differences were excluded, about 87% of estimated nonsynonymous differences were integral in both species (table 3). Integral values of the numbers of synonymous and nonsynonymous differences imply the absence of conflict among possible mutational pathways. Therefore, we compared the possible combinations of integral synonymous (table 4) and nonsynonymous (table 5) differences between mouse and human and between rat and human. Assuming that mutations are independent in mouse and rat, the expected numbers of each possible combination can be calculated by multiplying marginal frequencies of the contingency table (tables 4 and 5). At both synonymous and nonsynonymous sites, the results showed a highly significant deviation from expectations (tables 4 and 5). However, this did not occur because of greater than expected frequencies of codons with two or Nucleotide difference at individual codons in rodents and human 1287 Table 3 Numbers of Codons Having Different Numbers of Nonsynonymous Differencesa Between Mouse and Human and Between Rat and Human Number of Nonsynonymous Differences Mouse-Human Rat-Human 0 1 1.250 1.500 1.667 1.833 2 2.167 2.250 2.333 2.500 2.667 2.750 2.833 3 2,979,314 362,828 32 35,953 103 598 68,592 4,939 1,552 4,450 9,666 1,933 783 1,603 765 2,962,139 367,973 37 37,963 114 676 74,377 5,537 1,838 5,106 11,391 2,235 1,013 1,802 910 Total Integral values (% total) All nonzero Integral nonzero values (% all nonzero) 3,473,111 3,411,499 (98.2%) 493,797 3,473,111 3,405,399 (98.1%) 510,972 432,185 (87.5%) 443,260 (86.7%) a The method of Nei and Gojobori (1986). three differences in one species and none in the other. Rather, for both synonymous and nonsynonymous differences, codons with two or more differences in one species and zero in the other occurred much less frequently than the random expectation (tables 4 and 5). Codons with two nonsynonymous differences in one species and none in the other species occurred only about 13% as often as expected (table 5). Similarly, codons with three nonsynonymous differences in one species and none in the other occurred only about 26% as often as expected (table 5). On the other hand, in the cases of both synonymous and nonsynonymous differences, codons with one or more differences in both species occurred much more frequently than expected (tables 4 and 5). An obvious explanation for the latter pattern was that codons in which both mouse and rat showed one or more differences from human were those at which substitutions occurred after the primate-rodent divergence but before the rat-mouse divergence. We tested this interpretation in the case of nonsynonymous substitutions by computing mean pS and pN between mouse and rat for codons showing at least one nonsynonymous difference between human and mouse for each species (fig. 1). ATG and TGG codons were excluded because pS is undefined at these codons. There were highly significant differences in both mean pS and mean pN between codons with different patterns of nonsynonymous difference between the two rodents and human (fig. 1). However, it was striking that mean pN between mouse and rat was near zero for every case where the two rodents had an equal number of nonsynonymous differences from human (fig. 1). This pattern is easily explained if in most of these cases the two rodents were identical at nonsynonymous sites because the nonsynonymous substitutions occurred in the rodent lineage prior to the mouse-rat divergence. In order to examine the pattern of substitutions occurring in the rodent lineage after the mouse-rat divergence, we examined codons at which mouse, rat, or both rodents had the same codon as human. In addition, in order to ensure that all codons analyzed were identical as regards the opportunity for synonymous and nonsynonymous substitution, we examined only codons at which all three species had codons belonging to fourfold degenerate codon sets (table 6). The results showed that, in both mouse and rat, one or two nonsynonymous differences from human were observed more frequently than expected in codons having a synonymous difference from human (table 6). By contrast, in both mouse and rat, one or two nonsynonymous differences from human were observed less frequently than expected in codons lacking a synonymous difference from human (table 6). In mouse, there were over four times as many codons with one synonymous difference and one nonsynonymous difference (1,738 codons) as there were codons with two nonsynonymous differences (423 codons) (table 6). Likewise, in rat, there were four times as many codons with one synonymous difference and one nonsynonymous difference (2,665 codons) as there were codons with two nonsynonymous differences (661 codons) (table 6). Furthermore, in mouse there were over twice as many codons with one synonymous and one nonsynonymous difference (1,738) as there were codons with two nonsynonymous differences (762), including in the latter total both codons with no synonymous differences and those with one synonymous difference (table 6). A similar pattern was seen in rat, where the number of codons with one synonymous and one nonsynonymous difference (2,665) was nearly twice that of all codons with two nonsynonymous differences (1,404) (table 6). Table 4 Observed (O) and Expected (E) Numbers of Codons with Integral Numbers of Synonymous Differences Between Mouse and Human and Between Rat and Humana Mouse-Human Rat-human 0 1 2 Totals (frequency) a 2 0 1 2 Totals (frequency) O: 2,227,369, E: 1,726,765 O: 197,411, E: 691,014 O: 618, E: 7,619 O: 183,696, E: 677,396 O: 764,490, E: 271,079 O: 3,278, E: 2,989 O: 480, E: 7,384 O: 3,147, E: 2,955 O: 6,744, E: 33 2,411,545, (0.7120) 965,048, (0.2849) 10,640, (0.0031) 2,425,398 (0.7160) 951,464 (0.2809) 10,371 (0.0031) 3,387,233 6 Test of independence: v 5 3.15 3 10 ; 4 df; P , 0.001. 1288 Friedman and Hughes Table 5 Observed (O) and Expected (E) Numbers of Codons with Integral Numbers of Nonsynonymous Differences Between Mouse and Human and Between Rat and Humana Mouse-Human Rat-human Totals (frequency) a 0 1 2 3 0 1 2 3 O: 2,891,466, E: 2,588,444 O: 64,507, E: 315,634 O: 10,664, E: 62,100 O: 277, E: 736 2,966,914 (0.8759) O: 57,861, E: 309,306 O: 284,735, E: 37,717 O: 11,879, E: 7,420 O: 56, E: 88 354,531 (0.1047) O: 5,703, E: 56,799 O: 11,062, E: 6,926 O: 48,288, E: 1,362 O: 51, E: 16 65,104 (0.0192) O: 113, E: 597 O: 48, E: 73 O: 67, E: 14 O: 456, E: 0 684 (0.0002) Totals (frequency) 2,955,143, 360,352, 70,898, 840, 3,387,233 (0.8724) (0.1064) (0.0209) (0.0002) Test of independence (categories 2 and 3 pooled because of low expectation in category 3): v2 5 1.89 3 106; 4 df; P , 0.001. Thus, in both rodent species, codons with two nonsynonymous differences from human were considerably less frequent than expected under random substitution. Overall, many more synonymous than nonsynonymous differences were observed at the fourfold degenerate codons where at least one of the two rodent species was identical to human. In codons identical between rat and human, 72,852 synonymous differences were observed between mouse and human, compared with 11,222 nonsynonymous differences, for a ratio of 6.5:1. In codons identical between mouse and human, 78,283 synonymous differences were observed between mouse and human, compared with 14,034 nonsynonymous differences, for a ratio of 5.6:1. FIG. 1.—Mean number of synonymous differences per synonymous site (pS) and mean number of nonsynonymous differences per nonsynonymous site (pN) between mouse and rat at 347,817 individual codons categorized on the basis of the pattern of nucleotide differences between the two rodent species and human. The patterns of difference from human are as follows: 1:1 5 one difference in each species; 1:2 5 one difference in one species and two differences in the other; 1:3 5 one difference in one species and three differences in the other; 2:2 5 two differences in each species; 2:3 5 two differences in one species and three in the other; 3:3 5 three differences in each species. Any codon at which either species did not differ from human was not included in the analysis. Both mean pS and mean pN differed significantly among categories (one-way analysis of variance; P , 0.001 in each case). Because each fourfold degenerate codon includes twice as many nonsynonymous sites as synonymous sites, the expected ratio in the absence of selection would be 0.5:1. The observed ratios of synonymous to nonsynonymous differences were thus more than an order of magnitude greater than expected under random substitution. Discussion A detailed analysis of patterns of nucleotide difference at individual codons revealed no support for the hypothesis of Bazykin et al. (2004) that multiple nonsynonymous substitutions per codon have occurred in the rat and mouse lineages to a greater extent than expected by chance. When expected frequencies were calculated based on the occurrence of patterns of nucleotide differences between each rodent species and human, codons with multiple nonsynonymous substitutions in one rodent and none in the other were found to have occurred at a much lower frequency that expected by chance. Codons with multiple substitutions in both lineages occurred much more frequently than expected by chance. This obviously reflects the shared evolutionary history of rat and mouse; many of these substitutions no doubt arose after the rodent-primate divergence but before the mouse-rat divergence. In order to examine a set of differences that occurred independently in the rat and mouse lineages after the mouse-rat divergence, we considered fourfold degenerate codons at which either mouse or rat or both had the same codon as human. When one of the two rodent species had the same codon as human, this codon was highly likely to have been present in the ancestor of mouse and rat. Thus, changes to these codons represent changes that occurred in one of the two rodent lineages after the mouse-rat divergence. In both rodent species, these codons included about four times as many codons with one synonymous difference and one nonsynonymous difference as there were codons with two nonsynonymous differences. This result does not support the hypothesis of an excess of codons with two nonsynonymous differences in either of the two rodent lineages. Rather, because codons with two nonsynonymous differences occur less frequently than expected under random substitution, these results support the hypothesis that purifying selection has acted to eliminate many nonsynonymous mutations at codons that had previously undergone a nonsynonymous substitution in one of the rodent lineages. The reason for the different conclusion of Bazykin et al. (2004) was apparently the reasoning those authors used to Nucleotide difference at individual codons in rodents and human 1289 Table 6 Observed (O) and Expected (E) Numbers of Synonymous and Nonsynonymous Differences Between Each Rodent Species and Human at Fourfold Degenerate Codons where the Other Rodent Species has the Same Codon as Human Nonsynonymous Differences 0 1 Codons identical between rat and human (N 5 825,355)a Mouse-human 0 1 Synonymous Differences O: 744,120, E: 742,966 O: 70,775, E: 71,929 O: 7,960, E: 8,842 O: 1,738, E: 856 O: 423, E: 695 O: 339, E: 67 Codons identical between mouse and human (N 5 831,625)b Rat-human 0 1 O: 744,120, E: 741,901 O: 74,875, E: 77,094 O: 8,561, E: 10,169 O: 2,665, E: 1,057 O: 661, E: 1,272 O: 743, E: 132 a b 2 Test of independence: v2 5 2.22 3 104; 2 df; P , 0.001. Test of independence: v2 5 5.89 3 104; 2 df; P , 0.001. calculate the expected frequencies of codons with two or more nonsynonymous differences in one rodent lineage and none in the other. When these expected frequencies are calculated from marginal frequencies of contingency tables, as was done here, it is clear that there is no excess of such codons. Our analyses showed that, in comparisons between the two rodent species and human, the numbers of nucleotide differences between homologous codons, as estimated by the method of Nei and Gojobori (1986), were integral in a large majority of cases (tables 2 and 3). The method of Nei and Gojobori (1986) provides integral counts of numbers of synonymous and nonsynonymous differences when there is no ambiguity among possible evolutionary pathways; fractional values result from averaging among possible pathways. Available methods of estimating synonymous and nonsynonymous substitution incorporate different approaches to the problem of multiple possible evolutionary pathways (Nei and Kumar 2000, pp. 51–71). In assessing the potential impact of these different approaches, it is of interest that codons with multiple possible pathways represented only a small fraction (less than 2%) of codons in comparisons between orthologous sequences from mammals belonging to different placental orders. A surprising result of our analyses was the finding of a consistently higher degree of nucleotide difference between rat and human than between mouse and human. Many studies have addressed the hypothesis that the rate of molecular evolution has accelerated in rodents, particularly murid rodents, in comparison to other placental mammals (Wu and Li 1985; Gu and Li 1992; Easteal, Collet, and Betty 1995; Li et al. 1996), but so far little attention has been paid to the possibility of evolutionary rate differences among the murid rodents themselves. Further analysis of the apparent evolutionary rate difference between rat and mouse may help in deciding among proposed mechanisms for evolutionary rate differences among species. Bazykin, G. A., F. A. Kondrashov, A. Y. Ogurtsov, S. Sunyaev, and A. S. Kondrashov. 2004. Positive selection at sites of multiple amino acid replacements since rat-mouse divergence. Nature 429:558–562. Easteal, S., C. C. Collet, and D. J. Betty. 1995. The mammalian molecular clock. Springer and Landes, Austin, Tex. Gu, X., and W.-H. Li. 1992. Higher rates of amino acid substitution in rodents than in humans. Mol. Phylogenet. Evol. 1: 211–214. Hughes, A. L. 1999. Adaptive evolution of genes and genomes. Oxford University Press, New York. Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at MHC class I loci reveals overdominant selection. Nature 335:167–170. Kimura, M. 1977. Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature 267:275–276. Li, W.-H., D. L. Ellsworth, J. Krushkal, B. H.-J. Chang, and D. Hewett-Emmett.1996. Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol. Phyloget. Evol. 5:182–187. Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York. Nei, M., and T. Gojobori. 1986. Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426. Nei, M., and S. Kumar. 2000. Molecular evolution and phylogenetics. Oxford University Press, New York. Suzuki, Y., and M. Nei. 2004. False-positive selection identified by ML-based methods: examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus. Mol. Biol. Evol. 21:914–921. Wu, C. I., and W. H. Li. 1985. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc. Natl. Acad. Sci. USA 82:1741–1745. Zhang, J. 2004. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21:1332–1339. Acknowledgments William Martin, Associate Editor This research was supported by grant GM43940 from the National Institutes of Health. Accepted February 8, 2005 Literature Cited