Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
LOCAL VARIATION IN POLYMORPHISM RATES AND SIGNALS FOR RECENT SELECTION IN NONCODING DNA OF HOMINOIDS Belinda Giardine, Kuan-Bei Chen, Robert Harris, Aakrosh Ratan, Webb Miller, Stephan C. Schuster, Vanessa Hayes, Francesca Chiaromonte, Ross C. Hardison Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences, The Pennsylvania State University, University Park, Pennsylvania http://main.genome-browser.bx.psu.edu The SNP rate varies locally. Comparison of interspecies divergence and within-species polymorphism can be used as a test for non-neutral evolution (McDonald-Kreitman test). As input data, we use alignments of the human reference sequence with chimp and orangutan, and SNPs from personal genomes (PGs). Ancestral repeats (AR) are the proxy for neutral DNA. Divergence and SNP counts after masking coding exons and CpGs: Number of windows The graph is a histogram of SNP density in 10kb windows. Genome graphs showing SNP density along chr2. YH is an male Chinese individual. NA19240 is a Yoruban daughter from the 1000 Genomes Project. SNP count per 10kb window Total Divergence from chimp 28,729,623 Divergence from orangutan 78,673,940 Polymorphic sites in PGs 8,725,848 AR 12,540,717 35,676,824 3,895,705 not AR 16,188,906 42,997,116 4,830,143 The combined SNP rate explains 83% of the local variability of the SNPs along the genome. average Venter YH NA19240 This supports the use of the SNP rate from the combined personal genomes. McDonald-Kreitman test for non-neutral evolution in noncoding DNA, using Ancestral Repeats as a neutral reference (MKAR) LCT gene from example below A peak found in all three individuals and another found in YH and NA19240, but missing in Venter. The MKAR positive selection hits (purple) are enriched for segmental duplications (blue) (p-value 1.34E-284). 12-16 Myr Divergence (from PCA; correlation between first principal component and combined SNP rate = 0.999) 4-6 Myr Polymorphisms Orangutan Chimp Human 10 kb sliding windows This example region has the lactase gene. There are 2 regulatory SNPs (in red) annotated in the ORegAnno track that affect lactase activity, one of which is found in a region we identify as being under positive selection. Divergence ancestral repeat exon (masked) AR AR AR Window Position Scale chr2: SNP UBXN4 SNP OREG0014998 OREG0008730 divergence TFBS Conserved AR (neutral model) non-AR 18 15 Numbers from chr2 136,303,000- 136,313,000 7X Reg Potentia 78 276 p < 0.00001 LCT Human Mar. 2006 chr2:136,255,979-136,329,658 (73,680 bp) 20 kb 136270000 136280000 136290000 136300000 136310000 136320000 RefSeq Genes MCM6 Hits are defined as significant after a FDR correction with a false positive rate of 0.10. 1.43962 _ pg11 log2 r_pd -2.3293 _ 0.335579 _ pg11 -log10 FDR 0.00482785 _ 1.70857 _ pg11 orang log2 -2.08616 _ 1.32883 _ pg11 orang -log1 0.0101947 _ MKAR ARs Regulatory elements from ORegAnno OREG0014999 OREG0000096 OREG0000004 HMR Conserved Transcription Factor Binding Sites ESPERR Regulatory Potential (7 Species) Recent selection in non-coding regions 11 Personal Genomes log2 (r_pd) divergence from chimp 11 Personal Genomes -log10 (FDR) divergence from chimp 11 Personal Genomes log2 (r_pd) divergence from orangutan 11 Personal Genomes -log10 (FDR) divergence from orangutan Ancestral repeats used for recent selection (MKAR)