Download poster

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
LOCAL VARIATION IN POLYMORPHISM RATES AND SIGNALS FOR RECENT SELECTION IN NONCODING DNA OF HOMINOIDS
Belinda Giardine, Kuan-Bei Chen, Robert Harris, Aakrosh Ratan, Webb Miller, Stephan C. Schuster, Vanessa Hayes, Francesca Chiaromonte, Ross C. Hardison
Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences, The Pennsylvania State University, University Park, Pennsylvania
http://main.genome-browser.bx.psu.edu
The SNP rate varies locally.
Comparison of interspecies divergence and within-species polymorphism can
be used as a test for non-neutral evolution (McDonald-Kreitman test). As
input data, we use alignments of the human reference sequence with chimp
and orangutan, and SNPs from personal genomes (PGs). Ancestral repeats
(AR) are the proxy for neutral DNA.
Divergence and SNP counts after masking coding exons and CpGs:
Number of windows
The graph is a histogram of SNP density in 10kb windows.
Genome graphs showing SNP density along chr2.
YH is an male Chinese individual.
NA19240 is a Yoruban daughter from the
1000 Genomes Project.
SNP count per 10kb window
Total
Divergence from chimp
28,729,623
Divergence from orangutan 78,673,940
Polymorphic sites in PGs
8,725,848
AR
12,540,717
35,676,824
3,895,705
not AR
16,188,906
42,997,116
4,830,143
The combined SNP rate explains
83% of the local variability of the
SNPs along the genome.
average
Venter
YH
NA19240
This supports the use of the SNP
rate from the combined personal
genomes.
McDonald-Kreitman test for non-neutral evolution in noncoding
DNA, using Ancestral Repeats as a neutral reference (MKAR)
LCT gene from example below
A peak found in all three individuals and another found
in YH and NA19240, but missing in Venter.
The MKAR positive selection hits (purple) are enriched for segmental
duplications (blue) (p-value 1.34E-284).
12-16 Myr
Divergence
(from PCA; correlation between first principal
component and combined SNP rate = 0.999)
4-6 Myr
Polymorphisms
Orangutan
Chimp
Human
10 kb
sliding windows
This example region has the lactase gene. There are 2
regulatory SNPs (in red) annotated in the ORegAnno
track that affect lactase activity, one of which is found in
a region we identify as being under positive selection.
Divergence
ancestral repeat
exon (masked)
AR
AR
AR
Window Position
Scale
chr2:
SNP
UBXN4
SNP
OREG0014998
OREG0008730
divergence
TFBS Conserved
AR (neutral model)
non-AR
18
15
Numbers from
chr2 136,303,000- 136,313,000
7X Reg Potentia
78
276
p < 0.00001
LCT
Human Mar. 2006 chr2:136,255,979-136,329,658 (73,680 bp)
20 kb
136270000 136280000 136290000 136300000 136310000 136320000
RefSeq Genes
MCM6
Hits are defined as
significant after a
FDR correction
with a false positive
rate of 0.10.
1.43962 _
pg11 log2 r_pd
-2.3293 _
0.335579 _
pg11 -log10 FDR
0.00482785 _
1.70857 _
pg11 orang log2
-2.08616 _
1.32883 _
pg11 orang -log1
0.0101947 _
MKAR ARs
Regulatory elements from ORegAnno
OREG0014999
OREG0000096 OREG0000004
HMR Conserved Transcription Factor Binding Sites
ESPERR Regulatory Potential (7 Species)
Recent selection in non-coding regions
11 Personal Genomes log2 (r_pd) divergence from chimp
11 Personal Genomes -log10 (FDR) divergence from chimp
11 Personal Genomes log2 (r_pd) divergence from orangutan
11 Personal Genomes -log10 (FDR) divergence from orangutan
Ancestral repeats used for recent selection (MKAR)
Related documents