Download Eukaryotic genomes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
AAAATTTTCGTATCTGTTGGAGTTAGATAAGCCTACGCTTGATGGACCGTTGGGTGGCTTTCTAAGTGAGCTCGTGCCATCACAATTAATATAAGGAATTGTAGATGTTTCTTTCGTTATAGGTATTTCAAAATA
ATTATAAGAACCTACGCCCTCGTCTTTCTCCATTGGAACAGTTGCCGTTTTCGCAGTTCTTTTTGGTTCAGTCCTCATATCATGTGATTCCCCTGGCTCTCCTGATCTTTTTATACTTACTTTGAAATCGTCATAT
GTGTATTTCTTTGATGCAACTCCGATAACGAAGACAATGCTTCCAATAATAACTAAGAATTTGCATACCGTTATTAAACCTACCAAAAGTTTACCTATAAGCTTCTGTAATATTGGCCCCATCATTGTTGTGAATA
CGCACCCTACCAAAAATGATGGGAAATCCAGCACAATACTGCCAGGCCCACTACCTATTGTAATTTTCCATCGTAACCAATCCCTTTTCAAATCCATCCGTGACTTCTATGTCTCGTTACTTTCACAGCGTGTG
GAGCTACTAGAAAAGTGGCAAAGCTAAACAGCTGATCGAAGTAAACAGAAAAGAACACTAATTGTAGATCAGGCTGTGTACTAGACCTTATTTTACTGTATTTTTTCGGAAAGAAAAAAGGAGCGCTTTGCAG
ATCGAAAGTTTCGCTCGTAAATTATTTGTAAGATGCTATTCATAATATGTTAACTGAGAGAAACCAGGTCAAAACAAAACAATTTTGGGCTCTTGCCTCCAAATTTGCCTACCCTAGAACAGGTATCCATTATCT
CGCCTGTACCCGATTAAAAAAAAGACCAATTATTTAAAACTTCTCAAGAAGTTTCATATGCAGTGTATAAGTTGAAGGAATATAGGAATATATATCCTTCAGAAAAGCAACACAATACCTAATTACATAACCGAT
ATTTACCTTTTAGAGTGCCTCATTCTTGCAATCTTTCTGTTCGCCATAACACCACCGCCCATGCTCATGCCATTATTTGTTCCCATCCCCATCTGATTAGGGGCTGACTGCGGCTGCCCAAAAGAAGTTGTCG
GCACACCACCTGCCCCCCCAAAAATGGATGATGGATTTGTTACGTTTGAATTGGAACCAGAGGCAGCATTCGCACCAAATATATCACTTGGCCTTAATGCATTGGTCGCGGTATTAGTAATTCCGCCATTCAA
TCCGCTAAAATTAATATTAGGAACTGTTGATGGCGTGAATGAGCTGTTTGTATTGAAAGATGGGGTTTGCGATTGATGTGGTTGGTTATTAGAGCCGGCAAACACCGTATTAGCATTAGTGTTGCCGTTCATAT
TAAATACAGAGCCGCCACCAGGCGTTGAATTATTTCCCGTAAAATTAAAAGCAGAGGGAACATTGACATTTTGTGCATTCGTGGAAGGAGGTTTATTGAATAAACCAGCATTAGCATTAGTTGACGAAGTTGC
TGATGTAAAAGGATTGAGACCCCCTGCACCATTGTTTCCAAAATTAAATGAAGATTGATTAGAAGCTGCACCAGTTGCTGCTGTTCCTGAGCTCGAAAAGCCAAATGCCGAGCCCGCTCCATTCGTATTGCCA
CTAGCGATACTTTGATCCGGTTTTCCTACGTTAAATGTACCCGCTATATTGGTTCCTGAGGTATTGGAAGTAGTAGTTGTGCCGTTACCAGTAGCAGGGGCGTTAAACGAGAATGATGTGGAGTTTGCTGAG
GCATTGGTACCATTGGTGTTAGCAGTACCGAATGAAAATGCAGATTTAGATGTTGTATTACCAGTAGCGTCTGTTGGCTTACCCAAGACAGGAATCGGCGTTGAGGAAGCAGAACCATCGAAGAAAGAAGTT
GGAGAGTTTGACTTTTCTTTATTGTGATTGAACTTTGTAAAGGAAAAGCCATTTGATAGCTTCTCCGTATTTGCTGCCGCTGTACTTGCTGTCGACTTCATCGACTCGGGAGCCCCAAAACTAAAAGATGGTTT
TGTGCTAGTGGTTGTTGTATTATTTGTTGTGGAACCGCCAAAGGTGAAAGATGGCGGTGTAGGTCTTTTATCTGTCTCATTAGCAGGAGGTTTAGTGAAAGAAAATGAGGTGTTAGAGCCTGGTGGTTCTTTA
GCTGCATCTGACTTACCAAATGAAAACAATGGCTTTGGTTGTGAGGTTTTTGAAGACGCAAACGTAAAGGAGGGCTTTTTAGGTTCCGAAACAACAGATGAATCTTTTTGAGCAGGTTCAGTAAAAGAAAAAG
TTGGCTTGAGAGTCTTATCATCCGTCGGTGCTTGAACATCAACAGGCTTGCCCGGAAACGAAAACGAGGGTTTAGCTGCTTCGTTTGAAATTGGACTACTCTTACGTTCCTCCTCTGACTTAGAGAAAGAGAA
TGTAGGTTTCGCACTTCCCTCAGAGATCTTATTTTCACTTGTTGACTGCCCAAAAGTAAAAGTAGGCTTCTTGACTATTGTGGCAGGTGTCTCAGATGGTTTGGTGTGTGTTTCTTTCGCGGTGGCGGCTTTA
CCAAAGGTAAATTGTGCAGAGGAGTCAATATTGCTTGTTACATCAGCTTTTTTTCCGAATGTAAATAATGGTGTACCTTCAGCTTGCTTATCACTTGCACCAAAGACAAAGCTTGGTTTCCCTGATGCGTCCTT
TTCTGACTCTCCCTTTTTGGTCTCCTTTTGATCACCGGTCTTGCCGAAATCGAATAAAGGCTTGGTGTTTGTATCCTCGCTAACAGGTAAACGCCTTTTTCTTTTGGGCTCATTTTCATCATCACCTTCATCACC
ATTCTCTTCTTGTTTACCAAAAGAAAATATTGGAGCAGTTGATTTTGGAGGCGCGTCTGATTCTGTATGATTTTCACTTTTTTCGGATGTCTTTCCAAATTTAAAAGGTTGACTGGCAGAAGTAACGGTATCTGA
TTTACCACCAAAATTGAATAAAGTTGTGGAAGGGACAGTATTGTCGACAGCCTTAGTTTTATTAGCCTTTTGGCTAAAATTGAATGATAAGGTAGGCGCCTCGGCAGTTTTCGTTGACTTATCGGTTTTTCCCA
TTTCTACACTCGATTTAAAGACTGCACCTGCAGAAGAAGTTGCCTTAGGAGAAGTTTTCTTAGATGGAGTCTCATTGTCCTTGATAAAGTCAAAACCTACGGTGGGCAAAACAATACTTTCTTTGTCCTTTTTG
GGCTCAATATTTTTCTTTAACGTAGGCGTACCAGAGCGCTCTGAATTGGGAACAAAGCTCTCCTGAATAGGGGCAGTTTTTGCAACTGTGGAAGCTGGTCCTTTTAAAAGTAGATTTTTTTGAGGATTCGATAA
CCTATTAGAGTTGATGTCTGCACGTAGGTCTTCAATTTCGCTTGTCAGGTTAGGGCCTGTAGCCAGATTGCCATTTGAAATACTACTCTTAATATTATTTCTATTCTCGCTTGTCTTCTGATCACCGCCAGCGTT
ACCTTCCTTATCCTTGTTATCCTTTTTTTGTATAGCGTCATATTCTGACAAATCATATTCAAAATTTGCTGACCACACGGTCCCCTTTGACTGACTATGAAACCTTTTTCTATTGGATCTATTTTTCAATGATTTGA
GAATGGGTAGTCCAACATTGGTGTCTTCACCGCTTTTTCCGGCCAACTGCCTAGTGCAAGAACCGTTTTTAATAGGGGAAGGAGTAGATGATGTGCATAGGTACGATCCTCCCTCATCGCTTTTACTTTGAGA
GCCCAATATAACCGACGATGTAATAGATGGAAATTCAGTTGATTGAATTAATCCAAGCTCACGCATATTTCTCACCCTCTGCTTCTCCCTTAATAACCTCAGTCTTTGAATGGGCAAAATTGGCAAAAGCGGCG
GTCTCTCAGTGTTTTCGGTTCCATATATTATAATTGGCGCATTATTGTTGTTCTGAGTTAAGCTGTCGCTATGCTGTGATGTACCGGACACCCTCTTTCTCTTATTAACATGCAGTGTGTCTTCAACATCTGATT
CCTCCAAATGATTCGCGTATGAGAGGTTTGAACTGAAAACTTTCTTGCTCGATGGCCGTTTTTTATTGGGGTTTGTGAAGAATGATTTTAAAGTGGAAGAAAACGATCTCTTTTCGACACGTGGAGAAGACATC
ACAGAAGAAGTGTTTGAAGACATGAATGACTAAAAATTGTCGCTCACTCTCTGTCCCTATAACCCTTTCGAGGCTAATATCCTATCGTATTTGCACCGCTACGTAGTGTCCTTATTGAGTTCCTCATCACTTATT
TTCTTTAAGTGTTTCTTGACATTACGAAATTTCGTCAAAGAAAAAAATTAAAATGAAAAAGCATTTCAATGTCACATAATACGAACCATTGATCACGTGCAACGACAAACCCTAAATATAAAAACTAGGGCGTAA
AAACCGGGGCTTGAAAATTAGGGCATAAAATAGGCTTTGCATACACGTGACTTATATTTGGTGTCGGCGTTTTCTTTACGCGGTGTAGTGTAAATCTCTTGTCGTACAAGTGGATATACGCACTGTATACCTC
CAGTAACACCAAAAAAAAAACCGTGGTTGTCCCATGTAAACGAGTACCGCACACGTAGGCCAAAGCACTCCAGAGAGACTTCGTGTCAAAGGTCTATAATAGGTGGTGCCTTCTTGCTTCTTTTTTGCAGATT
CTTAGTATAATACGCTAGACTATTGTACTTTCTAATTTTAAGAGATATCTTTTTCCTCACAAAGATTTCGTTAAGCAATCGAAGTAAAGTACTCCATCAGAAGAGTTTTTAAAATTTTCGTATCTGTTGGAGTTAG
ATAAGCCTACGCTTGATGGACCGTTGGGTGGCTTTCTAAGTGAGCTCGTGCCATCACAATTAATATAAGGAATTGTAGATGTTTCTTTCGTTATAGGTATTTCAAAATAATTATAAGAACCTACGCCCTCGTCTT
TCTCCATTGGAACAGTTGCCGTTTTCGCAGTTCTTTTTGGTTCAGTCCTCATATCATGTGATTCCCCTGGCTCTCCTGATCTTTTTATACTTACTTTGAAATCGTCATATGTGTATTTCTTTGATGCAACTCCGAT
AACGAAGACAATGCTTCCAATAATAACTAAGAATTTGCATACCGTTATTAAACCTACCAAAAGTTTACCTATAAGCTTCTGTAATATTGGCCCCATCATTGTTGTGAATACGCACCCTACCAAAAATGATGGGAA
ATCCAGCACAATACTGCCAGGCCCACTACCTATTGTAATTTTCCATCGTAACCAATCCCTTTTCAAATCCATCCGTGACTTCTATGTCTCGTTACTTTCACAGCGTGTGGAGCTACTAGAAAAGTGGCAAAGCT
AAACAGCTGATCGAAGTAAACAGAAAAGAACACTAATTGTAGATCAGGCTGTGTACTAGACCTTATTTTACTGTATTTTTTCGGAAAGAAAAAAGGAGCGCTTTGCAGATCGAAAGTTTCGCTCGTAAATTATT
TGTAAGATGCTATTCATAATATGTTAACTGAGAGAAACCAGGTCAAAACAAAACAATTTTGGGCTCTTGCCTCCAAATTTGCCTACCCTAGAACAGGTATCCATTATCTCGCCTGTACCCGATTAAAAAAAAGA
CCAATTATTTAAAACTTCTCAAGAAGTTTCATATGCAGTGTATAAGTTGAAGGAATATAGGAATATATATCCTTCAGAAAAGCAACACAATACCTAATTACATAACCGATATTTACCTTTTAGAGTGCCTCATTCT
TGCAATCTTTCTGTTCGCCATAACACCACCGCCCATGCTCATGCCATTATTTGTTCCCATCCCCATCTGATTAGGGGCTGACTGCGGCTGCCCAAAAGAAGTTGTCGGCACACCACCTGCCCCCCCAAAAAT
GGATGATGGATTTGTTACGTTTGAATTGGAACCAGAGGCAGCATTCGCACCAAATATATCACTTGGCCTTAATGCATTGGTCGCGGTATTAGTAATTCCGCCATTCAATCCGCTAAAATTAATATTAGGAACTG
TTGATGGCGTGAATGAGCTGTTTGTATTGAAAGATGGGGTTTGCGATTGATGTGGTTGGTTATTAGAGCCGGCAAACACCGTATTAGCATTAGTGTTGCCGTTCATATTAAATACAGAGCCGCCACCAGGCG
TTGAATTATTTCCCGTAAAATTAAAAGCAGAGGGAACATTGACATTTTGTGCATTCGTGGAAGGAGGTTTATTGAATAAACCAGCATTAGCATTAGTTGACGAAGTTGCTGATGTAAAAGGATTGAGACCCCCT
GCACCATTGTTTCCAAAATTAAATGAAGATTGATTAGAAGCTGCACCAGTTGCTGCTGTTCCTGAGCTCGAAAAGCCAAATGCCGAGCCCGCTCCATTCGTATTGCCACTAGCGATACTTTGATCCGGTTTTC
CTACGTTAAATGTACCCGCTATATTGGTTCCTGAGGTATTGGAAGTAGTAGTTGTGCCGTTACCAGTAGCAGGGGCGTTAAACGAGAATGATGTGGAGTTTGCTGAGGCATTGGTACCATTGGTGTTAGCAG
TACCGAATGAAAATGCAGATTTAGATGTTGTATTACCAGTAGCGTCTGTTGGCTTACCCAAGACAGGAATCGGCGTTGAGGAAGCAGAACCATCGAAGAAAGAAGTTGGAGAGTTTGACTTTTCTTTATTGTG
ATTGAACTTTGTAAAGGAAAAGCCATTTGATAGCTTCTCCGTATTTGCTGCCGCTGTACTTGCTGTCGACTTCATCGACTCGGGAGCCCCAAAACTAAAAGATGGTTTTGTGCTAGTGGTTGTTGTATTATTTG
TTGTGGAACCGCCAAAGGTGAAAGATGGCGGTGTAGGTCTTTTATCTGTCTCATTAGCAGGAGGTTTAGTGAAAGAAAATGAGGTGTTAGAGCCTGGTGGTTCTTTAGCTGCATCTGACTTACCAAATGAAAA
CAATGGCTTTGGTTGTGAGGTTTTTGAAGACGCAAACGTAAAGGAGGGCTTTTTAGGTTCCGAAACAACAGATGAATCTTTTTGAGCAGGTTCAGTAAAAGAAAAAGTTGGCTTGAGAGTCTTATCATCCGTC
GGTGCTTGAACATCAACAGGCTTGCCCGGAAACGAAAACGAGGGTTTAGCTGCTTCGTTTGAAATTGGACTACTCTTACGTTCCTCCTCTGACTTAGAGAAAGAGAATGTAGGTTTCGCACTTCCCTCAGAG
ATCTTATTTTCACTTGTTGACTGCCCAAAAGTAAAAGTAGGCTTCTTGACTATTGTGGCAGGTGTCTCAGATGGTTTGGTGTGTGTTTCTTTCGCGGTGGCGGCTTTACCAAAGGTAAATTGTGCAGAGGAGT
CAATATTGCTTGTTACATCAGCTTTTTTTCCGAATGTAAATAATGGTGTACCTTCAGCTTGCTTATCACTTGCACCAAAGACAAAGCTTGGTTTCCCTGATGCGTCCTTTTCTGACTCTCCCTTTTTGGT
Eukaryotic genomes
EMBO Practical Course: Bioinformatics and
Comparative Genome Analysis
Stazione Zoologica Anton Dohrn, Naples, Italy
May 7-19, 2012
Part 1: Structure and diversity of eukaryotic genomes
Part 2: Dynamics and evolution of eukaryotic genomes
The living world
Bacteroides
BACTERIA
Escherichia
Bacillus
Synechococcus
Chloroflexus
Thermotoga
Pyrodictium
Thermoproteus
Crenarchaeota
Euryarchaeota
ARCHAEA
Thermococcus
Methanococcus
Methanobacterium
Methanomicrobium
Halobacterium
Homo
Caenorhabditis
Arabidopsis
Saccharomyces
Paramecium
EUCARYA
Trypanosoma
Vairimorpha
Adapted from C. Woese (1990, 1997)
Endosymbiosis
Membranes
RNA
Prokaryotic cell structure
Eukaryotic cells
Genome =
Genome =
nuclear chromosomes (+ plasmids)
+ mitochondrial genome (+ plasmids)
+ chloroplastic genome
chromosome(s)
+ plasmid(s)
Multiple membranes
Mitochondrial
genome
Nuclear
genome
Chloroplastic
genome
Mitochondrial
genome
Nuclear
genome
Origin of eukaryotes: endosymbioses
Timmis et al. (2004) Nature Reviews Genetics 5: 123-135
Prokaryotic cell
The central dogma of molecular biology
Genome =
chromosome(s)
+ plasmid(s)
Replication
gene
RNA
TTT phe F
TTC phe F
TTA leu L
TTG leu L
TCT ser
TCC ser
TCA ser
TCG ser
CTT leu L
CTC leu L
CTA leu L
CTGleu L
S
S
S
S
Transcription
Transcriptional
regulations
TAT tyr Y
TAC tyr Y
TAA ochre
TAG amber
TGT cys C
TGC cys C
TGA opale
TGG trp W
CCT pro P
CCC pro P
CCA pro P
CCGpro P
CAT his H
CAC his H
CAA gln Q
CAGgln Q
CGT arg R
CGCarg R
CGA arg R
CGG arg R
ATT ile I
ATC ile I
ATA ile I
ATGmet M
ACT thr T
ACC thr T
ACA thr T
ACGthr T
AAT asn N
AAC asn N
AAA lys K
AAGlys K
AGT ser S
AGC ser S
AGA arg R
AGG arg R
GTT val V
GTCval V
GTAval V
GTGval V
GCT ala
GCCala
GCAala
GCGala
GAT asp D
GACasp D
GAAglu E
GAGglu E
GGT gly G
GGCgly G
GGAgly G
GGG
A
A
A
A
DNA
Genome
Intermediary
RNA
Translation
François Jacob
Regulation
of gene
expression
André Lwoff
(genetic code)
Proteins
Jacques Monod
RNA surprises
1970 reverse transcription
Chicken virus
RNA
cDNA
D. Baltimore
retrogene
R. Dulbecco
H. M. Temin
gene
1977 introns
Mammalian virus, rabbit globin gene
mRNA
Precursor RNA
Intron
1983 RNA catalysis
RNA editing
+
Richard J. RobertsPhilip A. Sharp
Intron
Tetrahymena nuclear intron, E. coli RNAse P
Trypanosoma
1985 retrotransposons
yeast
RNA
RNA
novel RNA sequence
cDNA
mobile element
mobile
element
early
then
1980
2000
RNA interference
2002 micro RNAs
Sydney AltmanThomas R. Cech
Jef D. Boeke
petunias, fungi, Caenorhabditis elegans
Caenorhabditis elegans
Andrew Z. Fire Craig C. Mello
The central dogma of molecular biology updated
Genome
Replication
Replication
DNA
DNA
Transcription
Transcriptional
regulations
Intermediary
Reverse
transcription
Exon-shuffling
Transcription
RNA
RNA
Translation
Translation
(genetic code)
(genetic code )
Proteins
Proteins
Genome
Post-transcriptional
regulations
Splicing, editing
non-coding RNAs
Eukaryotic gene
DNA
Gene
Intron 1
Intron 2
Intron 3
Primary transcript
Exon 1
Exon 2
Exon 3
Exon 4
Jonctions des exons
Matured RNA
5' UTR
3' UTR
regulation
+
Coding region
degradation
protein
Excised introns
Introns
1- Spliceosomal introns
donor site
branch point
acceptor site
Polymerase II transcripts
2’ OH
exon 1
2’ OH
G
A
G
A
G
exon 2
First transesterification
G
exon 1
exon 2
3’ OH
Second transesterification
G
exon 1
exon 2
A
G
3’ OH
pre-mRNA
Alternative splicing
mRNA
Alternatively spliced gene models (Vitis vinifera)
Gene models
novel alternative splice form
original gene model
minor alternative splice form
Introns
2- tRNA introns
Polymerase III transcripts
1 nucleotide after anticodon
Multistep enzymatic reaction
Introns
3- Group I introns
4- Group II introns
Mitochondrial genes
Chloroplastic genes
Nuclear rDNA (rare)
Mitochondrial genes
Chloroplastic genes
Self-splicing (ribozymes): two successive transesterifications
Often contain genes for specific proteins ---> mosaic genes
homing endonucleases
reverse transcriptases
intron
Exon 1
exonic translation product
Exon 2
Intronic CDS
polyprotein
intron translation product
Retro-transposons
Example: Ty1 element of yeast (Copia family)
6 kb
LTR
gag
LTR
pol
RNA
+1 frameshift
AUG
UAA
UAA
5’
AAAAAAAA 3 ’
cDNA
99 % protein GAG
X
GAG
New target
VLP
1 % polyprotein GAG AP IN RT RH
protease
AP: protease
IN: intégrase
RT: reverse transcriptase
RH: Rnase H
polyproteine
Reverse transcriptase
RNase H, integrase
Consequences of RNA activity on eukaryotic genomes
Formation of retrogenes
Exon shuffling
RNA
Exonization of mobile elements
exon
mobile
lement
cDNA
ancestral gene
Exon 1 Intron Exon 2
retrogene
Exon 1 Intron Exon 2
novel
exon
or
gene fusion
1 % of human genes,
plus many processed
pseudogenes
New splicing sites or intron loss
~ 19 % of all eukaryotic exons
novel
exon
New splicing sites or intron loss
~ 4 % of novel exons
in human genome
Historical example of exon shuffling: jingwei
Long and Langley (1993) Science 260: 91-95; Long et al. (2003) Nature Reviews Genetics 4: 865- 875
Genetic acquisitions by transfers
intracellular, interorganellar transfers
horizontal gene transfers
bacteria
NUMTs
yeasts
NUPTs
Nucleus
introgressions of large chromosomal segments
Mitochondria
Chloroplast
Donor species: e.g. Zygosaccharomyces baillii
51
53
102
5' SS
intron
106 Intron size
BP
GTACGTAGATAGA
mt DNA
(COX1 fragment)
51
3
3' SS
GAATCAAGCTCATATAGACAACTAACATATGATTTTAG 102
ACTAAATTAATGAACTCTTTATAAATTACTTATAAAAGTCATTTAAATGAT
||||||||||||.|||||||||||||||.||||||||||.|||||||||||
ACTAAATTAATGGACTCTTTATAAATTATTTATAAAAGTTATTTAAATGAT
1
2
DEHA2F06314g Putative 1-3-beta-glucanosyltransferase
Recipient genome: e.g. Saccharomyces cerevisiae
1
2
3
Elements of eukaryotic genomes (nuclear)
Chromosomes: linear, centromeres, telomeres, origins of replication, replicons
Protein-coding genes and spliceosomal introns
Genes for non coding RNAs: rRNAs, tRNAs, snoRNAs, snRNAs, microRNAs …..
Mobile genetic elements: and their remnants
Pseudogenes: and processed pseudogenes
Satellite DNAs: micro-, minisatellites, repeated sequences
Fragments of organellar DNAs: NUMTs and NUPTs
cellular functions
5,780
~ 23,000
Saccharomyces cerevisiae
Homo sapiens
Coding exons
Introns, UTR, pseudogenes
regulations
evolution
Mobiles elements
All others
Duplicated genes and regions, acquired DNA sequences, newly-created
genes :
> intense dynamics of genome modification and evolution
The exquisite beauty of a eukaryotic genome:
Saccharomyces cerevisiae
Nucleus: 16 chromosomes
1996
Mitochondria
slot
Size (kb)
1,078 + rDNA t.r.
1,532 + ENA2 t.r.
1,091
1,091
948
924
II
XIV
X
XI
VIII
V
IX
III
VI
I
813
784
746
666
577
563 + CUP1 t.r.
440
317
270
230
TOTAL 12,071 kb
+ tandem repeats
rDNA
9 kb repeat units
70-120 tandem copies
Total
0.6 - 1.1 Mb
Dispersed repeated sequences
tRNA genes (identical)
Ty elements and LTR (similar)
85,779 bp
(polymorphic
introns)
ca. 30 copies
Total 2.6 Mb
FLP
IR
2
REP2
REP1
IR 1
XII
IV
XV
VII
XVI
XIII
ST
D
B
two-micron plasmid
6,318 bp
ca. 50 copies
Total 0.3 Mb
RNA "virus" 3 - 5 kb
100-1000 copies
Total 0.3 - 5 Mb
Total
Noyau
Total
Mitoch.
16
1
5 769
7
RNA-coding genes
transfer RNAs
sno RNAs
sn RNAs
ribosomal RNAs
Other RNAs
275
77
6
3
>4
23
2
1
Mobile genetic elements
complete (actifve)
incomplete (traces)
52
220
-
Introns
spliceosomal
group I and group II
273
-
1 - 10
Chromosomes
Protein-coding genes
active CDS
pseudogenes
77
-
Yeast genomes are found in a variety of forms
Haploids, Aneuploids
Homozygous diploids, Heterozygous diploids
Interspecific hybrids, Partial hybrids, mosaics
A resource for functional data in eukaryotes
> 85 % of genes functionally characterized
Genome size
Nb of genes
(nucleotides)
(protein-coding)
Amoeba dubia
~ 670 000 000 000
?
Psilotum nudum
~ 250 000 000 000
?
Fritillaria assyriaca
~ 100 000 000 000
?
Necturus lewisi
~100 000 000 000
?
Homo sapiens
2 900 000 000
23 000
Vitis vinifera
487 000 000
30 400
Drosophila melanogaster
160 000 000
14 000
Arabidopsis thaliana
115 000 000
28 000
Caenorhabditis elegans
98 000 000
19 400
Saccharomyces cerevisiae
12 500 000
5 800
4 600 000
4 300
Escherichia coli
Part 1: Structure and diversity of eukaryotic genomes
Part 2: Dynamics and evolution of eukaryotic genomes
Ernst Haeckel, 1866
The eukaryotic world after genomic analyses
Viridiplantae
(P. Keeling, 2005)
Excavata
Rhizaria
Chromalveolata
Unikonts
Genomes of unicellular eukaryotes
Volvox
Chlamydomonas
Ostreococcus
Cyanidioschyzon
Keeling et al., 2005 Trends in Ecology and Evolution, 20: 670-676
The genome of Ostreococcus tauri
Derelle et al., 2006 PNAS 103: 11647-11652
aim: genome of the smallest free living eukaryote
genome size: 12.6 Mb, 20 chromosomes
compositional heterogeneity related to transposons (includes a 146 kb-long segmental duplication)
compact genome: 7892 protein-coding genes, short intergenes, size reduction of multigene families
The genome of Ostreococcus lucimarinus
Palenik et al., 2007 PNAS 104: 7705-7710
O. lucimarinus
O. tauri
13.2
12.6
chromosomes
21
20
protein-coding
genes
7651
7892
split genes (%)
20
genome
size (Mb)
25
Multiple mechanisms contribute to species divergence, act differently on different chromosomes
Horizontal gene transfer altering cell-surface characteristics
Numerous gene fusions 330 (O.t.), 348 (O. l.) of which 137 are common to both species
Numerous (20) genes for selenocysteine-containing proteins (TGA codons)
Genomes of unicellular eukaryotes
Leishmania
Trypanosoma
Giardia
Keeling et al., 2005 Trends in Ecology and Evolution, 20: 670-676
The genome of Giardia lamblia
Morrison et al., 2007 Science 317:1921-1926
human intestinal parasite, flagellated trophozoites attach to epithelial cells
two diploid nuclei, no mitochondria, no peroxisomes
genome ~11.7 Mb, 5 chromosomes, draft sequence 92 scaffolds
6470 annotated CDS, very few introns (4), low degree of heterozygosity (0.01% between the 4 genomes)
simplified molecular machinery,
cytoskeletal structure and
metabolic pathways,
---> either early divergence
or regressive evolution
frequent insertion of motifs (up to 101 amino-acids) in conserved proteins
numerous traces of horizontal gene acquisitions
Genomes of unicellular eukaryotes
Plasmodium
Babesia
Paramecium
Phaeodactylum
Ectocarpus
Phytophtora
Guillardia
Keeling et al., 2005 Trends in Ecology and Evolution, 20: 670-676
The macronuclear genome of Paramecium tetraurelia
Aury et al., (2006) Nature 444: 171-178
Micronucleus (2n)
genome size ca. 100 Mb
> 50 chromosomes
Precise elimination of > 10000
short, unique copy elements
Reconstruction of functional genes
Amplification
ca. 800 times
Imprecise elimination of
transposable elements and
other repeated sequences
Macronucleus
genome size ca. 75 Mb
Chromosome fragmentation
De novo telomere addition
Internal deletions
Note: Heterogeneity in the
sequences abutting imprecisely
eliminated regions
697 scaffolds, totalling 72 Mb
The macronuclear genome of Paramecium tetraurelia
Comparison of two scaffolds
originating from a common
ancestor at the recent WGD
The macronuclear genome of Paramecium tetraurelia
Aury et al., (2006) Nature 444: 171-178
Ancient WGD
Old WGD
Recent WGD
Intermediary WGD
between paralogous proteins
The macronuclear genome of Paramecium tetraurelia
Aury et al., (2006) Nature 444: 171-178
Old
WGD
Intermediary
WGD
Recent
WGD
The genome of Phaeodactylum tricornutum
Bowler et al., 2008, Nature 456: 239-244
Genome size (Mb)
Protein-coding genes
Spliceosomal introns
pennate diatoms
P. tricornatum
centric diatoms
Thalassiosira
pseudonana
27.4
10402
8169
32.4
11776
17880
Numerous genes of bacterial origin involved in carbon
and nitrogen utilization (xylanase, glucanase, prismane,
carbon-nitrogen hydrolase, amidohydrolase), urea cycle
(carbamoyl transferase, carabamate kinase, ornithine
cyclodeaminase), cell wall silicification (Sadenosylmethionine-dependent decarboxylases and
methyltransferases).
Eukaryotes
Pt - Tp
The genomes of Phytophtora infestans (sojae and ramorum)
Haas et al., 2009 Nature 461: 393-398
Genome size (Mb)
Scaffolds
Repeat (%)
Protein-coding genes
P. infestans
P. sojae
P. ramorum
240
4921
74
17797
95
1810
39
16988
65
2576
28
14451
Conserved syntenic blocks containing most common genes and few repeated DNA are separated by
regions of repeated DNA with low gene density and no conservation of gene order --> dynamic genomes
Rapidly evolving effector genes in non-conserved regions (modular secreted proteins, major types: RXLR
and Crinkler, targetted to plant cells and responsible for necrosis, mostly species-specific)
Effector gene family expansion in P. infestans associated with numerous mobile elements (helitron)
oak
soja
potato
Genomes of unicellular eukaryotes
Tuber
Laccaria
Rhizopus
Encephalitozoon
Monosiga
Dictyostelium
Entamoeba
Keeling et al., 2005 Trends in Ecology and Evolution, 20: 670-676
The genome of the choanoflagellate Monosiga brevicollis
King et al., 2008, Nature 451: 783-788
> 600 Myr
Spliceosomal introns
gain > loss
loss > gain
flagellum
Beta-tubulin
(green)
Actin (red)
DNA (blue)
Actin-filled
microvilli
2µm
The genome of the choanoflagellate Monosiga brevicollis
King et al., 2008, Nature 451: 783-788
Protein domain fusions
Algal genes in the closest relatives of animals.
Sun et al., 2010, Mol. Biol. Evol. (in press)
Origin of photosynthesis in evolution: important but controversial
Classical arguments: transfer of plastid genes to nucleus and loss of plastids in evolution
Presence of algal genes in aplastic organisms: are they footprints of photosynthetic ancestors ?
Phylogenomic analyses identified over 100 genes of possible algal origin in Monosiga
The vast majority of these algal genes appear to be derived from haptophytes, diatoms, or
green plants.
Over 25 percent of these algal genes are ultimately of prokaryotic origin and were spread
secondarily to Monosiga.
The presence of algal genes may be expected in many phagotrophs or taxa of phagotrophic
ancestry, and therefore does not necessarily represent evidence of plastid losses.
Four membrane plastids (double endosymbiosis)
Bigelowiella
natans
green
algae
endosymbionte
Guillardia theta
red algae
endosymbionte
Nucleomorph genomes
Bigelowiella natans (Chlorarachiophytes)
Gilson et al., 2006 PNAS 103: 9566-9571
3 chromosomes (purified by PFGE)
141, 134, 98 kb
inverted repeats at chromosome ends (rDNA)
326 genes (proteins: 284, rRNAs: 18, tRNA,s: 20, snRNAs: 4)
+ 5 pseudogenes (plastid-targeted DnaK in terminal inverted repeats)
17 genes encoding plastid proteins
852 pigmy introns (18-21 nt), splicing machinery
Nucleomorph genomes
3 chromosomes (purified by PFGE)
196, 181, 174 kb
inverted repeats at chromosome ends
(rDNA, ubiquitin-conjugating enzyme
gene)
464 protein-coding genes
+ 47 genes for non-coding RNAs
(rRNA, tRNA, snRNA, snoRNA)
17 spliceosomal introns (42-52 nt)
compact genome (very short intergenic
regions, partially overlapping genes)
Guillardia theta (Cryptomonads)
Douglas et al., 2001 Nature 410: 1091-1096
The world of unicellular eukaryotes from genomics
Small, compact genomes
Transposon clustering --> compositional heterogeneity
Numerous gene fusions
Horizontal gene acquisitions
Multicellularity
Viridiplantae
Small, compact genomes
Simplified molecular machinery
Regressive evolution
Horizontal gene acquisitions
Excavata
Primary endosymbiosis
(chloroplasts)
followed by chloroplast
loss
Double endosymbionts
(green algae)
Horizontal gene acquisitions
Rhizaria
Chromalveolata
Whole-genome duplications and gene loss
Rapidly evolving dynamic genomes
Transposon clustering
Horizontal gene acquisitions
Double endosymbionts (red algae)
else ?
Unikonts
Intron gain and loss
Numerous gene fusions
Horizontal gene acquisitions
Multicellularity
Whole-genome duplications
Related documents