Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
AAAATTTTCGTATCTGTTGGAGTTAGATAAGCCTACGCTTGATGGACCGTTGGGTGGCTTTCTAAGTGAGCTCGTGCCATCACAATTAATATAAGGAATTGTAGATGTTTCTTTCGTTATAGGTATTTCAAAATA ATTATAAGAACCTACGCCCTCGTCTTTCTCCATTGGAACAGTTGCCGTTTTCGCAGTTCTTTTTGGTTCAGTCCTCATATCATGTGATTCCCCTGGCTCTCCTGATCTTTTTATACTTACTTTGAAATCGTCATAT GTGTATTTCTTTGATGCAACTCCGATAACGAAGACAATGCTTCCAATAATAACTAAGAATTTGCATACCGTTATTAAACCTACCAAAAGTTTACCTATAAGCTTCTGTAATATTGGCCCCATCATTGTTGTGAATA CGCACCCTACCAAAAATGATGGGAAATCCAGCACAATACTGCCAGGCCCACTACCTATTGTAATTTTCCATCGTAACCAATCCCTTTTCAAATCCATCCGTGACTTCTATGTCTCGTTACTTTCACAGCGTGTG GAGCTACTAGAAAAGTGGCAAAGCTAAACAGCTGATCGAAGTAAACAGAAAAGAACACTAATTGTAGATCAGGCTGTGTACTAGACCTTATTTTACTGTATTTTTTCGGAAAGAAAAAAGGAGCGCTTTGCAG ATCGAAAGTTTCGCTCGTAAATTATTTGTAAGATGCTATTCATAATATGTTAACTGAGAGAAACCAGGTCAAAACAAAACAATTTTGGGCTCTTGCCTCCAAATTTGCCTACCCTAGAACAGGTATCCATTATCT CGCCTGTACCCGATTAAAAAAAAGACCAATTATTTAAAACTTCTCAAGAAGTTTCATATGCAGTGTATAAGTTGAAGGAATATAGGAATATATATCCTTCAGAAAAGCAACACAATACCTAATTACATAACCGAT ATTTACCTTTTAGAGTGCCTCATTCTTGCAATCTTTCTGTTCGCCATAACACCACCGCCCATGCTCATGCCATTATTTGTTCCCATCCCCATCTGATTAGGGGCTGACTGCGGCTGCCCAAAAGAAGTTGTCG GCACACCACCTGCCCCCCCAAAAATGGATGATGGATTTGTTACGTTTGAATTGGAACCAGAGGCAGCATTCGCACCAAATATATCACTTGGCCTTAATGCATTGGTCGCGGTATTAGTAATTCCGCCATTCAA TCCGCTAAAATTAATATTAGGAACTGTTGATGGCGTGAATGAGCTGTTTGTATTGAAAGATGGGGTTTGCGATTGATGTGGTTGGTTATTAGAGCCGGCAAACACCGTATTAGCATTAGTGTTGCCGTTCATAT TAAATACAGAGCCGCCACCAGGCGTTGAATTATTTCCCGTAAAATTAAAAGCAGAGGGAACATTGACATTTTGTGCATTCGTGGAAGGAGGTTTATTGAATAAACCAGCATTAGCATTAGTTGACGAAGTTGC TGATGTAAAAGGATTGAGACCCCCTGCACCATTGTTTCCAAAATTAAATGAAGATTGATTAGAAGCTGCACCAGTTGCTGCTGTTCCTGAGCTCGAAAAGCCAAATGCCGAGCCCGCTCCATTCGTATTGCCA CTAGCGATACTTTGATCCGGTTTTCCTACGTTAAATGTACCCGCTATATTGGTTCCTGAGGTATTGGAAGTAGTAGTTGTGCCGTTACCAGTAGCAGGGGCGTTAAACGAGAATGATGTGGAGTTTGCTGAG GCATTGGTACCATTGGTGTTAGCAGTACCGAATGAAAATGCAGATTTAGATGTTGTATTACCAGTAGCGTCTGTTGGCTTACCCAAGACAGGAATCGGCGTTGAGGAAGCAGAACCATCGAAGAAAGAAGTT GGAGAGTTTGACTTTTCTTTATTGTGATTGAACTTTGTAAAGGAAAAGCCATTTGATAGCTTCTCCGTATTTGCTGCCGCTGTACTTGCTGTCGACTTCATCGACTCGGGAGCCCCAAAACTAAAAGATGGTTT TGTGCTAGTGGTTGTTGTATTATTTGTTGTGGAACCGCCAAAGGTGAAAGATGGCGGTGTAGGTCTTTTATCTGTCTCATTAGCAGGAGGTTTAGTGAAAGAAAATGAGGTGTTAGAGCCTGGTGGTTCTTTA GCTGCATCTGACTTACCAAATGAAAACAATGGCTTTGGTTGTGAGGTTTTTGAAGACGCAAACGTAAAGGAGGGCTTTTTAGGTTCCGAAACAACAGATGAATCTTTTTGAGCAGGTTCAGTAAAAGAAAAAG TTGGCTTGAGAGTCTTATCATCCGTCGGTGCTTGAACATCAACAGGCTTGCCCGGAAACGAAAACGAGGGTTTAGCTGCTTCGTTTGAAATTGGACTACTCTTACGTTCCTCCTCTGACTTAGAGAAAGAGAA TGTAGGTTTCGCACTTCCCTCAGAGATCTTATTTTCACTTGTTGACTGCCCAAAAGTAAAAGTAGGCTTCTTGACTATTGTGGCAGGTGTCTCAGATGGTTTGGTGTGTGTTTCTTTCGCGGTGGCGGCTTTA CCAAAGGTAAATTGTGCAGAGGAGTCAATATTGCTTGTTACATCAGCTTTTTTTCCGAATGTAAATAATGGTGTACCTTCAGCTTGCTTATCACTTGCACCAAAGACAAAGCTTGGTTTCCCTGATGCGTCCTT TTCTGACTCTCCCTTTTTGGTCTCCTTTTGATCACCGGTCTTGCCGAAATCGAATAAAGGCTTGGTGTTTGTATCCTCGCTAACAGGTAAACGCCTTTTTCTTTTGGGCTCATTTTCATCATCACCTTCATCACC ATTCTCTTCTTGTTTACCAAAAGAAAATATTGGAGCAGTTGATTTTGGAGGCGCGTCTGATTCTGTATGATTTTCACTTTTTTCGGATGTCTTTCCAAATTTAAAAGGTTGACTGGCAGAAGTAACGGTATCTGA TTTACCACCAAAATTGAATAAAGTTGTGGAAGGGACAGTATTGTCGACAGCCTTAGTTTTATTAGCCTTTTGGCTAAAATTGAATGATAAGGTAGGCGCCTCGGCAGTTTTCGTTGACTTATCGGTTTTTCCCA TTTCTACACTCGATTTAAAGACTGCACCTGCAGAAGAAGTTGCCTTAGGAGAAGTTTTCTTAGATGGAGTCTCATTGTCCTTGATAAAGTCAAAACCTACGGTGGGCAAAACAATACTTTCTTTGTCCTTTTTG GGCTCAATATTTTTCTTTAACGTAGGCGTACCAGAGCGCTCTGAATTGGGAACAAAGCTCTCCTGAATAGGGGCAGTTTTTGCAACTGTGGAAGCTGGTCCTTTTAAAAGTAGATTTTTTTGAGGATTCGATAA CCTATTAGAGTTGATGTCTGCACGTAGGTCTTCAATTTCGCTTGTCAGGTTAGGGCCTGTAGCCAGATTGCCATTTGAAATACTACTCTTAATATTATTTCTATTCTCGCTTGTCTTCTGATCACCGCCAGCGTT ACCTTCCTTATCCTTGTTATCCTTTTTTTGTATAGCGTCATATTCTGACAAATCATATTCAAAATTTGCTGACCACACGGTCCCCTTTGACTGACTATGAAACCTTTTTCTATTGGATCTATTTTTCAATGATTTGA GAATGGGTAGTCCAACATTGGTGTCTTCACCGCTTTTTCCGGCCAACTGCCTAGTGCAAGAACCGTTTTTAATAGGGGAAGGAGTAGATGATGTGCATAGGTACGATCCTCCCTCATCGCTTTTACTTTGAGA GCCCAATATAACCGACGATGTAATAGATGGAAATTCAGTTGATTGAATTAATCCAAGCTCACGCATATTTCTCACCCTCTGCTTCTCCCTTAATAACCTCAGTCTTTGAATGGGCAAAATTGGCAAAAGCGGCG GTCTCTCAGTGTTTTCGGTTCCATATATTATAATTGGCGCATTATTGTTGTTCTGAGTTAAGCTGTCGCTATGCTGTGATGTACCGGACACCCTCTTTCTCTTATTAACATGCAGTGTGTCTTCAACATCTGATT CCTCCAAATGATTCGCGTATGAGAGGTTTGAACTGAAAACTTTCTTGCTCGATGGCCGTTTTTTATTGGGGTTTGTGAAGAATGATTTTAAAGTGGAAGAAAACGATCTCTTTTCGACACGTGGAGAAGACATC ACAGAAGAAGTGTTTGAAGACATGAATGACTAAAAATTGTCGCTCACTCTCTGTCCCTATAACCCTTTCGAGGCTAATATCCTATCGTATTTGCACCGCTACGTAGTGTCCTTATTGAGTTCCTCATCACTTATT TTCTTTAAGTGTTTCTTGACATTACGAAATTTCGTCAAAGAAAAAAATTAAAATGAAAAAGCATTTCAATGTCACATAATACGAACCATTGATCACGTGCAACGACAAACCCTAAATATAAAAACTAGGGCGTAA AAACCGGGGCTTGAAAATTAGGGCATAAAATAGGCTTTGCATACACGTGACTTATATTTGGTGTCGGCGTTTTCTTTACGCGGTGTAGTGTAAATCTCTTGTCGTACAAGTGGATATACGCACTGTATACCTC CAGTAACACCAAAAAAAAAACCGTGGTTGTCCCATGTAAACGAGTACCGCACACGTAGGCCAAAGCACTCCAGAGAGACTTCGTGTCAAAGGTCTATAATAGGTGGTGCCTTCTTGCTTCTTTTTTGCAGATT CTTAGTATAATACGCTAGACTATTGTACTTTCTAATTTTAAGAGATATCTTTTTCCTCACAAAGATTTCGTTAAGCAATCGAAGTAAAGTACTCCATCAGAAGAGTTTTTAAAATTTTCGTATCTGTTGGAGTTAG ATAAGCCTACGCTTGATGGACCGTTGGGTGGCTTTCTAAGTGAGCTCGTGCCATCACAATTAATATAAGGAATTGTAGATGTTTCTTTCGTTATAGGTATTTCAAAATAATTATAAGAACCTACGCCCTCGTCTT TCTCCATTGGAACAGTTGCCGTTTTCGCAGTTCTTTTTGGTTCAGTCCTCATATCATGTGATTCCCCTGGCTCTCCTGATCTTTTTATACTTACTTTGAAATCGTCATATGTGTATTTCTTTGATGCAACTCCGAT AACGAAGACAATGCTTCCAATAATAACTAAGAATTTGCATACCGTTATTAAACCTACCAAAAGTTTACCTATAAGCTTCTGTAATATTGGCCCCATCATTGTTGTGAATACGCACCCTACCAAAAATGATGGGAA ATCCAGCACAATACTGCCAGGCCCACTACCTATTGTAATTTTCCATCGTAACCAATCCCTTTTCAAATCCATCCGTGACTTCTATGTCTCGTTACTTTCACAGCGTGTGGAGCTACTAGAAAAGTGGCAAAGCT AAACAGCTGATCGAAGTAAACAGAAAAGAACACTAATTGTAGATCAGGCTGTGTACTAGACCTTATTTTACTGTATTTTTTCGGAAAGAAAAAAGGAGCGCTTTGCAGATCGAAAGTTTCGCTCGTAAATTATT TGTAAGATGCTATTCATAATATGTTAACTGAGAGAAACCAGGTCAAAACAAAACAATTTTGGGCTCTTGCCTCCAAATTTGCCTACCCTAGAACAGGTATCCATTATCTCGCCTGTACCCGATTAAAAAAAAGA CCAATTATTTAAAACTTCTCAAGAAGTTTCATATGCAGTGTATAAGTTGAAGGAATATAGGAATATATATCCTTCAGAAAAGCAACACAATACCTAATTACATAACCGATATTTACCTTTTAGAGTGCCTCATTCT TGCAATCTTTCTGTTCGCCATAACACCACCGCCCATGCTCATGCCATTATTTGTTCCCATCCCCATCTGATTAGGGGCTGACTGCGGCTGCCCAAAAGAAGTTGTCGGCACACCACCTGCCCCCCCAAAAAT GGATGATGGATTTGTTACGTTTGAATTGGAACCAGAGGCAGCATTCGCACCAAATATATCACTTGGCCTTAATGCATTGGTCGCGGTATTAGTAATTCCGCCATTCAATCCGCTAAAATTAATATTAGGAACTG TTGATGGCGTGAATGAGCTGTTTGTATTGAAAGATGGGGTTTGCGATTGATGTGGTTGGTTATTAGAGCCGGCAAACACCGTATTAGCATTAGTGTTGCCGTTCATATTAAATACAGAGCCGCCACCAGGCG TTGAATTATTTCCCGTAAAATTAAAAGCAGAGGGAACATTGACATTTTGTGCATTCGTGGAAGGAGGTTTATTGAATAAACCAGCATTAGCATTAGTTGACGAAGTTGCTGATGTAAAAGGATTGAGACCCCCT GCACCATTGTTTCCAAAATTAAATGAAGATTGATTAGAAGCTGCACCAGTTGCTGCTGTTCCTGAGCTCGAAAAGCCAAATGCCGAGCCCGCTCCATTCGTATTGCCACTAGCGATACTTTGATCCGGTTTTC CTACGTTAAATGTACCCGCTATATTGGTTCCTGAGGTATTGGAAGTAGTAGTTGTGCCGTTACCAGTAGCAGGGGCGTTAAACGAGAATGATGTGGAGTTTGCTGAGGCATTGGTACCATTGGTGTTAGCAG TACCGAATGAAAATGCAGATTTAGATGTTGTATTACCAGTAGCGTCTGTTGGCTTACCCAAGACAGGAATCGGCGTTGAGGAAGCAGAACCATCGAAGAAAGAAGTTGGAGAGTTTGACTTTTCTTTATTGTG ATTGAACTTTGTAAAGGAAAAGCCATTTGATAGCTTCTCCGTATTTGCTGCCGCTGTACTTGCTGTCGACTTCATCGACTCGGGAGCCCCAAAACTAAAAGATGGTTTTGTGCTAGTGGTTGTTGTATTATTTG TTGTGGAACCGCCAAAGGTGAAAGATGGCGGTGTAGGTCTTTTATCTGTCTCATTAGCAGGAGGTTTAGTGAAAGAAAATGAGGTGTTAGAGCCTGGTGGTTCTTTAGCTGCATCTGACTTACCAAATGAAAA CAATGGCTTTGGTTGTGAGGTTTTTGAAGACGCAAACGTAAAGGAGGGCTTTTTAGGTTCCGAAACAACAGATGAATCTTTTTGAGCAGGTTCAGTAAAAGAAAAAGTTGGCTTGAGAGTCTTATCATCCGTC GGTGCTTGAACATCAACAGGCTTGCCCGGAAACGAAAACGAGGGTTTAGCTGCTTCGTTTGAAATTGGACTACTCTTACGTTCCTCCTCTGACTTAGAGAAAGAGAATGTAGGTTTCGCACTTCCCTCAGAG ATCTTATTTTCACTTGTTGACTGCCCAAAAGTAAAAGTAGGCTTCTTGACTATTGTGGCAGGTGTCTCAGATGGTTTGGTGTGTGTTTCTTTCGCGGTGGCGGCTTTACCAAAGGTAAATTGTGCAGAGGAGT CAATATTGCTTGTTACATCAGCTTTTTTTCCGAATGTAAATAATGGTGTACCTTCAGCTTGCTTATCACTTGCACCAAAGACAAAGCTTGGTTTCCCTGATGCGTCCTTTTCTGACTCTCCCTTTTTGGT Eukaryotic genomes EMBO Practical Course: Bioinformatics and Comparative Genome Analysis Stazione Zoologica Anton Dohrn, Naples, Italy May 7-19, 2012 Part 1: Structure and diversity of eukaryotic genomes Part 2: Dynamics and evolution of eukaryotic genomes The living world Bacteroides BACTERIA Escherichia Bacillus Synechococcus Chloroflexus Thermotoga Pyrodictium Thermoproteus Crenarchaeota Euryarchaeota ARCHAEA Thermococcus Methanococcus Methanobacterium Methanomicrobium Halobacterium Homo Caenorhabditis Arabidopsis Saccharomyces Paramecium EUCARYA Trypanosoma Vairimorpha Adapted from C. Woese (1990, 1997) Endosymbiosis Membranes RNA Prokaryotic cell structure Eukaryotic cells Genome = Genome = nuclear chromosomes (+ plasmids) + mitochondrial genome (+ plasmids) + chloroplastic genome chromosome(s) + plasmid(s) Multiple membranes Mitochondrial genome Nuclear genome Chloroplastic genome Mitochondrial genome Nuclear genome Origin of eukaryotes: endosymbioses Timmis et al. (2004) Nature Reviews Genetics 5: 123-135 Prokaryotic cell The central dogma of molecular biology Genome = chromosome(s) + plasmid(s) Replication gene RNA TTT phe F TTC phe F TTA leu L TTG leu L TCT ser TCC ser TCA ser TCG ser CTT leu L CTC leu L CTA leu L CTGleu L S S S S Transcription Transcriptional regulations TAT tyr Y TAC tyr Y TAA ochre TAG amber TGT cys C TGC cys C TGA opale TGG trp W CCT pro P CCC pro P CCA pro P CCGpro P CAT his H CAC his H CAA gln Q CAGgln Q CGT arg R CGCarg R CGA arg R CGG arg R ATT ile I ATC ile I ATA ile I ATGmet M ACT thr T ACC thr T ACA thr T ACGthr T AAT asn N AAC asn N AAA lys K AAGlys K AGT ser S AGC ser S AGA arg R AGG arg R GTT val V GTCval V GTAval V GTGval V GCT ala GCCala GCAala GCGala GAT asp D GACasp D GAAglu E GAGglu E GGT gly G GGCgly G GGAgly G GGG A A A A DNA Genome Intermediary RNA Translation François Jacob Regulation of gene expression André Lwoff (genetic code) Proteins Jacques Monod RNA surprises 1970 reverse transcription Chicken virus RNA cDNA D. Baltimore retrogene R. Dulbecco H. M. Temin gene 1977 introns Mammalian virus, rabbit globin gene mRNA Precursor RNA Intron 1983 RNA catalysis RNA editing + Richard J. RobertsPhilip A. Sharp Intron Tetrahymena nuclear intron, E. coli RNAse P Trypanosoma 1985 retrotransposons yeast RNA RNA novel RNA sequence cDNA mobile element mobile element early then 1980 2000 RNA interference 2002 micro RNAs Sydney AltmanThomas R. Cech Jef D. Boeke petunias, fungi, Caenorhabditis elegans Caenorhabditis elegans Andrew Z. Fire Craig C. Mello The central dogma of molecular biology updated Genome Replication Replication DNA DNA Transcription Transcriptional regulations Intermediary Reverse transcription Exon-shuffling Transcription RNA RNA Translation Translation (genetic code) (genetic code ) Proteins Proteins Genome Post-transcriptional regulations Splicing, editing non-coding RNAs Eukaryotic gene DNA Gene Intron 1 Intron 2 Intron 3 Primary transcript Exon 1 Exon 2 Exon 3 Exon 4 Jonctions des exons Matured RNA 5' UTR 3' UTR regulation + Coding region degradation protein Excised introns Introns 1- Spliceosomal introns donor site branch point acceptor site Polymerase II transcripts 2’ OH exon 1 2’ OH G A G A G exon 2 First transesterification G exon 1 exon 2 3’ OH Second transesterification G exon 1 exon 2 A G 3’ OH pre-mRNA Alternative splicing mRNA Alternatively spliced gene models (Vitis vinifera) Gene models novel alternative splice form original gene model minor alternative splice form Introns 2- tRNA introns Polymerase III transcripts 1 nucleotide after anticodon Multistep enzymatic reaction Introns 3- Group I introns 4- Group II introns Mitochondrial genes Chloroplastic genes Nuclear rDNA (rare) Mitochondrial genes Chloroplastic genes Self-splicing (ribozymes): two successive transesterifications Often contain genes for specific proteins ---> mosaic genes homing endonucleases reverse transcriptases intron Exon 1 exonic translation product Exon 2 Intronic CDS polyprotein intron translation product Retro-transposons Example: Ty1 element of yeast (Copia family) 6 kb LTR gag LTR pol RNA +1 frameshift AUG UAA UAA 5’ AAAAAAAA 3 ’ cDNA 99 % protein GAG X GAG New target VLP 1 % polyprotein GAG AP IN RT RH protease AP: protease IN: intégrase RT: reverse transcriptase RH: Rnase H polyproteine Reverse transcriptase RNase H, integrase Consequences of RNA activity on eukaryotic genomes Formation of retrogenes Exon shuffling RNA Exonization of mobile elements exon mobile lement cDNA ancestral gene Exon 1 Intron Exon 2 retrogene Exon 1 Intron Exon 2 novel exon or gene fusion 1 % of human genes, plus many processed pseudogenes New splicing sites or intron loss ~ 19 % of all eukaryotic exons novel exon New splicing sites or intron loss ~ 4 % of novel exons in human genome Historical example of exon shuffling: jingwei Long and Langley (1993) Science 260: 91-95; Long et al. (2003) Nature Reviews Genetics 4: 865- 875 Genetic acquisitions by transfers intracellular, interorganellar transfers horizontal gene transfers bacteria NUMTs yeasts NUPTs Nucleus introgressions of large chromosomal segments Mitochondria Chloroplast Donor species: e.g. Zygosaccharomyces baillii 51 53 102 5' SS intron 106 Intron size BP GTACGTAGATAGA mt DNA (COX1 fragment) 51 3 3' SS GAATCAAGCTCATATAGACAACTAACATATGATTTTAG 102 ACTAAATTAATGAACTCTTTATAAATTACTTATAAAAGTCATTTAAATGAT ||||||||||||.|||||||||||||||.||||||||||.||||||||||| ACTAAATTAATGGACTCTTTATAAATTATTTATAAAAGTTATTTAAATGAT 1 2 DEHA2F06314g Putative 1-3-beta-glucanosyltransferase Recipient genome: e.g. Saccharomyces cerevisiae 1 2 3 Elements of eukaryotic genomes (nuclear) Chromosomes: linear, centromeres, telomeres, origins of replication, replicons Protein-coding genes and spliceosomal introns Genes for non coding RNAs: rRNAs, tRNAs, snoRNAs, snRNAs, microRNAs ….. Mobile genetic elements: and their remnants Pseudogenes: and processed pseudogenes Satellite DNAs: micro-, minisatellites, repeated sequences Fragments of organellar DNAs: NUMTs and NUPTs cellular functions 5,780 ~ 23,000 Saccharomyces cerevisiae Homo sapiens Coding exons Introns, UTR, pseudogenes regulations evolution Mobiles elements All others Duplicated genes and regions, acquired DNA sequences, newly-created genes : > intense dynamics of genome modification and evolution The exquisite beauty of a eukaryotic genome: Saccharomyces cerevisiae Nucleus: 16 chromosomes 1996 Mitochondria slot Size (kb) 1,078 + rDNA t.r. 1,532 + ENA2 t.r. 1,091 1,091 948 924 II XIV X XI VIII V IX III VI I 813 784 746 666 577 563 + CUP1 t.r. 440 317 270 230 TOTAL 12,071 kb + tandem repeats rDNA 9 kb repeat units 70-120 tandem copies Total 0.6 - 1.1 Mb Dispersed repeated sequences tRNA genes (identical) Ty elements and LTR (similar) 85,779 bp (polymorphic introns) ca. 30 copies Total 2.6 Mb FLP IR 2 REP2 REP1 IR 1 XII IV XV VII XVI XIII ST D B two-micron plasmid 6,318 bp ca. 50 copies Total 0.3 Mb RNA "virus" 3 - 5 kb 100-1000 copies Total 0.3 - 5 Mb Total Noyau Total Mitoch. 16 1 5 769 7 RNA-coding genes transfer RNAs sno RNAs sn RNAs ribosomal RNAs Other RNAs 275 77 6 3 >4 23 2 1 Mobile genetic elements complete (actifve) incomplete (traces) 52 220 - Introns spliceosomal group I and group II 273 - 1 - 10 Chromosomes Protein-coding genes active CDS pseudogenes 77 - Yeast genomes are found in a variety of forms Haploids, Aneuploids Homozygous diploids, Heterozygous diploids Interspecific hybrids, Partial hybrids, mosaics A resource for functional data in eukaryotes > 85 % of genes functionally characterized Genome size Nb of genes (nucleotides) (protein-coding) Amoeba dubia ~ 670 000 000 000 ? Psilotum nudum ~ 250 000 000 000 ? Fritillaria assyriaca ~ 100 000 000 000 ? Necturus lewisi ~100 000 000 000 ? Homo sapiens 2 900 000 000 23 000 Vitis vinifera 487 000 000 30 400 Drosophila melanogaster 160 000 000 14 000 Arabidopsis thaliana 115 000 000 28 000 Caenorhabditis elegans 98 000 000 19 400 Saccharomyces cerevisiae 12 500 000 5 800 4 600 000 4 300 Escherichia coli Part 1: Structure and diversity of eukaryotic genomes Part 2: Dynamics and evolution of eukaryotic genomes Ernst Haeckel, 1866 The eukaryotic world after genomic analyses Viridiplantae (P. Keeling, 2005) Excavata Rhizaria Chromalveolata Unikonts Genomes of unicellular eukaryotes Volvox Chlamydomonas Ostreococcus Cyanidioschyzon Keeling et al., 2005 Trends in Ecology and Evolution, 20: 670-676 The genome of Ostreococcus tauri Derelle et al., 2006 PNAS 103: 11647-11652 aim: genome of the smallest free living eukaryote genome size: 12.6 Mb, 20 chromosomes compositional heterogeneity related to transposons (includes a 146 kb-long segmental duplication) compact genome: 7892 protein-coding genes, short intergenes, size reduction of multigene families The genome of Ostreococcus lucimarinus Palenik et al., 2007 PNAS 104: 7705-7710 O. lucimarinus O. tauri 13.2 12.6 chromosomes 21 20 protein-coding genes 7651 7892 split genes (%) 20 genome size (Mb) 25 Multiple mechanisms contribute to species divergence, act differently on different chromosomes Horizontal gene transfer altering cell-surface characteristics Numerous gene fusions 330 (O.t.), 348 (O. l.) of which 137 are common to both species Numerous (20) genes for selenocysteine-containing proteins (TGA codons) Genomes of unicellular eukaryotes Leishmania Trypanosoma Giardia Keeling et al., 2005 Trends in Ecology and Evolution, 20: 670-676 The genome of Giardia lamblia Morrison et al., 2007 Science 317:1921-1926 human intestinal parasite, flagellated trophozoites attach to epithelial cells two diploid nuclei, no mitochondria, no peroxisomes genome ~11.7 Mb, 5 chromosomes, draft sequence 92 scaffolds 6470 annotated CDS, very few introns (4), low degree of heterozygosity (0.01% between the 4 genomes) simplified molecular machinery, cytoskeletal structure and metabolic pathways, ---> either early divergence or regressive evolution frequent insertion of motifs (up to 101 amino-acids) in conserved proteins numerous traces of horizontal gene acquisitions Genomes of unicellular eukaryotes Plasmodium Babesia Paramecium Phaeodactylum Ectocarpus Phytophtora Guillardia Keeling et al., 2005 Trends in Ecology and Evolution, 20: 670-676 The macronuclear genome of Paramecium tetraurelia Aury et al., (2006) Nature 444: 171-178 Micronucleus (2n) genome size ca. 100 Mb > 50 chromosomes Precise elimination of > 10000 short, unique copy elements Reconstruction of functional genes Amplification ca. 800 times Imprecise elimination of transposable elements and other repeated sequences Macronucleus genome size ca. 75 Mb Chromosome fragmentation De novo telomere addition Internal deletions Note: Heterogeneity in the sequences abutting imprecisely eliminated regions 697 scaffolds, totalling 72 Mb The macronuclear genome of Paramecium tetraurelia Comparison of two scaffolds originating from a common ancestor at the recent WGD The macronuclear genome of Paramecium tetraurelia Aury et al., (2006) Nature 444: 171-178 Ancient WGD Old WGD Recent WGD Intermediary WGD between paralogous proteins The macronuclear genome of Paramecium tetraurelia Aury et al., (2006) Nature 444: 171-178 Old WGD Intermediary WGD Recent WGD The genome of Phaeodactylum tricornutum Bowler et al., 2008, Nature 456: 239-244 Genome size (Mb) Protein-coding genes Spliceosomal introns pennate diatoms P. tricornatum centric diatoms Thalassiosira pseudonana 27.4 10402 8169 32.4 11776 17880 Numerous genes of bacterial origin involved in carbon and nitrogen utilization (xylanase, glucanase, prismane, carbon-nitrogen hydrolase, amidohydrolase), urea cycle (carbamoyl transferase, carabamate kinase, ornithine cyclodeaminase), cell wall silicification (Sadenosylmethionine-dependent decarboxylases and methyltransferases). Eukaryotes Pt - Tp The genomes of Phytophtora infestans (sojae and ramorum) Haas et al., 2009 Nature 461: 393-398 Genome size (Mb) Scaffolds Repeat (%) Protein-coding genes P. infestans P. sojae P. ramorum 240 4921 74 17797 95 1810 39 16988 65 2576 28 14451 Conserved syntenic blocks containing most common genes and few repeated DNA are separated by regions of repeated DNA with low gene density and no conservation of gene order --> dynamic genomes Rapidly evolving effector genes in non-conserved regions (modular secreted proteins, major types: RXLR and Crinkler, targetted to plant cells and responsible for necrosis, mostly species-specific) Effector gene family expansion in P. infestans associated with numerous mobile elements (helitron) oak soja potato Genomes of unicellular eukaryotes Tuber Laccaria Rhizopus Encephalitozoon Monosiga Dictyostelium Entamoeba Keeling et al., 2005 Trends in Ecology and Evolution, 20: 670-676 The genome of the choanoflagellate Monosiga brevicollis King et al., 2008, Nature 451: 783-788 > 600 Myr Spliceosomal introns gain > loss loss > gain flagellum Beta-tubulin (green) Actin (red) DNA (blue) Actin-filled microvilli 2µm The genome of the choanoflagellate Monosiga brevicollis King et al., 2008, Nature 451: 783-788 Protein domain fusions Algal genes in the closest relatives of animals. Sun et al., 2010, Mol. Biol. Evol. (in press) Origin of photosynthesis in evolution: important but controversial Classical arguments: transfer of plastid genes to nucleus and loss of plastids in evolution Presence of algal genes in aplastic organisms: are they footprints of photosynthetic ancestors ? Phylogenomic analyses identified over 100 genes of possible algal origin in Monosiga The vast majority of these algal genes appear to be derived from haptophytes, diatoms, or green plants. Over 25 percent of these algal genes are ultimately of prokaryotic origin and were spread secondarily to Monosiga. The presence of algal genes may be expected in many phagotrophs or taxa of phagotrophic ancestry, and therefore does not necessarily represent evidence of plastid losses. Four membrane plastids (double endosymbiosis) Bigelowiella natans green algae endosymbionte Guillardia theta red algae endosymbionte Nucleomorph genomes Bigelowiella natans (Chlorarachiophytes) Gilson et al., 2006 PNAS 103: 9566-9571 3 chromosomes (purified by PFGE) 141, 134, 98 kb inverted repeats at chromosome ends (rDNA) 326 genes (proteins: 284, rRNAs: 18, tRNA,s: 20, snRNAs: 4) + 5 pseudogenes (plastid-targeted DnaK in terminal inverted repeats) 17 genes encoding plastid proteins 852 pigmy introns (18-21 nt), splicing machinery Nucleomorph genomes 3 chromosomes (purified by PFGE) 196, 181, 174 kb inverted repeats at chromosome ends (rDNA, ubiquitin-conjugating enzyme gene) 464 protein-coding genes + 47 genes for non-coding RNAs (rRNA, tRNA, snRNA, snoRNA) 17 spliceosomal introns (42-52 nt) compact genome (very short intergenic regions, partially overlapping genes) Guillardia theta (Cryptomonads) Douglas et al., 2001 Nature 410: 1091-1096 The world of unicellular eukaryotes from genomics Small, compact genomes Transposon clustering --> compositional heterogeneity Numerous gene fusions Horizontal gene acquisitions Multicellularity Viridiplantae Small, compact genomes Simplified molecular machinery Regressive evolution Horizontal gene acquisitions Excavata Primary endosymbiosis (chloroplasts) followed by chloroplast loss Double endosymbionts (green algae) Horizontal gene acquisitions Rhizaria Chromalveolata Whole-genome duplications and gene loss Rapidly evolving dynamic genomes Transposon clustering Horizontal gene acquisitions Double endosymbionts (red algae) else ? Unikonts Intron gain and loss Numerous gene fusions Horizontal gene acquisitions Multicellularity Whole-genome duplications