Download MEGAN analysis of metagenomic data

MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res. 2007 Early metagenomic  Known phylogenetic markers and subsequent sequencing of clones   Analysis of paired-end reads Complete sequences of environmental fosmid and BAC clones   Environmental assemblies   Rough annotation of the metabolic capacity Distinguish between discrete species and population of closely related biotypes Problem of using proven phylogenetic markers(ribosomal genes, coding sequences)  Slow-evolving genes : distinguishing between species at large evolutionary distances What is MEGAN?     Metagenome Analyzer (MEGAN) Free software. Deviates from the analytical pattern of previous Built on the statistical analysis of comparing random sequence intervals with unspecified phylogenetic properties against databases    Providing filter to adjust the level of stringency later to an appropriate level Laptop analysis   Depends on the related sequences in the databases Comparing result (BLAST)-> laptop (MEGAN) Graphical and statistical output Pipeline     Compare against databases : BLAST Compute, explore taxonomical content : NCBI taxonomy Lowest common ancestor (LCA) algorithm Data sets(Sargasso Sea, mammoth bone, Short E. coli K12 & B. bacteriovorus HD100) What we can do with MEGAN     Species and strain identification through species-specific genes Searching species or taxa by find tool Distribution of strains of a species Underlying sequence alignments Experiments-1  Sargasso Sea  data set   Sanger sequencing Sample 1-4 from DDBJ/EMBL/GenBank    BLASTX->NCBI-NR   10000 reads from Sample1 Randomly selected a pooled set of 10000 reads from samples 2-4 1% no hits from sample1, <3% no hits from sample 2-4 Filters    Min-score : bit-score threshold of 100 Top-percent : bit scores lie within 5% of the best score Min-support : isolated assignments it by one read) discarded Analysis-Sargasso Sea data  1.66M reads, AVG. 818bp by Sanger sequensing  Species profile of 16 taxonomical groups  Environmental assemblies  By analyzing six specific phylogenetic markers  rRNA, RecA/RadA, HSP70, RpoB, EF-Tu, and Ef-G Result • Sample1 •~83% reads were assigned to taxa that were more speific than the kingdom level •Majority of (8298) were assigned to bacterial group •Sample 2-4 •~59% reads were assigned to taxa that were more specific than the kingdom level •Majority of (5709) were assigned to bacterial group •Alphaproteobacteria, Gammaproteobacteria by a factor of 2-4 over the remaining 14 taxonomic groups •Eukaryotes & Viruses : size filtering •Archaea : May be there is 10times as much vacterial sequence information in the public databases •MEGAN vs. previous (Venter et al. 2004) •Specific assignment information : LCA Result-cont. •Averaged weighted percentage of the siz phylogenetic markers for each of the 16 taxonomic groups •Easily detect sampling bias between sample1 and pooled sample 2-4 Experiments-2  Mammoth bone  Data set       Roche GS20 sequencing (Sequencing-by-synthesis) Sample from 1g of mammoth bone , 28000 years ~300,000 reads, 95bp BLASTZ-genome sequences (elephant, human, dog) 45.4% of the reads mammoth DNA, others are environmental organisms (bacteria, fungi, amoeba, nematodes) BLASTX–NCBI-NR for environmental sequences  Filters : bit-score threshold 30, discard isolated assignment (filtered 2086 reads) Result   19841 reads to Eukaryota, of which 7969 to Gnathostomata 16972 : Bacteria, 761: Archea, 152 : Viruses Experiment 3  Identifying species from various lead length  Short E. coli K12 & B. bacteriovorus HD100 simulation    5000 random shotgun reads BLASTX-NCBI-NR Filters     Bit-score threshold 35 20% of the best hit Discarded isolated assignments Result : no false-positive assignment, short read can be used for metagenomic analysis, albeit at the cost of a high rate of underprediction Experiment 3-cont.  Roche GS20 sequencing  Data set     2000 reads from random positions in the E.coli K12 ~100 bp BALSTX – NCBI-NR Filters     Bit-score threshold 35 20% of the best hit Discarded isolated assignments Result Experiment 3-cont.  Roche GS20 sequencing  Data set      2000 reads from random positions in the B. bacteriovorus HD100 ~100 bp BALSTX – NCBI-NR : A in figure BLASTX – NCBI-NR without B.bacteriovorus HD100 : B in figure Filters     Bit-score threshold 35 20% of the best hit Discarded isolated assignments Result MEGAN 3(June, 2009)  Suitable for very large datasets   Interests changed   Advances in the throughput and cost-efficiency of sequencing technology From ‘which species present’ to ‘What’s different?’ Features   Visualization technique for multiple database New statistical method for highlighting the difference in a pairwise comparison MEGAN3-cont.   Comparing 6 mouse gut with human gut Clickable, collapsible.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download MEGAN analysis of metagenomic data