* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download VanBUG_quackenbush
Therapeutic gene modulation wikipedia , lookup
Quantitative comparative linguistics wikipedia , lookup
Gene expression programming wikipedia , lookup
Metagenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Meeting the Bioinformatics Challenges of Functional Genomics VanBUG 11 September 2003 Acknowledgments <johnq@tigr.org> TIGR Human/Mouse/Arabidopsis H. Lee Moffitt Center/USF Expression Team Timothy J. Yeatman Emily Chen Greg Bloom Bryan Frank Renee Gaspard PGA Collaborators Jeremy Hasseman Gary Churchill (TJL) Lara Linford Greg Evans (NHLBI) Fenglong Liu Harry Gavras (BU) Simon Kwong Howard Jacob (MCW) John Quackenbush Anne Kwitek (MCW) Shuibang Wang Allan Pack (Penn) Yonghong Wang Emeritus Beverly Paigen (TJL) Ivana Yang Jennifer Cho (TGI) Luanne Peters (TJL) Yan Yu Ingeborg Holt (TGI) David Schwartz (Duke) Array Software Hit Team Feng Liang (TGI) Nirmal Bhagabati Kristie Abernathy (mA) TIGR PGA Collaborators John Braisted Sonia Dharap (mA) Norman Lee Tracey Currier Julie Earle-Hughes (mA) Renae Malek Jerry Li Cheryl Gay (mA) Hong-Ying Wang Wei Liang Priti Hegde (mA) Truong Luu John Quackenbush Rong Qi (mA) Bobby Behbahani Alexander I. Saeed Erik Snesrud (mA) Vasily Sharov Heenam Kim (mA) Mathangi Thiagarajan Funding provided by the Department of Energy Joseph White and the National Science Foundation Assistant Funding provided by the National Cancer Institute, Sue Mineo the National Heart, Lung, Blood Institute, and the National Science Foundation The TIGR Gene Index Team Foo Cheung Svetlana Karamycheva Yudan Lee Babak Parvizi Geo Pertea Razvan Sultana Jennifer Tsai John Quackenbush Joseph White TIGR Faculty, IT Group, and Staff Acknowledgments <johnq@tigr.org> Thanks to Syntek, Inc. <http://www.syntek.com> for GeneShaving MeV module and assistance with MyMADAM Thanks to DataNaut, Inc. <http://www.datanaut.com> for RelNet and Terrain Map modules and assistance with Client/Server MeV <tm4@tigr.org> Science is built with facts as a house is with stones – but a collection of facts is no more a science than a heap of stones is a house. – Jules Henri Poincare There are 1011 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman, physicist, Nobel laureate (1918-1988) Microarray Analysis at TIGR Step 1: Experimental Design Step 2: Data Collection Step 3: Data Analysis Step 4: Consulting with the ArraySW gang in the trailer Step 5: Sharing data with our collaborators Steps in the Process Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results Steps in the Process Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results TIGR Gene Indices home page www.tigr.org/tdb/tgi ~60 species >16,000,000 sequences TGICL Tools are available – with more coming Geo Pertea Razvan Sultana Valentin Antonescu Available with source Gene Index Assembly process ESTs from GenBank (dbEST) Expressed Transcripts (ET) from GenBank CDS TIGR ESTs reduce redundancy remove vector, poly-A, adapter,mitochondrial and ribosomal sequence High stringency pairwise comparisons to build Clusters Each cluster is assembled to obtain Tentative Consensus sequences (TCs) Annotate TCs and release The Mouse Gene Index <http://www.tigr.org/tdb/mgi> A TC Example GO Terms and EC Numbers Babak Parvizi The TIGR Gene Indices <http://www.tigr.org.tdb/tdb/tgi> Dan Lee, Ingeborg Holt Building TOGs: Reflexive, Transitive Closure And Paralogues Tentative Orthologues Thanks to Woytek Makałowski and Mark Boguski TOGA: An Sample Alignment: bithoraxoid-like protein Gene Finding in Humans is easy! Razvan Sultana Gene Finding in Humans is easy? Razvan Sultana Gene Finding in Humans is difficult? Razvan Sultana Gene Finding in Humans is difficult? A genome and its annotation is only a hypothesis that must be tested. Razvan Sultana RESOURCERER Jennifer Tsai http://pga.tigr.org/tools.shtml RESOURCERER: An Example RESOURCERER: Using Genetic Markers Just added: Integrated QTLs Steps in the Process Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results SOPs are available Coming: Data QC SOP cDNA/template prep PCR purification Printing RNA labeling Hybridization <http://pga.tigr.org/tools.shtml> What data should we collect?Nature Genetics 29, December 2001 MAGE-ML – XML-based data exchange format <http://www.mged.org> EVERYTHING MIAME Relational Schema What’s Wrong with MIAME? MIAME was designed as a model for capturing information necessary to create public databases. MIAME-based databases lack LIMS capabilities, which are necessary for large-scale studies. We do not want to store images in our database for practical reasons – limited space. We needed to develop a variety of tools adapted to our existing infrastructure and legacy data and databases. Probes are labeled and applied to the arrays An “experiment” is a hybridization A “study” is a collection of hybridization experiments MAD Microarray Database Schema Conceptual Schema: MAD Clone Slide Slide_type Spot New_plate Gene Hyb Study Experiment Expression Expt_probe Probe Probe_source PCR Protocol Primer_pair Scan Analysis Normalize Primer MADAM: Microarray Data Manager Marie-Michelle Cordonnier-Pratt, UGA converted MySQL to Oracle and made MADAM work! Available with source and MySQL ExpDesigner Steps in the Process Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results Microarray Overview I Microtiter Plate Microbial ORFs Design PCR Primers Microarray Slide (with 60,000 or more spotted genes) + PCR Products Eukaryotic Genes Select cDNA clones PCR Products Many different plates For each plate set, containing different genes many identical replicas Microarray Overview Selected Genes PCR Scorer Reads/loads primer data file to MAD and allows PCR data entry, and translation of 96 384. (Alex Saeed, developer and maintainer enhancements: Wedge Smith) Primer Design Clone Selection Primer Synthesis PCR Amplification Gel-based Scoring MAD The Beast: Microarray Robot from Intelligent Automation <http://www.ias.com> Additional Software for Arrays: Scheduler Microarray Scheduler Allows scheduling of all instruments Designed and maintained by Jerry Li Available with source Microarray Overview Amplified/Purified Genes Loaded in Arrayer Run Parameters Set Slides Printed SliTrack/Controller Takes Slide Order and Run parameters, generates spot order, IAS control file, launches IAS run software, loads database. (J. Li, developer and maintainer) MAD Steps in the Process Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results Microarray Overview II Measure Fluorescence in 2 channels red/green Control Test Prepare Fluorescently Labeled Probes Hybridize, Wash Analyze the data to identify patterns of gene expression Microarray Overview II Measure Fluorescence in 2 channels red/green Weed Control Test Prepare Fluorescently Labeled Probes Bush Hybridize, Wash Analyze the data to identify patterns of gene expression Microarray Overview II Measure Fluoresence in 2 channels red/green Control Test Prepare Fluorescently Labeled Probes Obtain RNA Samples Hybridize, Wash Analyze the data to identify differentially expressed genes Microarray Overview Control MADAM Allows data entry (J. Li & J. White, web prototype; A. Saeed, J. White, J.Li, & V. Sharov, developers) Test Obtain RNA Samples Prepare Fluorescently Labeled Probes Hybridize, Wash MAD Microarray Overview Control MABCOS Uses Bar Codes to track samples (J. Li developer) Test Obtain RNA Samples Prepare Fluorescently Labeled Probes Hybridize, Wash MAD Available with source MADAM + mMAP Allows data entry, Paired TIFF moves files/renames to Image Files long-term storage (A. Saeed, J. White, J.Li, & V. Sharov, developers) Microarray Overview NetAPP MAD Microarray Overview Spotfinder Provides Image Analysis, writes data to flat files or directly to db (V. Sharov, developer and maintainer) NetAPP Available as Executable for Windows; device-independent C/C++ coming MAD The TIGR Array Software System PCRSCORE SpotFinder SLITRACK MADAM MAD McCoder MABCOS ExpDesigner MIDAS MeV Data Normalization and Filtering Lowess Normalization Why LOWESS? A SD = 0.346 Observations 1. Intensity-dependent structure 2. Data not mean centered at log2(ratio) = 0 LOWESS (Cont’d) Local linear regression model Tri-cube weight function Least Squares yi xi A w( xi ) ( yi xi ) 2 w( x ) ( y x ) i ( X 'WX ) 1 X 'WY i i 2 0 Estimated values of log2(Cy5/Cy3) as function of log10(Cy3*Cy5) SD = 0.346 LOWESS Results “Slice Analysis” (Intensity-dependent Z-score) MIDAS: Data Analysis Wei Liang Adding Error Models, MAANOVA, Automated Reporting Available with OSI source Microarray Overview MIDAS Performs data normalization and filtering, including, soon, ANOVA MAD MIDAS MAD Steps in the Process Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results MeV: Data Mining Tools Available with OSI source Alexander Saeed Alexander Sturn Nirmal Bhagabati John Braisted Syntek Inc. Datanaut, Inc. MeV: Metabolic pathway analysis is coming Maria Klapa and Chris Koenig Analyses available in MeV... Hierarchical clustering (HCL) Bootstrapped/Jackknifed HCL k-means clustering (KMC) k-means support (iterative KMC) Self-Organizing Maps (SOMs) Cluster Affinity Search Technique (CAST) Figure of Merit for CAST and KMC (soon SOM) QT-clust (Heyer Jackknife) Principal component analysis (PCA) Gene Shaving Relevance Networks Support Vector Machines (SVM) Self-Organizing Trees Classification approaches, including Template Matching t-tests Significance Analysis of Microarrays (SAM) ANOVA tools GO, Metabolic Pathway, and Genome Localization annotation/clustering Client-server mode with well-defined API Missing from MeV... MAGE-ML output for direct submission to databases ... Coming in the next MADAM release. Links to BioConductor … are coming. Array CGH module from Barb Weber and Adam Margolin ... is coming. EASE module from Doug Hosack ... is coming Lots of stuff we are not smart enough to think about. Sleep Deprivation Studies in Mouse 0 3 z 6 z z z z z 9 z z z z z z z z z z z z z z z z z z z z z z z z z z z z z z z 12 z z z z z z z z Experimental Paradigm Compare gene expression between sleeping and sleep-deprived mice in cortex and hypothalamus Perform 3 biological replicates Normalize and filter data and use data mining techniques to select distinct patterns of gene expression Use Gene Ontology (GO) assignments to classify genes by cellular localization, molecular function, biological process Use GO analysis to develop an understanding of response Differential Expression in Cortex Stress Response Intermediate Metabolism and Signal Transduction Energy Metabolism Transcription; Mitochondrial and Ribosomal Proteins Differential Expression in Hypothalamus Sleep signaling EASE Analysis of GO terms GO Class GO Cellular Component GO Molecular Function Cortex – Up-regulated Genes GO Category endoplasmic reticulum heat shock protein activity pyruvate dehydrogenase (lipoamide) phosphatase activity chaperone activity p-value 6.0610-03 8.7810-04 3.1710-03 7.3810-03 Themes: Cortex – Down-regulated Genes GO Class Gene Category General biological trends based on representation of p-value GO Biological Process biosynthesis 2.8510 functional rolesprotein on the array protein metabolism 1.0010 electron transport 6.0410 Problem: GO Cellular Component ribosome 5.9510 complex 1.1710 Requirement ofribonucleoprotein functional class assignment limits utility eukaryotic 48S initiation complex 9.7410 for discovery ofeukaryotic new functional networks 43S pre-initiation complex 2.6810 GO Molecular Function mitochondrial inner membrane structural constituent of ribosome RNA binding activity cytochrome c oxidase activity hydrogen ion transporter activity Hosack, et al. 2003 -25 -11 -03 -37 -32 -18 -15 3.7010-03 6.4610-39 4.8310-21 9.7910-04 1.8810-03 Thanks to Doug Hosack, NIAID Now available... The TGI databases, including RESOURCERER The TGICL Gene Index Clustering and Assembly Tools A freely-available MySQL version of our MIAMEsupportive database A freely-available, open source, java-based set of tools: MADAM: Microarray Data Manager MIDAS: Microarray Data Analysis System MeV: Multiexperiment Viewer A freely-available, image processing software system linked to the database: TIGR Spotfinder Nobody in the game of football should be called a genius. A genius is somebody like Norman Einstein. -Joe Theisman, Former quarterback A theory has only the possibility of being right or wrong. A model has a third possibility; it may be right but irrelevant. – Manfred Eigen Unless a reviewer has the courage to give you unqualified praise, I say ignore the bastard. - John Steinbeck Acknowledgments <johnq@tigr.org> TIGR Human/Mouse/Arabidopsis H. Lee Moffitt Center/USF Expression Team Timothy J. Yeatman Emily Chen Greg Bloom Bryan Frank Renee Gaspard PGA Collaborators Jeremy Hasseman Gary Churchill (TJL) Heenam Kim Greg Evans (NHLBI) Lara Linford Harry Gavaras (BU) Simon Kwong Howard Jacob (MCW) John Quackenbush Anne Kwitek (MCW) Shuibang Wang Allan Pack (Penn) Yonghong Wang Emeritus Beverly Paigen (TJL) Ivana Yang Jennifer Cho (TGI) Luanne Peters (TJL) Yan Yu Ingeborg Holt (TGI) David Schwartz (Duke) Array Software Hit Team Feng Liang (TGI) Nirmal Bhagabati Kristie Abernathy (mA) TIGR PGA Collaborators John Braisted Sonia Dharap(mA) Norman Lee Tracey Currier Julie Earle-Hughes (mA) Renae Malek Jerry Li Cheryl Gay (mA) Hong-Ying Wang Wei Liang Priti Hegde (mA) Truong Luu John Quackenbush Rong Qi (mA) Bobby Behbahani Alexander I. Saeed Erik Snesrud (mA) Vasily Sharov Mathangi Thaiagarjian Funding provided by the Department of Energy Joseph White and the National Science Foundation Assistant Funding provided by the National Cancer Institute, Sue Mineo the National Heart, Lung, Blood Institute, and the National Science Foundation The TIGR Gene Index Team Foo Cheung Svetlana Karamycheva Yudan Lee Babak Parvizi Geo Pertea Razvan Sultana Jennifer Tsai John Quackenbush Joseph White TIGR Faculty, IT Group, and Staff