* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download AnnotatorsInterface-GUS
Transposable element wikipedia , lookup
Metagenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genetic engineering wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Copy-number variation wikipedia , lookup
Genome (book) wikipedia , lookup
Long non-coding RNA wikipedia , lookup
X-inactivation wikipedia , lookup
Protein moonlighting wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Point mutation wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Polyadenylation wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene expression profiling wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene expression programming wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
RNA interference wikipedia , lookup
Helitron (biology) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Primary transcript wikipedia , lookup
History of RNA biology wikipedia , lookup
Microevolution wikipedia , lookup
Epitranscriptome wikipedia , lookup
Designer baby wikipedia , lookup
Gene nomenclature wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
RNA silencing wikipedia , lookup
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002 Outline Current annotation efforts Motivation for new annotation tool Requirements for new annotation tool Thoughts on design and implementation Future plans Current Annotation Efforts Overview of Current Efforts Automated annotation has been applied to the DoTS transcripts – Predicted gene ownership (clustering of assemblies) – BlastX against NR • Automated assignment of descriptions based on similarity – BlastX against ProDom and RPS-Blast against CDD • Predicted GO Functions – Framefinder • Predicted Protein Sequences – Blat alignments – EPCR, Index Words, etc… Manual annotation efforts have focused on – validating the automated annotation and – adding additional information at the central dogma level Manual annotation of the gene index utilizes an annotation tool, the GUS Annotator Interface, which directly updates the GUSdev database. DoTS RNA transcripts The assembly of sequences generates a consensus sequence or DoTS transcript Incoming Sequences (EST/mRNA) •GenBank, dbEST sequences •Make Quality (remove vector, polyA, NNNs) “Quality” sequences •Block with RepeatMasker Blocked sequences •Blastn to cluster sequences “Unassembled” clusters •Assemble sequences with CAP4 CAP4 assemblies (generate consensus sequences) Dots Consensus sequences BLASTn DoTs consensus sequences (98% identity, 150bps) Gene Cluster (RNA s in the Gene) Current Efforts: Gene Annotation (1) Gene RNA RNAInstance RNAFeature Assembly RNA_1 Instance_1 Feature_1 Assembly_1 RNA_2 Instance_2 Feature_2 Assembly_2 RNA_3 Instance_3 Feature_3 Assembly_3 RNA_4 Instance_4 Feature_4 Assembly_4 RNA_5 Instance_5 Feature_5 Assembly_5 … … … … Gene_A Task 1: Validation of Gene Membership Generate DoTS transcripts Current Efforts: Gene Annotation (2) Gene RNA RNAInstance RNAFeature Assembly RNA_1 Instance_1 Feature_1 Assembly_1 RNA_2 Instance_2 Feature_2 Assembly_2 RNA_3 Instance_3 Feature_3 Assembly_3 Instance_4 Feature_4 Assembly_4 Instance_5 Feature_5 Assembly_5 … … … Gene_A Gene_B RNA_4 RNA_5 - Removing RNAs from the cluster results in the creation of a new Gene - An entry is made in the MergeSplit table for tracking purposes - Similar process followed when an RNA is added to a Gene Generate DoTS transcripts Current Efforts: Gene Annotation (3) Task 2: Assign Reference RNA – will be annotated further – RNA table Task 3: Assign Approved Gene Name/Symbol – Gene Table – Evidence: Comment (specifies database link) Task 4: Assign Gene Description – Gene Table – Evidence: Comment Task 4: Associate known Gene synonyms – GeneSynonym table – Evidence: Comment Current Efforts: RNA Annotation Annotation of “Reference Sequence” Task 1: Assign/Confirm Description of assembly – RNA table Task 2: Confirm/Add/Delete GO Functions – ProteinGOFunction (in GUSdev, GO tables have been redesigned in GUS3.0) – Evidence: Comments or Similarity (ProDom, CDD-Pfam, CDDSmart, or NR) Current Annotator Interface Architecture Annotator Interface JDBC (Query Only) JavaServlet writes GUSdev executes “XML” file reads AnnotatorInterface Submitter GA-Plugin DBI(Insert/Update/Delete) Perl Object Layer Current Annotator Interface Current Gene Annotation Validate Cluster and Assign Reference RNA/Assembly Current Gene Annotation (cont.) Assign Gene Name/Symbol Assign Gene Description Assign Gene Synonym(s) Evidence Current RNA (and Protein) Annotation RNA Description GO Functions Evidence Allgenes Display of Gene Annotation Allgenes Display of RNA Annotation RNA Description (Confirmed or manually added GO Functions) Status of Current Annotation (as of June 20, 2002) 1289 manually reviewed genes – 1003 with gene name – 697 with gene synonyms – 1046 with description 6146 manually reviewed RNAs/DoTS assemblies 949 ‘proteins’ with reviewed GO function Motivation for new tool Want to annotate using genomic sequence • Create “curated” gene models specifying structure • Increase structure of annotation in GUS • Annotation of proteins • Redefinition of annotation tasks • Current interface not designed for this purpose Some Other Annotation Tools • • Artemis • • • Developed and used at Sanger Reads and writes flat files Supports rich set of annotations • Save as EMBL format Apollo • • • Combined effort including members from Sanger and Berkeley Flat files (CORBA access to ENSEMBL) 2 versions, currently being merged • • Sanger: annotation viewer Berkeley: focus on editing No Existing Tool To Meet All of Our Needs Requirements At a High Level Requirements: Graphical View Provide alignment of features on genomic sequence – could potentially display any feature type currently stored in GUS3.0 – features can be selected and used to generate “curated” features – similar to display and functionality in Apollo Toggle (or configure) the display of each feature type Zoom to sequence level and will include links to functionality relevant to the feature highlighted Also support creation of features “from scratch” – based on literature, etc. Detail editors provide ability to change endpoints, etc. Gene Annotation Create curated gene model – specify gene boundaries – specify location of exons (and thus introns) • 5' exon boundary (putative transcription start site) • 3' exon boundary (include poly adenylation signal) – automatic creation of Gene entry – merge with existing gene instances through GeneInstance table – tables/views affected: • • • • • GeneFeature ExonFeature GeneInstance Gene MergeSplit – evidence: features used to create model, PubMed ID – should be as easy as clicking on existing features and saying make curated (then can modify endpoints, etc. if needed) Gene Annotation (2) Assign (HUGO or MGI approved) abbrievated gene name/symbol – Gene Table – Evidence: ExternalDatabaseLink Assign full gene name (MGI or HUGO full gene name) – Gene Table – Evidence: ExternalDatabaseLink Assign abbrievated gene name/symbol synonyms (non-approved gene symbols) – GeneSynonym Table – Evidence: ExternalDatabaseLink Assign full gene name aliases – GeneAlias Table – Evidence: ExternalDatabaseLink Gene Annotation (3) Assign gene category (e.g. non-coding) – Gene Table – Evidence: • ExternalDatabaseLink/Literature Reference • Similarity (eg. to known non-coding RNA) Confirm/assign gene chromosomal location – GeneChromosomalLocation – Evidence: • ExternalDatabaseLink/Literature Reference • RH mapping data • Alignments/Features OMIM Link assignment (verification if computationally determined) – ExternalDatabaseLink RNA Annotation (1) Create “curated RNAs” – – – – – – Define RNA transcript forms of gene (create RNAs) Using exons defined by curated gene 5' and 3' UTRs Automatic creation of RNA entry Merge existing RNA instances Tables affected: • • • • RNAFeature UTRFeature RNAInstance RNA – Evidence: Features used to create Assign RNA categories to created RNAs (e.g. alternative form) – RNARNACategory Table RNA Annotation Assign (or confirm computed) RNA description – RNA table – Evidence: Gene from which it is derived Anatomy expression assignment(s) – RNAAnatomy – RNAAnatomyLOE – Evidence: • ExternalDatabaseLink/Literature references • Assembly anatomy percent from DoTS • RAD experiments Assign GO terms to curated RNA (non-coding RNAs, e.g. small RNA involved in splicing) – GOTermAssociation – GOTermAssociationEvid – Evidence: ExternalDatabaseLInk, Literature References Requirements: Protein Annotation Confirm/assign GO Function – GOTermAssociation, GOTermAssociationEvid – Evidence: ExternalDatabaseLink and/or Literature References Confirm/assign GO Biological Process – GOTermAssociation, GOTermAssociationEvid – Evidence: ExternalDatabaseLink and/or Literature References Confirm/assign GO Cellular Component – GOTermAssociation, GOTermAssociationEvid – Evidence: ExternalDatabaseLink and/or Literature References Assign protein name – Protein Table – Evidence: ExternalDatabaseLink, Literature Ref, Similarities Assign protein name synonyms – Protein Table – Evidence: ExternalDatabaseLink, Literature Ref, Similarities Protein Annotation (2) Assign protein category (post-translational modifications) – ProteinProteinCategory – Evidence: ExternalDatabaseLink, Literature References Protein-protein interactions assigned – Interaction – InteractionInteractionLOE – Evidence: PubMed ID, etc. Protein pathway assignments – PathwayInteraction (for newly created interactions) – Still under consideration: What is best way to link with existing pathway • for example, Pathway is represented in DoTS, and we want to say that this curated Protein is really the same as a protein in a pathway. Next Steps/ Open Issues Completion of Java Object Layer Decision regarding BioJava wrappers – What exactly will this give us to aid in interface development (eg. FeatureRenderer, etc…) Discussion on layout of interface – Joan’s input after experimentation with other tools Depending on the above : – Client Side portion which communicates with remote GUS Server – Interface Implementation