Download AnnotatorsInterface-GUS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transposable element wikipedia , lookup

Metagenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genetic engineering wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Copy-number variation wikipedia , lookup

Genome (book) wikipedia , lookup

Long non-coding RNA wikipedia , lookup

X-inactivation wikipedia , lookup

Protein moonlighting wikipedia , lookup

Genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Point mutation wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

NEDD9 wikipedia , lookup

Polyadenylation wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene expression profiling wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene expression programming wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA world wikipedia , lookup

Gene wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

Gene desert wikipedia , lookup

Gene therapy wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA interference wikipedia , lookup

RNA wikipedia , lookup

Helitron (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Primary transcript wikipedia , lookup

History of RNA biology wikipedia , lookup

Microevolution wikipedia , lookup

Epitranscriptome wikipedia , lookup

Designer baby wikipedia , lookup

Gene nomenclature wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA silencing wikipedia , lookup

Non-coding RNA wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Annotator Interface
Sharon Diskin
GUS 3.0 Workshop
June 18-21, 2002
Outline





Current annotation efforts
Motivation for new annotation tool
Requirements for new annotation tool
Thoughts on design and implementation
Future plans
Current Annotation Efforts
Overview of Current Efforts

Automated annotation has been applied to the DoTS transcripts
– Predicted gene ownership (clustering of assemblies)
– BlastX against NR
• Automated assignment of descriptions based on similarity
– BlastX against ProDom and RPS-Blast against CDD
• Predicted GO Functions
– Framefinder
• Predicted Protein Sequences
– Blat alignments
– EPCR, Index Words, etc…

Manual annotation efforts have focused on
– validating the automated annotation and
– adding additional information at the central dogma level

Manual annotation of the gene index utilizes an annotation tool, the GUS
Annotator Interface, which directly updates the GUSdev database.
DoTS RNA transcripts
The assembly of sequences generates
a consensus sequence or DoTS
transcript
Incoming
Sequences (EST/mRNA)
•GenBank, dbEST sequences
•Make Quality (remove vector,
polyA, NNNs)
“Quality” sequences
•Block with RepeatMasker
Blocked sequences
•Blastn to cluster sequences
“Unassembled” clusters
•Assemble sequences with CAP4
CAP4 assemblies
(generate consensus sequences)
Dots Consensus
sequences
BLASTn DoTs
consensus sequences
(98% identity, 150bps)
Gene Cluster
(RNA s in the Gene)
Current Efforts: Gene Annotation (1)
Gene
RNA
RNAInstance
RNAFeature
Assembly
RNA_1
Instance_1
Feature_1
Assembly_1
RNA_2
Instance_2
Feature_2
Assembly_2
RNA_3
Instance_3
Feature_3
Assembly_3
RNA_4
Instance_4
Feature_4
Assembly_4
RNA_5
Instance_5
Feature_5
Assembly_5
…
…
…
…
Gene_A
Task 1: Validation of Gene Membership
Generate
DoTS
transcripts
Current Efforts: Gene Annotation (2)
Gene
RNA
RNAInstance
RNAFeature
Assembly
RNA_1
Instance_1
Feature_1
Assembly_1
RNA_2
Instance_2
Feature_2
Assembly_2
RNA_3
Instance_3
Feature_3
Assembly_3
Instance_4
Feature_4
Assembly_4
Instance_5
Feature_5
Assembly_5
…
…
…
Gene_A
Gene_B
RNA_4
RNA_5
- Removing RNAs from the cluster results in the creation of a new Gene
- An entry is made in the MergeSplit table for tracking purposes
- Similar process followed when an RNA is added to a Gene
Generate
DoTS
transcripts
Current Efforts: Gene Annotation (3)
Task 2: Assign Reference RNA
– will be annotated further
– RNA table

Task 3: Assign Approved Gene Name/Symbol
– Gene Table
– Evidence: Comment (specifies database link)

Task 4: Assign Gene Description
– Gene Table
– Evidence: Comment

Task 4: Associate known Gene synonyms
– GeneSynonym table
– Evidence: Comment
Current Efforts: RNA Annotation
Annotation of “Reference Sequence”

Task 1: Assign/Confirm Description of assembly
– RNA table

Task 2: Confirm/Add/Delete GO Functions
– ProteinGOFunction (in GUSdev, GO tables have been redesigned in GUS3.0)
– Evidence: Comments or Similarity (ProDom, CDD-Pfam, CDDSmart, or NR)
Current Annotator Interface Architecture
Annotator Interface
JDBC (Query Only)
JavaServlet
writes
GUSdev
executes
“XML” file
reads
AnnotatorInterface
Submitter
GA-Plugin
DBI(Insert/Update/Delete)
Perl
Object
Layer
Current Annotator Interface
Current Gene Annotation
Validate Cluster and Assign Reference RNA/Assembly
Current Gene Annotation (cont.)
Assign Gene Name/Symbol
Assign Gene Description
Assign Gene Synonym(s)
Evidence
Current RNA (and Protein) Annotation
RNA Description
GO Functions
Evidence
Allgenes Display of Gene Annotation
Allgenes Display of RNA Annotation
RNA Description
(Confirmed or manually added GO Functions)
Status of Current Annotation
(as of June 20, 2002)

1289 manually reviewed genes
– 1003 with gene name
– 697 with gene synonyms
– 1046 with description

6146 manually reviewed RNAs/DoTS
assemblies

949 ‘proteins’ with reviewed GO function
Motivation for new tool
Want to annotate using genomic sequence
•
Create “curated” gene models specifying structure
•
Increase structure of annotation in GUS
•
Annotation of proteins
•
Redefinition of annotation tasks
•
Current interface not designed for this purpose
Some Other Annotation Tools
•
•
Artemis
•
•
•
Developed and used at Sanger
Reads and writes flat files
Supports rich set of annotations
•
Save as EMBL format
Apollo
•
•
•
Combined effort including members from Sanger and
Berkeley
Flat files (CORBA access to ENSEMBL)
2 versions, currently being merged
•
•
Sanger: annotation viewer
Berkeley: focus on editing
No Existing Tool To Meet All of Our Needs
Requirements At a High Level
Requirements: Graphical View

Provide alignment of features on genomic sequence
– could potentially display any feature type currently stored in
GUS3.0
– features can be selected and used to generate “curated”
features
– similar to display and functionality in Apollo



Toggle (or configure) the display of each feature type
Zoom to sequence level and will include links to
functionality relevant to the feature highlighted
Also support creation of features “from scratch”
– based on literature, etc.

Detail editors provide ability to change endpoints, etc.
Gene Annotation

Create curated gene model
– specify gene boundaries
– specify location of exons (and thus introns)
• 5' exon boundary (putative transcription start site)
• 3' exon boundary (include poly adenylation signal)
– automatic creation of Gene entry
– merge with existing gene instances through GeneInstance table
– tables/views affected:
•
•
•
•
•
GeneFeature
ExonFeature
GeneInstance
Gene
MergeSplit
– evidence: features used to create model, PubMed ID
– should be as easy as clicking on existing features and saying
make curated (then can modify endpoints, etc. if needed)
Gene Annotation (2)

Assign (HUGO or MGI approved) abbrievated gene name/symbol
– Gene Table
– Evidence: ExternalDatabaseLink

Assign full gene name (MGI or HUGO full gene name)
– Gene Table
– Evidence: ExternalDatabaseLink

Assign abbrievated gene name/symbol synonyms (non-approved
gene symbols)
– GeneSynonym Table
– Evidence: ExternalDatabaseLink

Assign full gene name aliases
– GeneAlias Table
– Evidence: ExternalDatabaseLink
Gene Annotation (3)

Assign gene category (e.g. non-coding)
– Gene Table
– Evidence:
• ExternalDatabaseLink/Literature Reference
• Similarity (eg. to known non-coding RNA)

Confirm/assign gene chromosomal location
– GeneChromosomalLocation
– Evidence:
• ExternalDatabaseLink/Literature Reference
• RH mapping data
• Alignments/Features

OMIM Link assignment (verification if computationally
determined)
– ExternalDatabaseLink
RNA Annotation (1)

Create “curated RNAs”
–
–
–
–
–
–
Define RNA transcript forms of gene (create RNAs)
Using exons defined by curated gene
5' and 3' UTRs
Automatic creation of RNA entry
Merge existing RNA instances
Tables affected:
•
•
•
•
RNAFeature
UTRFeature
RNAInstance
RNA
– Evidence: Features used to create

Assign RNA categories to created RNAs (e.g. alternative form)
– RNARNACategory Table
RNA Annotation

Assign (or confirm computed) RNA description
– RNA table
– Evidence: Gene from which it is derived

Anatomy expression assignment(s)
– RNAAnatomy
– RNAAnatomyLOE
– Evidence:
• ExternalDatabaseLink/Literature references
• Assembly anatomy percent from DoTS
• RAD experiments

Assign GO terms to curated RNA (non-coding RNAs, e.g. small
RNA involved in splicing)
– GOTermAssociation
– GOTermAssociationEvid
– Evidence: ExternalDatabaseLInk, Literature References
Requirements: Protein Annotation

Confirm/assign GO Function
– GOTermAssociation, GOTermAssociationEvid
– Evidence: ExternalDatabaseLink and/or Literature References

Confirm/assign GO Biological Process
– GOTermAssociation, GOTermAssociationEvid
– Evidence: ExternalDatabaseLink and/or Literature References

Confirm/assign GO Cellular Component
– GOTermAssociation, GOTermAssociationEvid
– Evidence: ExternalDatabaseLink and/or Literature References

Assign protein name
– Protein Table
– Evidence: ExternalDatabaseLink, Literature Ref, Similarities

Assign protein name synonyms
– Protein Table
– Evidence: ExternalDatabaseLink, Literature Ref, Similarities
Protein Annotation (2)

Assign protein category (post-translational modifications)
– ProteinProteinCategory
– Evidence: ExternalDatabaseLink, Literature References

Protein-protein interactions assigned
– Interaction
– InteractionInteractionLOE
– Evidence: PubMed ID, etc.

Protein pathway assignments
– PathwayInteraction (for newly created interactions)
– Still under consideration: What is best way to link with existing pathway
• for example, Pathway is represented in DoTS, and we want to say that this
curated Protein is really the same as a protein in a pathway.
Next Steps/ Open Issues


Completion of Java Object Layer
Decision regarding BioJava wrappers
– What exactly will this give us to aid in interface
development (eg. FeatureRenderer, etc…)

Discussion on layout of interface
– Joan’s input after experimentation with other tools

Depending on the above :
– Client Side portion which communicates with remote GUS
Server
– Interface Implementation