* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download ppt - IPAW
Survey
Document related concepts
Transcript
The craft of annotation Carole Goble Based on observations of the PRINTS protein fingerprint database Primary & Secondary databases Primary source generated by experimentalists. Role: standards, quality thresholds, dissemination •Sequence databases: EMBL, GenBank •Increasingly other data types: micro-array Secondary source derived from repositories, other secondary databases, analysis and expertise. Role: Distilled and accumulated specialist knowledge. Value added commentary. •Swiss-Prot, PRINTS, CATH, PAX6, Enzyme, dbSNP… Role: Warehouses to support analysis over replicated data • GIMS, aMAZE, InterPro… The “Annotation Pipeline” Analysis EMBL Analysis SwissProt Analysis PRINTS Interpro BLOCKS GPCRDB TrEMBL Analysis Annotation Distillation millions Expressed Sequence Tags nrdb 503,479 234,059 TrEMBL Swiss-Prot 85,661 InterPro 2990 PRINTS 1310 PRINTS PRINTS - a database of protein family “fingerprints” Fingerprints - groups of motifs excised from alignments –used to provide diagnostic signatures for protein families PRINTS forms basis of derived resources –e.g., blocks, emotif, InterPro Used in gene family analysis, genome annotation, etc. ID AC DE OS OC OC OX RN RP RX RA RT RL RN RP RX RA RT RL CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC DR DR DR DR DR KW Swiss-Prot annotation PRIO_HUMAN STANDARD; PRT; 253 AA. P04156; MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) (ASCR). Homo sapiens (Human). Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. NCBI_TaxID=9606; [1] SEQUENCE FROM N.A. MEDLINE=86300093 [NCBI, ExPASy, Israel, Japan]; PubMed=3755672; Kretzschmar H.A., Stowring L.E., Westaway D., Stubblebine W.H., Prusiner S.B., Dearmond S.J. "Molecular cloning of a human prion protein cDNA."; DNA 5:315-324(1986). [6] STRUCTURE BY NMR OF 23-231. MEDLINE=97424376 [NCBI, ExPASy, Israel, Japan]; PubMed=9280298; Riek R., Hornemann S., Wider G., Glockshuber R., Wuethrich K.; "NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231)."; FEBS Lett. 413:282-288(1997). -!- FUNCTION: THE FUNCTION OF PRP IS NOT KNOWN. PRP IS ENCODED IN THE HOST GENOME AND IS EXPRESSED BOTH IN NORMAL AND INFECTED CELLS. -!- SUBUNIT: PRP HAS A TENDENCY TO AGGREGATE YIELDING POLYMERS CALLED "RODS". -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR. -!- DISEASE: PRP IS FOUND IN HIGH QUANTITY IN THE BRAIN OF HUMANS AND ANIMALS INFECTED WITH NEURODEGENERATIVE DISEASES KNOWN AS TRANSMISSIBLE SPONGIFORM ENCEPHALOPATHIES OR PRION DISEASES, LIKE: CREUTZFELDT-JAKOB DISEASE (CJD), GERSTMANN-STRAUSSLER SYNDROME (GSS), FATAL FAMILIAL INSOMNIA (FFI) AND KURU IN HUMANS; SCRAPIE IN SHEEP AND GOAT; BOVINE SPONGIFORM ENCEPHALOPATHY (BSE) IN CATTLE; TRANSMISSIBLE MINK ENCEPHALOPATHY (TME); CHRONIC WASTING DISEASE (CWD) OF MULE DEER AND ELK; FELINE SPONGIFORM ENCEPHALOPATHY (FSE) IN CATS AND EXOTIC UNGULATE ENCEPHALOPATHY(EUE) IN NYALA AND GREATER KUDU. THE PRION DISEASES ILLUSTRATE THREE MANIFESTATIONS OF CNS DEGENERATION: (1) INFECTIOUS (2) SPORADIC AND (3) DOMINANTLY INHERITED FORMS. TME, CWD, BSE, FSE, EUE ARE ALL THOUGHT TO OCCUR AFTER CONSUMPTION OF PRION-INFECTED FOODSTUFFS. -!- SIMILARITY: BELONGS TO THE PRION FAMILY. HSSP; P04925; 1AG2. [HSSP ENTRY / SWISS-3DIMAGE / PDB] MIM; 176640; -. [NCBI / EBI] InterPro; IPR000817; -. Pfam; PF00377; prion; 1. PRINTS; PR00341; PRION. Prion; Brain; Glycoprotein; GPI-anchor; Repeat; Signal; Polymorphism; Disease mutation. gc; gx; gn; ga; gt; gp; bb; gr; bb; gd; bb; si; si; sd; sd; sd; sd; sd; sd; sd; bb; ci; ci; cr; cd; cd; cd; cd; cd; cd; cd; cd; cd; bb; tp; KA; tp; KA; tp; KA; bb; tt; tt; tt; tt; tt; tt; tt; Nude PRINTS entry manual annotation SUMMARY INFORMATION ------------------37 codes involving 0 codes involving 0 codes involving 0 codes involving 0 codes involving 1 codes involving 0 codes involving 8 7 6 5 4 3 2 elements elements elements elements elements elements elements COMPOSITE FINGERPRINT INDEX --------------------------8| 37 37 37 37 37 37 37 37 7| 0 0 0 0 0 0 0 0 6| 0 0 0 0 0 0 0 0 5| 0 0 0 0 0 0 0 0 4| 0 0 0 0 0 0 0 0 3| 1 0 0 0 1 1 0 0 2| 0 0 0 0 0 0 0 0 --+----------------------------------------| 1 2 3 4 5 6 7 8 PRIO_COLGU P40251 M1 PRIO_GORGO P40252 M1 PRIO_SHEEP P23907 M1 PRIO_COLGU PRIO_MACFA PRIO_CEREL PRIO_ODOHE PRIO_GORGO PRIO_PANTR PRIO_HUMAN PRIO_MACFA P40254 M1 PRIO_PANTR P40253 M1 PRIO_CALJA P40247 M1 MAJOR MAJOR MAJOR MAJOR MAJOR MAJOR MAJOR PRION PRION PRION PRION PRION PRION PRION PRIO_CEREL P79142 M1 PRIO_HUMAN P04156 M1 PRIO_BOVIN P10279 M1 PROTEIN PROTEIN PROTEIN PROTEIN PROTEIN PROTEIN PROTEIN PRIO_ODOHE P47852 M1 O46648 O46648 M1 PRP2_BOVIN Q01880 M1 PRECURSOR PRECURSOR PRECURSOR PRECURSOR PRECURSOR PRECURSOR PRECURSOR (PRP) (PRP) (PRP) (PRP) (PRP) (PRP) (PRP) SWISS-PROT IDs (PRP27-30) (PRP33-35C) - COLOBUS GUEREZA. (PRP27-30) (PRP33-35C) - MACACA FASCICULARIS (CRAB EATING MACAQUE) - CERVUS ELAPHUS (RED DEER). - ODOCOILEUS HEMIONUS (MULE DEER) (BLACK-TAILED DEER). (PRP27-30) (PRP33-35C) - GORILLA GORILLA GORILLA (LOWLAND GORILLA) (PRP27-30) (PRP33-35C) - PAN TROGLODYTES (CHIMPANZEE) (PRP27-30) (PRP33-35C) (ASCR) - HOMO SAPIENS (HUMAN). Low Level Annotation Prion protein signature PROSITE; PS00291 PRION_ 1; PS00706 PRION_ 2 BLOCKS; BL00291 PFAM; PF00377 prion INTERPRO; IPR000817 1. STAHL, N. AND PRUSINER, S. B. Prions and prion proteins. FASEB J. 5 2799- 2807 (1991). Annotation: “High-level” Semi-structured text-based annotation, representing the accumulated knowledge of the biological community about the data entry Intellectually formed – the accumulated knowledge of an expert distilling the aggregated information drawn from multiple data sources and analyses, and the annotators knowledge. Culled from other sources such as other database entries annotations and the literature. Intended to be human readable rather than machine processable. gc; gx; gt; gp; gp; gp; gp; bb; gr; gr; gr; gr; gr; gr; gr; gr; gr; gr; gr; bb; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; PRION PR00341 Prion protein signature INTERPRO; IPR000817 PROSITE; PS00291 PRION_1; PS00706 PRION_2 BLOCKS; BL00291 PFAM; PF00377 prion 1. STAHL, N. AND PRUSINER, S.B. Prions and prion proteins. FASEB J. 5 2799-2807 (1991). 2. BRUNORI, M., CHIARA SILVESTRINI, M. AND POCCHIARI, M. The scrapie agent and the prion hypothesis. TRENDS BIOCHEM.SCI. 13 309-313 (1988). PRINTS Annotation (manual) 3. PRUSINER, S.B. Scrapie prions. ANNU.REV.MICROBIOL. 43 345-374 (1989). Prion protein (PrP) is a small glycoprotein found in high quantity in the brain of animals infected with certain degenerative neurological diseases, such as sheep scrapie and bovine spongiform encephalopathy (BSE), and the human dementias Creutzfeldt-Jacob disease (CJD) and Gerstmann-Straussler syndrome (GSS). PrP is encoded in the host genome and is expressed both in normal and infected cells. During infection, however, the PrP molecules become altered and polymerise, yielding fibrils of modified PrP protein. PrP molecules have been found on the outer surface of plasma membranes of nerve cells, to which they are anchored through a covalent-linked glycolipid, suggesting a role as a membrane receptor. PrP is also expressed in other tissues, indicating that it may have different functions depending on its location. The primary sequences of PrP's from different sources are highly similar: all bear an N-terminal domain containing multiple tandem repeats of a Pro/Gly rich octapeptide; sites of Asn-linked glycosylation; an essential disulphide bond; and 3 hydrophobic segments. These sequences show some similarity to a chicken glycoprotein, thought to be an acetylcholine receptor-inducing activity (ARIA) molecule. It has been suggested that changes in the octapeptide repeat region may indicate a predisposition to disease, but it is not known for certain whether the repeat can meaningfully be used as a fingerprint to indicate susceptibility. PRION is an 8-element fingerprint that provides a signature for the prion proteins. The fingerprint was derived from an initial alignment of 5 sequences: the motifs were drawn from conserved regions spanning virtually the full alignment length, including the 3 hydrophobic domains and the octapeptide repeats (WGQPHGGG). Two iterations on OWL18.0 were required to reach convergence, at which point a true set comprising 9 sequences was identified. Several partial matches were also found: these include a fragment (PRIO_RAT) lacking part of the sequence bearing the first motif,and the PrP homologue found in chicken - this matches well with only 2 of the 3 hydrophobic motifs (1 and 5) and one of the other conserved regions (6), but has an N-terminal signature based on a sextapeptide repeat (YPHNPG) rather than the characteristic PrP octapeptide. High level annotation Prion protein (PrP) is a small glycoprotein found in high quantity in the brain of animals infected with certain degenerative neurological diseases, such as sheep scrapie and bovine spongiform encephalopathy (BSE), and the human dementias Creutzfeldt- Jacob disease (CJD) and Gerstmann- Straussler syndrome (GSS). PRINTS Annotation Process Finger Print Process Blank Annotation MEDLINE OMIM PRINTS GRAP Annotation gathering Editorial culling Tag Decor -ation Filled Annotation Knowledge SWISS-PROT heuristics mapping rules PRINTS Annotation Process For all matches to a fingerprint, full SWISS-PROT entry is retrieved: tp; PRIO_COLGU tp; PRIO_GORGO tp; PRIO_SHEEP PRIO_MACFA PRIO_PANTR PRIO_CALJA PRIO_CEREL PRIO_ODOHE PRIO_HUMAN O46648 PRIO_BOVIN PRP2_BOVIN ID analysis determines if the entry is a super-family, family or domain This is essential as influences how the annotation is processed: tp; URIC_RAT URIC_MOUSE URIC_RABIT URIC_PAPHA tp; URIC_PIG URIC_DROPS URIC_DROME URIC_DROVI tp; URIC_SOYBN URIC_EMENI URIC_ASPFL URID_CANLI tp; MUP5_MOUSE LACB_BOVIN LACB_BUBAR LACB_CAPHI tp; MUP_RAT RET1_ONCMY RET2_ONCMY PURP_CHICK tp; RETB_HUMAN ICYA_MANSE ICYB_MANSE CRA2_HOMGA tp; UROT_HUMAN PLMN_PIG PLMN_HUMAN PLMN_BOVIN tp; APOA_HUMAN UROK_HUMAN APOA_MACMU UROK_PIG tp; THRB_BOVIN HGFL_MOUSE THRB_HUMAN HGFL_HUMAN PRINTS Annotation Process ID analysis usually reveals families unambiguously the comment field helps to resolve super-families from domains CC CC CC CC -!-!-!-!- SIMILARITY: BELONGS TO THE PRION FAMILY SIMILARITY: BELONGS TO THE URICASE FAMILY SIMILARITY: BELONGS TO THE LIPOCALIN FAMILY SIMILARITY: CONTAINS 38 KRINGLE REGIONS Once entry type established, appropriate precis is constructed Shared annotation is engineered to provide a report detailing the function & structure of the protein the disease(s) with which it is associated the family to which it belongs a set of literature references a list of keywords Any other remarks The precis is then fed into a naked pre-PRINTS file. Output is English. Swiss-Prot tag Heuristics PRINTS tag --- Comment field ------> Description Copy gt (title) RAuthor, RTitle, Rlocation Common + Filters: •Top four - Date priority •Mixed paper subject portfolio gr (reference) Database cross Reference fields Common + Filters: -Preferred links gp (other databases) KeyWords Up to a threshold of common keywords gd (general annotation) Function Majority vote function Subcellular location Majority vote subcellular location Disease Golden vote -Sequence provenance disease Similarity tag Cluster on SWISS-PROT codes Majority vote for families Even distribution for superfamilies and domains family Subunit An indication of structure subunit (structure) RP Structure Paper type classification - 1 crystallographic - 1 NMR structure -Preferred order Swiss-Prot Redundancy OPSD SHEEP OPSD HUMAN OPSD MOUSE DR PRINTS; PR00237; GPCRRHODOPSN. DR PRINTS; PR00237; GPCRRHODOPSN. DR PRINTS; PR00237; GPCRRHODOPSN. OPSD SHEEP ABSORBING VISUAL PIGMENTS ARE THE LIGHT- MOLECULES THAT MEDIATE VISION OPSD HUMAN VISION OPSD MOUSE ABSORBING VISUAL PIGMENTS ARE THE LIGHT- ABSORBING MOLECULES THAT MEDIATE VISUAL PIGMENTS ARE THE LIGHTMOLECULES THAT MEDIATE VISION Redundancy elimination ACM1 HUMAN ACM4 HUMAN ACM2 HUMAN Primary transducing effect is pi turnover. Primary transducing effect is inhibition of adenylate cyclase. Primary transducing effect is adenylate cyclase inhibition. Databases: majority vote Major prion protein precursor (PRP) PRINTS; PR00341 PRION PROSITE; PS00291 PRION_ 1; PS00706 PRION_ 2 PFAM; PF00377 prion INTERPRO; IPR000817 PDB; 1B10; 1AG2 References: date ranking ++ 1. CERVENAKOVA, L., [...] Infectious amyloid precursor gene sequences in primates used for experimental transmission of human spongiform encephalopathy. PROC. NATL. ACAD. SCI. USA 91 12159- 12162 (1994). 2. LOWENSTEIN, D. H., [...] Three hamster species with different scrapie incubation times and neuropathological features encode distinct prion proteins. MOL. CELL. BIOL. 10 1153- 1163 (1990). 3. KALUZ, S., [...] Disease – Golden Voting. (PRIO_ HUMAN; P04156): Prp is found in high quantity in the brain of humans and animals infected with neurodegenerative diseases known as transmissible spongiform encephalopathies or prion diseases [...] (PRIO_ HUMAN; P04156): Kuru is transmitted during ritualistic cannibalism, among natives of the new guinea highlands. [...] (PRIO_ SHEEP; P23907): Polymorphism at position 171 may be related to the alleles of scrapie [...] PRINTS Annotation Process Finger Print Process Blank Annotation MEDLINE OMIM PRINTS GRAP Annotation gathering Editorial culling Tag Decor -ation Filled Annotation Knowledge SWISS-PROT heuristics mapping rules gc; gx; gt; gp; gp; gp; gp; gp; gp; gp; bb; gr; gr; gr; gr; gr; gr; gr; gr; bb; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; PRIO Major prion protein precursor (PRP) signature PROSITE; PS00291 PRION_1; PS00706 PRION_2 INTERPRO; IPR000817 PFAM; PF00377 prion PDB; 1B10; 1AG2 SCOP; 1B10; 1AG2 CATH; 1B10; 1AG2 MIM; 176640; 123400; 137440; 245300; 600072 PRECIS annotation 1. LOWENSTEIN, D.H., BUTLER, D.A., WESTAWAY, D., MCKINLEY, M.P., DEARMOND, S.J. AND PRUSINER, S.B. Three hamster species with different scrapie incubation times and neuropathological features encode distinct prion proteins. MOL.CELL.BIOL. 10 1153-1163 (1990). 5. RIEK, R., HORNEMANN, S., WIDER, G., GLOCKSHUBER, R. AND WUETHRICH, K. NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231). FEBS LETT. 413 282-288 (1997). The function of prp is not known. Prp is encoded in the host genome and is expressed both in normal and infected cells. (PRIO_HUMAN; P04156): Prp is found in high quantity in the brain of humans and animals infected with neurodegenerative diseases known as transmissible spongiform encephalopathies or prion diseases, like: creutzfeldt-jakob disease (cjd), gerstmann-straussler syndrome (gss), fatal familial insomnia (ffi) and kuru in humans; scrapie in sheep and goat; bovine spongiform encephalopathy (bse) in cattle; transmissible mink encephalopathy (tme); chronic wasting disease (cwd) of mule deer and elk; feline spongiform encephalopathy (fse) in cats and exotic ungulate encephalopathy (eue) in nyala and greater kudu. The prion diseases illustrate three manifestations of cns degeneration: (1) infectious (2) sporadic and (3) dominantly inherited forms. Tme, cwd, bse, fse, eue are all thought to occur after consumption of prion-infected foodstuffs. Prp has a tendency to aggregate yielding polymers called "rods". The structure has been determined, e.g. "NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231)" [5]. Belongs to the prion family. Keywords: GPI-anchor; Repeat; Signal; Prion; Brain; Glycoprotein; Polymorphism; Disease mutation; 3D-structure. PRIO is an 8-element fingerprint that provides a signature for the Major prion protein precursor (PRP). The fingerprint was derived from an initial alignment of 6 sequences: the motifs were drawn from conserved regions spanning virtually the full alignment length. Two iterations on SPTR37_9f were required to reach convergence, at which point a true set comprising 37 sequences was identified. A single partial match was also found: (PRIO_CHICK; P27177). gc; gx; gt; gp; gp; gp; gp; bb; gr; gr; gr; gr; gr; gr; gr; gr; gr; gr; gr; bb; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; gd; PRION PR00341 Prion protein signature INTERPRO; IPR000817 PROSITE; PS00291 PRION_1; PS00706 PRION_2 BLOCKS; BL00291 PFAM; PF00377 prion 1. STAHL, N. AND PRUSINER, S.B. Prions and prion proteins. FASEB J. 5 2799-2807 (1991). Human annotation 2. BRUNORI, M., CHIARA SILVESTRINI, M. AND POCCHIARI, M. The scrapie agent and the prion hypothesis. TRENDS BIOCHEM.SCI. 13 309-313 (1988). 3. PRUSINER, S.B. Scrapie prions. ANNU.REV.MICROBIOL. 43 345-374 (1989). Prion protein (PrP) is a small glycoprotein found in high quantity in the brain of animals infected with certain degenerative neurological diseases, such as sheep scrapie and bovine spongiform encephalopathy (BSE), and the human dementias Creutzfeldt-Jacob disease (CJD) and Gerstmann-Straussler syndrome (GSS). PrP is encoded in the host genome and is expressed both in normal and infected cells. During infection, however, the PrP molecules become altered and polymerise, yielding fibrils of modified PrP protein. PrP molecules have been found on the outer surface of plasma membranes of nerve cells, to which they are anchored through a covalent-linked glycolipid, suggesting a role as a membrane receptor. PrP is also expressed in other tissues, indicating that it may have different functions depending on its location. The primary sequences of PrP's from different sources are highly similar: all bear an N-terminal domain containing multiple tandem repeats of a Pro/Gly rich octapeptide; sites of Asn-linked glycosylation; an essential disulphide bond; and 3 hydrophobic segments. These sequences show some similarity to a chicken glycoprotein, thought to be an acetylcholine receptor-inducing activity (ARIA) molecule. It has been suggested that changes in the octapeptide repeat region may indicate a predisposition to disease, but it is not known for certain whether the repeat can meaningfully be used as a fingerprint to indicate susceptibility. PRION is an 8-element fingerprint that provides a signature for the prion proteins. The fingerprint was derived from an initial alignment of 5 sequences: the motifs were drawn from conserved regions spanning virtually the full alignment length, including the 3 hydrophobic domains and the octapeptide repeats (WGQPHGGG). Two iterations on OWL18.0 were required to reach convergence, at which point a true set comprising 9 sequences was identified. Several partial matches were also found: these include a fragment (PRIO_RAT) lacking part of the sequence bearing the first motif,and the PrP homologue found in chicken - this matches well with only 2 of the 3 hydrophobic motifs (1 and 5) and one of the other conserved regions (6), but has an N-terminal signature based on a sextapeptide repeat (YPHNPG) rather than the characteristic PrP octapeptide. Implications for provenance Tools used by the service providers can be sophisticated. Provenance information may be recorded in those tools. But are not passed on into the annotation (e.g. SWISS-PROT and PRINTS) •Why? Implications for provenance Mining, Aggregating, Distilling, Summarising and Generating phrases and texts from comment fields. Distillation to create compact and comprehensive summary. Urge to be non-redundant. •How to represent the provenance? •How does the provenance get aggregated? •How does it get propagated? •Degrees of evidence -> Degrees of provenance Implications for provenance gr; 5. RIEK, R., HORNEMANN, S., WIDER, G., GLOCKSHUBER, R. AND WUETHRICH, K. gr; NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231). gr; FEBS LETT. 413 282-288 (1997). bb; gd; The function of prp is not known. Prp is encoded in the host genome and is expressed both in normal and gd; infected cells. gd; gd; (PRIO_HUMAN; P04156): gd; Prp is found in high quantity in the brain of humans and animals infected with neurodegenerative diseases gd; known as transmissible spongiform encephalopathies or prion diseases, like: creutzfeldt-jakob disease (cjd), gd; gerstmann-straussler syndrome (gss), fatal familial insomnia (ffi) and kuru in humans; scrapie in sheep and gd; goat; bovine spongiform encephalopathy (bse) in cattle; transmissible mink encephalopathy (tme); chronic gd; wasting disease (cwd) of mule deer and elk; feline spongiform encephalopathy (fse) in cats and exotic ungulate gd; encephalopathy (eue) in nyala and greater kudu. The prion diseases illustrate three manifestations of cns gd; degeneration: (1) infectious (2) sporadic and (3) dominantly inherited forms. Tme, cwd, bse, fse, eue are all gd; thought to occur after consumption of prion-infected foodstuffs. gd; gd; Prp has a tendency to aggregate yielding polymers called "rods". gd; gd; The structure has been determined, e.g. "NMR characterization of the full-length recombinant murine prion gd; protein, mPrP(23-231)" [5]. Swiss-Prot •Inter and Intra provenance Implications for provenance Inheritance of errors E.g. SWISS-PROT errors gd; gd; Polymorphism at position 171 may be related to the alleles of scarpie incubation-control (sic) gene in this species. Poor quality begates poor quality. E.g. SWISS-PROT annotation poor or inconsistent gd; Visual pigments are the light-absorbing molecules that mediate vision. They consist gd; of an apoprotein, opsin, covalently linked to cis-retinal. This receptor is coupled gd; to the activation of phospholipase c. gd; gd; Visual pigments are the light-absorbing molecules that mediate vision. They consist gd; of an apoprotein, opsin, covalently linked to cis-retinal. This receptor is coupled gd; to the activation of phospholipase c (by similarity). •How do we record that it’s a copy but its been corrected and why? Implications for provenance Hugely subjective. e.g. if only one annotation claims that the family is implicated in a disease, and that annotation was by a group Terri Attwood respects then it gets in. • How to capture that subjectivity and use it when using the annotation? •The workflow is complex – how to capture this? • Its more like argumentation than reproducible derivation. Questions, questions … Where does provenance come from? –Incidental vs supplied by the scientist, somehow. What is provenance used for? –Reliability & quality: –Justification & audit: –Reusability, reproducibility & repeatability –Change & evolution: –Ownership, security, credit & copyright. –Identity - LSID –Immutability –Migration & storage –Aggregation –Versioning Spares