Download Microarray Pitfalls - Home | StemCore Laboratories

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Gene function analysis
Stem Cell Network
Microarray Course, Unit 5
May 2007
Sections
• Introduction to Gene Ontology
• GOstat
• Example
Gene Ontology
Michael Ashburner
Annotate genes or
proteins
Started for Drosophila melanogaster
(fly).
Now expanded for all taxa
http://www.geneontology.org
Gene Ontology
Biological process
A phenomenon marked by
changes that lead to a particular result, mediated by one or more gene products.
Molecular function
Elemental activities, such as
catalysis or binding, describing the actions of a gene product at the molecular
level. A given gene product may exhibit one or more molecular functions.
Cellular component
The part of a cell of which a
gene product is a component; for purpose of GO includes the extracellular
environment of cells; a gene product may be a component of one or more parts of
a cell; this term includes gene products that are parts of macromolecular
complexes, by the definition that all members of a complex normally copurify
under all except extreme conditions.
http://www.geneontology.org/GO_nature_genetics_2000.pdf
Gene Ontology
Biological process
http://www.geneontology.org/GO_nature_genetics_2000.pdf
Gene Ontology
http://www.geneontology.org/GO_nature_genetics_2000.pdf
Gene Ontology
Gene Ontology
Evidence codes
http://www.geneontology.org/GO.evidence.shtml
IC: Inferred by Curator
IDA: Inferred from Direct Assay
IEA: Inferred from Electronic Annotation
IEP: Inferred from Expression Pattern (2006)
IGC: Inferred from Genomic Context (2007)
IGI: Inferred from Genetic Interaction
IMP: Inferred from Mutant Phenotype
IPI: Inferred from Physical Interaction
ISS: Inferred from Sequence or Structural Similarity
NAS: Non-traceable Author Statement (2006)
ND: No biological Data available
RCA: inferred from Reviewed Computational Analysis
TAS: Traceable Author Statement
NR: Not Recorded (2006)
Gene Ontology
Stats. May 29th 2007.
biological_process: 13,553 terms
(10,894 in 2006; 9,277 in 2005)
cellular_component: 1,966 terms
(1,815; 1,512)
molecular_function: 7,609 terms
(7,927; 6,957),
Total: 23,128 terms
(20,636; 17,746)
Gene Ontology
Stats. May 29th 2007.
Mouse Genome Informatics
(The Jackson Laboratory
http://www.informatics.jax.org/)
• biological_process: 14,200 genes, 42,675 annotations (3.0
kw/gene) [13,329 genes, 33,783 annotations (2.5
kw/gene) in 2006]
• cellular_component: 14,713 genes, 31,330 annotations
(2.1 kw/gene) [13,547 genes, 26,515 annotations (2.0
kw/gene)]
• molecular_function: 15,553 genes, 50,343 annotations
(3.2 kw/gene) [14,056 genes, 40,806 annotations (2.9
kw/gene)]
8.3 terms per gene [7.5 in 2006]
Databases using Gene Ontology
NetAffx (Affymetrix probe annotations)
Flybase (sequences) was the first
SGD (yeast)
MGI (mouse)
InterPro (Protein sequences)
ProDom (Protein domains)
Entrez Gene (gene information)
GOstat
Find statistically overrepresented
properties within a group of genes
as selected by...
...typically, analysis of a DNA
microarray experiment
http://gostat.wehi.edu.au/
Beissbarth & Speed (2004) Bioinformatics, 20: 1464-1465.
GOstat
gene
gene
gene
gene
gene
A
B
C
D
E
X
X
Y
Y
Total set of genes
2,000 of 5,000 are X
Not significant
Total set of genes
4 of 5000 are Y
Very significant
•Do it for all Gene Ontology terms
•Take into account the structure of the ontology
•Sort by p-values
Contigency Table
genes with GO
in group
total genes in
group
51
467
176
9180
selected genes
(e.g. differentially
expressed)
reference group
(e.g. all genes on
array)
p-value
8e-52
Chi-square
Test
(Fisher's
Exact Test
for small
values)
Probability of obtaining those values from a random
distribution.
Web tool
Web tool
Output
Example
We will study the function of a set of
genes selected via StemBase
http://www.stembase.ca/
(see corresponding Unit for more
info on using StemBase)
http://gostat.wehi.edu.au/
1. Select a set of genes
Objective:
Genes correlated to Lgals3bp (lectin,
galactoside-binding, soluble, 3 binding
protein)
A galectin, a beta-galactoside-binding
protein implicated in modulating cellcell and cell-matrix interactions
1. Select a set of genes
1. Select a set of genes
1. Select a set of genes
1. Select a set of genes
2. Run in GOstat
2. Run in GOstat
Calcium ion binding
mannosyl-oligosaccharide mannosidase activity
2. Run in GOstat
http://www.geneontology.org/amigo
2. Run in GOstat
Calcium ion binding
mannosyl-oligosaccharide mannosidase activity
2. Run in GOstat
3. Examine expression
MAN2A1
MAN1A
Lgals3bp
1448647_at
1417111_at
1448380_at
3. Examine expression
3. Examine expression
To know more
• Gene Ontology.
http://www.geneontology.org/GO.doc.shtm
• GOstat
http://gostat.wehi.edu.au
Beissbarth & Speed (2004) Bioinformatics, 20: 1464-1465.
• StemBase. http://www.stembase.ca
See corresponding Unit in this course.
Related documents