Download Astrobiology + Bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Astrobiology
+ Bioinformatics
R. Eric Collins, 11 August 2010
METABOLISMS
BACTERIA
chemo-litho-autotrophs
chemo-litho-heterotrophs
chemo-organo-autotrophs
chemo-organo-heterotrophs
photo-litho-autotrophs
photo-litho-heterotrophs
photo-organo-autotrophs
photo-organo-heterotrophs
ARCHAEA
chemo-litho-autotrophs
chemo-litho-heterotrophs
chemo-organo-autotrophs
chemo-organo-heterotrophs
photo-litho-heterotrophs
photo-organo-heterotrophs
EUCARYA
chemo-organo-heterotrophs*
photo-organo-autotrophs*
photo-organo-heterotrophs*
*utilizes Bacterial endosymbiont
Bioinformatics
●
●
“The application of statistics and computer
science to the field of molecular biology”
Common applications of Bioinformatics:
●
Sequence analysis
●
Genome annotation and comparative genomics
●
Computational evolutionary biology
●
Analysis of gene expression and regulation
●
●
Prediction of protein structure and protein
expression
Modeling complex ecological systems
Central Dogma of Molecular Biology
(for a biologist)
replication
DNA
transcription
RNA
translation
protein
Central Dogma of Molecular Biology
(for a computer scientist)
cp DNA.tar DNA.tar.1
md5 DNA.tar DNA.tar.1
MD5 (DNA.tar) = 483f0777e...
MD5 (DNA.tar.1) = f39e1e9...
DNA.tar
tar -xf DNA.tar
RNA.c
gcc -o protein RNA.c
protein
http://www.youtube.com/watch?v=D3fOXt4MrOM
The Era of Molecular Genetics and
Exobiology
●
●
●
●
●
●
1924: Alexander Oparin writes “The Origin of Life”
1947, 1952: Joshua Lederberg founded modern
bacterial genetics and gene manipulation
1954: “The Origins of Life” by JBS Haldane, geneticist
1960: Lederberg writes “Exobiology: Approaches to
Life Beyond Earth”
1965: Linus Pauling founded the use of “Molecules as
Documents of Evolutionary History”
1977, 1990: Carl Woese identified Archaea as the
Third Domain of Life
The Rise of Computers
NASA Ames Center for Bioinformatics (1996 to 2001)
NASA Center for Astrobioinformatics (December 2003 to Feb 2004)
NASA Center for Computational Astrobiology (2000 to 2008)
The PCR Revolution:
Culture Independence
●
gene sequencing (informational & functional)
●
identification of cells at microscopic level
●
community fingerprinting
●
metabolic profiling of DNA, RNA, protein, lipids
Beaufort Sea, Canadian Arctic
Collins et al. 2010
Ribosomal gene sequencing
Shark Bay, Western Australia
Leuko et al. 2006
cc photo by flickr user Koala:Bear
Community fingerprinting
The Sequencing Revolution:
Comparative Genomics
●
Genomics
●
●
Transcriptomics
●
●
Microarrays: $100-$1000 per slide, ~10,000 probes
Proteomics
●
●
Sanger sequencing: $7000/Mb, 96 x 700bp reads
Mass spectrometer, ~$500 per experiment
DOE JGI IMG: 1911 Bacteria, 84 Archaea, 76 Eukarya
Siberian Permafrost
Ayala-del-Río et al. 2010
cc photo by flickr user Магадан
Gene expression by
Psychrobacter arcticus 273-4
Black Sea, Russia
Fuchsman and Rocap 2006
cc photo by flickr user И. Максим
One way of computing
genetic similarity
Protein similarity by
whole genome BLAST (ranks)
A1
B1
B2
...
Bn
A2
1
1
2
2
3
2
4
4
...
2
2
1
4
1
3
1
1
An
3
3
3
3
2
4
2
3
4
4
4
1
4
1
3
2
Reciprocal best BLAST hits
A1
B1
B2
...
Bn
A2
1
1
2
2
3
2
4
4
...
2
2
1
4
1
3
1
1
An
3
3
3
3
2
4
2
3
4
4
4
1
4
1
3
2
Whole genome comparisons
all Bacteria vs. all Archaea
with reciprocal best BLAST hits
Whole genome comparisons
all Bacteria vs. all Archaea
with reciprocal best BLAST hits
Limited by genome size of bacterium
Limited by genome size of archaeon
(oxygen-using salt-loving Archaeon)
(oxygen-sensitive high-temperature-loving Archaeon)
Anaerobic/thermophilic Bacteria
are genomically more similar
to Archaea than other Bacteria
The Sequencing Revolution (2.0):
Metagenomics
●
●
●
Next Generation Sequencing technology
●
454: $30/Mb, 1 million x 400bp reads, 12 hours
●
Illumina: $6/Mb, 15 million x 2 x 100bp reads, 5 days
●
SOLiD: $3/Mb, 200 million x 2 x 25bp reads, 5 days
●
PacBio, Ion Torrent, Helicos, ...
Applications
●
Metagenomics: whole community sequencing
●
Deep Sequencing: hypervariable tag sequencing
●
Transcriptomics: whole transcriptome sequencing
●
????
Essential resources
●
IMG/m (217 metagenomes), CAMERA
Diffuse Hydrothermal Vents
Sogin et al. 2006
Short, error-prone sequencing reads
...
x 20,000
(or 20,000,000)
“Rare Biosphere”
World Ocean viromes
Angly et al. 2006
Genome assembly with
short reads is hard
Virus genes are mostly unknown
Cuatro Ciénegas, Mexico
Breitbart et al. 2008
Pilbara craton, Western Australia
Shen et al. 2001
cc photo by flickr user ccferg
Sulfate-reducing Bacteria & Archaea
Fractionation == biological sulfate
reduction (?)
Placing time boundaries on
the evolution of metabolisms
Matching observations to genetics
Matching observations to genetics
Clustering proteins by similarity
A1
B1
B2
...
Bn
A2
1
1
2
2
3
2
4
4
...
2
2
1
4
1
3
1
1
An
3
3
3
3
2
4
2
3
4
4
4
1
4
1
3
2
Birth, Death, Innovation
Project ideas
●
Noise/error filter for short sequencing reads
●
Genome assembly from short error-prone reads
●
●
Mathematical formalization of bacteria vs.
archaea genome size relationships
Better ways of calling outliers e.g. in all vs. all
BLAST comparisons
●
Digitization of Bergey's manual
●
Methods for astronomy analogies in biology
●
gamma ray bursts? habitable zones? ...
Related documents