* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 1: Bio Primer - Columbia CS
Protein adsorption wikipedia , lookup
Messenger RNA wikipedia , lookup
Molecular cloning wikipedia , lookup
Transcription factor wikipedia , lookup
Non-coding RNA wikipedia , lookup
Cell-penetrating peptide wikipedia , lookup
Promoter (genetics) wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Genetic code wikipedia , lookup
Epitranscriptome wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Gene regulatory network wikipedia , lookup
Biochemistry wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Proteolysis wikipedia , lookup
Molecular evolution wikipedia , lookup
Biosynthesis wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Gene expression wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Chapter 1: Bio Primer 1.1 Cell Structure; DNA; RNA; transcription; translation; proteins Prof. Yechiam Yemini (YY) Computer Science Department Columbia University COMS 4761 --2007 Overview     Cell structure and mechanisms DNA; RNA; Transcription; Regulation Translation; protein; sequence & structure References:  B. Alberts et al, “Molecular Biology of The Cell”, 4th edition, Garland Science.  R. Horton et al, “Principles of Biochemistry”, 3rd Edition, Prentice Hall.  J.D. Watson et al, “Molecular Biology of The Gene”, 5th edition, Pearson Benjamin Cummings.  NCBI Introductory overview: http://www.ncbi.nih.gov/About/primer/index.html  Animation sites: o http://www.johnkyrk.com/ o http://vcell.ndsu.nodak.edu/~christjo/vcell/animationSite COMS 4761 --2007 2 1 Organisms Are Made of Cells COMS 4761 --2007 3 Prokaryotes & Eukaryotes Have Different Cells  Prokaryotes: single cell organisms without nucleus E.g., Bacteria: E-coli, H-Pylori  Eukaryotes: single/multi-cell organisms with nucleus E.g., Yeast, plants, drosophila, humans Earth formed -4.5B yrs Prokaryotic bacteria -3.5B yrs -1.5B yrs Nucleated cells Multi-cellular -0.5B yrs eukaryotes © Pearson; Benjamin COMS Cummings 4761 --2007 4 2 Prokaryotes Single cell; size 0.2-2µm Eukaryotes Single or multi cell; cell size 10-100µm No nucleus Nucleus Structure One membrane at cell boundary Multiple membranes/compartments DNA No organelles No cytoskeleton Organelles: mitochondria, Golgi, chloroplasts Cytoskeleton Single circular DNA Two or more chromosomes Genes code proteins Genes have large non-coding regions (introns) 90% of DNA encodes proteins 95-97% non-coding DNA Proteins ~105-6 base pairs ~107-9 base pairs DNA is loosely organized DNA is tightly packed (chromatin + histones) Cell division through fission 1-2k protein species Mitosis 5-20k protein species ~106 proteins per cell ~109 proteins per cell COMS 4761 --2007 5 Cells Are Made of Macromolecules Small molecules: 3% Macromolecules: 26% Sugars Polysaccharides Fatty Acids Fats, Lipids, Membranes Amino Acids Proteins Nucleotides Nucleic Acids (DNA, RNA) Molecules % weight Water Inorganic ions Sugars Amino acids Nucleotides Fatty acids Other small molecules Macromolecules (proteins, DNA, RNA, polysaccharides) COMS 4761 --2007 70% 1% 1% 0.4% 0.4% 1% 0.2% 26% 6 3 DNA Structure COMS 4761 --2007 7 The Central Dogma of Biology DNA Transcription RNA Translation Protein  DNA stores hereditary information  DNA is transcribed into RNA  RNA is translated into proteins  Proteins perform the key functions of cells COMS 4761 --2007 8 4 DNA Consists of Sequences of Nucleotides  DNA strands are sequences of nucleotides Backbone T + T Sugar Phosphate Base Nucleotide A C T T A C G C  Bases: Adenine, Guanine, Thymine, Cytosine  DNA is organized in complementary double strands  Hydrogen bonds hybridize complementary pairs: AT, CG 5’-end Hydrogen bonds 3’-end T A G C A T T A T A G C C G COMS 4761 --2007 G C 9 DNA Forms A Double Helix Helix full turn: 10.5bp Vertical hydrogen bonds support the structure Major and minor grooves provide access by proteins (e.g., transcription factors) COMS 4761 --2007 10 5 DNA Is Tightly Packed  DNA is 2m long; needs to fold into 10-6m nucleus  Chromatin beads fold around 4 histones  Transcription needs to unpack the DNA to copy it COMS 4761 --2007 11 Sample Bioinformatics Challenges Sequencing the genome Discovering sequence similarity Discovering genes Analyzing evolutionary relationships Discovering other important structures Distinguishing exons from introns Regulatory structures: (promoters & transcription factors) Regions expressing micro RNA …. COMS 4761 --2007 12 6 Transcription COMS 4761 --2007 13 Schematics DNA Transcription mRNA Translation Protein COMS 4761 --2007 14 7 Overview A. Assembling transcription complex B. Transcribing DNA to mRNA C. Removing introns COMS 4761 --2007 15 Animation The Transcription Process COMS 4761 --2007 16 8 Transcription Details http://cwx.prenhall.com/horton/medialib/ From PDB COMS 4761 --2007 17 Transcription Factors  TFs bind to promoters regions and to RNA polymerases  TFs regulate the rate of transcription (up/down)  Regulation is yet to be well understood COMS 4761 --2007 18 9 Transcription Is Regulated COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 19 Example The Lac Operon Lac consists of 3 genes; commonly transcribed Used by bacteria to transport and metabolize lactose cAMP activates transcription to initiate transport & metabolism of lactose COMS 4761 --2007 20 10 Lac Activation Low-level sugar  generate cAMP  cAMP  binds with CRP; adjusts its alpha helix to fit the DNA grooves and binds with it CRP-cAMP  accelerates polymerase binding Lac Lac COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 21 Splicing The Introns COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 22 11 From Genes To Networks Regulation is organized in networks Top: gene network regulating the body development of sea urchin Middle: a promoter region Bottom: interaction of two modules COMS 4761 --2007 23 Regulatory Networks Can Be Complex Genetic regulatory network controlling the development of the body plan of the sea urchin embryo Davidson et al., Science, 295(5560):1669-1678. COMS 4761 --2007 24 12 Sample Bioinformatics Challenges  Discovering and analyzing transcription factors Evolutionary analysis; motifs finding Discovering the structure of regulatory networks Analyzing the operations of regulatory networks Designing synthetic regulatory networks COMS 4761 --2007 25 Translation COMS 4761 --2007 26 13 RNA Encodes Protein Sequences DNA Transcription RNA Translation Protein  Proteins are sequences of amino-acids (AA)  Translation uses RNA sequence as a template to construct AA sequence  The coding problem:  Code sequence of 20 amino-acids using 4 nucleic acids  2 nucleic acids can code only 42=16 amino-acids  Codon: sequence of 3 nucleic acids; encodes amino acid  Translation: translate mRNA codons to amino acids  Start/Stop codons define an open reading frame(ORF)  Translation requires reading/identifying codons and forming a respective protein sequence COMS 4761 --2007 27 The Genetic Code U U C A A G UUU Phenylalanine UUC Phe UUA Leucine UUG Leu UCU Serine UCC Ser UCA Ser UCG Ser UAU Tyrosine UAC Ty CUU Leu CUC Leu CUA Leu CUG Leu CCU Proline CCC Pro CCA Pro CCG Pro CAU Histidine CAC His CAA Glutamine CAG Gln CGU Arginine CGC Arg CGA Arg CGG Arg AAU Asparagine AAC Asn AAA Lysine AAG Lys AGU Serine AGC Ser AGA Arg AGG Arg GAU Aspartate GAC Asp GAA Glutamate GAG Glu GGU Glycine GGC Gly GGA Gly GGG Gly AUU Isoleucine AUC Ile AUA Ile AUG G C ACU Threonine ACC Thr ACA Thr Methionine ACG Thr GUU Valine GUC Val GUA Val GUG Val GCU Alanine GCC Ala GCA Ala GCG Ala UAA Stop UAG Stop COMS 4761 --2007 UGU Cysteine UGC Cys UGA Stop UGG Tryptophan 28 14 tRNA Provides Translation Units  Anticodon 3’ CGA 5’ binds to codon 5’ GCU 3’ of mRNA  It translates GCU to Alanine COMS 4761 --2007 http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html 29 Translation Basics  Initiation:  Ribosome binds to mRNA; moves in 5’3’ until it finds Start codon AUG  Elongation  Ribosome recruits tRNA to match next codon  tRNA binds its AA into peptide bond with protein  Ribosome releases tRNA and moves to next codob  Termination  Until a Stop codon is reached  Release factor releases polypeptide from ribosome COMS 4761 --2007 http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html 30 15 Animation Translation of RNA into proteins COMS 4761 --2007 31 Proteins Are Sequences of Amino Acids  Proteins are constructed through peptide bonds  Proteins are folded into complex conformations  Proteins perform functions by binding Transcription factors and polymerase bind to DNA Enzymes bind to molecules to accelerate their reactions Globins bind to oxygen to transport it Antibodies bind to pathogens COMS 4761 --2007 32 16 Example: Hemoglobin COMS 4761 --2007 33 Sickle-Cell Anemia: A Single Nucleotide Change Codon 6 in β-globin COMS 4761 --2007 Sickle structure 34 17 Evolution of β-Globin (α-globin cluster is coded by chromosome 16 ) COMS 4761 --2007 35 The Evolution of α-Globin Across Species COMS 4761 --2007 36 18 Protein Structures COMS 4761 --2007 37 Protein Structure Is Of Central Importance  Structure is found through complex crystallography  X-ray diffraction; NMR  The holy-grail: compute structure from sequence  Ab-initio: compute structure directly from sequence  Homology techniques: use similarity to known proteins  Structure is conserved across wide variations  Small number of fold families (α-helix, β-sheets…)  There are rules (e.g., hydrophobic AA are packed inside)  Nature folds proteins very fast  So why is it so difficult to predict structure? COMS 4761 --2007 38 19 SwissProt vs. PDB Statistics PDB ~30k structures COMS 4761 --2007 39 Proteins Interact Via Active Sites  Protein interactions are defined by active sites E.g., antibody with pathogen E.g., drug design  Proteins use geometry: ligands latch with holes  Proteins use physics: electrical fields  How can protein-protein interactions be computed? COMS 4761 --2007 40 20 Sample Bioinformatics Challenges Analyzing protein sequence similarity Evolutionary conservation/changes Computing structure from sequences Analyzing structure homologies Analyzing protein-2-protein interactions Inferring function from structure COMS 4761 --2007 41 The Cell Cycle COMS 4761 --2007 42 21 Cells Operate In Cycles  G0 Phase  cell is at rest  G1 Phase (4hrs)  Cell either progresses into synthesis or  leaves cell cycle to differentiate  S Phase (10hrs)  DNA Synthesis  Checkpoint determines integrity of DNA  G2 Phase (4hrs)  Cell prepares for Mitosis  Checkpoint determines integrity of DNA  DNA is repaired or cell dies (Apoptosis)  Mitosis (2hrs)  Chromosomes are separated  Cell divides COMS 4761 --2007 43 The Cell Cycle is Regulated  Transition among phases is controlled by a regulatory network  Checkpoints are used to assure quality COMS 4761 --2007 44 22 Evolution COMS 4761 --2007 45 Optimizing Functionality  DNA is substantially conserved through evolution  Evolution = mutation + selection Mutation = single nucleotide polymorphism (SNP); duplication of entire DNA segments mating; recombination Selection = optimize fitness of species  Examples Metabolic nets learn to optimize energy budget (Alon 05)  Functional similarity Sequence similarity COMS 4761 --2007 46 23
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            