* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download General
Synthetic biology wikipedia , lookup
Expanded genetic code wikipedia , lookup
Molecular cloning wikipedia , lookup
List of types of proteins wikipedia , lookup
Messenger RNA wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Non-coding RNA wikipedia , lookup
Community fingerprinting wikipedia , lookup
Biochemistry wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epitranscriptome wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Restriction enzyme wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Biosynthesis wikipedia , lookup
Genetic code wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Gene expression wikipedia , lookup
Point mutation wikipedia , lookup
Homology modeling wikipedia , lookup
Protein structure prediction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Computational Biology, Part 1 Introduction Robert F. Murphy Copyright  1996, 2000, 2001. All rights reserved. Course Introduction What these courses are about  What I expect  What you can expect  What these courses are about overview of ways in which computers are used to solve problems in biology  supervised learning of illustrative or frequently-used programs  (03-510) supervised learning of programming techniques and algorithms selected from these uses  I expect      students will have basic knowledge of biology and chemistry (at the level of Modern Biology/Chemistry) and willingness to learn more students will have basic familiarity with use of computers (e.g., at the level of Computing Skills Workshop) and eagerness to gain new skills (03-510) students have some programming experience and willingness to work to improve heterogeneous class - I plan to include refreshers on each new topic students will ask questions in class and via email You can expect  Three major course sections      Class sessions: lectures/demonstrations/exercises/quizzes Homework assignments       Sequence Analysis (13 classes) Biological Modeling (11 classes) Biological Imaging (4 classes) 4 homework assignments for 03-311 (80% of grade) 8 homework assignments for 03-310 (70% of grade) 10 homework assignments for 03-510 (70% of grade) Test March 1 (20% for 03-311, 10% for others) Final (20% of grade for 03-310, 03-510) Communication on class matters via email list Textbooks for first half of course  For 03-310/311 students  “Required  textbook” is Baxevanis & Ouellette For 03-510 students  “Recommended”  textbook is Durbin et al. Additional suggested book  Computational Molecular Biology, Peter Clote & Rolf Backofen (ISBN 0-471-87252-0)  Chap. 1 is an excellent introduction to Molec. Biol. for non-Biology majors Specific sources for CMU computational biology classes  Web page (http://www.bio.cmu.edu/Courses/03310 or 03311 or 03510)  Lecture Notes (as PowerPoint files)  Homework Assignments (as Word files)  Additional materials as needed  FTP server (www.bio.cmu.edu)  Files  needed for homework assignments CompBiol project volume on AFS  /afs/andrew.cmu.edu/usr/murphy/CompBiol Additional classes for 03-510 We will have one additional class meeting per week for 03-510 for the first half of the semester only  Purpose is to cover some more advanced material and programming assignments  Other relevant courses  Second half mini-course “47-863: Topics in Operations Research: Computational Biology” will be taught by Dr. R. Ravi  Tuesday-Thursday 1:30-2:50 starting 3/13  Recommended for 03-510 students  Fall 2001 course on advanced topics in computational molecular biology will be taught by Dr. Dannie Durand  Prerequisite: 03-310/311/510 Information flow A major task in computational molecular biology is to “decipher” information contained in biological sequences  Since the nucleotide sequence of a genome contains all information necessary to produce a functional organism, we should in theory be able to duplicate this decoding using computers  Review of basic biochemistry Central Dogma: DNA makes RNA makes protein  Sequence determines structure determines function  Structure  macromolecular structure divided into       primary structure (1D sequence) secondary structure (local 2D & 3D) tertiary structure (global 3D) DNA composed of four nucleotides or "bases": A,C,G,T RNA composed of four also: A,C,G,U (T transcribed as U) proteins are composed of amino acids DNA properties - base composition  Some properties of long, naturally-occuring DNA molecules can be predicted accurately given only the base composition, usually expressed as either  %GC (the percent of all base pairs that are G:C), or  GC (the mole fraction of all bases that are either G or C)  %GC = 100*GC DNA properties - melting temperature and buoyant density  Two such properties are  Tm, the melting temperature, defined as the temperature at which half of the DNA is singlestranded and half is double-stranded  Tm (oC) = 69.3 + 41 GC (for 0.15 M NaCl)  0, the buoyant density, defined as the density of a solution in which a DNA molecule will feel no net force when centrifuged (the density at the point in a density gradient at which the DNA stops moving, or “bands”)  0 (g cm-3) = 1.660 + 0.098 GC (for CsCl) DNA structure - restriction maps Restriction enzymes cut DNA at specific sequences.  A restriction map is a graphical description of the order and lengths of fragments that would be produced by the digestion of a DNA molecule with one or more restriction enzymes  Restriction map of a circular plasmid with one enzyme AccII AccII AccII AccII AccII AccII pGEM4 AccII AccII AccII AccII AccII Restriction map of all enzymes that cut only once SspBIBsrGI Bsp1407I AcsI ApoI EcoRI Ecl136II EcoICRISacI SstI Acc65I Asp718I AvaI NheINaeINgoMINgoAIV SgrAI Eco47IIIAor51HI DsaI BsmFI EcoNI AflIII pGEM4 AlwNI AatII SspI XmnIAsp700I ScaI Eco255I XorII PvuI BspCI AhdI AspEI Eam1105I EclHKI BpmI GsuI BglI AviII FspI Transcription       transcription is accomplished by RNA polymerase RNA polymerase binds to promoters promoters have distinct regions "-35" and "-10" efficiency of transcription controlled by binding and progression rates transcription start and stop affected by tertiary structure regulatory sequences can be positive or negative RNA processing eukaryotic genes are interrupted by introns  these are "spliced" out to yield mRNA  splicing done by spliceosome  splicing sites are quite degenerate but not all are used  Translation conversion from RNA to protein is by codon: 3 bases = 1 amino acid  translation done by ribosome  translation efficiency controlled by mRNA copy number (turnover) and ribosome binding efficiency  translation affected by mRNA tertiary structure  Protein localization leader sequences can specify cellular location (e.g., insert across membranes)  leader sequences usually removed by proteolytic cleavage  Postranslational processing peptides fold after translation - may be assisted or unassisted  processing enzymes recognize specific sites (amino acid sequences)  protein signals can involve secondary and tertiary structure, not just primary structure  Goals of Sequence Analysis Assigned Reading:  Baxevanis & Ouellette, Chapter 10  Goals of Sequence Analysis   Management of sequence information Assembly of sequence fragments into complete units (proteins, genes, chromosomes) Goals of Sequence Analysis  Confirmation and prediction of restriction enzyme sites (for nuc.acids)  can aid sequence determination in areas of uncertainty by permitting testing of specific bases  can permit selection of appropriate enzymes for sequence checking  can permit selection of appropriate enzymes for subcloning or generation of probes Goals of Sequence Analysis   Finding open reading frames (ORFs) for cDNAs or genomic DNA from organisms without introns Finding protein coding regions in DNAs using codon usage tables     not all ORFs are made into proteins redundancy in genetic code is not fully reflected in the tRNAs made by a particular organism (codon preference) can use to identify "real" coding regions (pseudo-genes "drift" in their codon usage) can use expressed sequence tags (ESTs) Goals of Sequence Analysis  Finding and using consensus sequences  Examples         promoters transcription initiation sites transcription termination sites polyadenylation sites ribosome binding sites protein features use sets of sequences identified (by other means) as related use sets of sequences identified by sequence comparison Goals of Sequence Analysis  Comparison and alignment of sequences  compare sequence to database - goal: find related sequences (SIMILARITY)  compare sequence to sequence - goal: find matching domains (ALIGNMENT)  compare database to database - goal: estimate genetic distance (EVOLUTION)  either: determine consensus sequences  comparisons can be pairwise or multiple-strand Goals of Sequence Analysis  Translation to protein sequence and prediction of protein properties - use measured propensities of particular amino acids or amino acid stretches  Predict molecular weight  Predict isoelectric point (pI)  Predict extinction coefficient  Prediction of secondary and tertiary structure  RNA - use base pairing energies  protein - use propensities
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            