* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Prediction of protein disorder: basic concepts and practical hints
Paracrine signalling wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Signal transduction wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Amino acid synthesis wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Point mutation wikipedia , lookup
Biosynthesis wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Genetic code wikipedia , lookup
Structural alignment wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein purification wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Homology modeling wikipedia , lookup
Biochemistry wikipedia , lookup
Anthrax toxin wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Prediction of protein disorder Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry Eotvos Lorand University, Budapest, Hungary dosztanyi@ceaser.elte.hu Protein Structure/Function Paradigm Dominant view: 3D structure is a prerequisite for protein function But….       Heat stability Protease sensitivity Failed attempts to crystallize Lack of NMR signals Increased molecular volume “Freaky” sequences … IDPs     Intrinsically disordered proteins/regions (IDPs/IDRs) Do not adopt a well-defined structure in isolation under native-like conditions Highly flexible ensembles Functional proteins p53 tumor suppressor transactivation TAD Disordered region DNA-binding DBD tetramerization regulation TD RD Disordered region Wells et al. PNAS 2008; 105: 5762 Bioinformatics of protein disorder   Part 1 Prediction of protein disorder  Databases  Prediction of protein disorder Part 2 Biology of disordered proteins  Prediction of functional regions within IDPs Datasets  Ordered proteins in the PDB   over 100000 structures few 1000s folds   Some structures in the PDB classify as disordered! only adopt a well-defined structure in complex in crystals, with cofactors, proteins, … Disorder in the PDB  Missing electron density regions from the PDB  NMR structures with large structural variations  Less than 10% of all positions  Usually short (<10 residues), often at the termini Disprot www.disprot.org Current release: 6.02 Release date: 05/24/2013 Number of proteins: 694 Number of disordered regions: 1539 Experimentally verified disordered proteins collected from literature (X-ray, NMR, CD, proteolysis, SAXS, heat stability, gel filtration, …) Additional databases  Combining experiments and predictions  Genome level annotations  MobiDB: http://mobidb.bio.unipd.it D2P2: http://d2p2.pro  IDEAL: http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL  Sequence properties of disordered proteins    Amino acid compositional bias High proportion of polar and charged amino acids (Gln, Ser, Pro, Glu, Lys) Low proportion of bulky, hydrophobhic amino acids (Val, Leu, Ile, Met, Phe, Trp, Tyr)  Low sequence complexity  Signature sequences identifying disordered proteins Protein disorder is encoded in the amino acid sequence Amino acid compositions He et al. Cell Res. 2009; 19: 929 Prediction methods for protein disorder Over 50 methods  Based on amino acid propensity scales or on simplified biophysical models   GlobPlot, FoldIndex, FoldUnfold, IUPred, UCON Machine learning approaches PONDR VL-XT, VL3, VSL2; Disopred; POODLE S and L ; DisEMBL; DisPSSMP; PrDOS, DisPro, OnD-CRF, POODLE-W, RONN  1.Amino acid propensity scale GlobPlot Compare the tendency of amino acids:   to be in coil (irregular) structure. to be in regular secondary structure elements Linding (2003) NAR 31, 3701 GlobPlot From position specific predictions Where are the ordered domains? Longer disordered segments? Noise vs. real data GlobPlot: http://globplot.embl.de/ downhill regions correspond to putative domains (GlobDom) up-hill regions correspond to predicted protein disorder Globular proteins Large entropy penalty Large number of inter-residue contacts 2. Physical principles IUPred If a residue cannot form enough favorable interactions within its sequential environment, it will not adopt a well defined structure it will be disordered  Based on an energy estimation method  Parameters calculated from statistics of globular proteins  No training on disordered proteins Dosztanyi (2005) JMB 347, 827 IUPred  The algorithm: …PSVEPPLSQETFSDL WKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPRVA PAPAAPTPAA... Based only on the composition of environment of D’s we try to predict if it is in a disordered region or not: Amino acid composition of environment: A – 10% C – 0% D – 12 % E – 10 % F–2% etc… Estimate the interaction energy between the residue and its environment Decide the probability of the residue being disordered based on this 3. Machine learning approaches INPUT . A T V Q L S M I W Q S T R . OUTPUT D O DISOPRED2: …..AMDDLMLSPDDIEQWFTED….. SVM with linear kernel Assign label: D or O F(inp) D O Ward (2004) JMB 337, 635 DISOPRED2 Cutoff value! PONDR VSL2 Differences in short and long disorder  amino acid composition  methods trained on one type of dataset tested on other dataset resulted in lower efficiencies PONDR VSL2: separate predictors for short and long disorder combined length independent predictions Peng (2006) BMC Bioinformatics 7, 208 4. Metaservers: Disorder prediction methods Meta-predictor PONDR VLXT PONDR VL3 PONDR VSL2 Sequence IUPred ANN Prediction FoldIndex TopIDP Xue et al. Biochem Biophys Acta. 2010; 180: 996 Disordered regions and secondary structure  Coil is an ordered, irregular structural element  Disordered proteins usually do not contain stable secondary structural elements   (e.g. by CD) They can contain transient secondary structure elements  (by NMR)  Pure random coil never occurs  Use secondary structure predictions methods for disordered proteins with extreme caution  Long segments without predicted secondary structure may indicate proteins disorder (NORsnet) Accuracy •True positive: Disordered residues are predicted as disordered •False positive: Ordered residues predicted as disordered •True negative: Ordered residues predicted as ordered •False negative: Disordered residues predicted as ordered 75-90% Prediction of protein disorder  Disordered residues can be predicted from the amino acid sequence   Methods can be specific to certain type of disorder   ~ 80% at the residue level accordingly, accuracies vary depending on datasets Predictions are based on binary classification of disorder Heterogeneity in protein disorder Transient structures Flexible loop RC-like Compact Modularity in proteins  Many proteins contains multiple domains  Composed of ordered and disordered segments  Average length of a PDB chain is < 300  Average length of a human proteins ~ 500  Average length of cancer-related proteins > 900  Structural properties of full length proteins … Practical
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            