* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Interdependence, Reflexivity, Fidelity, Impedance Matching
Non-coding DNA wikipedia , lookup
Magnesium transporter wikipedia , lookup
Bottromycin wikipedia , lookup
Gene regulatory network wikipedia , lookup
Western blot wikipedia , lookup
RNA silencing wikipedia , lookup
Protein moonlighting wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein adsorption wikipedia , lookup
Epitranscriptome wikipedia , lookup
List of types of proteins wikipedia , lookup
Two-hybrid screening wikipedia , lookup
History of molecular evolution wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Non-coding RNA wikipedia , lookup
Protein structure prediction wikipedia , lookup
Gene expression wikipedia , lookup
Molecular evolution wikipedia , lookup
Biochemistry wikipedia , lookup
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
SubmittedtoMolecularBiologyandEvolutionasanArticle
Bestfit:Discoveriessection
Interdependence,Reflexivity,Fidelity,ImpedanceMatching,andtheEvolutionofGeneticCoding
CharlesW.Carter,Jr1andPeterWills2
1.DepartmentofBiochemistryandBiophysics,UniversityofNorthCarolinaatChapelHill,ChapelHill,NC
27599-7260
ORCIDID:/0000-0002-2653-4452
2.DepartmentofPhysics,UniversityofAuckland,PB92109,Auckland1042,NewZealand
ORCIDID:/0000-0002-2670-7624
CorrespondingAuthor:CharlesW.Carter,Jr.Email:carter@med.unc.edu
ABSTRACT
Genetic coding is generally thought to have required ribozymes whose functions were taken over by
polypeptideaminoacyl-tRNAsynthetases(aaRS).TwodiscoveriesaboutaaRSandtheirtRNAsubstrates
nowfurnishaunifyingrationalefortheoppositeconclusion:thatthekeyprocessesoftheCentralDogma
of molecular biology emerged simultaneously and naturally from simple origins in a peptide•RNA
partnership,eliminatingtheepistemologicalneedforapriorRNAworld.First,thetwoaaRSclasseslikely
arosefromoppositestrandsofthesameancestralgene,implyingasimplegeneticalphabet.Inversion
symmetriesinaaRSstructuralbiologyarisingfromgeneticcomplementaritywouldhavestabilizedthe
initialandsubsequentdifferentiationofcodingspecificitiesandhencerapidlypromoteddiversityinthe
proteome. Second, amino acid physical chemistry maps onto tRNA identity elements, establishing
reflexivityinproteinaaRS.Bootstrappingofincreasinglydetailedcodingisthusintrinsictopolypeptide
aaRS,butimpossibleinanRNAworld.Thesenotionsunderlinethefollowingconceptsthatcontradict
gradualreplacementofribozymalaaRSbypolypeptideaaRS:(i)anysetofaaRSmustbeinterdependent;
(ii)reflexivityintrinsictopolypeptideaaRSproductiondynamicspromotesbootstrapping;(iii)takeover
of RNA-catalyzed aminoacylation by enzymes will necessarily degrade specificity; (iv) the Central
Dogma’semergenceismostprobablewhenreplicationandtranslationerrorratesremaincomparable.
These characteristics are necessary and sufficient for the essentially de novo emergence of a coupled
gene-replicase-translatase system of genetic coding that would have continuously preserved the
functionalmeaningofgeneticallyencodedproteingeneswhosephylogeneticrelationshipsmatchthose
observedtoday.
RunningTitle:EvolutionofGeneticCoding
Keywords:Aminoacyl-tRNAsynthetases,Bootstrapping,Evolutionoftranslation,MolecularPhylogeny
1
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Introduction:whencemoleculargenetics?
Gene expression consists of interpreting symbolic information stored in nucleic acid sequences. This
irreversible computational process creates intrinsically novel meaning, and is thus fundamentally
differentfromthephysicalchemistryunderlyingothernaturalprocesses,distinguishingitevenfromthe
molecular biological processes of replication and transcription. Our goal here is to provide a new
conceptual basis for understanding how informational readout and the synthesis of peptide catalysts
frominstructionsingenesmightfirsthaveemergedandthenevolvedcompatiblywithinheritance.
A. TheCentralDogmaandtheadaptorhypothesisimplyaminoacyl-tRNAsynthetases(aaRS)
Itishelpfultothinkabouttheoriginofgeneticswithintheconceptualframeworkfirstarticulatedby
Crick with recent modifications (Fig. 1). Crick recognized that protein synthesis must be directed by
information archived in DNA sequences and that information flow proceeds unidirectionally via an
intermediate RNA “message” to ribosomes. He proposed separately that linking gene sequences to
proteinsequencesrequiredtheinterventionofathirdRNAcomponent(Crick1955)to“adapt”individual
aminoacidsto“codon”unitsinthemessage(Fig.1A),accountingfortheinitiallyobscurerelationship
betweenwhatturnedouttobecollinearsequencesofgenesandproteins.
Participation of the adaptor, transfer RNA (tRNA), involves creating a covalent bond between its 3’
terminusandthecarboxylategroupofanappropriateaminoacid.Creationofthatbond,inturnrequires
activation of the amino acid’s α-carboxyl group by reaction with ATP. In cells, activation and
aminoacylation require a separate enzyme for each amino acid. These assignment catalysts, called
aminoacyl-tRNAsynthetases(aaRS),werefirstclearlyidentifiedbyBergandOfengand(1958).
Executing genetic coding rules requires that aaRSs recognize both amino acids and tRNAs with high
specificitysothattheformercanbeescortedtotheribosomebythelatterforproteinsynthesis.However,
specificrecognitionbyfoldedproteinsdependsonacomplex“ecology”basedonthechemicalbehavior
andinteractionsofindividualaminoacids(Fig.1B).Thatbehaviorcanbeaccuratelyparameterizedby
two experimental Gibbs phase transfer free energies—from vapor to cyclohexane and from water to
cyclohexane—relatedtothesizeandpolarity,respectively,ofeachaminoacid’ssidechain(Carterand
Wolfenden2015;Wolfenden,etal.2015;CarterandWolfenden2016).Correlationsbetweenthesefree
energiesandtRNAidentityelementsrecognizedbyaaRSandthedistributionofaminoacidsbetween
surfaces and cores after protein folding established these parameters as the main axes of a kind of
“periodictable”ofaminoacids(CarterandWolfenden2016)concatenatedinchainsthatfoldtogenerate
proteinsofvirtuallyunlimitedfunctionaldiversity,inanalogytojoiningatomstoformmolecules.
ImplementingtheCentralDogma—theirreversibleattachmentsofaminoacidstocodon-specifictRNAs
by aaRSs—thus exploits the ecology of the amino acids within those enzymes. Proteins folded in
accordancewithsuchecologiesthat,inturn,executecomputationallycontrolledproductionfromgenes
of specialized amino acid ecologies (including their own! ) compose a reflexive property known as a
2
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
“strangeloop”(Fig.1C;(Hofstadter1979).Recognizingthatloopopensfundamentallynewwaystothink
aboutwhatenabledtheaaRStoemergeastheonlyproteinscodedbyprogramswrittenasmRNAthat
can, once folded, collectively interpret the programming language in tRNA. We propose that this
reflexivityinfunctionalchemistryandencodedinformationplayedacrucialroleincreatinggenetics.
B. TheRNAWorldhypothesisfailstoaddresskeyquestionsaboutgeneexpression.
Thedefaultframeworkforthinkingabouthowgeneticsemergedhasbeenafacilesolutiontotheproblem
thatlifesimultaneouslyrequiresthatgeneticinformationmustbepassedfromgenerationtogeneration,
and that catalysts must synchronize rates of chemical reactions underlying the accuracy in gene
replication, expression, and metabolism. Base pairing between complementary nucleic acid strands
answeredtheformerproblemimmediatelyanddecisively,oncethehelicalstructureofdouble-stranded
DNAwaselucidated(WatsonandCrick1953),andpointedlyhighlightedthesecondproblem.
ThecrystalstructureoftRNAPhe(Kim,etal.1973)revealedthat,unlikeDNA,RNAcanassumetertiary
structures,consistentwithproposals(Woese1967;Crick1968;Orgel1968)thattheearliestcatalysts
alsomighthavebeenRNAsthatcould“dothejobofaprotein”(Crick1968).Thathypothesishasbeen
sustainedalmostexclusivelybytheobservationthat,whereasproteinscannotreadilystoreortransmit
digitalinformation,RNAdoeshaverudimentarycatalyticproperties(Cech1986;Guerrier-Takada1989).
The expedient conclusion that RNAs functioned as both genes and catalysts in a life form devoid of
proteinswasrapidlyembracedas“theRNAWorld”(Gilbert1986).
Theclaritywithwhichbase-pairingsolvedtheinheritanceproblemandthediscoveryof,andfascination
with,catalyticRNAshort-circuitedthequesttounderstandandanswerdeeperquestions:
CatalyticRNAitselffallsfarshortoffulfillingthetasksnowcarriedoutbyproteins.Theterm“catalytic
RNA” overlooks three fundamental problems: (i) it vastly overestimates the potential catalytic
proficiency of ribozymes (Wills 2016); and fails to address either (ii) the computational essence of
translation, or (iii) the requirement that, throughout the evolution of translation and intermediary
metabolism,catalystsnotonlyaccelerate,butmoreimportantly,synchronizechemicalreactionswhose
spontaneousratesdifferbythan1020–fold(WolfendenandSnider2001).
Thenexusconnectingpre-bioticchemistrytobiologyisnotreplicationbutthetranslationtablethatmaps
amino acid sequences of functional proteins onto nucleotide triplet codons. The quintessential problem
posed by life’s diversity (Carter and Wolfenden 2016; Wills 2016) is how that critical transformation
becameembedded,inparallel,intotRNAandgenesequences,togetherwiththeribosomalread-write
mechanism(Bowman,etal.2015;PetrovandWilliams2015).SpontaneousfoldingofRNAaptamersand
thedynamicsofanRNAworlddonotrequireencodingintogeneticinformationandhencefallwellshort
ofwhat“conversiontoafunctionalmolecule”(HorningandJoyce2016)impliesforDarwinianevolution
byselectionactingonphenotypes(Wills2016).
3
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
By the time protein folding organizes amino acid side chains into a functional active site, genetic
informationhasbeenirreversiblytransformed.Anymolecularmachinechargedwithreversingtranslation
byunfolding,then“reading”thesequenceofaproteinwouldrequireshuttlingeachsuccessiveamino
acidthrough~20activesitesuntilonefitted,andthenovercomingtheredundancyofthegeneticcode.
By enabling the inheritance of genetically encoded characteristics this one-way flow of genetic
informationenshrinedintheCentralDogma(Koonin2015)ensuresthatbiologicalevolutiontranscends
thesimplepopulationdynamicsofnaturalselectioninanyRNAworld.
RNAresearchhasneverprovided,eitherexperimentallyorconceptually,evenanapproximatemodelfor
howanearlyrandomcatalyticnetwork,withoutencodedproteins,mighthaveprogressivelybootstrapped
thespecificityandselectivitycharacteristicofenzymicsystems(Hordijk,etal.2014).Thus,evolutionof
synchronizedcatalysisrequiredsimultaneousevolutionofgeneticcoding.
C. Many nevertheless embrace the RNA World with little reservation (Wolf and Koonin 2007;
VanNoorden2009;Yarus2011b,a;Bernhardt2012;Breaker2012;RobertsonandJoyce2012).
Itisimportant,therefore,toassesstheexperimentaldataonwhichthehypothesisrestsandtoseparate
datathatgenuinelysupportthehypothesisfromthosethatonlyappeartodoso.
SelectingevermoreproficientRNAaptamersfromlargecombinatoriallibrariesbasedoriginallyonselfsplicing introns only appears to support the existence of ancestral ribozymal polymerases. Despite the
technical elegance and practical value of Selex experiments (Tuerck and Gold 1990), even fullydevelopedaptamerreplicases(Wochner,etal.2011;Attwater,etal.2013;SczepanskiandJoyce2014;
Taylor, et al. 2015; Horning and Joyce 2016) would support only a limited version of the RNA World
hypothesiswithoutphylogeneticrelationshipsconnectingthemtobiologicalancestry.Sofarasweknow,
allnucleicacidsincontemporarybiologyaresynthesizedbyproteinenzymes,muchas,reciprocally,the
synthesis of proteins from activated amino acids is catalyzed by an RNA template at the peptidyl
transferasecenteroftheribosome(Noller,etal.1992;Noller2004;Petrov,etal.2014;Bowman,etal.
2015).Thusnophylogeneticbasisexistsforancestralribozymalpolymerases.
The most compelling evidence that proteins were first coded by ribozymes is the extensive phylogenetic
analysisofcontemporaryproteinfamilies.Kooninandcolleagues(Aravind,etal.2002;Leipe,etal.2002;
KooninandNovozhilov2009;Koonin2011),andothers(Caetano-Anolles,etal.2007;Caetano-Anollés,
et al. 2013; Caetano-Anollés and Caetano-Anollés 2016) argue that protein domains speciated
substantially before the advent of protein-based aminoacyl-tRNA synthetases and translation factors.
Consequently, they argue, a fully developed ribozyme-based version of the contemporary universal
geneticcodemusthavefirstmappedRNAsequencestotheaminoacidsequencesofpeptides.Wewill
call this fully-blown RNA World scenario the “RNA Coding World” (RCW; see also (Rodin and Rodin
2006b,a;RodinandRodin2008;Rodin,etal.2011)).Thecontemporary“ProteinCodingWorld”(PCW),
whichusesaaRSenzymestoattachaminoacidstocognatetRNAs,isenvisagedtohaveevolvedbyaseries
4
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
of“takeovers”,wherebythecodingfunctionsofaaRSribozymeswereprogressivelyreplaced,without
disruption, by enzymic counterparts. Our analysis articulates the improbability that such a takeover
couldeverhavetakenplace.
D. ContemporaryaaRSsfurnishcluesabouthowtheybecamemolecularinterpreters.
Understanding the evolutionary basis for the Central Dogma (Fig. 1) requires asking how selforganization and selection might have produced, from nearly random origins, finely tuned ecological
nichesofaminoacidsarrangedtoprovidethecatalyticandpattern-matchingcapabilitiesnecessaryfor
theoperationofacodeusinga20-letteralphabet.Weenvisionthatthisprocessbeganwithareduced
alphabetadministeredbyasmall“bootblock”thatgrewbystepwiseincreasesinalphabetsize,inwhich
the information that survived (i.e. was selected) at each stage was such that it could be used by the
existinginterpreterstomakethemselves,inspiteoftheerrorsthattheymade.
Accumulatingsucherrorsleadspotentiallytotwotypesof“catastrophes”.Replicativeerrorseventually
limit the survival of progressively longer “genes”, and can produce what has been called an “Eigen
catastrophe” (Eigen and Schuster 1977). Similarly, translation errors eventually limit the functional
specificity available to maintain a cell’s biochemical network, and can lead to an “Orgel catastrophe”
(Orgel1968).Thus,replicationandtranslationerrorsrepresentthemostsignificantresistancetothe
emergenceandgradualenhancementofbiologicalcomplexity.
Eigen (Eigen 1971; Eigen and Schuster 1977) articulated a strategy for integrated survival of both
informationandfunctionalspecificityinanerror-pronenetworkinvolving“informationcarriers”and
“functional catalysts”. He noted that the survival of separate molecular species may be enhanced in
systems containing either codependent or multiply interdependent components, whose cooperation
with other members of the set might assist survival of sets of molecules that would otherwise be
eliminated by competition. He called the symmetrical arrangement of components within these sets
“hypercycles”,aconceptthatcanbegeneralizedtoincludeotherinterdependentarrangements.
Although early aaRS phylogenies should record the order in which enzymic aaRS appeared, either ab
initioorduringtheirtakeoverofribozymalaaRSs,earlierauthors(Woese,etal.2000)doubtedthatsuch
information would be relevant to the emergence of genetic coding. §I, however, summarizes a new
interpretationofevidence,fromexperimentaldeconstructionofbothaaRSclasses(Chandrasekaran,et
al. 2013; Carter 2014; Carter, et al. 2014; Carter 2015, 2016, 2017), that all contemporary aaRS
descended in modular fashion from a single bi-directional gene, whose strands coded for functional
ancestors,respectively,ofClassIandIIsynthetases.Productsofthatgeneappeartohavebeenoptimally
differentiatedandcraftedalmostideallytoestablishhypercycle-likeinterdependence,implementinga
minimal amino acid alphabet—all characteristics of the “boot block” envisioned to have first enabled
genetic coding. This bi-directional coding ancestry necessarily coupled the evolutionary descent of
5
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
contemporaryClassIandIIaaRSsphylogenies(O’DonoghueandLuthey-Schulten2003;WolfandKoonin
2007;Caetano-Anollés,etal.2013)discussedin§Babove.
Wethereforefaceastarkchoice:eitherthesequentialaaRSdecompositionsintoincreasinglyconserved
fundamentalmodules—urzymes(Ur=primitive;(Li,etal.2013;Carter2014;Martinez,etal.2015))and
protozymes (Proto = before; (Martinez, et al. 2015)—and their bi-directional coding ancestry
(Chandrasekaran,etal.2013;Carter,etal.2014;Carter2015),ortherelativeoriginsofmultipleprotein
superfamiliesderivedfromcurrentphylogeneticanalysesmustbewrong.Weoutlinearesolutionin§III.
Phylogenetic and biochemical evidence has been supplemented by developing statistically significant
(CarterandWolfenden2015,2016)relationshipsbetweenidentityelementsthatdictaterecognitionof
tRNAbydifferentsynthetasesandthesizeandpolarityofaminoacidsidechains.Codingrelationships
implementedintRNArecognitionarethereforenotarbitrary,butreflectthedeeplyrelevantinnerlogic
of protein folding rules (Carter and Wolfenden 2015; Wolfenden, et al. 2015; Carter and Wolfenden
2016).Weconsiderthispoint,reflexivity,andotherrelevantconceptsingreaterdetailin§II.
CorrelationsofthetRNAidentityelementsalsorevealedthatsignalsforthetwophysicalpropertiesare
distributed differently between the anticodon (specifying polarity) and the acceptor stem (specifying
size).This,inturn,furnisheddetailsofan“operationalRNAcode”inthetRNAacceptorstem(Schimmel,
et al. 1993) that likely preceded development of coding by the tRNA anticodon, perhaps first
implementingonlythebinarydiscriminationbetweenlarge(ClassI)andsmall(ClassII)sidechains.This
newevidencemakesitdifficulttoimaginehowasophisticatedproteomesynthesizedbyanadvanced
ribozyme-basedtranslationsystemcouldhavesurvivedanytransitionviaaprotein-basedexpression
systemwithfidelityaslowasthatofprimordialbinarycoding.
RESULTS
I.
AARSclassdualitieswouldhavehelpedtostabilizequasispeciesbifurcations.
ThreefunctionalitiesgivetheaaRStheiruniquestatusastheearliestenzymes:(i)Theyaccelerateamino
acidactivationattheexpenseoftwoATPphosphates~1014-fold,resultinginirreversiblesynthesisof
aminoacyl 5’AMP. The uncatalyzed rates of all other reactions in protein synthesis are orders of
magnitude faster than thatreaction, whichthuslimitstherateofprebioticproteinsynthesis.(ii)The
adenosineinATPservesasanaffinitytagthatincreasesactive-sitebinding1000-fold,enhancingcoding
assignmentspecificity,especiallywhereeditingisrequired.(iii)TheyacylatetRNA,covalentlylinkinga
specifiedaminoacidtoatRNAmoleculebearingacode-cognateanticodon.
Notably,twodistinctsetsofhomologousaaRSstructures,ClassIandClassII(Cusack,etal.1990;Eriani,
etal.1990;Ruff,etal.1991),implementthesethreefunctionsindisparateways.Thetwoclassesactivate
symmetricalsetsof10aminoacids.Bothclasseshaveonemajor(A),andtwodifferentminorsubclasses
(BandC)(Cusack1994).ThecommonoriginofthetwoaaRSclassesonoppositestrandsofthesame
6
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
ancestralgene(RodinandOhno1995)remainedobscureuntilquiterecently(Martinez,etal.2015).This
sectionoutlineshowconsequencesofthisdualityatmultiplestructuralandfunctionallevelsmayhave
servedtodifferentiateandstabilizeearlystagesofgeneticcodinginthefaceofhigherrorrates.
Anancestralbi-directionalgeneproducedaClassIancestorwithamodestspecificityforlargeraminoacids,
and a Class II ancestor with similar specificity for smaller amino acids. No useful information can be
encodedusingonlyonekindorequivalentclassofactivatedaminoacid.Thesimplestimaginablecode
wouldhaverequireddiscriminatingbetweenatleasttwokindsofaminoacids.Theinterestingscenarios
(Wills2004)thusentailgeneratingthefullcodefromsimple2-or4-letteralphabetsviatransitions,each
ofwhichincreasestheeffectivesizeneffoftheaminoacidandcodonalphabets.Nestedinstabilities(Wills
2004)allowforsuchaseriesofcode-expandingtransitionstoattractorstateswithprogressivelylarger
valuesofneff.Thesetransitionsconnectdynamicstateswithsignificanterrorratesandthusentailbroad
distributionsoffunctionalproteinsequencesandtheirencodinggenesthatarecalled“quasispecies”,so
wecallthecorrespondingtransitions“quasispeciesbifurcations”(Fig.2).
TheTrpRSandHisRSurzymes(Li,etal.2013)andthedesignedClassI/IIprotozymegene(Martinez,et
al. 2015) furnish substantive experimental representations of the ancestral assignment catalysts
envisionedbyWills(2004).Geneproductscreatedfromoppositestrandsutilizingthefullgeneticcode
bothaccelerateaminoacidactivation~106-fold.Bothwild-typeprotozymesexhibithighATPaffinityand
the Class I protozyme possesses a consensus phosphate binding site composed entirely of oriented
backboneNHgroups(Hol,etal.1978).Thus,itseemsplausibleandofobviousinterestthatprotozymes
codedusingfewerthanthecanonical20aminoacidsmightretainsubstantialcatalyticactivity.
Acodingsystemassigningdualclassesofaminoacids{a,b}thatarefunctionallydifferentiatedinacrude
binaryfashiontotRNAswithanticodonscomplementarytocodons{A,B}bysuchspeciescouldbifurcate
intotwoversionstoproducefour-membercodonandaminoacidalphabets,{A,B,C,D}and{a,b,c,d},
increasingthecodingcapacityfrom2lettersto4letters,andexpandingthe2´2translationtableintoa
4´4table.Insimulationsofsuchaprocess(Wills2004,2009),thehierarchicallynestedembeddingof
assignment activities in the protein sequence space geometrically mirrored the decomposition of the
alphabets. The system showed stepwise coding self-organization, first from a non-coding state to the
executionofabinarycode{A®a,B®b}andthenfromthebinarycodetotheexpandedfour-dimensional
code{A®a,B®b,C®c,D®d},anticipatingexperimentalstudiesofthetwosynthetaseClasses(Fig.2).
Ancestralbi-directionalcodingwouldhaveimpactedtheemergenceandevolutionofgeneticcodingin
severalimportantways.First,theancestralgenewouldhavepartitionedthesequencespacedecisively,
dividing it between those sequences related most closely to each of the two strands. Second, the
translated products of each strand would have differentiated the functional specificities retained by
sequences surrounding the centroids of the two populations. Third, the bidirectional coding
7
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
complementarity constraint steepens the fitness landscape, decisively enforcing coding cooperation
compared to the corresponding possibilities for genes that could mutate independently, thereby
increasing selection pressure for coding. Cooperation is therefore more robust than it would be with
independentgenesfortheClassIandIIurzymes.Finally,thereducedvolumesofsequencespaceand
enhanced functional specialization of the two bi-directionally coded quasispecies suggest that fewer
mutations were necessary for neofunctionalization of subsequent duplications, successively easing
subsequentbifurcationsasneffincreasedduringthebi-directionalcodingregime.
A. ExperimentaldeconstructionsofClassIandIIaaRSrevealparallelstructuralhierarchies.
Apuzzlinghierarchyofinversionsymmetriesinthestructural,functional,andevolutionarybiologyof
contemporaryaaRSsnowappearstomakesenseiftheaaRSsareremnantsofsuchbifurcations.
Superimposing Class I and II aaRS catalytic domains reveals small invariant cores, distinct from
idiosyncraticelementsuniquetoeachaminoacid.LikeRussianMatryoshkadolls,paralleldeconstruction
ofbothClassIandIIaaRSfamiliesrevealsnested,increasinglyconservedmodularcatalystsofnearly
equalmolecularmass(Carter2014):catalyticdomains(200-350residues),urzymes(120-130residues;
(Pham, et al. 2007; Pham, et al. 2010; Li, et al. 2011; Li, et al. 2013)), and protozymes (46 residues;
(Martinez,etal.2015)),eachretainingconservedportionsfromitsprecedingconstruct.
Urzymesretainallnecessaryfunctionsoffull-lengthaaRS,albeitwithlowerproficiencyandspecificity,
andareanalogoustousing“molecule”todefinethesmallestunitofmatterthatretainsallpropertiesof
achemicalsubstance.Protozymes,ontheotherhand,approachthesmallestpolypeptidecatalystsand
henceareperhapsmoreanalogousto“atoms”.
Publishedevidencethatexperimentalurzymecatalyticactivitiesariseneitherfromtinyamountsofwildtype enzyme nor from unrelated, but highly active contaminants includes the following (Pham, et al.
2010;Li,etal.2011):(i)emptyvectorcontrolshavenoactivity.(ii) proteasecleavagetaggedfusion
proteinsreleasescrypticactivity.(iii)Mutationsalteractivity.(iv)AminoacidKMvaluesdifferfromWT
values, and, most importantly, (v) single turnover active-site titration experiments show pre-steadystate burst sizes demonstrating that 35−75% of molecules transiently form tight transition-state
complexes.Experimentalassaysofprotozymeswerevalidatedbyshowingthatactive-sitemutantsH18A
(ClassI)andR113A(ClassII)eliminatedactivityoftherespectivecatalyst(Martinez,etal.2015).
Modular accretions in the structurally unrelated Class I and II protein superfamilies exhibit parallel
accelerationsoftherate-limitingstepofproteinsynthesisovera108-foldrange.Experimentaltransitionstatestabilizationfreeenergiestracklinearlywithnumberofresiduesindeconstructedconstructsfrom
bothClasses,reinforcingthejustificationoftheseconstructsassnapshotsintheparallelevolutionofboth
synthetase classes (Martinez, et al. 2015). Urzymes retain ~60% of the full-length transition state
stabilization free energy observed in modern synthetases. Protozymes are only 46 amino-acids-long.
AlthoughtheyretainonlytheATPbindingsites,aaRSprotozymesfrombothClassIandIIaaRSexhibit
8
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
~40%ofthefull-lengthtransition-statestabilization,buthavenotyetbeenshowneithertoacylatetRNA
ortodiscriminatesignificantlybetweendifferentaminoacids.
Theseaccelerationsdocumentthatmultiplefamiliesofproteinsarecapableofsynchronizingchemical
reactions over a very broad range from the uncatalyzed rate to that observed in contemporary
organisms.RNAhasnotbeenshowncapableofparallelrateaccelerationsoversuchadynamicrange
either in parallel families or with similar increases in mass, underscoring the superior ability of
polypeptidecatalyststosynchronizecellularchemistry.
B. Ancestralbi-directionalgeneticcodingunderliestheaaRSclassdistinction
RodinandOhno(RodinandOhno1995)alignedcodingsequencesofthetwoaaRSClassesinopposite
directions revealed highly significant bi-directional coding of the class-defining active-site sequence
motifs. Subsequently, it became increasingly apparent that protein-based aaRSs all descended from a
singleancestralgenewhosecomplementarystrandsencodedprecursorstotheClassIandClassIIaaRS
superfamilies (Carter, et al. 2014; Carter 2015; Martinez, et al. 2015). Bi-directional coding ancestry
impliesthatproteinaaRSgeneevolutionbeganwithanearlystageinwhichtheuniqueinformationin
onestrandofagenecouldbeinterpretedasadifferentproteinwithasimilarfunctionontheopposite
strand.ThreetypesofresultshaveconfirmedpredictionsoftheRodin-Ohnohypothesis:
1) The most highly conserved portions of contemporary aaRSs should correspond to modules in the
contemporary enzymes capable of bi-directional alignment, and should retain catalytic activity when
extractedfromthefull-lengthgenes.Twosuccessivelevelsofexperimentaldeconstructionconfirmthis
prediction.Urzymes(Pham,etal.2007;Pham,etal.2010;Li,etal.2011;Li,etal.2013)have~120-130
aminoacidsandretainallofthetranslationfunctionsofcontemporarysynthetasesandaccelerateamino
acidactivationby109-fold,withsignificantspecificity.Adesignedbi-directionalgeneencodes~46amino
acidClassIandIIProtozymesthatcontaintheATPbindingsitesoftherespectiveaaRS,bindATPtightly,
andaccelerateaminoacidactivation106-fold(Martinez,etal.2015).
2)Codingsequencesshouldretainahigherfrequencyofbase-pairingbetweenmiddlecodonbasesin
antiparallel, in-frame alignments of Class I and II aaRS. This middle-base pairing frequency, ~0.34, is
significantlynon-randomandincreasesto0.42incomparisonsbetweencodingsequencesreconstructed
independentlyforancestralnodesofbothClassIandIIaaRS(Chandrasekaran,etal.2013).
3)Itshouldbepossibletore-constructabonafidebi-directionalgenesuchthateachstrandcodesfora
functionalaminoacidactivatingenzymehomologoustoonecontemporaryaaRSsclass.Weconfigured
Rosetta to both constrain tertiary structures and impose genetic complementarity to give “designed”
ClassIandIIprotozymes(Martinez,etal.2015).Remarkably,allfourwild-typeanddesignedpeptides
fromClassIandClassIIhavethesamekcat/KMandaccelerateaminoacidactivationby~106-fold.Wildtypesequenceshave100-foldlowerkcatand100foldhigherKMvaluesthandothedesignedprotozymes
fromthecomplementarygene,inkeepingwiththepossibilitythattheirwild-typesequencesmayinclude
9
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
amino acid binding determinants lost in the designed protozymes. The protozymes extend a linear
relationship between transition state stabilization free energy and the number of residues of the
constructs. Notably, Class I and II constructs exhibit the same slopes and intercepts relating rate
accelerationtonumberofresidues(Martinez,etal.2015).
Bi-directional,in-framecodingisastrangeidea.Base-pairingispartofaninversionsymmetryoperator
thatgeneratesthesequenceand(usinghelicalsymmetryoperators)thestructureoftheoppositestrand.
Becausetheoppositestrandsequencecanberetrievedusingthisinversionoperator,agene’sunique
informationiscontainedineachstrand.Thatuniqueinformation,however,hastwodifferentfunctional
interpretations. Validating (1)-(3) of the Rodin-Ohno hypothesis revealed higher-order symmetries
relatingClassIandIIgeneproducts(Carter,etal.2014;Carter2015),asdiscussedin§II.C-§II.E.
C. Anancienthypercycle-likeinterdependencerelatescatalyticresiduesineachClass.
Active-site amino acids in aaRS occur in three sets of signature sequences (Eriani, et al. 1990; Carter
1993). Class I HIGH and KMSKS sequences and Class II Motifs 1 and 2 are present in the respective
urzymes.TheHIGH/Motif2signatureispresentintheprotozymes.Asthesemotifsprovidedtheoriginal
evidenceforbi-directionalcoding(RodinandOhno1995),andcontainactive-siteresidues,itcomesas
no surprise that the respective active-sites utilize different catalytic residues. In fact, all residues
contributingtocatalysisbyClassIactivesitesmustbeactivatedbyClassaaRSII,andconversely,residues
neededforClassIIactivitymustbeactivatedbyClassIaaRS(Carter,etal.2014;Carter2015,2017).This
functional “anti-homology” dates from the earliest Class I and II catalysts. Interdependence induces a
hypercycle-likecouplingbetweenthetwobi-directionalgeneproductssimilartothatproposedbyEigen
(Eigen1971;EigenandSchuster1977)tomitigatecompetition,inducecooperationandtherebyincrease
theoverallsemi-randomgeneticcontentthatcouldsurvivedeteriorationatanygivencopy-errorrate.
D. FoldedClassIandIIAARStertiarystructuresare“insideout”.
Binarypatternscodingforproteinsecondarystructures(Kamtekar,etal.1993;Patel,etal.2009)are
reflected across complementary coding strands. They are determined by positions of hydrophobic
residues (Muñoz and Serrano 1994); the heptapeptide repeat, a-g, with hydrophobic amino acids in
positionsa,e,f,isdiagnosticforalphahelix.Alternationofhydrophobicsidechains,especiallywhenthey
includesidechainswithbranchedβ-carbonatoms,isalmostaseffectiveapredictorofβ-structure.
Solubleglobularproteinshavehydrophobiccoresandwater-solublesurfaces.Thedistributionofamino
acids in folded proteins between these two extreme environments is spanned by a two-dimensional
“basisset”furnishedbytheexperimentalfreeenergiesoftransferbetweenvaporandcyclohexaneand
between water and cyclohexane (Carter and Wolfenden 2015; Wolfenden, et al. 2015). The
contemporarygeneticcoderespectsthisdichotomytoanextraordinarydegree,ascodonsforvirtually
all core side chains are anticodons for distinctly surface side chains (Zull and Smith 1990).
Complementary codons for proline and glycine, most often associated with turns, mean that such
10
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
sequence-directedturnformationalsoreflectsacrosscodesfromantiparallelstrands.Thus,thefolded
productsfromabi-directionalgenewilltendtohavecomparablesecondarystructures,withopposite
polarities.Bythesecriteria,ClassIandIIaaRSurzymesarebothantiparalleland“insideout”.
E. ClassIandIIaaRSaminoacidsubstratespecificities,especiallythosefromancestralcodes,
arerelatedbyinversionwithrespecttosidechainsize(Carter,etal.2014;Carter2015).
ModernaaRSsprefertheircognateaminoacidsby~5.5kcal/mole,~80%ofwhichcomesfromallosteric
influencesofmorerecentlyacquiredmodulesontheurzymeactivities.Lackinginsertion-andanticodonbindingdomains,ClassILeuRSandClassIIHisRSurzymesarerelativelynon-specific(Carter,etal.2014;
Carter2015).ExperimentalΔGkcat/KMvaluesshowthattheyhavesimilarandcomplementaryspecificities.
LeuRSurzymeprefersClassIsubstrates;HisRSurzymeprefersClassIIsubstrates,bothby~1kcal/mole.
TheyarethereforecapableofmakingthecorrectchoicebetweenClassIandIIaminoacidsroughlyfour
timesinfive.Thatfidelityistoopromiscuoustosupportmorethan“statisticalensembles”ofpeptides,
ashypothesizedbyCarlWoese(Woese1965a,b;Woese,etal.1966).Thus,urzymeswouldhavebeen
thepredominantassignmentcatalystswithinamuchbroaderpopulationofmoleculartypes,withthe
properties of a “quasispecies-like” cloud as defined by Eigen (Eigen and Schuster 1977) that likely
includedmanyspecieswithsimilarspecificity,butlowercatalyticefficiency.
TheonlystatisticallysignificantdistinctionbetweenaminoacidsactivatedbyClassIandClassIIaaRSis
theirsizes(CarterandWolfenden2015;Wolfenden,etal.2015):ClassIaminoacidsareuniformlylarger
thanthosefromClassII.Accountingforthesolventexposureofaminoacidsinfoldedproteinsentails
both size and polarity and is therefore two-dimensional (Carter and Wolfenden 2015, 2016). Class II
aminoacidsmigratesignificantlytowardwaterinterfacesduringproteinfolding,whereasClassIamino
acids migrate toward cores. This differential localization in folded proteins gains significance in the
contextofthedistributionofidentityelementsintRNA(§II.F).
F. tRNA acceptor stem identity elements represent a code for amino acid side-chain size and
otherdescriptorsincludingsidechaincarboxylationandβ-branching.
Evidence that the much smaller aaRS urzymic cores could have accelerated the rates of tRNA
aminoacylation(Li,etal.2013)nowmakesitincreasinglylikelythatanearly“operationalgeneticcode”
(Schimmel, et al. 1993; Schimmel 1996) functioned entirely on the basis of acceptor stem bases that
specifiedthemostsignificantdifferencebetweenClassIandClassIIaminoacids.AncestraltRNAsmay
havebeenonlyabouthalfthesizeandconsistedofonlytheacceptorandTΨCloopsofmoderntRNAs.
Doublingofthisancestralstructurehasbeenproposedtohavecreatedtheanticodonanddihydrouridine
loopswiththeanticodoninitiallyservingasaproxyfortheidentityelementsintheacceptorstem(Di
Giulio1992;Rodin,etal.1996;DiGiulio2004,2008;RodinandRodin2008).Anysuccessfulmodelfor
theemergenceofgeneticcodingfromanRNA-basedsystemofmolecularinformationprocessingshould
thusbeconsistentwiththesetwoobservationsaswellaswiththephylogeniesofthetwoaaRSClasses.
11
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
G. ClassI,IIgenes,geneproducts,mechanisms,andspecificitiesaremaximallydifferentiated.
An important barrier to the emergence of diversity from quasi-random reproductive processes is the
strongtendencyofmutantdaughterspeciestoregresstothecentroidofthedistributionsfromwhich
they originate (Eigen, et al. 1988). The centroids behave as “strong attractors”. Inversion symmetries
relatingClassIandIIaaRS,describedin§II.B-§IIEsuggestthattheirgenes,geneproducts,functions,and
substrates are inherently differentiated to survive successive quasispecies bifurcations necessary for
enhancedgeneticcodingtoemergefrompopulationsoflowsequenceidentityandmodestspecificity:
1) Bi-directional coding complementarity means that individual ancestral Class I and II gene
sequencesareasdifficultaspossibletointerconvertfromonetotheotherbyserialmutation.
2) DescentoftheClassIandIIaaRSfromabi-directionalgenestabilizestwoquasispeciesthatcan
presumablybegintointerpretbinarysequencepatterns,decisivelyovercomingthebarrierposed
bythestrongattractionofasinglequasispecies.
3) Reducedpopulationsizeandenhancedfunctionalspecializationofthetwobi-directionallycoded
quasispeciessuggestthatfewermutationsarenecessaryforneofunctionalization,successively
easingsubsequentbifurcationsduringthebi-directionalcodingregime.
4) Distinct properties of protozymes and urzymes point to successive emergence during the bidirectional coding era of their ATP-, amino-acid- and pyrophosphate-binding sites, consistent
withmodularconstructionofbasicaaRSfunctions.
5) Invertedfoldinginstructionsgiveriseto“insideout”ClassIandIItertiarystructuresthatareas
differentaspossiblefromoneanother,andthusminimallyvulnerabletomutationsthatmight
fusethetwoquasispeciesbyregressiontothecommoncentroid.
6) Catalytic residues in Class I and II aaRS are entirely segregated. Thus, throughout their early
evolution,thetwoClassesformedahypercycle-likenetwork(Fig.3).ByargumentsfromEigen
andSchuster(Eigen,etal.1988)andWills(Wills2009),theirinterdependencedefendedthem
againstcorruptionbymolecularparasitesduringgrowthofcatalyticnetworks.
7) ClassIandIIaminoacidsarethemselvesoptimallyseparatedonthebasisof(i)size,(ii)polarity,
andhence(iii)theirultimatedestinationinfoldedproteins.
II.
Bi-directionalityfurnishesfourpropertiesindispensableforself-organizationofcoding
Avoidingmultiplestopcodonsonbothstrandsofabi-directionallycodedancestralgenewouldmandate
that each of the four bases have a functionally coded meaning when it occurs as an (internal) codon
middlebase(seeforexample(Delarue2007)).Thiswouldimplya(possiblyredundant)alphabetoffour
letters.SuchareducedrepertoireisconsistentwiththatexpectedforanancestraltRNAacceptorstem,
inkeepingwiththefactthatthecontemporaryacceptorstemcodedistinguishesbestbetween(i)large
andsmall,(ii)β-branchedvsunbranched,and(iii)carboxylatevsnon-carboxylatesidechains(Carter
andWolfenden2015).Presumably,selectionsubsequentlydroveboththecodeandprimordialcoding
12
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
sequences to capture and employ additional symbolic information for precisely those chemical
properties—size and polarity—that determine how the 20 amino acids direct proteins into unique
configurations(CarterandWolfenden2016).
Bi-directionalcodingofenzymicaaRSimpactsfourpropertiesthatfavormuchmorerapidandefficient
evolutionofgeneexpressionthanwouldhavebeenpossibleforribozymalaaRS.Thesepropertiesare
developedmoreextensivelyandwithgreatermathematicalrigorinaseparatepaper(WillsandCarter
2017).
A. AnysetofaaRSsformsaninterdependentcatalyticnetwork
BecausecontemporaryaaRSareproteins,theirownfunctionalstructuresalldependintimatelyonall
aaRSfunctionalitiesandsoformhypercycle-likenetworks.Interdependenceimpliesfurtherthatboth
programminglanguageintRNAandtheirmRNAprogramsco-evolvedfromsimplerancestorswithfewer
distinctionsbetweenthem,andhencelesscomplexinterdependencies.Therefore,theyareexpectedto
have something approaching discrete ancestries, and hence successively simpler levels of
interdependenceasweapproachtheroot.StructuralvariantsinanyfunctionalaaRSpopulationmust
haverespondedcoordinatelythroughouttheirevolutionaryhistorytotwodifferentchemicalsignals—
aminoacidandtRNA.Bi-directionalcodingancestrydeeplyanchorssuchinterdependenceintheearliest
ancestralquasispecies,asactive-sitecatalyticresiduesinClassIandIIaaRSmustbeactivatedbythe
oppositeclass(Carter,etal.2014;Carter2015,2017).
B. Reflexivityofprotein-basedassignmentcatalysisofferssuperiorpathstocodebootstrapping
andoptimalgenesequences.
The aaRS molecular biological interpreters are the first and, probably the only, products of mRNA
blueprintsthatcanimplementthetranslationtableembodiedintRNA.Accumulatingreflexivegenetic
information—genes whose expression by rules can, in turn, execute those expression rules—is an
intrinsicarchitecturalfeatureofthePCWthatisabsentfromanyRCW.Rapidself-organizationofcoding
inthePCWisdrivenbyreflexive,in-parallelsensing(Fig.3)oftheaminoacidphasetransferequilibria
that drive folding and thus enable aaRS to recognize both the symbolic information in tRNA (i.e., the
syntax)andthechemistryofenzymes(i.e.,thesemantics)builtasinterpretationsofmRNAsequence
informationwritteninthecodinglanguage(Fig.1C).
Theuniversalgeneticcodeisanearlyuniqueselectionfromaninconceivablylargenumberofpossiblecodes
and must have been discovered by bootstrapping. It efficiently maps the chemical properties of amino
acidsontothesequencespaceoftripletcodons(CarterandWolfenden2016)andisalmostideallyrobust
to mutation. Bi-directional ancestry restricted the tiny fraction of the possible codes that share this
optimality (Freeland and Hurst 1998; Koonin and Novozhilov 2009) to an even smaller subset by
requiring anti-correlated coding of amino acid physical properties (Zull and Smith 1990;
Chandrasekaran, et al. 2013). Discovery of such a rare and highly optimized code through random-
13
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
samplingnaturalselectionhasavanishinglysmallprobabilityreminiscentofLevinthal’sparadoxabout
proteinfolding(DillandChan1997).Farmorelikelytoproducesucharesultisaseriesoffeedbackaccelerated symmetry-breaking processes, like phase transitions, that could bootstrap the earliest
prebiotictranslationsystemintoexistencefromsomeprevious,lesswell-organizedchemistry.
Themechanisticimplementation(Fig.3)ofreflexivitymakesitclearthattherequisitesforaccelerating
abootstrappeddiscoveryofcodingarebuiltintothePCW,butabsentintheRCW.Weenvisionaminimal,
low fidelity instruction set or “boot block” whose realization has been substantially demonstrated
((Martinez,etal.2015);Fig.3),andwhosefeedback-sensitivitycouldimproveitselfbyelaboratingits
ownresources,muchlikeinstallinganoperatingsysteminacomputeratstartup.Increasinglyspecific
codingassignmentsduringsuccessivetransitionstepscouldtakeholdonlybyconferringnewselective
advantage(s)totheevolvinggenes,i.e.mRNAsequences,inwhichtheybecameencoded.Thisway,such
asystemcouldexpressnewmeaninginakindofsnowballeffectbeyondthespecificleveloffidelityand
complexityalreadyachieved.
Thebootstrappingmetaphorcreatessubstantiallynewperspectivesontheemergenceofgeneticcodingby
integratinglocalenvironmentalsensingintogeneratingfunction(Fig.4).Codingrulesultimatelyresult
fromtheRNAand/orproteinfoldingrulesthatgeneratefunctionalassignmentcatalystsfromsequence.
Ribozymal and enzymatic functions are coupled to quite different nano-environmental effects.
Specificity,RNAfoldingdependslargelyonbasepairingbecausethefournucleotidebasesareotherwise
almostundifferentiated,havingonlytwosizesandsolventphasetransferequilibriathatdifferbyatmost
–3.7kcal/moleintheirtransferfreeenergiesfromchloroformtowater(CullisandWolfenden1981).
Thecorrespondingphasetransferequilibriaofthe20canonicalaminoacids(RadzickaandWolfenden
1988)exhibitapproximatelyfive-foldgreatervariationsinpolarityand26-foldgreatervariationinsize.
These differences together with the dominance of backbone-backbone hydrogen bonding result in
profoundlydifferentproteinfoldingrules.
Thedifferentialequationsgoverningexpressiondynamics(Fig.4A;(WillsandCarter2017))augment
the transcendent difference between coding rules derived in an RCW and in the PCW because the
synthesis of protein translatases is autocatalytic (horizontal arrows) in the PCW, but not in an RCW.
Coding rules in an RCW must be executed by ribozymes (Fig. 4B). An RCW thus cannot provide the
intrinsic self-organization necessary to rapidly refine nanospace protein engineering, the essential
advantagetobegainedfromenhancedcodingspecificity.
Emergenceofhigher-functioningencodedproteinscannotbetriggeredbyreflexivefeedbackinanRCW
whoseexpressionsystemitselfcontainsnoproteins.Moreover,codingrulesbasedonRNAfoldingrules
areintrinsicallyinsensitivetoproteinfoldingrulesand/orfunctionality.Bootstrappingofcodedproteins
in an RCW would require selecting ever-better ribozymal aaRSs through a slow, indirect Darwinian
evolution process that could discover protein folding rules only from non-aaRS protein performance.
14
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Moreover, ribozymal aaRS variants capable of improved assignments would have to be selected for
robustness against mutation in protein-coding genes. The extrinsic self-organization resulting from
mutationandhigher-levelselectioninanRCWprovidesnodirectfeedbackprocedurefordiscoveringa
translation table that embodies an ordered symbolic encoding of amino acid sidechain chemistry in
foldedproteins.Suchorderlinesswouldhavetoprogressivelyproveitsadvantagefortherelevantunit
ofselection,presumablyaprotocell.
InthePCW(Fig.4C),codingrulesaredeterminedbythecatalyticpropertiesoftheextantpopulationof
proteinaaRSs,whosesequencesare,themselves,producedbyrulesactingonthesetofgeneticblueprints
(mRNAs) that encode the aaRS rule executors. This reflexivity in the PCW enables stages of selforganizationingeneticcodingtooccurrapidly,essentiallyasdynamicphasetransitions,becausenanoenvironmental sensing (Fig. 3) can couple the coding rules naturally to protein folding rules. AARS
tertiarystructures—positioningdistantaminoacidsinprimarystructureclosetooneanotherinspace—
asdeterminedbyaminoacidphasetransferequilibria(Fig.3),furnishtheaaRSspecificityrequiredto
determine the coding rules. Sensitivity of the code to the phase transfer equilibria of amino acid side
chains allows those equilibria to feed directly back onto protein aaRS folding and function, naturally
producingarefinedmapofthephaseequilibriathatgovernproteinfoldingandfunctionintheexisting
code,viathetRNAidentityelements(Wolfenden,etal.1979;RadzickaandWolfenden1988;Wolfenden,
et al. 2015; Carter and Wolfenden 2016). Thus, in the PCW the mechanism for nanoscale control of
chemistry,i.e.coding,isdetermineddirectlybyitsoutcome.
APCWalsocoordinatesandoptimizesdiscoveryofgenesequencesbyplacingaminoacidswithdifferent
properties in different positions in accordance with their effects on a folded protein. To consider
aminoacylation functionalities as “assignment catalysis” relevant to coding, the specificity for the
relevantaminoacidmustalsohavebecomeassociatedwithaparallelspecificityinchoosingprimitive
“codons”inprecursormRNA.Enhancementsthatincorporatednewaminoacidsintotheprogramming
languagehadtoco-evolvewithmessagesabletoexploitthem.
Thisspecialrelationshipbetweenaminoacidsidechainenergetics,thecodingrules,aaRSsequences,and
theirgenes,establishesreflexivityatanevenmorefundamentallevel,acquiringadditionalbootstrapping
fromthechemicaleffectsofaminoacidsoccurringinparticularrelationshipstooneanotherinthethree
dimensionalarchitectureoffoldedsynthetases(Fig.4C).Inotherwords,aPCWautomaticallypressures
an evolving code to discover and refine the partition between amino acids that gives the genetic
representation of functional properties best adapted for survival: an error-minimized code in which
aminoacidswithsimilarchemicalpropertiesareassignedtosimilarcodons.Thisargumentextendsto
everystageofcodeexpansion.Thus,codeevolutioninaPCWwillinevitablytargetbothnear-optimal
foldedproteinfunctionalityandanencodingthatrepresentssurvivalfitnessaspreciselyaspossible.
15
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
For these reasons de novo emergence of genetic coding into a peptide/RNA world appears to have
introducedsuchoverwhelminginfluenceonthechoiceofcodonsbestabletorepresenttheeffectofan
aminoacidenteringthedevelopingecologyinsideafoldingproteinthatitmustbeseenasenormously
moreprobablethancodingemerginginanRNAWorld.
C. Fidelity:AnysimplePCWtakingoveramoresophisticatedribozymalcodingwillincreasethe
overallerrorrate,degradefitness,andhencebeeliminatedbypurifyingselection.
The PCW is rooted in phylogenetically-based ancestors capable only of the simplest coding
assignments—perhapsoneoratmosttwobits—andconsequentlyalsoinacodingsystemnecessarily
operatingathigherrorrates.Reducingerrorratesinbothreplicationandtranslationmustcertainlyhave
requiredlargeralphabets.Tobeselected,thefunctionalityofsuchprimordialcodingmustalreadyhave
exceeded that of whatever preceded it. Its simplicity appears to rule out scenarios involving proteins
“takingover”catalyticfunctionsfromanypre-existingworldofsophisticatedRNAcatalysts.Forclarity,
wehenceforthrefertoexecutorsofassignmentcatalysisasRNAorprotein“translatases”,todistinguish
themfromcontemporaryaaRS.
Any coding system depends on the maintenance of a population of templates that either specify the
sequencesofribozymalaaRSsorencodethesequencesofproteinaaRSs.InanRCWallsuchtemplates
arerequiredsomehowtosurvive,essentiallyasparasites,inaworldofRNAreplicators.Aribozymal
coding system, consisting of only ribozymal translatase species, could be functionally autonomous.
However, the attractor state of a hybrid ribozymal/protein aaRS system is one in which the protein
population also contributes to the overall rate of translation of any genetic template, and more
importantly,toitsoverallerrorrate.
Aseparatepaper(WillsandCarter2017)treatsthisprobleminanextensionofearliermathematical
modelsofcodingself-organization(Bedian1982;Wills1993;Wills1994;Bedian2001;Wills2004)by
consideringthedynamicstabilityofco-existingribozyme-andprotein-operatedassignmentcatalysts.
Weconfirmanalyticallytheintuitiveconclusionthattranslationerrorswouldinevitablybehigherfor
anyhybridcodingsituationdrivensimultaneouslybyseparateribozymalandproteintranslatasesthan
theywouldbeforanoptimizedsystemwithonlyonetypeofaaRS.Ifbothtypesoftranslataseseffect
codon-to-aminoacidassignmentsatdifferentcharacteristicratesandaccuraciesthehybridsystemwill
necessarilyoperateatintermediateerrorrates.AsEq.(25)of(WillsandCarter2017)makesabundantly
clear,introducinganysignificantpopulationofproteintranslatasesintrinsicallylessaccuratethanan
extant ribozymal coding apparatus will undermine the role of the ribozymal translatases, possibly
threatening the protein domain with extinction as the selective advantage of ribozymal translatases,
indirectlyconferredbyproteinfunctionality,isdiminished.Theproblemwillbeextremeinthepresence
ofrudimentaryancestralproteinaaRSsthatoperatealowdimensionaltranslationtable.Theaccuracy
16
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
oftheproteintranslataseswouldthenbenecessarilybeverymuchlessthanthatoftheextantribozymal
population,makingsurvivalofproteinsdependentontheeliminationoftheproteintranslatases.
Eitherway,theonlypathtocurrentmolecularbiologythusappearstorequireproteinaaRSgenesto
emergeinconcertwithotheressentialencodedproteingenes.Thatrequirementhighlightstheproblems
arising from coordinating inheritance with gene expression. We therefore turn our attention to the
dynamicsoftemplatereplicationanditseffectontheevolutionoftranslation.
Mixed ribozymal and enzymatic protein replicases. Copying of genetic information lies at the heart of
Darwinianevolution.ConsidertheadventofaproteinreplicaseinafunctionalRCWinwhichrelatively
sophisticated and accurate information copying has evolved through selection of a general ribozymal
replicase.IntroducingaproteinreplicaseinanRCWgeneratesthesameproblemjustdescribedforthe
adventofproteintranslatases.Anyproteinreplicaselessaccuratethantheribozymalreplicase,whichis
tobeexpectedforthefirstsuchproteinstoemergeinanRCW,woulddiminishtheprobabilityofcorrectly
copyingallgenes,includingthatcodingfortheribozymalreplicase.Sincethesystemevolutionhasbeen
optimizedundertheconstraintoftheribozymalreplicase’sperformance,thesystemwillbeatriskofan
errorcatastropheunlessselectionpurgesitoftheemergentproteinreplicase.
TheevidentsimplicityoftheearliestcodingapparatusinthePCWposesaninsuperablebarriertotakeover
of any more sophisticated coding apparatus in an RCW. Newly emerging protein-based assignment
catalysts must have been far less specific than the pre-existing ribozymal assignment catalysts
envisioned,forexample,byWolfandKoonin(2007),andcannothavebeenselectedwithinanadvanced
RCWbecausetheirveryrudimentaryfunctionalitywouldcorruptanypre-existingribozymaltranslation
systemofhigherspecificityanddiversity.NohybridsetofproteinandribozymalaaRSand/orreplicases
canhavesuperiorfitnesstothoseofapre-existingRCW,asreinforcedbythefollowingconsiderations:
(i)Themoresophisticatedthepre-existingRCW,theharderitwouldhavebeenforearlystagesofPCW
code development to compete. Conversely, the detailed inversion symmetries arising from bidirectionallycodedgene(§I)allpointtotheirkeyroleinenforcingdifferentiationearlyintheevolution
ofthegeneticcode,whenitwasmostvulnerabletoparasiteswithincorrectspecificities.
(ii)ThedramaticrateaccelerationbyaaRSprotozymesontheotherhand,representsadecisiveselective
advantageinapepide•RNAworld,firstbyharnessingthechemicalfreeenergytransferofNTPutilization
andthenbyprovidingaflowofactivatedaminoacids.
(iii) RNA sequences destined to evolve into genes once an accurate translation system had evolved
wouldhavehadnoobviousselectiveadvantageunlesstheemergentPCWcodewaspracticallyidentical
tothatoperatingintheRCW.
Thus,evenwereanRCWtohaveexisted,itwouldbeirrelevanttocontemporarybiologyifthePCWhad
torecapitulatetheentiregenesisofthecode.Nor,ofcourse,doesanyevidenceremainofsuchribozymal
aminoacidactivatingcatalysts,or,indeedofribozymalpolymerases.Finally,ifthebranchingphylogenies
17
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
of protein aaRS provided opportunity for self-organising quasispecies bifurcations, and their evident
reflexivity greatly accelerated the search for an optimal code, then, an extensive phase of ribozymal
proteinsynthesisnolongerfillsanytheoreticaldeficiencyinaccountingforthegeneticcode.Thus,itis
ourviewthatnaturedidnotre-inventits“operatingsystem”(Bowman,etal.2015).
D. Efficiency:avoidingdissipationsleadingtoEigenandOrgelerrorcatastrophes.
The bootstrapping requirement (§II.B) and the instability of hybrid coding assignment systems with
substantiallydifferenterrorrates(§II.C)mayreflectinherentlycomplementaryargumentsforefficient
coupling between self-organization of information storage (replication) and readout (translation).
Progressivemutationallossofreflexivityleadstoprogressiveincreasesinthecodingerrorrate(Wills
1994),resultinginthedissipationoffreeenergyflowsandultimatelyinwhathavebeencalled“error
catastrophes”. Error rates impede self-organization at multiple levels. We examine here the possible
couplingbetweenerrorgenerationduringreplicationandtranslation.
Studies of gene-replicase-translatase (GRT) systems reveal gene replication and coded expression are
interdependent. Are both self-organizational processes so strongly coupled that they emerged
simultaneously? Such coupling is not only possible (Smith, et al. 2014) but it occurs spontaneously
(Fü chslinandMcCaskill2001;Markowitz,etal.2006;Wills,etal.2015).GRTsystemsareintrinsically
spatiallyself-organizing,andunlikethehypotheticalRCWnoextrinsic,higherlevelunitsofselection—
i.e.compartmentation—arerequiredtoassuretheirsurvival.
LivingsystemsnowproduceproteinsfrominformationencodedingenesusingaaRStranslataseswhose
genesarecopiedusingproteinpolymerases.ThedynamicsoftheRNAdomainsofthePCWandRCW
(Fig.4A;(WillsandCarter2017))makeitevidentthatgeneandproteinproductioninthePCWaretightly
coupledthroughthepopulationvariablesthatrepresentthegenesandreplicaseenzyme.Furthermore,
translatasesintheproteindomainarecooperativelyautocatalytic.
Incontrast,eventsintheRCWproteindomainhavenoeffectonthevalueofanyRNAdomainvariable,
sothedynamicsofreplicationandcatalyzedcodingassignmentsarecompletelyautonomousintheRNA
domainoftheRCW.Moreover,theproteindomainisutterlydependentontheRNAdomainthroughthe
variablesthatrepresentthepopulationsofencodinggenesandtheaccuracyoftheribozymaltranslatase
population.Itishardtoenvisagehowanyselectionpressurethatproteinsmightexertonthesequences
ofnucleicacidsinanRNAWorldcouldmoldarefined,chemicallyorderedsystemofgeneticcoding.
Ontheotherhand,overcomingthedoubleriskofEigen-andOrgel-likeerrorcatastrophesininformation
storageandreadoutimplicitinhighlycoupledmolecularbiologicalsystemsseemsequallyimpossible,
until one takes account of the fact that natural selection is a self-organizing force that staves off the
potential error catastrophe that threatens information storage (Eigen 1971). Likewise, coding selforganization(Wills1993)stavesoffthepotentialerrorcatastropheintranslation(Orgel1963).Neither
systemcanbeexpectedtooperateunlesseachlimitsdeleteriouseffectsoftheerrorrateoftheother.
18
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Just as power transfer in dissipative electronic structures is optimal if input and output impedances
match,somolecularbiologicalorganizationobservedinlife’sinformationalsystemsmayhaveevolved
most efficiently by matching improvements in error rates for nucleic acid replication and protein
synthesisatsuccessivedevelopmentstages.Ifso,geneexpressionandreplicationbyfunctionalprotein
replicases could not have emerged efficiently from a world in which either function was already
performedatahigherlevelbyribozymes.
Directbootstrappingofgeneticinformationandencodedfunctionalproteinsisfarmoreplausiblethan
anyscenarioinwhichtherewasaninitialRNAWorldbythreecriteria—reflexivefeedback,degraded
specificityinhybridsystems,andtheneedtomatchthecomplexityofcodingtothatofproteinfunction.
Significanttracesinthestructuralbiologyofthecontemporaryaminoacyl-tRNAsynthetasestherefore
suggestthattheevolutionofgeneticcodingproceededbypreciselysuchasequenceofphasetransitions,
eachentailingbifurcationsoftheinformation-processingcharacteristicofthepreviousstage.
Impedancematchingarguesforcoevolutionofreplicationandtranslation.Fig.5illustratesamechanism
couplingbiologicalinformationstorageandreadout(dashedlinesinFig.4A;(WillsandCarter2017).We
conjecturethatprogressiveincreasesinthedimensionofthecodontable,neff,enhancecodingevolution
efficiencybymatchingnoiseingeneticinformationmaintenance(replicationerrorsandquasi-neutral
driftinsequencespace)tothatfromthetranslationerrorrate.Toparaphrasefromarecentdefinitionof
“information impedance matching” of information sources to receivers in a different context (Martin
2005),readingoutgeneticinformationwithaslittledissipationaspossiblerequiresreadoutmachinery
withapproximatelythelevelofnoisepresentintheinformationsources.Iferrorsineitherprocessare
eithertoohighortoolow,thesystemwilldissipateenergyunnecessarily,reducingthereadoutefficiency.
Inotherwords,atanyevolutionarystageofdevelopmentsinmolecularbiology,theselectiveeffectof
the“replicases”andthefidelityofthe“translatases”(andanyassociatedaccessories)needtolimitnoise
tocomparablelevelsinordertooptimizetheefficiencyofinformationtransferatthatstage.
The notion of impedance-matching is well-established physics. Our heuristic reference to it here is
supportedbythefollowingobservations.Errorratesappeartobeavalidmetricforemergingbiological
complexity over quite large timescales (Lewis, et al. 2016). Specific aspects of that work support this
view:(i)MichaelisMentenparametersfortheLeuRSandHisRS2urzymes(Carter,etal.2014;Carter
2015) suggest that, whereas they are quite impressive catalysts, their specificities for cognate amino
acidsarewellbelowthosenecessarytostabilizepopulationsoffull-lengthaaRS,whichhavemuchhigher
fidelities.(ii)StructuralstudiesoftheTrpRSurzymeshowthatitshighrateaccelerationarisesfromwhat
appearstobeamoltenglobularensemble(Sapienza,etal.2016).Inotherwords,itisalesscomplex
molecule—in a higher entropy state—than a properly folded protein. (iii) The million-fold rate
accelerationsofbothwildtypeanddesignedClassIandIIprotozymes(Martinez,etal.2015)suggest
thatthemanifoldofcatalyticallycompetentpolypeptidesisfarlargerthanpreviouslythoughtpossible.
19
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
(iv) Presumptive error rates for the aaRS constructs therefore exhibit a monotonic decline with
increasingmass,andbyimplication,increasingcomplexity.
Moreover,errorsquiteliterally(Gladstone2016)slowtheaccumulationofinformationandhencethe
growth of complexity in many situations. Thus, although the notion of impedance-matching requires
furtherdevelopmentinevolutionarytheory,includingincorporatingmetricsof“efficiency”,itappears
that natural selection and self-organization provide efficient coupling between information storage
(replication)andinformationreadout(translation),asifthetwoprocesseswereimpedancematched.
III.
ScenariosforearlyaaRSspeciationandco-evolutionofreplicationandreadout
PhylogeneticancestriesofcontemporaryClassIandIIaaRSprojectconvincinglybacktoasinglegene.
The simplicity of such a gene furnishes a conceptually consistent “boot block” (Fig. 3) substantially
reducingthechallengeofunderstandinghowgeneticcodingmighthaveemergedfromapeptide/RNA
partnership.Moreover,thedetailedinversionsymmetrieshelptoexplainhowsuchagenewouldenforce
theinitialdifferentiationnecessarytobreakthepowerfulforcesthatmakequasispeciescentroidsstrong
attractors,substantiallystrengtheningargumentsthatnogeneticcodecouldhaveprecededtheearliest
codedproteinaaRS.Dual-codinggeneticquasispeciesexemplifiedexperimentallybytheprotozymegene
described by Martinez, et. al. (2015) and the urzyme gene proposed by Pham, et al (2007) are thus
presumptiveancestorstobothClassIandIIaaRSsuperfamiliesandtheuniversalgeneticcodeitself.
A. WhydoestablishedproteinphylogeniessuggestlateaaRSspeciation?
The strongest argument that an RCW preceded the emergence of proteins is that multiple sequence
alignments of contemporary protein families (Aravind, et al. 2002; Leipe, et al. 2002; Koonin and
Novozhilov2009;Koonin2011),suggestthataaRSdivergedlateinthesuccessionofproteinfunctions.
However,takeoverofaribozyme-basedcomputationaltranslationprocessmustleadinaplausibleway
to,theobservedphylogenyofcontemporaryaaRSsuperfamilies.
WebelievetheconclusionthataaRSsdevelopedaftertheadventoffullyfunctionalproteinsbasedonan
alphabetof20aminoacidsrestsontwoquestionablephylogenicassumptions:(i)thatdomains(~250
aminoacids)arethebasicunitofremoteproteinevolutionaryhistory,and(ii)thattheevolutionofClass
I and II aaRS proceeded from independent ancestries. The former assumption fails to account
appropriatelyforthehighlymosaicnatureofcontemporaryproteins(Pham,etal.2010;Li,etal.2011).
Thelatterignoresthebi-directionalcodingancestryofClassIandIIaaRSurzymesandprotozymes,for
whichexperimentalevidenceisnowexceptionallystrong(Pham,etal.2010;Li,etal.2011;Li,etal.2013;
Carter2014;Carter,etal.2014;Carter2015;Martinez,etal.2015;Carter2016,2017).
The low fidelity of aaRS urzymes implies that they represent an important, but early stage in the
evolutionofcomplexityandhencethatdeepphylogeniesbasedonaligningintactcontemporaryaaRS
sequences(Aravind,etal.1998;Wolf,etal.1999;Leipe,etal.2002;WolfandKoonin2007)areprobably
misleading,especiallyinthecaseofthepre-LUCAheritageoftheaaRSsthemselves(Wolf,etal.1999;
20
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
WolfandKoonin2007).Notably,neitherdomaindatabase(SCOP(Murzin,etal.1995;Andreeva,etal.
2008);CATH(Pearl,etal.2003))hasbeencompiledatsufficientlyhighresolutiontoidentifytheClassI
and II urzymes as ancestral forms. Large insertions within the catalytic domains of aaRS were likely
accumulatedsegmentally,fromexogenousgeneticmoduleswiththeirownpreviousancestry(Pham,et
al.2007),subsequenttotheirinitialevolutionaryspeciation.Suchmosaicityinthemultiplesequence
alignments, akin to horizontal gene transfer (Leipe, et al. 2004; Soucy, et al. 2015) albeit in shorter
segments than those considered by Wolf, et al. (1999), could obscure deeper ancestral evolutionary
trajectoriesinvolvingtheurzymes.
Fig. 6 illustratesan alternative phylogeny of Class I aaRS that resolves what we feel to be a mistaken
conclusionthattheClassIaaRSdivergedlateintheevolutionoftheproteome.TheschemetracesClass
IandIIaaRSancestriesfromasinglegenebytwodistinctprocesses—speciationofthebi-directional
gene(I)andstrandspecializationtotranscenditslimitations(II).Itfurnishesasatisfactoryaccountof
the increase in structural multiplexing and independent parallel evolution of insertion elements and
anticodon-binding domains during a period in which protein synthesis operated with a gradually
increasing alphabet size that ultimately required development of editing domains (III) to achieve the
requisitefidelityofthecontemporaryproteome.
B. Aplausiblescenarioforco-evolutionofinformationstorageandgeneexpression.
Wehighlighthowconclusionsfrom§IIchangehowwethinktranslationmighthaveemerged,andoutline
aplausiblescenariofortheco-emergenceofinformationstorageandreadout.Ourscenarioiscompatible
witharough“impedancematching”inwhichhighnoiseinitiallypermitsco-optionofquiteunrefined
functionalities that comprise groupings of related effects averaged over large but separate regions of
sequencespace.Noiseisgraduallybroughtundercontrolbyrefiningfunctionalitieswithdistinguishable
specificities and selection of genes encoding them, enabling structural diversity and complexity to
developsimultaneouslywithincreasesinthedimensionofthecodontable(Fig.5).Althoughaspectsof
thisscenarioresemblepreviouslyoutlinedmarginalscenarios(Martin2005), itsscope,continuity,and
itslogical,experimental,andphylogeneticsupportareassembledhereforthefirsttime.
Theoriginofcontemporarytranslationwasmostlikelyanintimateco-evolutionaryprocessinvolving
both polymer classes (Carter and Kraut 1974a; Carter 1975). The chief arguments expressed in the
following two sections must remain hypotheses until experimental investigation, perhaps guided by
ideaswehaveexpressed,canconvincinglyestablishorrulethemout.Ourrecentuseofproteindesign
andmodularengineeringintheexperimentalcolonizationofthevoidthatpreviouslyexistedbetween
pre-biotic organic chemistry (Patel, et al. 2015; Sutherland 2015) and the Last Universal Common
Ancestor (Forterre, et al. 2005; Wong 2005; Xue, et al. 2005; Fournier, et al. 2011; Fournier and Alm
2015;Wong,etal.2016)arguesthatsuchexperimentationcannowbefruitfulonalargerscale.
21
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Argumentsdevelopedin§II.Dimplythatreplicationandtranslationarenecessarilymoretightlycoupled
thanisenvisionedintheRNAWorldhypothesisbytheneedforinformationalimpedance-matching.The
overridingchallengeassociatedwiththeemergenceofthegeneticcodeistodevelopascenarioinwhich
prebioticchemistryproducesbiologythroughcooperationbetweennucleicacidsandproteins(ortheir
precursors),reflexively,inimprovingbothinheritanceandfunction.Detailsofsuchascenarioliebeyond
currentresources;itisneverthelessappropriatetooutlineaspectsthatpointtowardfurtherwork.
Soon after discovery of catalytic RNA, structural complementarities were identified between extended
polypeptidesecondarystructuresandnucleicacids(CarterandKraut1974b;Carter1975;Church,etal.
1977; Warrant and Kim 1978). The short lengths of polymers required to form such complexes—6-8
aminoacidsandlessthanhalfaturnofRNAdoublehelix,suggestedtheymighthavebeenmorestableif
their polypeptide and polynucleotide components formed hairpins (Berezovsky, et al. 2000). Their
stabilityascomplexesappearedtodependlargelyontheircomplementaryvanderWaalssurfaces.
Stereochemicallytemplatedcross-catalysisplausiblyaccountedforthesimultaneousappearanceofcoding
andcatalysis.HelixradiiofRNAanddouble-strandedextendedpeptidesproducedoptimalvanderWaals
contactsbetweenthetwocomponentsatpreciselytheintegral,indefinitelyrepeatingstoichiometryof
two amino acids per base (Carter and Kraut 1974b; Carter 1975). This, coincidental, integral
stoichiometry enabled a putative rudimentary stereochemical coding. Moreover, specific hydrogen
bondingbetweencarboxylgroupsofantiparallelβ-polypeptidedoublehelicesandRNA2’OHgroups
orientedthe3’OHgroupasalikelynucleophilesuggestingthat,inaddition,thepolarinteractionsmight
exhibittemplatedcross-catalysis,onepolymeracceleratingtheelongationoftheother.
Successive inverted repeats of complementary polypeptide•polynucleotide complexes increased their
lengthsfrom~12to~23to~46aminoacidsandfrom~3to~6to~12basepairs.Peptidesequencescapable
ofaminoacidactivationwouldhavebeenaplausibleconsequence,forwhichwehaveonlycircumstantial
evidence.Peptidesatleast46aminoacidsproducedbystereochemicalcodingbasedoncomplementary
van der Waals surfaces of peptide and RNA backbones might already have begun to exhibit ATP
dependent carboxyl group activation, potentiating assembly of peptides. Ligation might then have
assembledthefirstprotogenesandaproto-ribosome.Partialcomplementarityofthe5’and3’-terminal
halvesoftheClassIprotozymegene(Carter2015)suggestscodingofthebi-directionalprotozymegene
by an ancestral RNA hairpin. This reasoning suggests that polypeptide catalytic activities could have
precededevenrudimentarygeneticcoding(Kamtekar,etal.1993;Moffet,etal.2003;Patel,etal.2009).
The wobble effect (Crick 1966) implies that bi-directional coding would have required a triplet code to
providemorethan4codons.Weencounterhereasubstantivebrokensymmetry.Theprotozymegene
(138bases)is6-foldlongerthanwhateverputativeRNAhairpinmighthavebeenassociatedwiththe
earliest~46-residuepeptidesarisingviastereochemicalcoding.Assumingthatsuchasystemcouldhave
sustainedreproductionneverthelessleavesuswithasix-foldgapbetweentherelativestoichiometries
22
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
oftemplatedcrosscatalysisandthefirsttruegeneexpression.Transitionsfromaninitialstateinwhich
proteinsynthesisisinitiatedwithoutinformation-bearinggenetictemplatesisenvisionedinthetheory
ofcodingself-organizationandGRTsystems(Bedian1982;Eigen,etal.1988;Wills1993).Whatsortof
continuitymighthaveconnectedanearlierdirect,stereochemicalcodingtoindirect,symboliccodingby
introducingmessengerRNAandtheuseofadaptorstogivethemessagesmeaning?
Theearliestindirectcodingemulatedthedirectstereochemicalcodingarisingfromcomplementaryvander
WaalssurfacesofpeptideandRNAbackbones.AnalysisofaaRSrecognitionelementsintRNAacceptor
stem and anticodon bases highlighted the capacity of tRNA acceptor stems to encode the size and bcarbonbranching,butnotthehydrophobicityofaminoacids(CarterandWolfenden2015,2016).These
propertiesarenecessaryandsufficienttoencodepeptideswiththemostimportantcharacteristics—βbranchedsidechainsfavoringextendedβ-structureandalternatingsmall/largesidechainsallowingvan
derWaalsaccessononeface—forassumingstructurescomplementarytotheRNAminorgroove(Carter
and Kraut 1974b; Carter 1975). Symbolic coding by the tRNA acceptor stem could therefore have
implementedpreciselythosefeaturesnecessarytopreservemolecularmechanismsthatsustaineddirect
stereochemicalcoding.Aselectiveadvantageofthatsymbolicrepresentationintheproto-tRNAacceptor
stemwouldhavebeenthatitsmoothedthetransitionbetweendifferentstoichiometries—twoamino
acidsperbasevs.threebasesperaminoacid—necessarytoimplementsymboliccoding.
The ancestral bi-directional gene produced two amino acid activating enzymes, Class I with a modest
specificityforlargeraminoacids,ClassIIwithasimilarspecificityforsmalleraminoacids,inkeepingwith
thecontemporaryspecificitiesofClassIandIIaaRSandurzymes.Itisobviouslyofinteresttodetermine
howlimitedanaminoacidalphabetisconsistentwithcatalyticactivityofsuchprotozymegenes.Extant
experimentalresults,however,showonlythatbyutilizingthefullgeneticcodethetwogeneproducts
createdfromoppositestrandscanbothaccelerateaminoacidactivation~106-fold.TheClassIprotozyme
possessesaconsensusphosphatebindingsitesite(Hol,etal.1978),suggestingthatitscatalyticactivity
mayarisefrombackboneconfigurations,andnotdependentirelyon“catalyticresidues”.
TheearliestcatalystsofaminoacylationmayhavecombinedancestralaaRSwithribozymes(Turk,etal.
2010; Turk, et al. 2011). It has not been established whether or not the protozymes might also have
acceleratedtRNAchargingwiththeactivatedaminoacidproducts.tRNAacceptorstemIDelementslikely
composedtheearliestconnectionbetweenaminoacylatedRNAsandagenesequence(Schimmel,etal.
1993; Rodin, et al. 1996; Henderson and Schimmel 1997; Rodin and Rodin 2008; Rodin, et al. 2009;
Rodin, et al. 2011). Dependence of aaRS tRNA affinity on acquiring an additional, anticodon-binding
domain suggests that, in contrast to amino acid activation, aminoacylation may have originated in
polypeptide•RNAcollaborationandwaslatertakenoverbyurzymeswiththerudimentarycapabilityto
recognizetRNAacceptorstems(Li,etal.2013).
CONCLUSIONS
23
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
The continuing search for ever better RNA replicases (Wochner, et al. 2011; Attwater, et al. 2013;
SczepanskiandJoyce2014;Taylor,etal.2015;HorningandJoyce2016)hasbeennomeanachievement.
However,wearguehereforamoreholisticandambitioussetofgoalsthanthosefuelingthatsearch,
anticipating that data and theory in §I, §II, and (Wills and Carter 2017) will stimulate discussion and
furtherresearchonquestionsrelevanttotheoriginsofgeneexpression,biology’sreadoutmechanism.
Ahighdegreeofcoherenceconnectstheoriesofself-organizationtotheexperimental,structural,and
phylogeneticaspectsoftheevolutionoftheaaRSenzymesthatimplementgeneexpressiontoday.
1) Pronouncedinversionsymmetriesintheaminoacidsubstrates,catalyticresidues,tertiary,and
secondary structures are evident in phylogenetic, structural, and biochemical data for
contemporaryClassIandIIaaRSandarisefromtheirbi-directionallycodingancestry.
2) Inversionsymmetriesassuremaximalstructuralandfunctionaldifferentiationbetweenthetwo
classes,anecessarypreconditionfortheirsurvivalincompetitionwithparasiticmolecularforms.
3) tRNAidentityelementsthatimplementcodingefficientlycapturetheaminoacidphaseequilibria
thatdriveproteinfoldingandareoptimalforbi-directionalcoding(ZullandSmith1990).
4) Bi-directionalcodingandtheuniquenessofthecodingtablecreateareflexivefeed-backcycleto
guiderapidevolutionaryemergenceofproteinaaRSgenesbybootstrappingrapidlytoanoptimal
codingtableandmRNAsequences.Ribozymalassignmentcatalystslacksuchreflexivity.
5) Hybrid system expression dynamics show that any emerging PCW with a lower-dimensional
codingtablethanthatofapre-existingRCWwouldnecessarilyhavebeeneliminatedbypurifying
selectionbeforeithadsufficienttimetoexpandthedimensionofitscodingtable.
6) Coupling of dynamic equations for gene-translatase-replicase (GRT) systems suggest that
matchingoferrorratesmaximizedtheprobabilityoflaunchingreplicationandtranslation.
7) (1)-(6) imply that replication and readout emerged simultaneously from a peptide•RNA
partnership.WeoutlineamoreprobablescenariothananRNAworldfortheoriginofbiology.
8) Molecularconstructs(§I)enhancetheabilitytotestspecificelementsofproposedscenarios.
Acknowledgments
ThisworkwassupportedbyTheNationalInstituteofGeneralMedicalSciences,(grantnumbersR0178227&R01-90406toC.W.C.,Jr.).Thispublicationwasmadepossiblealsothroughthesupportofagrant
from the John Templeton Foundation. The opinions expressed in this publication are those of the
author(s)anddonotnecessarilyreflecttheviewsoftheJohnTempletonFoundation.H. Fried (Cursor
Scientific Editing and Writing, LLC)mademanyusefulsuggestionsonanearlierdraft.
24
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
REFERENCES
AndreevaA,HoworthD,ChandoniaJ-M,BrennerSE,HubbardTJP,ChothiaC,MurzinAG.2008.Data
growthanditsimpactontheSCOPdatabase:newdevelopments.Nucl.AcidsRes.36:D419-D425.
AravindL,AnantharamanV,KooninEV.2002.MonophylyofClassIAminoacyltRNASynthetase,
USPA,ETFP,Photolyase,andPP-ATPaseNucleotide-BindingDomains:ImplicationforProtein
EvolutionintheRNAWorld.PROTEINS:Structure,Function,andGenetics48:1–14.
AravindL,LeipeDD,KooninEV.1998.Toprim—aconservedcatalyticdomainintypeIAandII
topoisomerases,DnaG-typeprimases,OLDfamilynucleasesandRecRproteins.NucleicAcids
Research26:4205–4213.
AttwaterJ,WochnerA,HolligerP.2013.In-iceevolutionofRNApolymeraseribozymeactivity.
NatureChemistry5:1101-1018.
BedianV.1982.Thepossibleroleofassignmentcatalystsintheoriginofthegeneticcode..Orig.
Life12:181–204.
BedianV.2001.Self-descriptionandtheoriginofthegeneticcode.Biosystems60:39–47.
BerezovskyIN,GrosbergAY,TrifonovEN.2000.Closedloopsofnearlystandardsize:common
basicelementofproteinstructure.FEBSLetters466:283-286.
BergP,OfengandEJ.1958.AnEnzymaticMechanismforLinkingAminoAcidstoRNA.
ProcNatAcadSciUSA44:78-85.
BernhardtHS.2012.TheRNAworldhypothesis:theworsttheoryoftheearlyevolutionoflife
(exceptforalltheothers).BiologyDirect7:23.
BowmanJC,HudNV,WilliamsLD.2015.TheRibosomeChallengetotheRNAWorld.Journalof
MolecularEvolution80:143-161.
BreakerRR.2012.RiboswitchesandtheRNAWorld.ColdSpringHarbPerspectBiol4:a003566.
Caetano-AnollésD,Caetano-AnollésG.2016.PiecemealBuildupoftheGeneticCode,Ribosomes,
andGenomesfromPrimordialtRNABuildingBlocks.Life6:43.
Caetano-AnollesG,KimHS,MittenthalJE.2007.Theoriginofmodernmetabolicnetworksinferred
fromphylogenomicanalysisofproteinarchitecture.ProceedingsoftheNationalAcademyof
Sciences,USA1049358–9363.
Caetano-AnollésG,WangM,Caetano-AnollésD.2013.StructuralPhylogenomicsRetrodictsthe
OriginoftheGeneticCodeandUncoverstheEvolutionaryImpactofProteinFlexibility.PLoSONE
8:e72225.
CarterCW,Jr.2015.WhatRNAWorld?WhyaPeptide/RNAPartnershipMeritsRenewed
ExperimentalAttention.Life5:294-320.
CarterCW,Jr,KrautJ.1974a.AProposedModelforInteractionofPolypeptideswithRNA.
ProcNatAcadSciUSA71:283-287.
CarterCW,Jr.2016.AnAlternativetotheRNAWorld.NaturalHistory125:28-33.
CarterCW,Jr.2017.CodingofClassIandIIaminoacyl-tRNAsynthetases.ProteinReviewsInPress.
CarterCW,Jr.1993.CognitionMechanismandEvolutionaryRelationshipsinAminoacyl-tRNA
Synthetases.AnnualReviewofBiochemistry62:715-748.
CarterCW,Jr.1975.CradlesforMolecularEvolution.NewScientistMarch27:784-787.
CarterCW,Jr.2014.Urzymology:ExperimentalAccesstoaKeyTransitionintheAppearanceof
Enzymes.J.Biol.Chem.289:30213–30220.
CarterCW,Jr.,LiL,WeinrebV,CollierM,Gonzales-RiveraK,Jimenez-RodriguezM,ErdoganO,
ChandrasekharanSN.2014.TheRodin-OhnoHypothesisThatTwoEnzymeSuperfamilies
DescendedfromOneAncestralGene:AnUnlikelyScenariofortheOriginsofTranslationThatWill
NotBeDismissed.BiologyDirect9:11.
25
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
CarterCW,Jr.,WolfendenR.2016.Acceptor-stemandanticodonbasesembedaminoacidchemistry
intotRNA.RNABiology13:145–151.
CarterCW,Jr.,WolfendenR.2015.tRNAAcceptor-StemandAnticodonBasesFormIndependent
CodesRelatedtoProteinFolding.Proc.Nat.Acad.Sci.USA1127489-7494.
CarterCW,Jr.,,KrautJ.1974b.AProposedModelforInteractionofPolypeptideswithRNA.
ProceedingsoftheNationalAcademyofSciences,USA71:283-287.
CechTR.1986.TheInterveningSequenceRNAofTetrahymenaisanEnzyme.ScientificAmerican
255:64-75.
ChandrasekaranSN,YardimciG,ErdoganO,RoachJM,CarterCW,Jr.2013.StatisticalEvaluationof
theRodin-OhnoHypothesis:Sense/AntisenseCodingofAncestralClassIandIIAminoacyl-tRNA
Synthetases.MolecularBiologyandEvolution30:1588-1604.
ChurchGM,SussmanJL,KimSH.1977.SecondarystructuralcomplementaritybetweenDNAand
proteins.ProceedingsoftheNationalAcademyofSciences,USA74:1458-1462.
CrickFHC.1966.Codon-AnticodonPairing:TheWobbleHypothesis.JournalofMolecularBiology
19:548-555.
CrickFHC.1955.OnDegenerateTemplatesandtheAdaptorHypothesis.Unpublished;
https://profiles.nlm.nih.gov/ps/retrieve/Narrative/SC/p-nid/153.
CrickFHC.1968.TheOriginoftheGeneticCode.JournalofMolecularBiology38:367-379.
CullisPM,WolfendenR.1981.AffinitiesofNucleicAcidBasesforSolventWater?Biochemistry
20:3024-3028.
CusackS.1994.EvolutionaryImplications.NatureStructuralandMolecularBiology1:760.
CusackS,Berthet-ColominasC,HärtleinM,NassarN,LebermanR.1990.Asecondclassof
synthetasestructurerevealedbyX-rayanalysisofEscherichiacoliseryl-tRNAsynthetaseat2.5Å.
Nature347:249-255.
DelarueM.2007.Anasymmetricunderlyingruleintheassignmentofcodons:Possiblecluetoa
quickearlyevolutionofthegeneticcodeviasuccessivebinarychoices.RNA13:1-9.
DiGiulioM.1992.OntheOriginoftheTransferRNAMolecule.J.Theor.Biol.159:199-214.
DiGiulioM.2004.TheoriginofthetRNAmolecule:implicationsfortheoriginofproteinsynthesis.
JournalofTheoreticalBiology226:89–93.
DiGiulioM.2008.TransferRNAgenesinpiecesareanancestralcharacter.EMBOReports9:820.
DillK,ChanHS.1997.FromLevinthaltopathwaystofunnels.NatureStructuralBiology4:10-19.
EigenM.1971.SelforganizationofMatterandtheEvolutionofBiologicalMacromolecules.
Naturwissenschaften58:465-523.
EigenM,McCaskillJS,SchusterP.1988.MolecularQuasi-Species.J.Phys.Chem.92:6881-6891.
EigenM,SchusterP.1977.TheHypercyde:APrincipleofNaturalSelf-OrganizationPartA:
EmergenceoftheHypercycle.Naturwissenschaften64:541-565.
ErianiG,DelarueM,PochO,GangloffJ,MorasD.1990.PartitionoftRNASynthetasesintoTwo
ClassesBasedonMutuallyExclusiveSetsofSequenceMotifs.Nature347:203-206.
ForterreP,GribaldoS,BrochierC.2005.Luca:àlarechercheduplusprocheancêtrecommun
universel.MEDECINE/SCIENCES21:860-865.
FournierGP,AlmEJ.2015.AncestralReconstructionofaPre-LUCAAminoacyl-tRNASynthetase
AncestorSupportstheLateAdditionofTrptotheGeneticCode.JournalofMolecularEvolution
80:171-185.
FournierGP,AndamCP,AlmEJ,GogartenJP.2011.MolecularEvolutionofAminoacyltRNA
SynthetaseProteinsintheEarlyHistoryofLife.OrigLifeEvolBiosph41621–632
26
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
FreelandSJ,HurstLD.1998.TheGeneticCodeisOneinaMillion.JournalofMolecularEvolution
47:238-248.
Fü chslinRM,McCaskillJS.2001.Evolutionaryself-organizationofcell-freegeneticcoding..Proc
NatlAcadSciUSA98:9185–9190.
GilbertW.1986.TheRNAWorld.Nature319:618.
GladstoneE.2016.ErrorinInformationDiffusionProcesses.[[Ithaca,NY]:CornellUniversity.
Guerrier-TakadaL,N.,andAltman,S.1989.SpecificInteractionsinRNAEnzymes-Substrate
Complexes.Science246:1578-1584.
HendersonBS,SchimmelP.1997.RNA-RNAInteractionsBetweenOligonucleotideSubstratesfor
Aminoacylation.Bioorganic&MedicinalChemistry5:1071-1079.
HofstadterDR.1979.Gödel,Escher,Bach:aneternalgoldenbraid.NewYork:BasicBooks,Inc.
HolWJG,vanDuijnenPT,BerensenHJC.1978.Theα-helixdipoleandthepropertiesofproteins.
Nature273:443-446.
HordijkW,WillsPR,SteelM.2014.AutocatalyticSetsandBiologicalSpecificity.BullMathBiol
76:201–224.
HorningDP,JoyceGF.2016.AmplificationofRNAbyanRNApolymeraseribozyme.
ProcNatAcadSciUSA113:9786–9791.
KamtekarS,SchifferJM,XiongH,BabikJM,HechtMH.1993.ProteinDesignbyBinaryPatterningof
PolarandNon-polarAminoAcids.Science262:1680-1685.
KimS-H,QuigleyGJ,SuddathFL,McPhersonA,SnedenD,KimJJ,WeinzierlJ,RichA.1973.ThreedimensionalstructureofyeastphenylalaninetransferRNA:Foldingofthepolynucleotidechain..
Science179:285-288.
KooninEV.2011.TheLogicofChance:TheNatureandOriginofBiologicalEvolution.UpperSaddle
River,NJ:PearsonEducation;FTPressScience.
KooninEV.2015.WhytheCentralDogma:onthenatureofthegreatbiologicalexclusionprincipl.
BiologyDirect10:52.
KooninEV,NovozhilovAS.2009.OriginandEvolutionoftheGeneticCode:TheUniversalEnigma.
IUBMBLife,61:99–111.
LeipeDD,KooninEV,AravindL.2004.STAND,aClassofP-LoopNTPasesIncludingAnimaland
PlantRegulatorsofProgrammedCellDeath:Multiple,ComplexDomainArchitectures,Unusual
PhyleticPatterns,andEvolutionbyHorizontalGeneTransfer.J.Mol.Biol.343:1–28.
LeipeDD,WolfYI,KooninEV,AravindL.2002.ClassificationandEvolutionofP-loopGTPasesand
RelatedATPases.J.Mol.Biol.317:41-72.
LewisCA,Jr.,CrayleJ,ZhouS,SwanstromR,WolfendenR.2016.Cytosinedeaminationandthe
precipitousdeclineofspontaneousmutationduringEarth’shistory.ProcNatAcadSciUSA
113:8194-8199.
LiL,FrancklynC,CarterCW,Jr.2013.AminoacylatingUrzymesChallengetheRNAWorld
Hypothesis.J.Biol.Chem.288:26856-26863.
LiL,WeinrebV,FrancklynC,CarterCW,Jr.2011.Histidyl-tRNASynthetaseUrzymes:ClassIandII
Aminoacyl-tRNASynthetaseUrzymeshaveComparableCatalyticActivitiesforCognateAminoAcid
Activation.J.Biol.Chem.286:10387-10395.
MarkowitzS,DrummondA,NieseltK,WillsPR.2006.Simulationmodelofprebioticevolutionof
geneticcoding.In:RochaLM,YaegerLS,BedauMA,FloreanoD,GoldstoneRL,VespignaniA,
editors.ArtificialLife.Cambridge,MA:MITPress.p.152--157.
MartinP.2005.SpatialInterpolationinOtherDimensions.[OregonStateUniversity.
MartinezL,Jimenez-RodriguezM,Gonzalez-RiveraK,WilliamsT,LiL,WeinrebV,Niranj
ChandrasekaranS,CollierM,AmbroggioX,KuhlmanB,etal.2015.FunctionalClassIandIIAmino
27
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
AcidActivatingEnzymesCanBeCodedbyOppositeStrandsoftheSameGene.J.Biol.Chem.
290:19710–19725.
MoffetDA,FoleyJ,HechtMH.2003.Midpointreductionpotentialsandhemebinding
stoichiometriesofdenovoproteinsfromdesignedcombinatoriallibraries.BiophysicalChemistry
105:231–239.
MuñozV,SerranoL.1994.IntrinsicSecondaryStructurePropensitiesoftheAminoAcids,Using
Statisticalf-ymatrices:ComparisonwithExperimentalScales.PROTEINS:Structure,Function,and
Genetics20:301-311.
MurzinAG,BrennerSE,HubbardTJP,ChothiaC.1995.SCOP:astructuralclassificationofproteins
databasefortheinvestigationofsequencesandstructures.J.Mol.Biol.247:536-540.
NollerH.2004.Thedrivingforceformolecularevolutionoftranslation.RNA10:1833-1837.
NollerHF,HoffarthV,ZimniakL.1992.UnusualResistanceofPeptidylTransferasetoProtein
ExtractionProceduresScience256:1416-1419.
O’DonoghueP,Luthey-SchultenZ.2003.OntheEvolutionofStructureinAminoacyl-tRNA
Synthetases.MicrobiologyandMolecularBiologyReviews67:550–573.
OrgelLE.1968.EvolutionoftheGeneticApparatus.J.Mol.Biol.88:381-393.
OrgelLE.1963.Themaintenanceoftheaccuracyofproteinsynthesisanditsrelevancetoageing..
ProcNat.Acad.Sci.USA49:517-521.
PatelBH,PercivalleC,RitsonDJ,DuffyCD,SutherlandJD.2015.CommonoriginsofRNA,protein
andlipidprecursorsinacyanosulfidicprotometabolism.NatureChemstry7:301-307.
PatelSC,BradleyLH,JinadasaSP,HechtMH.2009.Cofactorbindingandenzymaticactivityinan
unevolvedsuperfamilyofdenovodesigned4-helixbundleproteins.ProteinScience18:1388-1400.
PearlFM,BennettCF,BrayJE,HarrisonAP,MartinN,ShepherdA,SillitoeI,ThorntonJ,OrengoCA.
2003.TheCATHdatabase:anextendedproteinfamilyresourceforstructuralandfunctional
genomics.NucleicAcidsResearch31:452-455.
PetrovAS,BernierCR,HsiaoC,NorrisAM,KovacsNA,WaterburyCC,StepanovVG,HarveySC,Fox
GE,WartellRM,etal.2014EvolutionoftheRibosomeatAtomicResolution.PNAS11110251–
10256.
PetrovAS,WilliamsLD.2015.TheAncientHeartoftheRibosomalLargeSubunit:AResponseto
Caetano-Anolles.JMolEvol80:166-170.
PhamY,KuhlmanB,ButterfossGL,HuH,WeinrebV,CarterCW,Jr.2010.Tryptophanyl-tRNA
synthetaseUrzyme:amodeltorecapitulatemolecularevolutionandinvestigateintramolecular
complementation.J.Biol.Chem.285:38590-38601.
PhamY,LiL,KimA,ErdoganO,WeinrebV,ButterfossG,KuhlmanB,CarterCW,Jr.2007.AMinimal
TrpRSCatalyticDomainSupportsSense/AntisenseAncestryofClassIandIIAminoacyl-tRNA
Synthetases.MolCell25:851-862.
RadzickaA,WolfendenR.1988.ComparingthePolaritiesoftheAminoAcids:Side-Chain
DistributionCoefficientsbetweentheVaporPhase,Cyclohexane,1-0ctanol,andNeutralAqueous
Solution.Biochemistry27:1664-1670.
RobertsonMP,JoyceGF.2012.TheOriginsoftheRNAWorld.ColdSpringHarbPerspectBiol
4:a003608.
RodinA,RodinSN,CarterCW,Jr.2009.OnPrimordialSense-AntisenseCoding.JournalofMolecular
Evolution69:555-567.
RodinAS,SzathmáryE,RodinSN.2011.OnoriginofgeneticcodeandtRNAbeforetranslation.
BiologyDirect6:14.
28
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
RodinSN,OhnoS.1995.TwoTypesofAminoacyl-tRNASynthetasesCouldbeOriginallyEncodedby
ComplementaryStrandsoftheSameNucleicAcid.OriginsofLifeandEvolutionoftheBiosphere
25:565-589.
RodinSN,RodinA.2006a.OriginoftheGeneticCode:FirstAminoacyl-tRNASynthetasesCould
ReplaceIsofunctionalRibozymesWhenOnlytheSecondBaseofCodonsWasEstablished.DNAand
CellBiology25:365-375.
RodinSN,RodinA.2006b.PartitioningofAminoacyl-tRNASynthetasesinTwoClassesCouldHave
BeenEncodedinaStrand-SymmetricRNAWorld.DNAandCellBiology25:617-626.
RodinSN,RodinA,OhnoS.1996.Thepresenceofcodon-anticodonpairsintheacceptorstemof
tRNAs.Proc.Nat.Acad.Sci.USA93:4537-4542.
RodinSN,RodinAS.2008.Ontheoriginofthegeneticcode:Signaturesofitsprimordial
complementarityintRNAsandaminoacyl-tRNAsynthetases.Heredity100:341–355.
RuffM,KrishnaswamyS,BoeglinM,PoterszmanA,MitschlerA,PodjarnyA,ReesB,ThierryJC,
MorasD.1991.ClassIIAminoacylTransferRNASynthetases:CrystalStructureofYeastAspartyltRNASynthetaseComplexedwithtRNAAsp.Science252:1682-1689.
SapienzaPJ,LiL,WilliamsT,LeeAL,CarterCW,Jr.2016.AnAncestralTryptophanyl-tRNA
SynthetasePrecursorAchievesHighCatalyticRateEnhancementwithoutOrderedGround-State
TertiaryStructures.ACSChemicalBiology11:1661−1668.
SchimmelP.1996.Originofgeneticcode:AneedleinthehaystackoftRNAsequences.Proc.Nat.
Acad.Sci.USA93:4521-4522.
SchimmelP,GiegéR,MorasD,YokoyamaS.1993.AnoperationalRNAcodeforaminoacidsand
possiblerelationshiptogeneticcode.ProceedingsoftheNationalAcademyofSciences,USA
90:8763-8768.
SchneiderTD.2010.Abriefreviewofmolecularinformationtheory.NanoCommunication
Networks1173–180.
SczepanskiJT,JoyceGF.2014.Across-chiralRNApolymeraseribozyme.Nature515:440-442.
SmithJI,SteelM,HordijkW.2014.Autocatalyticsetsinapartitionedbiochemicalnetwork.Journal
ofSystemsChemistry5:2.
SoucySM,HuangJ,GogartenJP.2015.Horizontalgenetransfer:buildingtheweboflife.Nature
ReviewsGenetics16:472
SutherlandJD.2015.TheOriginofLife—OutoftheBlue.Angew.Chem.Int.Ed.54:InPress.
TaylorAI,PinheiroVB,SmolaMJ,MorgunovAS,Peak-ChewS,CozensC,WeeksKM,HerdewijnP,
HolligerP.2015.Catalystsfromsyntheticgeneticpolymers.Nature518:427-430.
TuerckC,GoldL.1990.Systematicevolutionofligandsbyexponentialenrichment:RNAligandsto
bacteriophageT7DNApolymerase.Science249:505-510.
TurkRM,ChumachenkobNV,YarusM.2010.Multipletranslationalproductsfromafive-nucleotide
ribozyme.ProceedingsoftheNationalAcademyofSciencesUSA107:4585–4589.
TurkRM,IllangasekareM,YarusM.2011.CatalyzedandSpontaneousReactionsonRibozyme
Ribose.J.AM.CHEM.SOC.133:6044–6050.
VanNoordenR.2009.RNAWorldEasiertoMake.Naturepublishedonline.
WarrantRW,KimS-H.1978.a-Helix-doublehelixinteractionshowninthestructureofa
protamine-transferRNAcomplexandanucleoprotaminemodel.Nature271:130-135.
WatsonJD,CrickFHC.1953.AStructureforDeoxyriboseNucleicAcid.Nature171:737-738.
WillsPR.1994.Doesinformationacquiremeaningnaturally?.BerichtederBunsengesellschaftfür
PhysikalicheChemie98:1129–1134.
WillsPR.2016.Thegenerationofmeaningfulinformationinmolecularsystems.Phil.Trans.R.Soc.
AA374:20150016.
29
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
WillsPR.2009.InformedGeneration:Physicaloriginandbiologicalevolutionofgeneticcodescript
interpreters.JournalofTheoreticalBiology257:345–358.
WillsPR.1993.Self-organizationofgeneticcoding.J.Theor.Biol.162:267-287.
WillsPR.2004.Stepwiseevolutionofmolecularbiologicalcoding.In:PollackJ,BedauM,Husbands
P,IkegamiT,WatsonRA,editors.ArtificiallifeIXCambridge::MITPress.p.51–56.
WillsPR,CarterCW,Jr.2017.Insuperableproblemsofaninitialgeneticcodeemergingfroman
RNAWorld.BiosystemsInPreparation.
WillsPR,NieseltK,McCaskillJS.2015.EmergenceofCodinganditsSpecificityasaPhysicoInformaticProblem.OrigLifeEvolBiosphpublishedonline;paginationnotyetavailable.
WochnerA,AttwaterJ,CoulsonA,HolligerP.2011.Ribozyme-CatalyzedTranscriptionofanActive
Ribozyme.Science332209-212.
WoeseC.1967.TheGeneticCode.NewYork:Harper&Row.
WoeseCR.1965a.OntheOriginoftheGeneticCode.ProceedingsoftheNationalAcademyof
SciencesUSA54:1546-1552.
WoeseCR.1965b.OrderintheOriginoftheGeneticCode.ProceedingsoftheNationalAcademyof
SciencesUSA54:71-75.
WoeseCR,DugreDH,SaxingerWC,DugreSA.1966.Themolecularbasisforthegeneticcode.Proc.
Natl.Acad.Sci.USA55:966–974.
WoeseCR,OlsenGJ,IbbaM,SollD.2000.Aminoacyl-tRNASynthetases,theGeneticCode,andthe
EvolutionaryProcess.MicrobiologyandMolecularBiologyReviews64:202–236.
WolfYI,AravindL,GrishinNV,KooninEV.1999.EvolutionofAminoacyl-tRNASynthetases—
AnalysisofUniqueDomainArchitecturesandPhylogeneticTreesRevealsaComplexHistoryof
HorizontalGeneTransferEvents.GenomeResearch9:689–710.
WolfYI,KooninEV.2007.OntheoriginofthetranslationsystemandthegeneticcodeintheRNA
worldbymeansofnaturalselection,exaptation,andsubfunctionalization.BiologyDirect2:14.
WolfendenR,CullisPM,SouthgateCCF.1979.Water,ProteinFolding,andtheGeneticCode.Science
206:575-577.
WolfendenR,LewisCA,YuanY,CarterCW,Jr.2015.Temperaturedependenceofaminoacid
hydrophobicities.Proc.Nat.Acad.Sci.USA1127484-7488.
WolfendenR,SniderMJ.2001.TheDepthofChemicalTimeandthePowerofEnzymesasCatalysts.
AccountsofChemicalResearch34:938-945.
WongJT-F.2005.Coevolutiontheoryofthegeneticcodeatagethirty.BioEssays27:416–425.
WongJT-F,NgS-K,MatW-K,HuT,XueH.2016.CoevolutionTheoryoftheGeneticCodeatAge
Forty:PathwaytoTranslationandSyntheticLife.Life6:12.
XueH,NgS-K,TongK-L,WongJT-F.2005.CongruenceofevidenceforaMethanopyrus-proximal
rootoflifebasedontransferRNAandaminoacyl-tRNAsynthetasegenes.Gene360:120–130.
YarusM.2011a.GettingPasttheRNAWorld:TheInitialDarwinianAncestor.ColdSpringHarb
PerspectBiol3:a003590.
YarusM.2011b.LifefromanRNAWorld:Theancestorwithin.Cambridge,MA:HarvardUniversity
Press.
ZullJE,SmithSK.1990.Isgeneticcoderedundancyrelatedtoretentionofstructuralinformationin
bothDNAstrands?TrendsinBiochemicalSciences15:257-261.
30
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
FigureLegends
Figure1.Informationflowinmolecularbiology.A.TheCentralDogmaissupplementedbythe“adaptor”
hypothesis. The dashed triangle represents the crucial elements of Crick’s original insight, which
necessarilyimplicatesbothtRNAandaaRS.B.Thephysico-chemicalpropertiesoftheaminoacidsdefine
the nano-scale “ecologies” within folded proteins, creating the intersection between genome and
proteome. These ecologies also drove the selection of tRNA identity elements, analogous to a
programming language, as well as protein folding. As a consequence, they also drive the selection of
aminoacidsequencesinmRNAgenesequences(mRNA),analogoustocomputerprograms.C.Network
analysisoftheCentralDogmaconsistsofthenodesofatetrahedron.EmbeddingthetrianglefromAinto
the ecology in B reveals a uni-directional feedback cycle or self-referential element as generator of
complexity in the spirit of Gödel’s incompleteness theorem (Hofstadter 1979). Genetic instructions
assembleaminoacidsaccordingtotheirphysicalpropertiesinwaysthat,whentranslatedaccordingto
theprogramminglanguageintRNA,yieldfunctionalproteins(enzymes,switches,regulators).AARSwith
cognate tRNAs furnish reflexive elements (orange arrow) connecting their gene sequences, via their
foldedstructures,totheenzymesthatenforcethecodingrulesinthecodontable.Physicalpropertiesof
aminoacidsandthecodonassignmenttableare“fixed”becausetheyaregovernedbychemicalequilibria.
Thegenomeandproteomearedynamically-determinedbiologicalprocessesthatformthebasisforthe
evolutionofdiversitythroughself-organizationandnaturalselectionofphenotypes.
Figure2.QuasispeciesbifurcationsinaaRSgeneorproteinsequencespace.A.Asinglequasispeciesof
undifferentiatedfunction makingrandomassignmentsX®yofcodons(X)toaminoacids(y)cannot
transmitgeneticinformation.Norcaniteasilybifurcatetoapairofnarrowerquasispecies.Bi-directional
codingancestryofthecontemporaryaaRScreatedsuitablequasispeciesdenovo{I,II;redandblue;bold
italicsexplicitlyindicatingClassI,IIaaRS}eachseparatelysupportingbinarycodingassignmentsI®i
andII®iofspecificsubsetsofcodons{I,II}tocorrespondingsubsetsofaminoacids{i,ii}.Thatdoublehelicalgenewithdualsingle-strandinterpretationsovercametheinitialandmostsubstantialbarrierto
theemergenceofgeneticcodingbypartitioningproteinsequencespacedecisivelyintotwofunctionally
distinctpopulations.TheplanebetweentheIandIIquasispeciesisalocalrepresentationoftheinversion
operator that transforms a sequence into its complement read in the reverse direction. B. Daughter
populationdistributionsderivedfromnearlysimultaneousbifurcationofthetwoancestralbinarycoding
quasispecies(expandedview)intosmallerseparatesub-populationsofgenesandassignmentcatalysts
operating a 4-letter code {Ia ® ia, Ib ® ib, IIa ® iia, IIb ® iib,}. Genetic coding bi-directionality is
preserved through complementary gene pairs IaÛIIa and IbÛIIb. Recapitulation of the bifurcation
process would further specialize related species, each step being progressively easier owing to the
increasedcodingspecificity,buteventuallylosingtheabilitytouseinformationinbothstrandsofgenes.
31
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 3. Reflexivity is an exclusive property of protein aaRS that arises from nano-environmental
sensing. A. The putative ancestral amino acid activating protozyme, substantiated experimentally in
(Martinez,etal.2015)furnishestwoassignmentcatalysts,eachexecutingacomplementaryassignment,
oneforlarge,theotherforsmallaminoacidsidechains.Eachalsocontributestothetranslationofthe
other.B.AstheassignmentcatalystsinAareproteins,theirfoldingreactionsaregovernedbythephase
transferequilibriaoftheaminoacids,sensingthenano-environmentinanecessarypreludetofunction.
Figure4.Feedbackinriboyzymal(RCW)andprotein(PCW)GRTnetworks.A.CoupledReplicaseand
Translataseproduction.DifferentialequationsforgeneexpressioninPCWandRCWarecomparedfor
RNAsandProteins(WillsandCarter2017).Solidlinesindicateautocatalyticacceleration.Dashedarrows
forma(hyper)cyclecouplingproductiondynamicsofReplicaseandtranslataseinthePCW,butnotin
anyRCW.B.InanRCW,codingrules[C]areimplementedbyribozymalassignmentcatalysts{T[C]}that
cannot sense the phase transfer equilibria accessible to protein assignment catalysts. Thus, natural
selectionistheonlyfeedbackcycle.Non-aaRSfunctionalproteins{Pi}furnishtheonlysourceofselective
advantage,andhavenodirectinfluenceonthecodingrules.C.InthePCW,codingrulesareexecutedby
proteinsthatmustfirstfold.Atighterfeedbackloop(greenarrows)isastructuralfeatureofthereaction
network.Proteinfoldingrulesdeterminethefunctionoftheassignmentcatalystsandthereforealsothe
eventualchoiceofcodonassignments,substantiallyenhancingnaturalselection.
Figure5.Impedance-matchingeaseselaborationofcodingfroma2-letteraminoacidalphabettoafull
20letteralphabet.Noise,N,inthegeneticsignal,S,ontheyaxis,servesastheprimaryobstacleopposing
informationtransferintranslation.Increasedcomplexity,onthexaxis,mustbeaccompaniedbyreduced
error rates. The error tolerance curve is a hyperbola in which the product of error frequency by
complexity, N/S*Ψ, equals the minimum cost of an error, εmin, as estimated by Schneider (Schneider
2010). By analogy to the gears on a bicycle’s derailleur, enlarging the alphabet size increases coding
capacity,providingaseriesofmatcheswiththehyperbolicboundingerrortolerancecurve(darkblue),
easingthepathtoincreasedfidelitybyenablingsteppedincreasesincodingcapacityandcomplexity.
Figure 6. Alternative evolution of Class I aaRS catalytic domains consistent both with phylogenetic
analysisandthemoreancientancestryofthemosthighlyconservedmodulesinthetwoaaRSClasses.
Thisscenariore-definestheClassICP1insertionbetweentheN-andC-terminalmodules,Ncore (blue)
and Ccore (red), of the Class I Urzymes, both of which are portrayed as ancestral to all Rossmannoid
superfamilies(AdaptedfromFig.4of(Aravind,etal.2002)).TheinitialCP1insertion(white)istheorigin
of most subsequent elaborations of the Class I catalytic domains that appear to have provided the
requisite increases in specific amino acid recognition (Carter 2015). Idiosyncratic Class I anticodonbinding domains are and not considered here. We distinguish three phases of aaRS evolution: I. BidirectionallycodedwithClassII;limiteddiversity;II.CP1enforcesstrandspecialization;III.Hydrolytic
editingenhancesspecificity.
32
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Fig.1
Fig.2
Fig.3
33
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Fig.4
Fig.5
Fig.6
34
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            