Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty Geeta N. Eick,1,2 Jamie T. Bridgham,1 Douglas P. Anderson,1,3 Michael J. Harms,1,3 and Joseph W. Thornton*,4 1InstituteofEcology& EvolutionaryBiology,UniversityofOregon,Eugene,OR 2DepartmentofAnthropology,UniversityofOregon,Eugene,OR 3InstituteofMolecularBiology,UniversityofOregon,Eugene,OR 4DepartmentofEcology& EvolutionandDepartmentofHumanGenetics,UniversityofChicago,Chicago,IL *Correspondingauthor:E-mail:[email protected] Associateeditor:MichaelS.Rosenberg D Abstract o w n Hypothesesaboutthefunctionsofancientproteinsandtheeffectsofhistoricalmutationsonthemareoftentestedusing loa d ancestralproteinreconstruction(APR)—phylogeneticinferenceofancestralsequencesfollowedbysynthesisandexper- e d imental characterization. Usually, some sequence sites are ambiguously reconstructed, with two or more statistically fro m plausiblestates.Theextenttowhichtheinferredfunctionsandmutationaleffectsarerobusttouncertaintyaboutthe h ancestral sequence has not been studied systematically. To address this issue, we reconstructed ancestral proteins in ttp s threedomainfamiliesthathavedifferentfunctions,architectures,anddegreesof uncertainty;wethenexperimentally ://a c characterizedthefunctionalrobustnessoftheseproteinswhenuncertaintywasincorporatedusingseveralapproaches, a d includingsamplingaminoacidstatesfromtheposteriordistributionateachsiteandincorporatingthealternativeamino em acidstateateveryambiguoussiteinthesequenceintoasingle“worstplausiblecase”protein.Ineverycase,qualitative ic.o conclusionsabouttheancestralproteins’functionsandtheeffectsofkeyhistoricalmutationswererobusttosequence up .c uncertainty, with similar functions observed even when scores of alternate amino acids were incorporated. There was o m somevariationinquantitativedescriptorsoffunctionamongplausiblesequences,suggestingthatexperimentallychar- /m b acterizing robustness is particularly important when quantitative estimates of ancient biochemical parameters are e /a desired. The worst plausible case method appears to provide an efficient strategy for characterizing the functional rtic robustness of ancestral proteins to large amounts of sequence uncertainty. Sampling from the posterior distribution le -a sometimesproducedartifactuallynonfunctionalproteinsforsequencesreconstructedwithsubstantialambiguity. b s Keywords:ancestralproteinreconstruction,ancestralsequencereconstruction,proteinevolution. trac t/3 4 /2 /2 Introduction A47/2 r4 t49 Ancestral protein reconstruction (APR)—phylogenetic infer- combiningexplicitreconstructionofhistorywithexperimental i9 c6 enceofancientproteinsequences,followedbygenesynthesis, analysis of molecular properties, APR can bring the decisive le5 by expression,andexperimentalcharacterization—hasbecomea inference style of molecular biology to questions about the g u widelyusedstrategytoexperimentallytesthypothesesabout mechanismsanddynamicsbywhichproteinsevolved. e s thefunctionalandbiochemicalpropertiesofancientproteins The advantages of APR, however, depend upon the reli- t o n (Jermannetal.1995;Chandrasekharanetal.1996;Changetal. ability of the inferred ancient sequences. Most APR studies 0 5 2002;Thorntonetal.2003;Thomsonetal.2005;Gaucheretal. havereconstructedancestralsequencesusingthemaximum A p 2008; Hobbs et al. 2012; Akanuma et al. 2013; Bar-Rogovsky likelihood(ML)approach,whichyieldsasinglebestestimate Fril 2 a0 etal.2013;Rissoetal.2013;Williamsetal.2013;Boucheretal. oftheancestralsequence(Yangetal.1995;Pupkoetal.2000). 1 s9 2014; Akanuma et al. 2015; Bickelmann et al. 2015; Carrigan Beginning with an alignment of present-day protein se- t et al. 2015; Clifton and Jackson 2016; Devamani et al. 2016; quences and a phylogeny of those sequences, ML-APR uses T Steindeletal.2016).APRhasalsobeenusedtoexperimentally a probabilistic Markov model of the process of sequence r a determinethe effects ofspecific historical changes in protein evolutiontocalculatethelikelihoodofeverypossibleances- c sequenceonthepropertiesofancientproteinsbyintroducing tralstateateverysiteintheproteinsequenceforanyinternal k mutations that recapitulate ancient sequence substitutions nodeof interest onthe phylogeny. The likelihoodof an an- into reconstructed ancestral proteins (Zhang and Rosenberg cestralaminoacidstateatsomesiteisdefinedastheprob- 2002; Bridgham et al. 2006; Kaiser et al. 2007; Ortlund et al. ability that all the states at this site found in present-day 2007; Yokoyama et al. 2008;Lynch et al.2011; Finniganet al. proteins at the tips of the tree would have evolved given 2012;Harmsetal.2013;Smithetal.2013;Wilsonetal.2015).By that ancestral state, the phylogeny, and the model. The (cid:2)TheAuthor2016.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution. ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense (http://creativecommons.org/licenses/by-nc/4.0/),whichpermitsnon-commercialre-use,distribution,andreproductioninany Open Access medium,providedtheoriginalworkisproperlycited.Forcommercialre-use,[email protected] Mol.Biol.Evol.34(2):247–261 doi:10.1093/molbev/msw223 AdvanceAccesspublicationOctober30,2016 247 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018 MBE Eicketal. . doi:10.1093/molbev/msw223 posteriorprobability(PP)ofthatstatecanthenbeexpressed cutoff, such as 0.2). The experimental characterization is as the likelihood of that state (weighted by its equilibrium then repeated for each of these single-residue neighbors of frequency, which serves as theprior) divided by thesum of theMLsequence(YokoyamaandRadlwimmer2001;Zhang theprior-weightedlikelihoodsofall20possiblestatesatthat andRosenberg2002;Ugaldeetal.2004;Thomsonetal.2005; site.Themaximumlikelihood(ML)estimateoftheancestral Bridgham et al. 2006; Goldschmidt et al. 2008; Carroll et al. state (or, strictly speaking, the maximum a posteriori esti- 2011;Eicketal.2012;Finniganetal.2012;Bickelmannetal. mate)istheonewiththehighestprior-weightedlikelihood. 2015).Thisapproachissufficienttodeterminetheimpactof TheMLestimateoftheancestralsequenceisthestringofML eachplausiblealternateaminoacid—andthusofuncertainty states at all sites; it represents the ancestral sequence that ateachambiguoussite—inisolationoninferencesaboutthe maximizes the conditional probability that all observed ex- ancestral protein’s function. If ambiguously reconstructed tantsequencesinthealignmentwouldhaveevolved. sites interact epistatically, however, it is possible that the The ML sequence is the best point estimate of the true true ancestral sequence may have functions different from ancestralsequence,butitisseldominferredwithcertainty.In those of the ML ancestor and its immediate single-variant D o virtuallyallreal-worldcases,thereconstructedMLsequence neighbors. w n contains some ambiguously inferred sites. At such sites, the Asecondstrategy,usedinonlyafewcases,introducesall lo a PP of the ML state is less than 1, and less likely but still oftheplausiblealternatestatesintoasingleproteinandthen de d plausible alternative states exist. Uncertainty typically arises functionallycharacterizesthis“worstplausiblecase”protein, fro whenthepatternofaminoacidsacrossthetipsofthephy- which we refer to as the AltAll reconstruction. (Akanuma m h logeny requires numerous independent changes of state et al. 2013; Bridgham et al. 2014; McKeown et al. 2014; ttp given any of the plausible ancestral amino acids and these Anderson et al. 2016). The AltAll sequence contains more s://a scenarioshaveprobabilitiesthatarenotdramaticallydifferent errorsthananyotherplausiblereconstructionandistypically c a d fromeachothergiventhetreeanditsbranchlengths. much more different from the ML protein than the true e m ThisuncertaintyimpliesthatalthoughtheMLsequenceis ancestralproteinisexpectedtobe.Itscharacterizationthere- ic .o the most accurate estimate of the true ancestral sequence forerepresentsafairlyconservativetestoffunctionalrobust- u p given the data, model, and tree, it is likely to contain some nesstosequenceuncertainty:iftheMLandAltAllancestors .c o m erroneouslyreconstructedstates.Considera200aminoacid both lead to the same inference of the ancestral protein’s /m ancestral protein containing 180 unambiguously recon- function, it is assumed that the correct ancestral se- b e structed sites (PP¼1.0) and 20 sites at which two states quence—which most likely lies between these sequences, /a are plausible with PPs of 0.8 and 0.2. The probability that butmuchclosertotheMLsequence—islikelytosharethe rticle the ML sequence is correct at every single site is 1180 (cid:2) inferredfunction,aswell.Thisapproachaddressespotential -a b 0.820¼1.2%,andtheexpectednumberoferroneousresidues epistatic interactions among plausible alternative states. It stra inthesequenceis4.TheMLreconstructionsitsatthecenter alsoispracticableincasesinwhichthenumber ofambigu- ct/3 of a cloud of plausible alternative sequences, each of which ously reconstructed residues is so large that the strategy of 4/2 containsalternativestatesatsomeoftheambiguouslyrecon- creatingallsingle-mutantneighborsisimpractical. /2 4 7 structedsites;withincreasingdistancefromtheMLsequence, AthirdstrategyistouseBayesiansamplingtoconstructa /2 4 expectedaccuracydeclines.TheMLsequence’splausibleim- set of sequences by choosing an amino acid state from the 4 9 9 mediateneighbors—thereconstructionscontainingtheplau- posterior probability distribution of ancestral states at each 6 5 sible alternative state at one site—each has only a 0.3% site.Severalsuchsampledproteinsarethenconstructedand b y chanceofbeingpreciselycorrectand4.6expectederrors.In assayed to provide some indication of the distribution of g u e the set of sequences containing two plausible alternative functions associated with the posterior probability distribu- s t o states, each sequence has a 0.07% chance of being correct tion of sequences (Williams et al. 2006; Pollock and Chang n 0 and 5.2 expected errors, and so on. At the far edge of the 2007;Gaucheretal.2008;Hobbsetal.2012;Hartetal.2014; 5 A cloud of plausible sequences, the sequence containing the Howardetal.2014;Rissoetal.2014;Bickelmannetal.2015). p alternatestateatall20ambiguouslyreconstructedsiteshas Thisstrategycalculatestheposteriorprobabilitydistribution ril 2 0 onlya10(cid:3)14probabilityofbeingcorrectandisexpectedto of ancestral sequence states given model parameters esti- 19 contain16errors. matedfromthedatabymaximumlikelihood,soitrepresents The goal of APR is to determine the ancestral protein’s an empirical Bayesian rather than fully Bayesian technique. functional characteristics, not its precise sequence. The strategy is appealing in principle, but a concern is the Nevertheless, experiments to determine function depend possibleproductionofnonfunctionalproteins:theensemble upon inference of the sequence. Thus, a crucial question in of all possible sequences contains far more very low- anystudyusingAPRistheextenttowhichthepropertiesof probability than high-probability sequences, particularly reconstructedancestralproteinsarerobusttostatisticalun- when ambiguity in the reconstruction is high, and if recon- certaintyabouttheirprimarysequence.Moststudiestodate structionerrorsaremorelikelytobefunctionallydeleterious thathaveaddressedthisquestionhavedonesobygenerating thanbeneficial,theremaybeabiastowardsnon-functionalor variantsoftheMLancestralsequence,eachofwhichcontains poorlyfunctioningproteins.Avariationofthisapproachisto a plausible alternate amino acid at one of the ambiguously generatealargenumberofancestralsequencesbysampling reconstructed sites (typically defined as amino acids with a from the posterior distribution, and then experimentally posterior probability above some arbitrary but reasonable characterizereconstructionsfromthisensemblethathavea 248 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018 MBE RobustnessofAncestralFunctionstoStatisticalUncertainty . doi:10.1093/molbev/msw223 highlikelihoodofbeingcorrect(Eicketal.2012).Suchhigh- Pins-binding and spindle orienting functions and to abolish probability sampled proteins also typically combine large theancestralguanylatekinaseenzymeactivity. numbers of plausible alternate states—helping to address The second family of protein domains we evaluated theissueofepistasis—butnotasmanyasthehighlyconser- wastheDNAbindingdomain(DBD)ofsteroidhormone vativeAltAllancestors. receptors(SRs).SRsareligand-activatedtranscriptionfac- The AltAll and Bayesian approaches have been used in tors, the DBDs of which bind as dimers to palindromic onlyahandfulofcases,soitremainsunclearhowgenerally responseelementscomposedofinvertedrepeatsofasix robustfunctionalinferencesusingAPRaretosimultaneous basehalf-site.TherearetwophylogeneticclassesofSRsin incorporationofmultiplealternatestates.Tobetterunder- vertebrates, which differ in their DNA-specificity at the standthisissue,weappliedtheAltAllandBayesiansampling twocentralbasesofthehalf-site:estrogenreceptors(ERs) strategiestoreconstructedancestralproteinsinthreediffer- bindpreferentiallytoestrogenresponseelements(EREs), entfamiliesofproteindomains—guanylatekinaseenzyme/ which are palindromes of AGGTCA; the other clade, protein interaction domains, steroid hormone receptor which contains receptors for progestagens, androgens, D o DNA-binding domains, and steroid hormone receptor and gluco- and mineralocorticoids (PR, AR, GR, and w n ligand-binding/transactivation domains. These families MR), bindstosteroidresponseelements(SREs),thepro- lo a have different kinds of functions and architectures, and totype of which is a palindrome of AGAACA. A recent de d theyvaryintheamountofambiguityintheirancestralre- publication reconstructed the last common ancestor of fro constructions.Eachhasbeenstudiedinpreviouslypublished the two clades (AncSR1DBD) and found that it bound m h papersusingML-APRtocharacterizethefunctionsofrecon- and activated transcription specifically on EREs, with no ttp structed ancestral proteins and to identify key historical activationandmuchloweraffinityforSREs.SRErecogni- s://a mutations that, when introduced into the reconstructed tion emerged on the daughter branch leading to the an- c a d ancestralsequence,recapitulatemajorshiftsinproteinfunc- cestor of the PR, AR, GR, and MR (AncSR2DBD), which e m tion. For each family, we compared the functions of the activated solely on SREs with no activation from and ic .o AltAllandBayesianancestorswiththoseoftheMLancestor loweraffinityforEREs.Asetof14historicalsubstitutions, u p todeterminewhetherthepublishedinferencesarerobustto when introduced into the AncSR1DBD, was sufficient to .c o m incorporation of uncertainty at multiple sites using these recapitulate the switch in specificity (McKeown et al. /m strategies. We also compared the effects of introducing 2014; Anderson et al. 2015). b e the key historical substitutions into the ML and AltAll an- The third domain family is the ligand-binding domain /a cestorstocharacterizetherobustnessoftheseinferencesto (LBD)oftheSRs.SRLBDsspecificallybindsteroidhormones rticle uncertaintyabouttheancestralsequence. andregulatetranscriptionoftargetgenesneartheresponse -a b s elementto whichtheirDBDisanchored.Thetwophyloge- tra Results neticclassesofvertebrateSRs alsodifferintheir ligandspe- ct/3 cificity.ERsarespecificallyactivatedbyestrogens,whichhave 4/2 ExemplarProteinFamilies an aromatized A-ring (where the four rings on the steroid /2 4 7 We focused on three families of protein domains that have backbonearedenotedwiththelettersA-D).PR,AR,GR,and /2 4 been the subject of previous work using ML-APR. The first, MR, in contrast, all bind steroids with a non-aromatized A- 4 9 9 theGKdomainfamily,containstwostructurallysimilarbut ring.ArecentpairofpublicationsreconstructedtheLBDof 6 5 functionally distinct groups: the guanylate kinase (gk) en- AncSR1 and found that it had ER-like specificity for aroma- b y zymes, which catalyze the transfer of a phosphoryl group tizedestrogens,whereastheLBDofAncSR2—likePR,AR,GR, g u e from ATP to GMP, and the GK domains, which mediate andMR—specificallyactivatedtranscriptioninthepresence s PID t o protein–protein interactions (Anderson et al. 2016). A bio- ofnon-aromatizedsteroidsbutnotestrogens(Eicketal.2012; n 0 logicalfunctionofGK sistobindaproteincalledPins,an Harms et al. 2013). Two substitutions that occurred on the 5 PID A associationthatiscrucialfororientationofthemitoticspin- branchbetweenAncSR1LBDandAncSR2LBDweresufficient p dle in animal cells. All forms of life contain gk enzymes; a to switch ligand preference, restoring estrogen-specific acti- ril 2 0 duplication of the gk gene and subsequent sequence diver- vation when the ancestral states were introduced into 19 gence produced the GK , which is found only in animals AncSR2LBDandrecapitulatingactivationbynon-aromatized PID and closely related unicellular protists. A recent study steroids when the derived states were introduced into (Anderson et al. 2016) reconstructed ancestral GK proteins AncSR1LBD(Harmsetal.2013). andfoundthatthelastcommonancestorofgkenzymesand These studies all used the ML sequence as the basis for GK s—calledAnc-gk ,whichexistedjustbeforethegene their published inferences about functions and mutational PID dup duplication that produced the two separate proteins—was effects.Thedegreeofuncertaintyvariedamongtheancestral aneffectiveguanylatekinasewithnoPins-bindingorspindle- proteinsstudied,withAnc-gk havingrelativelyhighconfi- dup orientingactivity.Itsdaughternodeonthetree—theances- dence(mean PP over sites¼0.94, 11% ofsites ambiguously tor of all GK s—bound Pins with moderate affinity and reconstructed), AncSR1DBD having medium confidence PID couldmediatespindleorientationinculturedcellsinwhich (meanPP¼0.87,15%ambiguoussites),andAncSR1LBDhav- endogenous GK had been disabled. A single substitution ing very high uncertainty (mean PP¼0.70, 26% ambiguous PID (s36P) that occurred on the branch between these two an- sites).Ambiguouslyreconstructedsitesweredefinedasthose cestral proteins was sufficient to confer on Anc-gk both atwhichmorethanonestatehadposteriorprobability>0.20 dup 249 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018 MBE Eicketal. . doi:10.1093/molbev/msw223 Table1. SummaryStatisticsforAncestralReconstructionsinthisStudy. Ancestor LnPosteriorProbability(PP) LnPPUnitsWorseThanML AveragePP #ExpectedErrors #SitesDifferentFromMLAncestor Anc-gk dup ML (cid:3)14.6 – 0.94 12 0/187 AltALL (cid:3)27.8 13.2 0.91 18 20/187 Bayes1 (cid:3)25.4 10.8 0.92 15 7/187 Bayes2 (cid:3)28.1 13.5 0.92 15 5/187 Bayes3 (cid:3)29.9 15.3 0.91 17 11/187 Bayes4 (cid:3)34.9 20.4 0.91 17 13/187 Bayes5 (cid:3)36.7 22.1 0.91 17 10/187 AncSR1DBD ML (cid:3)15.6 – 0.87 11 0/82 AltALL (cid:3)20.1 4.5 0.85 12 12/82 Bayes1 (cid:3)23.0 7.4 0.85 12 7/82 D Bayes2 (cid:3)29.4 13.8 0.82 15 12/82 o w Bayes3 (cid:3)30.0 14.4 0.84 13 9/82 n lo Bayes4 (cid:3)39.5 23.9 0.82 15 10/82 a d Bayes5 (cid:3)40.0 24.4 0.82 16 12/82 e d AncSR1LBD fro ML (cid:3)123.0 – 0.69 76 0/249 m AltALL (cid:3)153.6 30.6 0.64 88 65/249 h Best1 (cid:3)176.6 53.6 0.64 89 57/249 ttps Best2 (cid:3)177.3 54.3 0.65 88 55/249 ://a Best3 (cid:3)177.5 54.5 0.64 90 57/249 ca d Best4 (cid:3)177.8 54.8 0.65 87 55/249 e m Best5 (cid:3)177.5 54.5 0.65 87 59/249 ic Bayes1 (cid:3)228.9 105.9 0.60 99 75/249 .o u Bayes2 (cid:3)233.2 110.2 0.61 97 77/249 p.c Bayes3 (cid:3)229.5 106.5 0.61 96 76/249 o m Bayes4 (cid:3)236.3 113.3 0.59 101 86/249 /m Bayes5 (cid:3)227.1 104.1 0.61 96 76/249 b e AncSR2LBD /a ML (cid:3)23.0 – 0.93 17 0/249 rtic AltALL (cid:3)41.7 18.7 0.90 25 26/249 le-a b s NOTE.—Foreachancestralproteinsequence,columnsshowthenaturallogarithmoftheposteriorprobability(LnPP),thedifferenceinLnPPcomparedtothemaximum tra likelihood(ML)reconstruction,themeanPPoversites,thenumberofexpectederrorsinthesequencegivenitsPP,thenumberofsitesthataredifferentfromtheML c reconstruction,andthetotalnumberofresiduesintheprotein. t/3 4 /2 /2 4 7 (supplementary table S2, Supplementary Material online). ancestor, with total posterior probability lower than that of /2 Somebutnotallofthestrategiesweassessforcharacterizing theMLancestorbyfactorsrangingfrom10(cid:3)2to10(cid:3)13(table 44 9 9 robustness have been applied previously to these protein 1).AllthreeAltAllsequencescontainedmoreexpectederrors 6 5 families (Eick et al. 2012; McKeown et al. 2014; Anderson thantheircorrespondingMLancestors. b y et al. 2016); here we use both new and already published WeexperimentallycharacterizedtheAltAllancestorsand g u e datatosystematicallyevaluatetheseapproaches. found that all three had functions qualitatively similar to s t o those of the ML ancestors, indicating general robustness to n ML-APRFunctionalInferencesareQualitatively uncertainty. For Anc-gk , both the AltAll and ML recon- 05 dup A RobusttoSequenceUncertainty structionswereactiveguanylatekinaseswithkineticparam- p WefirstevaluatedtheAltAllstrategybyintroducingallplau- eters comparable to those of extant gk enyzmes, with no ril 2 0 siblealternateaminoacidsintotheMLancestor.TheAltAll detectable Pins binding activity (fig. 1A and supplementary 19 sequencecontainstheMLstateatallunambiguouslyrecon- fig.S1,SupplementaryMaterialonline;Andersonetal.,2016). structedsitesandtheplausiblealternatestate(thestatewith IntheAncSR1DBD,theAltAllandMLreconstructionsboth second-highestposteriorprobability,if>0.2)atallambiguous activatedluciferasereportertranscriptionrobustlyfromERE sites.WeexpressedeachAltAllprotein,characterizeditsfunc- but not SRE (fig. 1B; McKeown et al., 2014). And, in the SR tionsusingthesamemethodspreviouslyusedtocharacterize ligand binding domain, both versions of AncSR1LBD were theMLancestralprotein,andcomparedtheresults. highly sensitive to estradiol and insensitive to a non- For all three protein domains, the AltAll reconstructed aromatizedsteroid,whereas both versions of AncSR2 LBD sequences were very different from their corresponding ML wereverysensitivetonon-aromatizedsteroidsandunre- ancestors,withdifferencesat12,20,and65sitesintheDBD, sponsivetoestrogens(fig.1C;supplementarytableS1and GK,andLBDancestors,respectively.Inthemostextremecase fig. S2, Supplementary Material online). These experi- (AncSR1LBD), the AltAll protein was different from the ML mentsindicatethatinallcasesstudied,thecentralinfer- ancestor at almost 30% of residues (table 1). All AltAll se- ences about ancestral proteins’ functions are robust to quenceswerefarlesslikelytobepreciselycorrectthantheML statistical uncertainty about their precisesequence. 250 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018 MBE RobustnessofAncestralFunctionstoStatisticalUncertainty . doi:10.1093/molbev/msw223 A me)25 mP) Initial Velocity (nmol GDP/sec/nmol enzy11205050 Fluorescence anisotropy (124680000000 AncA-gnkcd-GupK1PID EGnKzPyImDess E n PzIyDmses 0 50 100 150 0 100 200 300 Enzymes [GMP], M [GK], M gkenz ATP + GMP ADP + GDP Anc-gk ML dup Anc-gkdup AltALL Anc-GK1PID ML D B 20 ow 18 ERE nlo SRE a 16 d e ctivation111024 AncSRA1nDcBSDR2DBD SRs d from h Fold a 68 ERs SRE ttps://a 4 ca d 2 e m 0AncSR1DBD AncSR2DBD AncSR1DBD AncSR2DBD ERs ERE ic.o C ML ML AltALL AltALL up .c o d m Sensitivity to aromatized steroi (estradiol, log M)----5876 ALBncDSsR2 ALBncDSsR1 AncSRA1nLcBSDR2LBD ESERRRsss NA NoAnr-oamroamtiazetizded /mbe/article-abstract/34/2/24 7 -12 -11 -10 -9 -8 -7 -6 -5 No activation /2 4 4 Sensitivity to non-aromatized steroid (11-DOC, log M) 9 9 6 5 FIG.1.FunctionalinferencesbasedonML-APRarerobusttoincorporationofuncertainty.Foreachproteindomainfamilytested,thefunctional b y propertiesoftheMLandAltAllancestralreconstructionsareshown(leftpanels),withareducedphylogeny(center)andcartoonrepresentationof g u thefunctionsbeingassayed.(A)MLandAltAllversionsoftheancestralguanylatekinase(Anc-gk )bothfunctionasenzymes(orange),in e dup s contrasttotheancestorofGKproteininteractiondomains(AncGK1PID,black).Theleftandrightgraphsshowgunaylatekinaseactivityand t on peptidebindingaffinity,respectively.ErrorbarsindicateSEMofthreeindependentexperiments.(B)BoththeMLandAltAllAncSR1DBDsactivate 0 5 preferentiallyontheestrogenresponseelement(ERE,green)ratherthanthesteroidresponseelement(SRE,blue)inaluciferasereporterassay.ML A p athnrdeeArlteAplllicvaetressioenascohf.(ACn)cBSoRt2hpMreLfearnedntAialtllAylalcAtnivcaStRe1frLoBmDsSR(pEus.rCploelusqmunarheeaignhdtsatnadr,ererrsoprebctairvseilnyd)iacraetaecmtiveaatneadnbdySeEsMtrafdroioml(tahnreaeroemxpaetriizmedensttesrwoiitdh) ril 20 1 butnot11-deoxycorticosterone(anon-aromatizedsteroid)inaluciferasereporterassay.Incontrast,MLandAltAllAncSR2LBDs(bluesquareand 9 star,respectively)arepreferentiallyactivatedbythenon-aromatizedsteroid.Single-substitutionneighborsofMLAncSR2LBDcontainingasingle plausiblealternativestateareindicatedbyopengraycircles.Eachpointshowstheconcentrationofhormonesatwhichhalf-maximalactivationis achieved(EC50);errorbarsshow95%confidenceintervals.SomeresultsinthisfigurewerepreviouslyreportedinEicketal.(2012),McKeownetal. (2014),andAndersonetal.(2016). For AncSR2LBD, we also experimentally characterized al- AltAllancestralformswerebothcontainedwithinthiscloud ternativereconstructions,eachcontainingjustoneofthe26 (fig.1C).Thesedataindicatedirectlythatinthisprotein,there plausible alternate states (Eick et al. 2012). These data pro- is little epistasis among plausible alternative states with re- videdusefulcontextforinterpretingtheobservedfunctions specttotheformofligandspecificitystudied. oftheMLandAlt-Allreconstructions.Wefoundthatthese Although the qualitative phenotypes were all consistent single-variant plausible ancestors occupied a relatively tight betweenAltAllandMLreconstructions,therewerenontrivial cloudofphenotypesdefinedbytheirsensitivitytonon-aro- quantitativedifferencesinthemeasuredfunctionalparame- matized steroids and insensitivity to estrogens; the ML and ters. Anc-gk AltAll displayed a turnover rate three times dup 251 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018 MBE Eicketal. . doi:10.1093/molbev/msw223 fasterthanthat of the MLreconstruction (fig.1A), but both Finally, we studied the effect on the ancestral SR ligand- arewithinthenormalrangeforextantmembersofthefamily bindingdomainofthederivedaminoacidsGln41andMet75, (supplementaryfig.S1,SupplementaryMaterialonline).Both whichwerepreviouslyshowntodramaticallydecreasesensi- the ML and AltAll versions of AncSR1DBD activated tran- tivitytoestrogensandincreasesensitivitytonon-aromatized scriptionrobustlyfromEREandnotatallfromSRE,butthe steroids when introduced into the ML reconstruction of magnitude of activation on ERE ranged from 9-fold above AncSR1LBD. We found that introducing them into the control for ML to 16-fold for AltAll (fig. 1B). Similarly, the AltAllversionhadasimilareffect(fig.2C).Conversely,revert- EC50s for activation of AncSR1LBD and AncSR2LBD varied ingtheseresiduestotheancestralstates(glu41andleu75)in byafactorofthreebetweenMLandAltAllreconstructions,a both the ML and the AltAll reconstructions of AncSR2LBD distinct but relatively small difference compared with the restores the ancestral ligand preference, switching the pro- orders-of-magnitude preference each protein displays be- tein’s preferred ligands from non-aromatized steroids to es- tween classes of hormones (fig. 1C; supplementary table S1, trogens(fig.2D). SupplementaryMaterialonline). We again observed some quantitative differences—typi- D o WeconcludethatthefunctionalinferencesbasedonML callybylessthanafactorofthree—betweentheeffectsofthe w n reconstructionsfortheseproteinsarerobusttoincorporation mutationsintheMLandAltAllbackgrounds,butthequal- lo a oflargeamountsofuncertainty.Qualitativemeasuresoffunc- itativeeffectswereunchanged(fig.2).Takentogether,these de d tion—suchasthepresence/absenceofabiochemicalactivity resultsindicatethattheeffectsofhistoricalsubstitutionsare fro ortherelativepreferencebetweendistinctclassesofbinding robusttosimultaneousincorporationofallplausiblealternate m h partners—wereidenticalbetweenMLandAltAllversionsof states,evenwhenlargedegreesofuncertaintyarepresentin ttp the protein, even when these sequences differed at a very theMLreconstruction. s://a largenumberofsites.Precisequantitativeestimatesoffunc- c a tional parameters, however, were less robust, with the true BayesianSamplingfromthePosteriorProbability de m valuepresumablylyingsomewhereinorneararangedefined Distribution ic .o bytheseobservations. The AltAll strategy incorporates all non-ML states that are u p plausible, defined as having a posterior probability at some .c o m defined cutoff, and it excludes any that do not meet this /m FunctionalEffectsofHistoricalSubstitutionsare criterion.ABayesiansamplingapproach,incontrast,willtyp- b e RobusttoIncorporatingUncertainty icallyproduceancestralsequencesthatexcludemostplausi- /a Wenextevaluatedtheextenttowhichinferencesaboutthe blealternatestates(becausetheirposteriorprobabilitiesare rticle functional effects of historical substitutions are robust to by definition<0.50) in favor of the ML state, but it will in- -a b s statistical uncertainty about the ancestral sequences into cludesomeimplausiblestateswithverylowprobabilitiesand tra c which they are introduced. If epistasis exists between the exclude some ML states that have very high posterior t/3 ambiguously reconstructed sites and the sites that puta- probabilities. 4/2 tivelychangedtheprotein’sfunction,theninferencesabout We characterized the utility of the Bayesian sampling /2 4 7 the effects of substitution in the latter class of sites could method and the robustness of inferred ancestral functions /2 4 depend strongly on the ancestral sequence chosen as the touncertaintyincorporated using thisapproach, byapply- 4 9 9 genetic background for the experiments. To address this ingittothethreeproteindomainfamilies.Foreachances- 6 5 issue,weassayedthefunctionaleffectsofkeyhistoricalsub- tral protein, we computationally generated at least one b y stitutionsontheAltAllancestralreconstructionsandcom- million Bayesian reconstructions by sampling an amino g u e paredthemtotheireffectsontheMLreconstructions. acid from the posterior probability distribution at each s t o We first examined the historical ser36Pro substitution in site in the protein. The variation in the amount of uncer- n 0 Anc-gk . When introduced into the ML protein, this mu- tainty among the three proteins led to ensembles of 5 dup A tationwaspreviouslyfoundtoabolishtheprotein’senzyme Bayesian sequences with different characteristics (fig. 3). p activity and confer the acquisition of binding to the Pins FortheAnc-gk andAncSR1DBDs,mostofthegenerated ril 2 dup 0 protein(Andersonetal.2016).Wefoundthatthemutation sequencesdifferedfromtheMLsequenceatfewerthan20 19 has the same qualitative effect when introduced into the sitesandwereupto40log-unitslesslikely.Inthefarmore AltAll version of the ancestral protein, abolishing catalytic uncertainAncSR1LBD,theBayesiansequencesdifferedfrom activity andconferring Pins binding with similar affinity (fig. the ML sequence by 50–100 residues and 60–170 ln- 2A). likelihoodunits.TheBayesiansequencestypicallyhadlower Wenextexaminedthesetof14large-effecthistoricalmu- posterior probabilities than the AltAll reconstructions but tations in AncSR1DBD. Introducing this group of substitu- differedfromtheMLsequencesatasimilarnumberofsites. tions into the ML reconstruction was previously shown to Forexperimentalanalysisofeachdomainfamily,weran- recapitulatethehistorical changeinDNArecognition, abol- domlychosefivesequencesfromthegeneratedensembleof ishingluciferaseactivationfromEREandestablishingactiva- Bayesian sequences (fig. 3 and supplementary table S2, tiononSRE(McKeownetal.2014).Weintroducedtheminto Supplementary Material online). In all cases, the sampled theAltAllversionofAncSR1DBDandfoundthatthey con- Bayesianancestorsweremanylogunitslesslikelytobecor- ferredaqualitativelyandquantitativelysimilarshiftinDNA rectandcontainedmoreexpectederrorsthantheMLrecon- specificity(fig.2B). structions.ComparedwiththeAltAllsequences,allbutone 252 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018 MBE RobustnessofAncestralFunctionstoStatisticalUncertainty . doi:10.1093/molbev/msw223 A Enzyme activity B Estrogen response element (ERE) 1.2 Protein binding activity 10 Steroid response element (SRE) 1.0 20 P vityM *s) 0.8 1 K rotein ation1168 zyme actiμ/K (1/atm 00..46 0.1 a, mM-1 binding a Fold activ1118024 n c c Ek 0.2 tiv 6 ity 4 0 0.01 2 none 0 Anc-gkdup Anc-gkdup Anc-gkdup Anc-gkdup AncSR1DBD AncSR1DBD AncSR1DBD AncSR1DBD ML ML+s36P AltALL AltALL+s36P ML ML+F AltALL AltALL+F D o C OH D CH3 O w n lo a Aromatized steroid Aromatized steroid d OH OH OH CH3 O ed fro m Non-aromatized steroid Non-aromatized steroid OH 11 OH http s 8 10 ://ac a d e SensitivityEC, -log M)507 Sensitivity, -log M)EC50 89 mic.oup.com (6 ( /m b 7 e /a rtic 5 AncSR1LBD AncSR1LBD AncSR1LBD AncSR1LBD 6 AncSR2LBD AncSR2 LBD AncSR2LBD AncSR2LBD le-ab ML ML e41Q l75M AltALL AltALL e41Q l75M ML ML Q41e M75l AltALL AltALLQ41e M75l stra c t/3 FIG.2.Functionaleffectsofhistoricalsubstitutionsarerobusttoincorporationofuncertainty.(A)Thes36Psubstitution(uppercase,derived; 4/2 lowercase,ancestral)changesthefunctionofAnc-gk fromanucleotidekinase(orangesquares,plottedusingthey-axisatleft)toaprotein /2 dup 4 bindingdomain(greydiamonds,plottedusingtherighty-axis),inboththeMLandAltAllbackgrounds.ErrorbarsindicateSEMfromthree 7 /2 independentexperiments.Coloredlinesshowtheeffectofthemutationoneachfunction.(B)Functionswitchingmutations(þF)switchthe 44 9 responseelementspecificityofAncSR1DBDfromERE(greensquares)toSRE(purplecircles)inboththeMLorAltAllancestralbackgrounds.(C) 9 6 Introducingtwomajor-effecthistoricalsubstitutionsintoeitherAncSR1LBDMLorAltAllreconstructionsdecreasespreferenceforanaromatized 5 b steroid(magentawithLewisstructureshown),andincreasessensitivitytoanon-aromatizedsteroid(blue).Pointsanderrorbarsshownmeanand y g SEMofEC50sfromthreeindependentdose-responseexperiments.(D)Reversalofthetwokeyhistoricalsubstitutionstotheirancestralstates(upper u e s case,derived;lowercase,ancestral)switchesthepreferenceoftheMLandAltAllAncSR2LBDsfromnon-aromatized(blue)toaromatized(pink) t o steroids.SomedatainthisfigurewerepreviouslyreportedinAndersonetal.2016,McKeownetal.2014,Eicketal.2012,andHarmsetal.2013. n 0 5 A of the 15 sampled Bayesian ancestors had lower posterior For the AncSR1LBD, however, the five Bayesian ances- p probability (by factors ranging from 10(cid:3)4 to 10(cid:3)49), and tors were apparently nonfunctional, failing to activate tran- ril 2 0 mostcontainedmoreexpectederrors(table1).Thevarious scription in the presence of any ligand tested in either 1 9 reconstructionscoveredwideregionsofsequencespace,dif- major class; ligand-independent constitutive activity was fering from each other by up to 104 residues in the case of also lacking (fig. 4C and supplementary fig. S2, AncSR1LBD(table2andfig.3). Supplementary Material online). It is very unlikely that Wesynthesizedandexperimentallycharacterizedthesam- the true ancestral protein lacked transcriptional activity, pled sequences and found that functional inferences were because descendant proteins from a wide variety of species generally robust to incorporating uncertainty using the have been tested, and virtually all of them function as Bayesianapproach.ForAnc-gk allfiveBayesiansequences transcriptional activators; for the Bayesian sequences to dup, were,liketheMLandAltAllproteins,effectiveguanylateki- represent the true ancestral function, a very large number nases with similar catalytic performance (fig. 4A). For of independent gains of transcriptional activity in various AncSR1DBD, all five Bayesian reconstructions of SR and outgroup lineages would be required, an unlikely AncSR1DBD activated from ERE but not from SRE, just as and non-parsimonious scenario. We therefore conclude both the ML and AltAll sequences did, and the degree of that the Bayesian versions of AncSR1LBD are likely to be activationtheyelicitedwassimilarinallcases(fig.4B). artifactually non-functional, presumably because of the 253 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018 MBE Eicketal. . doi:10.1093/molbev/msw223 greater uncertainty of this ancestral protein and the result- A 0 ing incorporation of more errors into Bayesian sequences. The AncSR1LBD Bayesian samples are (cid:4)100 ln-units less likely than the ML ancestor, differ from it by 75–86 amino −50 acids, and contain 20–25 more expected errors—far more ativestor uncertain by every measure than is the case for the other PP) relL ance−100 protein domain families studied (fig. 3A and table 1). Ln(o M CausesofErrorinBayesian-SampledAncestors Δ t−150 In contrast to the Bayesian reconstructions of AncSR1LBD, theAltAllversionwastranscriptionallyactiveandhadspecif- icitysimilartothatoftheMLancestor,despitedifferingfrom −200 it at 65 sites—almost as many differences as the Bayesian D o 0 20 40 60 80 100 versions had. We hypothesized that the non-functionality w n # amino acid differences from ML ancestor of the Bayesian reconstructions of AncSR1LBD might be lo a B duetosamplingofsomeverylow-probabilitystates—which de d 0 Anc-gkdup have a high probability of being incorrect—and the conse- fro (cid:1) AltALL quent failureto include states reconstructed with very high m Ln(PP) relativeo ML ancestor−−5205 ML (cid:1) (cid:1)(cid:1)(cid:1) pceavrloalTylbuoasabttgreiaoldiitnnyg,fluywardhctdohicinethirsoeainrnrvaeseligdahlmatsemioqnsouttonecgntechereitxssatiapnnolgytsespcniboreoirrlritateetycie,ntdwsa.enfdsrpoaemrceifitcytaphlliey- https://academ Δ t AncSR1LBDposteriorprobabilitydistributionthatcontained ic.o −75 fewer low-probability states. To accomplish this, we chose up fromtheensemblethefivesequenceswiththehighesttotal .c o 0 20 40 m # amino acid differences from ML ancestor posterior probability (Best1-5; fig. 3D; reported in Eick et al. /m 2012).TheseBestsampledsequencescoveredawiderangeof b C e space,differingsubstantiallyfromtheMLancestor(by55–59 /a AncSR1DBD residues), from the AltAll version (74–79 differences), and rticle PP) relativeL ancestor −205 ML (cid:1)(cid:1) (cid:1)(cid:1)(cid:1) AltALL ftA(rtaolaitnmbAeleldlesaab1ncehdtawonsltedihgehen2rt)8l.(y76Hf3aeo–nww8dee2rv9dte0hirfe,faextnrhpeteenhccBoteeessdes)t.oeTsrfarhtomehrespB,leBaesabdtyoseauesnqtiacutnehesneatoncsrecassemsccteooonnarss-- -abstract/34/2 ΔLn(to M −50 t(PaiPn<ed0o.1n)lythaabnouthtehBalafyaessiamnasnaymvpelerys,laonwd-pAroltbAalblicliotyntsatianteeds /247/2 4 zero.TheBayesiansamplesalsocontainedseventimesmore 4 9 0 20 40 sitesatwhichveryhigh-probabilitystates(PP(cid:5)0.9)wereex- 96 5 # amino acid differences from ML ancestor cluded than did the Best sampled ancestors, and the AltAll b y D 0 andMLsequencesbydefinitioncontainedzero(fig.5). gu AncSR1LBD e Whenexperimentallycharacterized(Eicketal.2012),allfive s t o Bestsampledancestorswereestrogen-specifictranscriptional n ML 0 −50 AltALL activators,activatinginresponsetoestradiolatconcentrations 5 A Best rangingfrom10to1000nM(fig.4D;seeFig.S5BinEicketal. p PP) relativeL ancestor−100 (cid:1)(cid:1)(cid:1)(cid:1) (cid:1) 2mse0eo1n2F)eigs.b(Sus5utBpdpinilsepEmilcaekyninetatgraynl.,foi2g0.rS1e22s,p)S.ouTnphspuelse,tmowehnneotraenra-ysarMaollmatthaeteriizaBeladoynehlsiionaren-; ril 2019 Ln(o M reconstructions were nonfunctional, all seven non-Bayesian Δ t−150 Fig.3Continued differenceinposteriorprobabilityrelativetotheMLreconstruction. −200 (A)EnsembleofBayesiansampledreconstructionsofAnc-gk (yel- dup low),AncSR1DBD(green),andAncSR1LBD(pink).(B)Distributionof 0 20 40 60 80 100 1 million Anc-gk sequences. (C) Distribution of 1 million # amino acid differences from ML ancestor dup AncSR1DBD sequences. (D) Distribution of 10 million AncSR1LBD FIG. 3. Probability distributions of ancestral sequences generated by sequences. ML (open square), AltAll (star), five Bayesian sampled Bayesiansamplingfromtheposteriorprobabilitydistributionofancestral sequenceschosenatrandomforexperimentalcharacterization(light amino acid states at each site. Each point represents one protein coloredcircles),andfivesampledsequenceswiththehighestlnPP sequence, plotted by the number of amino acid differences and the (Best,opencircles)areindicated. 254 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018 MBE RobustnessofAncestralFunctionstoStatisticalUncertainty . doi:10.1093/molbev/msw223 Table 2. Number of Amino Acid Differences among Alternative erroneous amino acids that disrupt protein structure and AncestralReconstructionsforThreeProteinDomains. function. Bayes Anc-gk ML AltALL 1 2 3 4 5 Discussion dup ML * RobustnessofFunctionalInferences AltALL 20 * Ourfindingshavebothpracticalandconceptualimplications. Bayes1 7 21 * Practically, they suggest that qualitative functional infer- Bayes2 5 23 11 * Bayes3 11 21 14 13 * ences— the presence/absence of a molecular function or Bayes4 13 14 18 17 20 * relative preference/specificity among ligands—about recon- Bayes5 10 19 17 15 19 17 * structedancestralproteinsareoftenrobusttostochasticun- Bayes certaintyaboutthepreciseancestralsequences,evenwhena AncSR1DBD ML AltALL 1 2 3 4 5 verylargeamountofuncertaintyispresent.Wefoundthatall Do the functions we examined are qualitatively unchanged w ML * acrossplausiblesequencereconstructionsinallthreeprotein nlo AltALL 12 * domainfamilies,evenwhenverylargenumbersofalternate ade Bayes1 7 11 * d Bayes2 12 15 12 * plausible amino acids are incorporated into the protein. fro Bayes3 9 11 11 14 * However, precise quantitative estimates of function—such m h Bayes4 10 14 12 10 10 * as EC50, dissociation constant, or enzyme kinetic parame- ttp Bayes5 12 15 14 20 15 16 * ters—didvaryamongalternativereconstructions.Thisvaria- s://a Bayes Best tion was relatively small—a factor of two or three, in most ca d cases—andwasfarlessthanthedifferencesbetweenparalogs e AncSR1LBD ML AltALL 1 2 3 4 5 1 2 3 4 5 m orancestralproteinswithbiologicallydistinctfunctions.Thus, ic ML * .o inferencesabouttheprecisequantitativeparametersofmo- u AltALL 65 * lecular function should be made with caution, and experi- p.c Bayes1 75 89 * o m Bayes2 77 94 101 * mental characterization of the range of plausible values for /m Bayes3 76 89 93 101 * such parameters is particularly important. Our results are b e Bayes4 86 99 104 100 97 * broadlyconsistentwithavarietyofcase-studiesofotherpro- /a BBaeystes15 7567 8734 9986 9843 18061 9879 8*9 * tfueinncst,iwonhsicohfhanavceesftorualnpdrothteaitnqsuaarleitgaetinveerianllfyerreonbcuesstatboouuntctehre- rticle-a Best2 55 79 96 90 90 85 84 79 * b s Best3 57 74 94 79 87 100 86 77 82 * tainty about the ancestral sequence (Thomson et al. 2005; tra c Best4 55 78 91 91 89 89 87 71 73 76 * Ortlundetal.2007;Finniganetal.2012;Boucheretal.2014; t/3 Best5 59 75 89 88 86 93 78 63 79 76 79 * Howardetal.2014).Similarly,arecentstudygeneratedasmall 4/2 library of variant proteins containing shuffled combinations /2 NOTE.—Anc-gkdup,AncSR1DBD,andAncSR1LBDdomainsare187,82,and249 47 aminoacidslong,respectively. of possible ancestral states (not weighted by their posterior /2 4 probabilities) and found a general pattern of qualitative ro- 4 9 9 bustnessamongthevariants,althoughdifferenceswereagain 6 5 observedinquantitativemeasuresoffunction(Bar-Rogovsky b y reconstructions, spanning a very large region of sequence etal.2015). g u e space, have similar functional phenotypes, indicating that Theimpactsoflarge-effecthistoricalmutationsonquali- s t o the inference of ancestral function is highly robust to incor- tativemeasuresoffunctionalsoappeartobegenerallyrobust n 0 porationoflargeamountsofstatisticaluncertaintybutnot,in touncertaintyabouttheancestralsequences.Thisresultsug- 5 A this case, when the Bayesian sampling method is used. gests that the genetic causes for the evolution of ancient p Notably, however, the Bayesian samples did not suggest a protein functions can be identified with some confidence ril 2 0 differentancestralfunctionbutinsteadfailedtofunctionatall. using APR. The consistency of these substitutions’ effects 19 The biochemical properties of the low-probability states across reconstructions indicates that epistasis is limited be- incorporated into the Bayesian ancestors reinforce the view tweenlarge-effecthistoricalmutationsandplausiblealterna- thattheyarealikelycauseofthenon-functionalityofthese tivesequencestatesinthereconstruction. sequences. Compared with the Best sampled ancestors, the Ourstudywasnotcomprehensive.Wespecificallyexam- Bayesian sequences incorporated (cid:4)3 times more low- ined therobustnessofinferences aboutproteinfunction to probability states at buried sites in the core of the protein, sequence uncertainty—particularly functions that changed whicharegenerallysubjecttogreaterconstraintsforproper amongparalogsoverevolutionarytime.Therecouldbeother folding than surface sites (fig. 5B). Further, the Bayesian se- biochemicalproperties,suchasstabilityorproteindynamics, quencecontainedmorenon-conservativeaminoaciddiffer- thatare lessrobust touncertainty. Incases inwhichuncer- ences from the ML ancestor than did the Best Sampled or taintyhasbeenassessed,however,estimatesofancestralpro- AltAll sequences (fig. 5D and supplementary table S3, tein stability have generally proven to be fairly robust to Supplementary Material online). Thus, the Bayesian recon- uncertainty about the ancestral sequence (see refs. structions of AncSR1LBD were more likely to incorporate Akanumaetal.2013,2015;Wheeleretal.2016). 255 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018 MBE Eicketal. . doi:10.1093/molbev/msw223 A B Anc-gk 40 AncSR1DBD dup ERE 600,000 n35 SRE o ati30 M*s) 400,000 activ25 (1/M Fold 20 K 15 k/cat200,000 10 5 0 0 ML 1 2 3 4 5 ML 1 2 3 4 5 Bayes Bayes D o C AncSR1LBD D AncSR1LBD wn n n lo o100 ML o100 ML a ati Bayes 1-5 ati Best 1-5 de Activ 75 Activ75 d fro d 50 d 50 m ol ol h F 25 F 25 ttp s 0 0 ://a c -11 -10 -9 -8 -7 -6 -5 -11 -10 -9 -8 -7 -6 -5 ad Estradiol [logM] Estradiol [logM] em ic FIG.4. FunctionsofancestralsequencesgeneratedbyBayesiansampling.(A)FiverandomlysampledBayesianversionsofAnc-gkduphadsimilar .ou p catalyticperformance(kcat/KM)astheMLancestors.ColumnheightanderrorbarsindicatemeanandSEMofthreeindependentexperiments.(B)All .c o fiverandomBayesAncSR1DBDancestorsactivatedfromERE(green)butnotSRE(blue),liketheMLancestor.Columnheightanderrorbarsindicate m meanandSEMfromthreeexperimentswiththreereplicateseach.(C)TherandomlysampledAncSR1LBDBayesancestorswerenotactivatedby /m b physiologicalandgreaterconcentrations(>1lM)ofestradiol,unliketheMLancestor.ErrorbarsindicateSEMofthreetechnicalreplicates.(D)The e /a five“Best”AncSR1LBDBayesancestorswereallactivatedbyestradiol(previouslyreportedinEicketal.2012).ErrorbarsindicateSEMofthree rtic technicalreplicates.SomedatainthisfigurewerepreviouslypublishedinAndersonetal.2016,McKeownetal.2014,andEicketal.2012. le -a b s Weexaminedonlyonesourceofuncertaintyinancestral the high degree of statistical uncertainty and the fact that tra proteins: statistical ambiguity about ancestral sequences someofthealternatesequencesdifferedfromeachotherand ct/3 given the data, phylogeny, and evolutionary model. fromtheMLancestorsatscoresofaminoacidsites—uptoa 4/2 Previous theoretical and simulation studies showed that in- thirdoftheentirelengthoftheprotein.Mostrandomamino /2 4 7 corporatingstochasticuncertaintyabouttheunderlyingphy- acid replacements in proteins are deleterious (Guo et al. /2 4 logeny does not strongly affect ancestral sequence 2004),sohowcanancestralreconstructionsbesorobustto 4 9 9 reconstruction (Hanson-Smith et al. 2010). Consistent with sequencechange? 6 5 this finding, several empirical case studies have found that Plausible alternate states are not random substitutions; b y assumingdifferentplausiblephylogenieshasonlyweakeffects rather, they are drawn from the much smaller set of states g u e on inferences of ancestral protein functions (Gaucher et al. thatarefoundinpresent-daymembersofthefamily,partic- s t o 2003;Akanumaetal.2015;CliftonandJackson2016;Steindel ularlythosedescendingonbranchesneartheancestralnode n 0 et al.2016)Systematic error inthetree,however, couldstill of interest. Unambiguous reconstructions occur when the 5 A affectancestralsequencesandtheirfunctions(Groussinetal. ancestral state is conserved in large numbers of extant se- p 2015).Further,comprehensiveworkhasnotbeenconducted quences descending from the node of interest and its close ril 2 0 ontheeffectsoftheassumedmodelonancestralreconstruc- outgroups, reflecting strong functional constraints on that 19 tions;however,researchtodatesuggeststhatfunctionalin- site. In contrast, ambiguous reconstructions occur when in- ferences are generally robust to sequence uncertainty formationabouttheancestralstateislostinmostdescendant associated with using different models and methods taxa, due to multiple exchanges between amino acid states (Ugalde et al. 2004; Thomson et al. 2005; Chang et al. 2007; on branches near the ancestral node of interest, a situation Devamani et al. 2016; Steindel et al. 2016). Further research thatleadstorelativelysmalldifferencesinthelikelihoodsof willberequiredtothoroughlyassesstheeffectofthesesour- thesestates at the node. This kind ofevolutionary dynamic ces of uncertainty and potential error on functional infer- occurs when functional constraints discriminate weakly or encesaboutancestralproteins. not at all between these states, which occurs only when they are all compatible with the protein’s functions. Thus, Mutation,Epistasis,andtheRobustnessofAncestral uncertainty about the ancestral state tends to occur when Functions multiple states are compatible with functional constraints, Theextraordinaryfunctionalrobustnessofthereconstructed andtheprotein’sfunctioninturnisrobusttowhichofthese ancestral sequences we studied may seem surprising, given states is incorporated. The distribution of uncertainty and 256 Downloaded from https://academic.oup.com/mbe/article-abstract/34/2/247/2449965 by guest on 23 July 2018
Description: