Ancient Origins of Vertebrate-Specific Innate Antiviral Immunity Krishanu Mukherjee,1 Bryan Korithoski,1 and Bryan Kolaczkowski*,1 1DepartmentofMicrobiologyandCellScience,UniversityofFlorida *Correspondingauthor:E-mail:bryank@ufl.edu. Associateeditor:NaokoTakezaki Abstract Animalsdeployvariousmolecularsensorstodetectpathogeninfections.RIG-likereceptor(RLR)proteinsidentifyviral RNAs and initiate innate immune responses. The three human RLRs recognize different types of RNA molecules and protectagainstdifferentviralpathogens.TheRLRproteinfamilyiswidelythoughttohaveoriginatedshortlybeforethe emergence of vertebrates and rapidly diversified through a complex process of domain grafting. Contrary to these D o w findings, here we show that full-length RLRs and their downstream signaling molecules were present in the earliest n lo animals,suggestingthattheRLR-basedimmunesystemarosewiththeemergenceofmulticellularity.Functionaldiffer- a d entiation of RLRs occurred early in animal evolution via simple gene duplication followed by modifications of the ed RNA-binding pocket, many of which may have been adaptively driven. Functional analysis of human and ancestral fro m RLRsrevealedthattheancestralRLRdisplayedRIG-1-likeRNA-binding.MDA5-likebindingarosethroughchangesinthe h RNA-bindingpocketfollowingtheduplicationoftheancestralRLR,whichmayhaveoccurredeitherearlyinBilateriaor ttp s later, after deuterostomes split from protostomes. The sensitivity and specificity with which RLRs bind different RNA ://a c structureshasrepeatedlyadaptedthroughoutmammalianevolution,suggestingalong-termevolutionaryarmsracewith a d viralRNAorothermolecules. em ic Keywords:RIG-likereceptor,antiviralimmunity,RNA-bindingprotein,proteinfamilyevolution. .o u p .c o m Introduction et al. 2009; Li, Ranjith-Kumar, et al. 2009; Pippig et al. 2009; /m b DetectingandrespondingtoforeignRNAinthecytoplasmis Shigemotoetal.2009;Luetal.2010;Wangetal.2010;Jiang e/a a major component of innate antiviral immunity. RIG-like etal.2011),fewstudieshavedirectlycomparedRNAbinding rtic receptors (RLRs) are related proteins that identify viral by different RLRs under the same experimental conditions, le-a RNAs and initiate downstream immune responses through makingitdifficulttosynthesizeexistingresultstoproducea bs directinteractionswiththemitochondrialantiviralsignaling clearpictureofthefunctionaldifferencesamongRLRs. tra c protein(MAVS)(fig.1A).Humansandothermammalshave The three RLRs present in humans and other mammals t/3 1 threeRLRs,whichcollectivelyrecognizeawidevarietyofviral do appear to differ in their RNA-binding sensitivity and /1 /1 A infections (Yoneyama et al. 2005; Kato et al. 2006). specificity. RIG-1 (DDx58) is a high-affinity receptor for 40 r Polymorphisms in RLRs have been linked to autoimmune short,blunt-endeddouble-strandedRNA(dsRNA)and50tri- /10 t 4 i diseases (Smyth et al. 2006; Colli et al. 2010; Aida et al. phosphate(50ppp)RNA(Hornungetal.2006).MDA5(IFIH1) 84 c 1 2011),andtherapeutic activation of RLRsisbeingevaluated appears to recognize long blunt-ended dsRNA, although its 7 le asapotentialantiviralandanticancerapproach(Poecketal. nativeligandremainscontroversial(Li,Lu,etal.2009).LGP2 by g 2008). Given the biological importance of RLRs and their (DHx58)seemstobindbothblunt-endedand50pppdsRNA ue s therapeuticpotential,RLRfunctionhasreceivedconsiderable withhighaffinity(Pippigetal.2009)andregulatesRIG-1and t o n recentattention. MDA5throughanunknownmechanism(Muralietal.2008; 0 4 RLRs recognize viral RNAs and activate downstream Satohetal.2010). A p immune responses using a modular domain architecture Structuralandmutationalstudiessuggestthatdifferences ril 2 (fig. 1B). A C-terminal RNA recognition domain (RD) binds in RNA binding among human RLRs primarily result from 0 1 viralRNA(Li,Ranjith-Kumar,etal.2009;Takahasietal.2009), differences in the overall shape and distribution of electro- 9 inducing a protein conformational shift, which allows twin staticforcesacrosstheRNA-bindingpocketintheC-terminal N-terminal caspase activation and recruitment domains RDaswellasdifferencesinspecificRNA-contactingresidues (CARDs) to interact with the signal-transducing protein liningthepocket.RIG-1andLGP2shareanopenRNA-bind- MAVS and ultimately activate cellular immune responses ing pocket with a large positively charged binding surface, (Kawaietal.2005; Meylan etal.2005;Sethetal.2005; Jiang although the shape of the pocket and the RNA-contacting etal.2011;Luoetal.2011). residuesdifferbetweenthetwomolecules(Luetal.2011).By AlthoughtheRNA-bindingpropertiesofhumanRLRshave contrast, the binding pocket of MDA5 appears to have a receivedconsiderable attention (Pichlmairetal.2006;Li,Lu, more compact RNA-binding surface (Li, Lu, et al. 2009). In (cid:2)TheAuthor2013.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution. ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense(http:// creativecommons.org/licenses/by-nc/3.0/),whichpermitsnon-commercialre-use,distribution,andreproductioninanymedium, Open Access providedtheoriginalworkisproperlycited.Forcommercialre-use,[email protected] 140 Mol.Biol.Evol.31(1):140–153 doi:10.1093/molbev/mst184 AdvanceAccesspublicationOctober8,2013 MBE Vertebrate-SpecificInnateAntiviralImmunity . doi:10.1093/molbev/mst184 A RLR Nucleus Mitochondrion viral MAVS Cytokines NFkB RNA TRAF3 RLR FADD Caspase 8/10 Type I IFNs IRFs IRF3 IKKi B CARD1 CARD2 DEAD Helicase RD RIG-1 MDA5 D o LGP2 w n C lo a d e d MAVS, CSAiSmPil9a ra tnod CASP2 DICERS DimEAilaDr /tHoelicase UnRiqLuRes to from CARD Domains h ttp eFvIGo.lu1t.ioRnLaRryssroecuorcgensi.z(eAv)irSaiml RpNlifiAedanddiagarcatmivaoteftdhoewmnostlerecaumlarsiingtnearlainctgiomnsoilencvuollevsedthinroRuLgRh-baamsedodimulamrudnoemsiaginnaalirncgh.iRteLcRtsurbeinodrivgiirnaaltRinNgAfsroamndminuteltriapclet s://a c withMAVSthroughCARDs(gray).MAVSformsacomplexwithTRAF3andFADD,whichinducesapoptoticcelldeaththroughCaspase8/10activation ad e aswellastype1IFNandproinflammatorycytokineproductionthroughactivationofIRF3/4andNFkB,respectively(seeLooandGale2011fora m review).(B)WeplotthecharacteristicdomainarchitectureofhumanRIG-1,MDA5,andLGP2.TheC-terminalRNARDbindsviralRNA(Li,Lu,etal. ic.o u 2009;Takahasietal.2009),whileN-terminalCARDsactivateMAVSsignaling,ultimatelygeneratingtheimmuneresponse(Kawaietal.2005;Meylan p etal.2005;Sethetal.2005).LGP2lacksCARDsandhasbeenrecentlyshowntoregulatebothRIG-1andMDA5,althoughtheregulatorymechanismis .co m unclear(Satohetal.2010).(C)Weidentifiedtheclosestnon-RLRhomologsofeachfunctionaldomainviasequencesimilarity;RDhomologywasnot /m detectableoutsideouridentifiedRLRproteinfamily. b e /a rtic le allthreeproteins,bindingofdifferentRNAmoleculesisfacil- RLRs by altering key contact residues within the C-terminal -a b itated by key contact residues, many of which are located RD. Finally, we find strong evidence that RLRs have been stra withintheRNA-bindingloopbetweenb5andb6(Cuietal. involved in a long-term evolutionary arms race with viral c t/3 2008;Li,Ranjith-Kumar,etal.2009;Pippigetal.2009;Takahasi RNA molecules, suggesting that the structure of viral RNA 1 /1 et al. 2009). Differences in contact residues among RIG-1, mayhaveshapedtheevolutionofanimalinnateimmunity. /1 4 MDA5, and LGP2 are thought to confer differential RNA- 0 /1 bindingproperties(Li,Lu,etal.2009). Results and Discussion 04 8 Evolutionary studies have painted a complex picture of 4 1 Origins of RLR-Based Innate Immunity 7 how RLRs arose and functionally diversified. Recent studies b suggest that full-length RLRs are a vertebrate-specific evolu- The availability of complete genome sequences from many y g u tionary novelty, although the building blocks of RLRs may eukaryotes provides an opportunity tosystematically exam- es have been present in closely related prevertebrate animals inetheoriginsofproteinfamiliescurrentlythoughttohave t o n (Sarkar et al. 2008; Zou et al. 2009). These studies differ in limitedtaxonomicdistributions.Usinghomology-basedgene 04 the ordering of gene-duplication events giving rise to the predictionbasedonconfirmedRLRsfromhumans,weiden- Ap threeRLRspresentinmammals,butbothstudiesultimately tified full-length RLRs in early branching animal genomes, ril 2 concludethatRLRevolutionwasdrivenbyacomplexseriesof including Porifera and Cnidaria (supplementary fig. S1, 01 9 domain-grafting and gene-fusion events. However, these Supplementary Material online). RLRs are absent from studiesreliedonphylogeneticmethodsknowntobesensitive nonmetazoan eukaryotes, including fungi and choanoflagel- to long-branch attraction artifacts, potentially undermining lates, arguing that RLRs are an animal-specific evolutionary theirreliability(Swoffordetal.2001;Suskoetal.2004). noveltyarisingwiththeoriginofmulticellularity.Althoughwe Here,weexaminethefunctionalevolutionofRLRsusinga identifiedRLRsfrommultiplelophotrocozoansandecdysozo- combinationofgenemodeling,phylogenetics,structuralanal- ans, we were unable torecover any RLR genes from arthro- ysis,populationgenetics,andmolecular-functionalcharacter- pods,despitetheavailability of multiplegenomesequences, izationofancestralandextantRLRs.Incontrasttoprevious suggestingthatRLRswerelostearlyinarthropodevolution. studies, we find that the RLR-based immune system is not Multipleindependentlinesofevidencesuggestthatnewly vertebratespecificbutoriginatedintheearliestmulticellular identified RLRs are functional. First, gene models were pre- animals.RLRsfunctionallydiversifiedthroughaseriesofgene dicted with high confidence, arguing against gene modeling duplication events followed by protein-coding changes, errorasanexplanationforourresults(supplementarytable which modulated the RNA-binding properties of different S1, Supplementary Material online). Second, inferred RLR 141 MBE Mukherjeeetal. . doi:10.1093/molbev/mst184 proteinsequenceshadthesameglobaldomainarchitecture duplicated in Bilateria to give rise to RIG-1 and asknownRLRs,andmultiplesequencealignmentconfirmed MDA5/LGP2lineages,followedbyamorerecentduplication the presence of conserved functional motifs, arguing that oftheMDA5/LGP2ancestor,givingrisetoMDA5andLGP2 newlyidentifiedgenesareinfactRLRsandnotdistantlyre- lineages in jawed vertebrates after their split from jawless lated homologs (supplementary table S2 and alignments, vertebrates. This later MDA5/LGP2 duplication event could Supplementary Material online). Finally, we found many berelatedtotheproposedwhole-genomeduplicationsearly near-exact matches to new RLRs in expressed sequence tag inthevertebratelineage(DehalandBoore2005). (EST) databases, suggesting that full-length RLRs are ex- Accordingtotheconsensusphylogeny—inwhichthean- pressed in early animal lineages, including Porifera and cestral RLR duplicated early in Bilateria—both RIG-1 and Cnidaria (supplementary table S3, Supplementary Material MDA5/LGP2 were lost from arthropods, and the MDA5/ online).Together,theseresultsstronglysupporttheconclu- LGP2lineagewasalsolostfromotherprotostomes,including sion that functional RLRs are present throughout nonverte- nematodes.However,weremaincautiousinourinterpreta- brateanimals. tion of this finding, as long-branch attraction between fast- D Weusedsensitivesequence-similaritysearchestoidentify evolvingprotostomesequencesanddeuterostomeRIG-1isa ow n theclosestnon-RLRhomologsofeachRLRfunctionaldomain potentialexplanationforthisresult.Analternativeinterpre- lo a (Boratynetal.2012).WefoundthattheRLRDEAD/Helicase tation suggested by the taxonomic distribution of RLR d e d domainwascloselyrelatedtoDICER,whereastheRLRCARD sequencesplacesthefirstduplicationeventearlyinthedeu- fro domains were distantly related to CARDs from MAVS and terostomelineageafteritsdivergencefromprotostomes.This m caspases 9 and 2 (fig. 1C). The RLR RD did not exhibit interpretation is more parsimonious in terms of gene loss http sseuqgugeesntcinegsitmhailtariittyisteoitahneyr aprnootevienlsfuonucttsiiodneatlhdeoRmLaRinfaomrislyo, eavrtehnrtosp,oredqsutiorinegxpolaninlytahesionbgsleervloesdssoefqutheencaendceissttrriabluRtiLoRn.in s://ac a greatly diverged from its homologs as to be undetectable Weobservedvariouslineage-specificlossesofdifferentRLR de m by sequence similarity. These results suggest that the genes,suggestingthatRLRlossmaybecommon.Forexample, ic RLR domain architecture probably arose through a fusion although MDA5 and LGP2 homologs were found in many .ou p of a DICER-like DEAD/Helicase with CARD domains from a teleostfishes,clearRIG-1homologshaveonlybeenidentified .c o MAVS-orcaspase-likeprogenitor. from salmon and carp, suggesting that RIG-1 was lost from m /m RLRs do not function in isolation; they initiate cellular somefishspecies(Biacchesietal.2009).RIG-1isalsomissing b e immune responses through interactions with downstream from the chicken genome, although MDA5 and LGP2 are /a signaling molecules (Yoneyama et al. 2004). We found that both present. The sea squirt (Ciona intestinalis) exhibited rtic le manyoftheproteinsinvolvedinRLR-basedimmunity—such theoppositepattern,encodingtwoRIG-1homologsbutlack- -a b asthesignalingproteinsTRAF3andFADD—arealsopresent ingMDA5andLGP2,eventhoughcloselyrelatedspecieshave stra in nonvertebrate animals, including Porifera and Cnidaria multipleRIG-1andMDA5/LGP2genes(fig.2).Previousstud- ct/3 (supplementary table S4, Supplementary Material online). ies have noted the small size of the sea squirt genome, 1 /1 Although the immediate target of RLR signaling, MAVS, is with many protein families appearing extremely simplified /1 4 too diverged to detect using sequence-homology-based compared with other species (Dehal et al. 2002). Together, 0/1 methods, this gene has been identified in the sea urchin thesefindingssuggestthatthespecificcontentofRLRgenes 04 8 (Strongylocentrotus purpuratus), suggesting that it predates may be fairly labile across evolutionary distances, perhaps 41 7 vertebrates (Hibino et al. 2006). A recent study has also driven in part by complex interactions with adaptive b y shown that RLRs can activate cellular immune responses immune systems arising very early in vertebrate evolution g u independentofMAVSsignaling(Poecketal.2010),suggesting (FlajnikandKasahara2010). es that functional RLR-based immune pathways may exist in Ourconsensusphylogenywasrobusttoalignmentambi- t on organisms that lack the MAVS gene. We did not find any guityanduseofalternativeevolutionarymodels(supplemen- 04 evidence for canonical RLR-pathway effector molecules tarytableS5,SupplementaryMaterialonline),withthesame Ap (type 1 interferons or proinflammatory cytokines) outside phylogeny being recovered using techniques robust to ril 2 0 of vertebrates, suggesting thatthispart of the RLR pathway common evolutionary model violations (Lartillot and 1 9 was a recent innovation, possibly facilitating interactions Philippe 2004; Kolaczkowski and Thornton 2008). Removal with the emerging adaptive immune system. However, our of fast-evolving protostome sequences did not alter the findingsdosuggestthatmanyofthenecessarycomponents remaining tree topology (supplementary fig. S2A, for a simplified form of RLR-based immunity were present Supplementary Material online). Although removal of these intheearliestmulticellularanimalsandnotalater,vertebrate- sequenceseliminatesanyinformationthatcouldconfidently specificnovelty. datethefirstRLRduplicationevent,itdoesindicatethatthe firstduplication intoRIG-1 andMDA5/LGP2lineagesisnot Evolutionary History of RLRs affectedbyremovaloffast-evolvinglineages,suggestingthat Figure2showstheRLRconsensusphylogenyobtainedusinga this grouping is probably notcausedby long-branch attrac- variety of alignment strategies and phylogenetic inference tion.Together,theseresultsargueagainststrongphylogenetic methods. This tree suggests that two primary gene duplica- biasesasamajorexplanationforourresults,assucherrorsare tion events gave rise to the three classes of RLRs found in expectedtoaffectdifferentevolutionarymodelstodifferent mammals. First, the ancestral RLR (RIG-1/MDA5/LGP2anc) degrees (Sullivan and Swofford 2001; Huelsenbeck and 142 MBE Vertebrate-SpecificInnateAntiviralImmunity . doi:10.1093/molbev/mst184 Macaca mulatta MDA5 Rannala 2004) and should be sensitive to removal of fast- Tetrapod Homo sapiens Teleost Mus musculus LGP2 evolvingtaxa(Abereretal.2013). Agnathan Canis familiaris We rooted our RLR phylogeny using the closest-related Invertebrate Monodelphis domestica Deuterostome Taeniopygia guttata protein family (supplementary fig. S3, Supplementary Protostome 0.99 Anolis carolinensis LGP2 Material online). Although the outgroup branch was long, 1.00 Xenopus laevis 0.94 Tetraodon nigroviridis supportfortheconsensusrootingincreasedwithincreasing Takifugu rubripes alignment quality and evolutionary model complexity, Oryzias latipes Gasterosteus aculeatus arguing against phylogenetic bias, which is expected to LGP2 anc Ctenopharyngodon idella affect simpler models more strongly than complex ones Danio rerio (supplementary table S6, Supplementary Material online). Macaca mulatta 0.94 Homo sapiens The same rooting was also recovered using a gene-species 1.00 Mus musculus 0.92 Canis familiaris tree reconciliation method that does not rely on outgroup Ornithorhynchus anatinus sequences, again arguing against long-branch attraction D ManDcA25/LGP2 00..9939 XGTeaanellounpsiou gpsay tglrluoiasp gicuatltiasta MDA5 (suAppltlheomuegnhtamryanfiyg.rSe4a,soSunpabplleemaleignntmaryenMtamteertihaloodnslainree).avail- ownlo 0.91 Tetraodon nigroviridis a Takifugu rubripes able, the choice of how to align sequences and how to de MDA5 anc GOasrytezriaoss tleautisp easculeatus postprocessalignmentdatatoremovepotentiallyunreliable d fro 000...888592 BBrraaDCnnPcacteehhntniriioooooss mprtteohoyrmamzioroaayn n ff lglmooorraididdroiaanneeu isdella patahpsoipssciratroineoaanctsshho,ecna,cnwwhhhoeaiiccvaheessoaefsinlsawecrdogheripccilhomardaepetvaeocsdltuupotipnuoonnrrecatseruyrutlstamiiinnnoggtdycaella“adtcboeoonsuuusstepenp.sotFuhroste”r, m https://ac a MDA5/LGP2 Saccoglossus kowalevskii sequence alignment, alignment processing and evolutionary de anc1 SSaacccocogglolosssusus sk okowwaalelevsvksikiii modelbycalculatingmaximumlikelihood(ML)andBayesian mic Branchiostoma floridae support measures across a variety of different alignments .ou p Macaca mulatta RIG-1 and evolutionary models and then summarizing support .c Homo sapiens o Canis familiaris on theconsensustreeinfigure2.Supportvaluescalculated m/m Mus musculus fromeachalignmentareprovidedinsupplementarytableS7 b Ornithorhynchus anatinus e Taeniopygia guttata (SupplementaryMaterialonline). /a Anolis carolinensis Consensus support for many of the key nodes on the rtic Xenopus tropicalis le Cyprinidae sp. phylogeny was high (fig. 2). The jawed-vertebrate-specific -a RIG-1/ 000...789470 BBrraanncchhiioosstStoomtmroaan fgflloyoPrlroiieddctaaereoenmtroyztuosn pmuarpriunruastus MS(ShHDim-Aliok5deaaanirLdaR-THL)GassPue2gpapcwloaardt-eliusknewdaeeprreparnroeMxciomLveafrrtaeemdliewkweiltoihhrko0o(.90d3.9r9aantaidnod0te.19s.09t bstract/31 MDA5/ Strongylocentrotus purpuratus /1 LGP2 Strongylocentrotus purpuratus Bayesianposteriorprobability,respectively).Thecladeuniting /1 anc StronSgtryoloncgeynlotcroetnutsro ptuusrp puurarptuusratus MDA5 and LGP2, along with an RLR from the sea lamprey 40/1 Strongylocentrotus purpuratus (Petromynusmarinus)wasalsorecoveredwithstrongsupport 04 Lottia gigantea (0.94SH-likeaLRT,1.0Bayesianposteriorprobability),aswas 84 Ciona intestinalis 17 RIG-1 Ciona intestinalis themucholderRIG-1clade(0.86SH-likeaLRT,0.91Bayesian b anc LLoottttiiaa ggiiggaanntteeaa posteriorprobability). y gu 0.88 Aplysia californica However, support for some of the oldest nodes on the es 0.94 Helobdella robusta phylogeny was relatively low. The clade uniting vertebrate t o 0.83 Helobdella robusta n 000...897619 CapiteCllaa Cpteiatleepltilatae ltleaCl etaetealentoarhabditis briggsae MRitLyD)R,sAah5nadadn0dt.h8L5eGSnPHo2-dliwkeeituhaLnRciteTinp(gh0a.8lao9llcBhMaoyrDedsaAita5en,apLnoGdsPte2hr,eiomarnipcdhrooRbrdaIGba-til1e- 04 April 2 Caenorhabditis elegans 0 sequenceswasrecoveredwith0.74SH-likeaLRT(0.87poste- 1 0.99 Caenorhabditis remanei 9 1.00 Loa loa rior probability). Lower support at these nodes is not 1.00 Brugia malayi unexpected, as reconstructing deep branching patterns Ascaris suum Caenorhabditis briggsae Caenorhabditis elegans Caenorhabditis remanei Loa loa FIG.2.Continued Nematostella vectensis Cnidaria methods.Statisticalsupportisshownforkeynodesonthephylogeny, Nematostella vectensis Amphimedon queenslandica calculatedusingML(SH-likeaLRT[Anisimovaetal.2011])(top)and Porifera Amphimedon queenslandica Bayesian posterior probabilities using standard evolutionary models DICER (middle) and the CAT model (Lartillot and Philippe 2004) (bottom). other DEAD/Helicases 0.5 substitutions/site ThetreeisrootedusingcloselyrelatedDEAD/Helicasedomains(Sarkar FIG.2. RLRsevolvedthroughgeneduplicationsinBilateriaandverte- et al. 2008). Ancestral protein sequences were reconstructed for the brates.WeshowthestronglysupportedRLRconsensustreecalculated nodesindicatedonthetree.Speciesnamesarecoloredbytaxonomic across a variety of alignment strategies and phylogenetic inference group. 143 MBE Mukherjeeetal. . doi:10.1093/molbev/mst184 usinglimiteddatafromfast-evolvingimmunereceptorsisa of the RLR protein family within a larger context of challenging phylogenetic problem. Nonetheless, we wanted DEAD/Helicase-containing proteins suggests that the first to examine support at these key nodes in greater detail, to RLRgeneduplicationmayhaveoccurredafterthedivergence assesspossiblephylogeneticerror. of protostomes from deuterostomes (supplementary fig.S3, Althoughconsensussupportforsomeoftheoldestnodes Supplementary Material online). We therefore remain cau- on the phylogeny was low, support for all nodes increased tiousinourinterpretationoftheprecisetimingofthegene with increasing alignment and evolutionary-model quality duplication giving rise to RIG-1 and MDA5/LGP2, which (supplementary table S7, Supplementary Material online). If may have occurred early in Bilateria or after the proto- the phylogeny was erroneous, we would expect support to stome/deuterostomedivergence. decrease when potentially unreliable alignment positions Previous studies have suggested that the evolutionary wereremovedand/orparameterswereaddedtotheevolu- history of RLRs is complex, with multiple gene-fusion and tionarymodeltobetterfitbiologicalreality.Thatweobserved domain-grafting events (Sarkar et al. 2008; Zou et al. 2009). the opposite effect—increasing support with increasingly Contrary to these findings, we observed no evidence for D reliable data and increasingly realistic evolutionary phylogenetic incongruence among individual functional ow n models—argues against phylogenetic bias or stochastic domains (supplementary table S8, Supplementary Material lo a errorasamajorexplanationforourresults. online). We suspect that previous results may have been d e d Furthermore,nearlyallthephylogeneticambiguityinour compromisedbyincompletetaxonsamplingandless-reliable fro tree ultimately resulted from difficulty resolving short inter- phylogeneticmethods.Ourresultssuggestthat,afterthefull m nodesassociatedwithfast-evolvingnonvertebratesequences. RLR domain-architecture was assembled very early in the http R10e0m%ovsuaplopfotrhteusnedseeqruanenycmesecthoomdp,lseutgeglyersetisnoglvtehdatthaemtpreleepwhityh- atinoinmaalldliinveeraggeen,caesimthproleugmhodperolotefinge-cnoeddinugplcichaatniognesanedxpfulanincs- s://ac a logenetic signal exists to reconstruct the underlying core RLRevolution. de m branching pattern, even though the precise placement of ic some fast-evolving sequences is ambiguous (supplementary Functional Evolution of RLRs .ou p fig.S2B,SupplementaryMaterialonline). ExtantRLRsrecognizedifferentvirusesanddifferintheirsen- .c o We additionally reconstructed the full RLR phylogeny sitivityandspecificityforvariousRNAligands(Hornungetal. m /m using a concatenation of DNA and protein sequence align- 2006;Li,Lu,etal.2009;Pippigetal.2009;LooandGale2011). b e ments (Agosti et al. 1996). Both ML and Bayesian support These differences in RNA binding are primarily determined /a for all key nodes on the phylogeny increased dramatically bydifferencesintheC-terminalRNA-RD(Pippigetal.2009). rtic le using this approach (supplementary fig. S5, Supplementary To better understand how the RNA-binding properties of -a b Materialonline).Allbuttwoofthenodesonthephylogeny RLRs functionally diversified, we reconstructed ancestral stra had maximal support across all methods. The node unit- RLR proteins, inferred the 3D structures of ancestral RDs, ct/3 ing protostome and deuterostome RIG-1 sequences had and compared ancestral residues with known functional 1 /1 maximal Bayesian posterior probability and between 0.97 residuesinextantproteinstopredictthelikelybindingprop- /1 4 and 1.0 ML bootstrap support, depending on whether the ertiesofancientRLRs.Thesepredictionswerethenexamined 0/1 alignment was processed to remove potentially unreliable experimentally by measuring RNA binding by ancestral and 04 8 positions. The remaining node separating RIG-1, MDA5, humanRDsinvitro.BecauseRDsfromdifferenthumanRLRs 4 1 and LGP2 sequences from Cnidarian and Poriferan RLRs are known to bind blunt ended and 50ppp dsRNA with 7 b y also had maximal posterior probability and 0.87–0.92 different affinities, and crystal structures are available for g u bootstrap. some of these interactions, we chose to focus our analyses es Althoughourconsensustreeissupportedbythebulkof onacomparisonofthesetwoRNAtypes. t o n phylogenetic analyses and appears robust to common WefoundthattheancestralRLRexhibitedRIG-1-likeRNA 04 sourcesoferror,highlevelsofsequencedivergenceraisethe binding. Not only is the shape andelectrostatic distribution Ap real possibility that unaccounted-for factors may generate of the ancestral RNA-binding pocket very similar to that ril 2 0 spurious phylogenetic groupings, particularly for deep of human RIG-1 (fig. 3), but also nearly all the key contact 1 9 nodes on the tree. Specifically, the timing of the first RLR residuesinhumanRIG-1areconservedinboththeancestral duplicationevent—givingrisetoRIG-1andMDA5/LGP2lin- RIG-1 and ancestral RLR, but not in other RLRs, including eages—remains weakly supported in many analyses. humanMDA5,humanLGP2,andtheirancestors(fig.4;sup- Although Bayesian posterior probabilities supporting this plementary fig. S6 and table S9, Supplementary Material node are typically high, concerns have been raised that online). The poriferan RLR has an RNA-binding pocket Bayesian approaches may overestimate clade support thatisverysimilarinbothshapeandelectrostaticdistribution (Suzuki et al. 2002; Misawa and Nei 2003; Simmons et al. tothatofhumanRIG-1,theancestralRIG-1andtheancestral 2004). Bootstrap support for this node was >0.8 in our RLR, further suggesting that the ancestral RLR bound blunt concatenated analysis, which is typically considered accept- ended and 50ppp RNA with high affinity, similar to human able,butconcernshavebeenraisedthatthis“ruleofthumb” RIG-1(Luetal.2011). maynotcorrespondtotraditionalexpectationsofstatistical Consistent with our structural predictions, we observed confidence (Efron et al. 1996), and other estimates of ML that both human RIG-1 and the ancestral RLR bound support are low in some analyses. Finally, reconstruction blunt ended and 50ppp dsRNA with equal affinities (fig. 3; 144 MBE Vertebrate-SpecificInnateAntiviralImmunity . doi:10.1093/molbev/mst184 D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /m b e /a rtic le -a FIG.3. PrimaryfunctionaldifferencesinRNAbindingwereestablishedearlyinRLRevolution.WeplottheRNA-bindingpocketshapeandelectrostatic bs potential(kT/e)forhumanRIG-1,MDA5,andLGP2aswellasforreconstructedancestralproteins(seefig.2forancestralnodesonthephylogeny). tra c DottedyellowlineindicatesRNA-bindingloop.BargraphsdisplayRLRconcentrationsatwhichhalf-maximalblunt-endedand50pppdsRNAbinding t/3 occurs(steady-statebindingaffinity[Kd]andinitialRNA-bindingrate[Km],measuredusinganinvitrokineticsassay).NotethatlowerKdandKm 1/1 indicatetighterbinding.Barsindicatestandarderrors.Seesupplementaryfig.S7(SupplementaryMaterialonline)forcompletebindingcurves. /1 4 0 /1 0 4 8 supplementary fig. S7, Supplementary Material online). Thesestructuralfeaturesareretainedin humanMDA5,but 4 1 7 Human RIG-1 exhibited equivalent steady-state binding of not in human RIG-1 or LGP2, suggesting that the MDA5/ b y bluntendedand50pppRNA(Kd=0.46and0.49mM,respec- LGP2ancestorlikelyexhibitedMDA5-likeRNAbinding. g u tively; two-sample t, P=0.38), as did the ancestral RLR Indeed, functional analysis of the MDA5/LGP2 ancestral es (Kd=0.54and0.50mM,respectively;P=0.34). RDrevealedastrongpreferenceforblunt-endedover50ppp t o n These steady-state binding results were corroborated by dsRNAcomparedwiththeancestralRLR,characteristicsthat 04 ouranalysisofinitialRNA-bindingrates,inwhichbluntended areretainedinhumanMDA5butnothumanRIG-1orLGP2 Ap and 50ppp dsRNA were bound with equal rates by both (fig.3;supplementaryfig.S7,SupplementaryMaterialonline). ril 2 0 human RIG-1 (Km=0.45 and 0.44mM, respectively; TheMDA5/LGP2ancestorhadslightlystrongersteady-state 1 9 P=0.48) and the ancestral RLR (Km=0.49mM for both affinityandfasterinitialbindingratesforblunt-endeddsRNA, RNAtypes;P=0.49).Remarkably,boththesteady-stateaffin- compared with the ancestral RLR (two-sample t, P=0.006 ities and initial RNA-binding rates of the ancestral RLR and and0.038,respectively).Atthesametime,theMDA5/LGP2 human RIG-1 were the same across these two RNA types ancestorweakenedits50pppRNA-bindingaffinitycompared (two-sample t, P>0.25 across all comparisons), suggesting with the ancestral RLR by nearly an order of magnitude an amazing degree of molecular-functional conservation (P<0.000002). These binding properties of the MDA5/ overmillionsofyearsofanimalevolution. LGP2 ancestor are retained in human MDA5, which binds After duplication of the ancestral RLR, a number of sub- blunt-endeddsRNAwithabouttenfoldstrongeraffinityand stitutionsoccurredintheMDA5/LGP2ancestortocreatean higher initial rate, compared with 50ppp dsRNA (P<0.01). MDA5-like protein. Specifically, the MDA5/LGP2 ancestor Theseresultsareconsistentwithatleastonepreviousstudy hadamuchmoreacidicRNA-bindingloop,comparedwith ofMDA5RNAbinding(Li,Lu,etal.2009). theancestralRLRandRIG-1,andthedistributionofpositive The physiological ligand of human MDA5 has remained charge across the RNA-binding pocket has shifted (fig. 3). mysterious, with recent studies suggesting that MDA5 may 145 MBE Mukherjeeetal. . doi:10.1093/molbev/mst184 A B 5’ppp dsRNA blunt-end dsRNA Zn Lys907 Zn Arg811 Lys907 Lys888 Trp908 Lys861 D o w His830 His830 n lo a d His847 e d fro m Phe853 Lys858 Phe853 Lys858 h Lys849 Lys849 ttp s ://a c RIG-1/MDA5/LGP2 ancestor a d Human RIG-1 e m ic FIG.4. TheancestralRLRbinds50pppandblunt-endeddsRNA.Weusedhomology-basedmoleculardockingtofit50pppdsRNA(A)andblunt-ended .ou dsRNA(B)intothebindingpocketoftheRIG-1/MDA5/LGP2ancestralRLR(fig.2).Contactresiduesareshownassticks,withpolarcontactsindicated p.c byblackdottedlines.RNAcontactresiduesfoundinhumanRIG-1areshowningreenandlabeled(Luetal.2010,2011). om /m b e /a cooperatively bind long, blunt-ended dsRNAs to produce (fig. 3; supplementary fig. S7, Supplementary Material rtic le immune-signaling “filaments” (Berke et al. 2012; Wu et al. online;two-samplet,P=0.01),oppositeofwhatweobserved -a b 2013). Although our experiments using short dsRNAs do for both human MDA5 and the MDA5/LGP2 ancestor. stra notdirectlyaddresscooperativeMDA5binding,ourfinding However, this difference in binding affinity is small, and no ct/3 that the human MDA5 RD—as well as the MDA5/LGP2 significantdifferenceininitialRNA-bindingratewasobserved 1 /1 ancestral RD—prefer blunt-ended over 50ppp dsRNA is (P=0.13). Nonetheless, our results suggest that human /1 4 consistentwithpreviousresults. LGP2’sRNA-bindingproperties—particularlyitstightbinding 0/1 Although we do not know whether differences in blunt of50pppdsRNA—aremuchmoresimilartothoseofhuman 04 8 endedversus50pppdsRNAbindingaretheonly—oreventhe RIG-1thanMDA5,whichisconsistentwithpreviousfindings 4 1 7 primary—functional differences separating human MDA5 (Li,Lu,etal.2009). b y and the MDA5/LGP2 ancestor from human RIG-1 and the LikehumanMDA5andtheMDA5/LGP2ancestor,human g u RLR ancestor, our results do suggest that the RNA-binding LGP2hasanegativelychargedRNA-bindingloop,whichcan es sensitivityandspecificityoftheMDA5/LGP2ancestorshifted alsobeobservedintheLGP2ancestorandtheMDA5ances- t o n immediatelyfollowingitsduplicationfromtheancestralRLR, tor,butnotinanyoftheRIG-1sortheancestralRLR(fig.3). 04 andthisfunctionalshiftwasretainedinhumanMDA5. ThepositivelychargedRNA-bindingpocketofhumanLGP2 Ap Although it may be tempting to dismiss relatively small alsoappearsstronglyshiftedcomparedwiththeotherhuman ril 2 0 changes in RNA-binding affinities as potentially “biologically and ancestral RLRs (fig. 3). These observations suggest 1 9 irrelevant,”thatthesechangesareassociatedwithlarge-scale thatLGP2bindsitsRNAligandsinadifferentconformation structuraldifferencesconservedovermillionsofyearsofevo- compared with RIG-1 and MDA5, which is consistent with lution argues against such an interpretation. Furthermore, structuralcomparisonsofhuman LGP2andRIG-1(Luetal. smalldifferencesinanorganism’sabilitytoquicklyrecognize 2011).TheseresultssuggestthatthesimilaritiesinRNA-bind- and respond to viral infections may translate into relatively ingsensitivityandspecificitybetweenhumanLGP2andRIG-1 large differences in pathogen sensitivity and, ultimately, are the result of convergent evolution, in which LGP2 fitness. reevolvedRIG-1-likeRNAbindingfromanMDA5-likeances- LGP2arosefromajawed-vertebrate-specificgeneduplica- tor,butusingdifferentstructuralmechanisms. tion of the MDA5/LGP2 ancestor and immediately lost its Together, our results suggest that the major functional twinN-terminalCARDsignalingdomains(supplementaryfig. differences in RNA binding among extant RLRs were estab- S8 and table S2, Supplementary Material online). Human lished early, following gene-duplication events. Structural LGP2 also appears to have evolved a slightly stronger modeling and functional characterization of ancestral and steady-state affinity for 50ppp over blunt-ended dsRNA human RDs confirmed that protein-coding changes and 146 MBE Vertebrate-SpecificInnateAntiviralImmunity . doi:10.1093/molbev/mst184 structural differentiation were associated with functional the RNA-binding loop, and none occurred in functional shifts in the RDs’ sensitivity and specificity for blunt-ended domains other than the RD, suggesting that adaptation has versus 50ppp dsRNA, suggesting that functional differentia- specifically targeted the RNA-binding properties of RIG-1 tionofRNAbindingoccurredthroughchangesintheshape duringmammalianevolution. andelectrostaticdistributionofRLRRDs. The functional consequences of many of these substitu- Becausesomeoftheoldestnodesonourphylogenywere tionsare unknown, but mutationalstudiessuggestthat the notresolvedwithveryhighcladesupportunderallanalyses, Tyr853PhesubstitutionobservedinprimateRIG-1islikelyto we wanted to make sure our ancestral reconstructions did affectRNAbinding(Takahasietal.2009).Asecondadaptive not depend on the tree topology. We therefore used an Gly811Arg substitution in the human lineage is likely to empirical Bayesian ancestral-reconstruction approach that increase affinity for blunt-ended dsRNA but not 50ppp integratedovertopologicaluncertainty.Evenwhenaccount- dsRNA, because Arg811 forms a hydrogen bond with the ing for ambiguity in the underlying phylogeny, all ancestral backbone of blunt-ended dsRNA but does not contribute protein sequences were reconstructed with very high confi- to stabilizing bound 50ppp dsRNA (Lu et al. 2011). The D dence, and alternative reconstructions were not supported Gly811 residue found in mouse and macaque does not ow n (supplementary fig. S9, Supplementary Material online). form this stabilizing hydrogen bond (fig. 5), suggesting lo a These findings suggest that our results are not strongly eitherastrongevolutionarypreferenceforincreasingspecific de d affectedbyambiguityineitherthephylogenetictreeorthere- affinity for blunt-ended dsRNA in the human lineage or fro constructed ancestral sequences. Our structural inferences a compensating change due to substitutions at other posi- m h also do not depend on the particular template used to tions in the molecule that may destabilize blunt-ended ttp buildstructuralmodelsofancestralproteins,asweobtained dsRNAbinding. s://a equivalent results using a range of available structural tem- Population-genomicanalysisofresequencingdata(Durbin c a plates (supplementary fig. S10, Supplementary Material et al. 2010) also suggests that RIG-1 may have experienced de m online). a recent selective sweep in humans (coalescent P=0.04, ic .o fig. 6A). A reduction in polymorphism and negative skew u p Recent Adaptation of RLRs in Mammals in Tajima’s D around the RIG-1 gene further supports this .co m Although major functional differentiation of RLRs occurred conclusion(Tajima1989)(fig.6B),asdoesanalysisofsingle- /m early,wefoundevidencethatRLRshavecontinuedtoadapt nucleotidepolymorphisms(SNPs)identifiedbytheHapMap be to viral pathogens by altering their RNA-binding properties project (Altshuler et al. 2010) (supplementary fig. S12, /artic throughout mammalian evolution (supplementary fig. S11 SupplementaryMaterialonline). le andtableS10,SupplementaryMaterialonline).Forexample, Althoughthesupportinfavorofaselectivesweepisweak, -ab s phylogenetic analysis of nonsynonymous/synonymous andwearecautiousregardingtheabilityofthesemethodsto tra c substitution rates identified a cluster of adaptive substitu- pinpoint the precise locations of selective sweeps (Nielsen t/3 1 tions within the RNA-binding loop of mammalian RIG-1 et al. 2005), it is intriguing to note that the strongest signal /1 (fig. 5). Few adaptive substitutions were identified outside foraselectivesweepoccursintheregionofthegenecoding /14 0 /1 0 4 8 4 1 7 b y g u e s t o n 0 4 A p ril 2 0 1 9 FIG.5. AdaptivesubstitutionsinmammalianRIG-1affectRNAbinding.WealignedtheC-terminalRNA-RDofmammalianRIG-1s(human,rhesus,and mouse shown); zinc-finger motifs (Zn) and the RNA-binding loop are indicated along the bottom of the alignment. Red stars indicate strongly supportedadaptiveaminoacidsubstitutions.Contactresiduesbindingblunt-endeddsRNA(lightgray)and50pppdsRNA(darkgray)arealsoindicated, withboldoutlinesindicatingresiduesrequiredforRNAbinding.Contactresidueswereidentifiedfromstructuralandmutationalanalysesofhuman RIG-1(Luetal.2010,2011).InsetshowsthestructuralconsequencesoftheadaptiveGly811Argsubstitutionobservedinthehumanlineage.Panelatleft showsthecrystallizedhumanRIG-1boundtoblunt-endeddsRNA(Luetal.2011).Polarcontactsareindicatedwithdottedblacklines.Rightpanel showsthecorrespondingregionofmacaque/mouseRIG-1.Numberingisbasedonthefull-lengthhumanRIG-1proteinsequence. 147 MBE Mukherjeeetal. . doi:10.1093/molbev/mst184 A Helicase RD Genes Sweep log-Likelihood1024680 32,717,478 RIG-1 32,263,354 Sweep log-Likelihood1024680 B16 Chromosome 9 RIG-1 Coding Sequence 12 C D π 8 ENKKLLCRKCKALACYTADVRVIEECHYTVLGDAFKECFVSR 4 803 0 P G TM QK S PHPKPKQFSSFEKRAKIFCARQNCSHDWGIHVKYKTFEIPVI a's D01 84T6G G Y T m-1 KIESFVVEDIATGVQTLYSKWKDFHFEKIPFDPAEMS D Taji-2 888 924 ow -3 RNA contact residues nlo a FIG.6. Population-genomicanalysissuggestsarecentselectivesweepinhumanRIG-1.(A)Weusedaspatiallyexplicitcomposite-likelihoodmodelto de d calculatesupportforaselectivesweepintheregionofhumanchromosome9surroundingRIG-1.Weplotthelog-likelihoodsupportforaselective fro sweepacrossthisregion,withthedottedlineindicatingthe5%significancecutoff,calculatedusingcoalescentsimulations.NotethatRIG-1iscoded m onthenegativeDNAstrand;wehaveorganizedthefiguresothatthenegativestrandreadsfromlefttoright.Thecorrespondingproteinfunctional http domainscanthenbereadfromN-terminal(left)toC-terminal(right).Graybarsindicateothergenesintheregion.Wealsoshowaclose-upofthe s RIG-1codingsequence,withtheexonscodingfortheRNA-RDandtheC-terminalendoftheHelicasedomainindicated.Grayexonsindicatethe ://a c 30-untranslatedregion.(B)Wedisplaysliding-windowcalculationsofpolymorphism((cid:2))andTajima’sDacrossthesamegenomicregionasin(A). ad e Dotted line indicates lower bound of the 95% confidence interval for mean Tajima’s D across the region. (C) We plot identified protein-coding m polymorphismsacrosstheRIG-1RD(Sherryetal.2001);RNAcontactresiduesareidentifiedbygraycircles(Luetal.2010,2011). ic.o u p .c o for the C-terminal RD, suggesting that adaptation of RIG-1 inhumans(supplementaryfig.S14A,SupplementaryMaterial m /m RNAbindingmayhaveoccurredrecentlyinhumans. online). b e Although there is no reason to presume that selection A recent study of protein-coding polymorphisms in /a must be acting on coding variants, it is interesting to note human RLRs also identified a number of adaptive changes rtic le thatanumberofprotein-codingpolymorphismshavebeen invariouspopulations(Vasseuretal.2011),furthersupport- -a b identified in the human RIG-1 RD (fig. 6C). Many of these ing the conclusion that RLRs have experienced recent stra polymorphisms induce radical changes in amino acid adaptationandpossiblechangesinproteinfunction. ct/3 biochemical properties either at or nearby critical RNA- 1 /1 contacting residues, suggesting that the RNA-binding Conclusion /1 4 properties of RIG-1 may vary markedly across the human Theavailabilityoffullysequencedgenomesfrommanyspe- 0/1 population. For example, recent exome resequencing has ciesprovidesanunprecedentedopportunitytosystematically 048 identified a low-frequency polymorphism (rs142758049, 4 evaluate the origins and evolution of protein families and 1 7 MAF=0.023) in African Americans (Tennessen et al. 2012) biochemicalsystems,sheddingnewlightontheoldquestion by thatinducesaLys861ThrchangeintheRD,disruptingakey g of how organismal complexity arose. Here, we have shown u ipnotleyrmacotriopnhisstmab(irlisz1in47g35903p2p3p9)dsoRnNtAhe(fisagm.4e).AAfrsiceacnonAdmceordicinang howanintegrativeanalysiscombiningtechniquesfromgene est o prediction, phylogenetics, structural modeling, molecular n halonaoppRloNttyAhpr-eocoucnghhtaancetgleerscetsAriodrsugt8ea5,t9iicttdoinoteGesrlyas.cttAaiboltinlhiszoeuwgtihhtehARrGNgl8uA58-95b7iinsdaninnodgt bifnuiofnoccrhtmieomantaiioslltnyry,adbiavoneudrtsifipheoodpw,ulrtaehtvieoeanRliLngRgenpnerteoiwctesinicnaffnoamrmdielayrtiivaoernospearbeaconiusdet 04 April 20 Glu883 (supplementary fig. S13, Supplementary Material 1 theoriginsandevolutionofinnateantiviralimmunity. 9 online); disruption of these stabilizing interactions is likely Our analysis provides a broad outline of RLR protein tostronglyimpactRDfunction. evolution while highlighting some of the key events in the IncontrasttoRIG-1,MDA5andLGP2RDsappeartohave evolution of this gene family. Specifically, we found the experiencedlessprotein-codingadaptationacrossthemam- following: malian phylogeny (supplementary fig. S11, Supplementary Material online), and support for adaptive sweeps within 1) RLR-mediated viral-RNA sensing and response path- humanMDA5andLGP2codingsequenceswaslow(supple- ways—a major component of innate antiviral immu- mentary fig. S14, Supplementary Material online). However, nity—arose at the origin of multicellular animals and we did observe a strong adaptive-sweep signature just arenotvertebratespecific,aslongbelieved. upstream of human MDA5. Analyses performed by the 2) The ancestral RLR exhibited RIG-1-like RNA-binding ENCODEproject have identifieda numberof transcription- properties, with binding affinities for both blunt ended factorbindingsiteswithinthisregion(Dunhametal.2012), and50pppdsRNAremarkablysimilartothoseofhuman suggestingthatMDA5expressionmayhaveadaptedrecently RIG-1. 148 MBE Vertebrate-SpecificInnateAntiviralImmunity . doi:10.1093/molbev/mst184 3) When the ancestral RLR duplicated to produce the RLRs, calculated using a sequence search of the Pfam data- RIG-1 and MDA5/LGP2 lineages, protein-coding base with an e-value cutoff of 10(cid:2)5 (Finn et al. 2010). changes in the MDA5/LGP2 ancestor reduced its affin- Finally, transcription of predicted genes was confirmed ity for 50ppp dsRNA and increased its affinity for using BLAST searches of species-specific EST databases, blunt-ended dsRNA, relative to the ancestral RLR. with an e-value cutoff of 10(cid:2)10 (Boguski et al. 1993). These RNA-binding properties are retained in human MDA5 but not in human RIG-1 and LGP2, suggesting Sequence Alignment and Phylogenetic Inference that LGP2 reevolved RIG-1-like RNA binding from an Sequence alignments were produced using MUSCLE v3.8.31 MDA5-like ancestor. (Edgar2004)andMAFFTv6.850b(KatohandToh2008).In 4) RLRs appear to have accumulated adaptive changes addition to full-length sequence alignments, we also used a alteringtheirRNA-bindingpropertiesthroughoutmam- domain-based alignment strategy. For this approach, each malianevolution,includingveryrecentlyinhumanevo- sequence wasusedto search the Pfam database (Finn etal. lution.HumanpolymorphismsinRIG-1andotherRLRs D maycontributetodifferencesinviralsusceptibilityand/ 2010) for matching functional domains, which were then ow aligned individually. Alignments were processed using n orriskofautoimmunediseases. lo Gblocksv0.91batvariousstringencysettingstoremovepo- ad Our work provides a general context for understanding tentially unreliable regions (Castresana 2000; Talavera and ed RLR functional evolution across the metazoan phylogeny Castresana2007). fro m and a framework for elucidating the detailed mechanisms For each alignment, the best-fit evolutionary model was h by which RLRs have functionally diversified. Although the identified using the AIC criterion in ProtTest v2.4 (Abascal ttps biologicalramificationsofobservedRNA-bindingdifferences etal.2005).MLphylogenieswerereconstructedusingPhyML ://a c for RLR-based immune signaling must remain highly specu- v3.0 (Guindon et al. 2010). We additionally reconstructed ad e lativeatthispoint,ouranalysissuggestsamodelinwhicha maximum-likelihood trees using a mixed branch length m ic “generalist” antiviral RNA-binding RLR ancestor arose very model, which incorporates evolutionary heterogeneity by .o u early in multicellular animals and functionally diversified by calculating the probability of the alignment data, given a p.c geneduplicationstoproducefirstamorespecializedMDA5- specified number of independently optimized branch om like RLR and then a possibly regulatory LGP2-like molecule length classes with corresponding mixing proportions /m b lackingdirectCARD-signalingcapabilities.Giventhisdiversity (Kolaczkowski and Thornton 2008). The tree topology and e/a ofRLRfunctions,wesuspectthatathoroughunderstanding allmodelparameterswereoptimizedusingsimulatedanneal- rtic ofRLR-basedantiviralimmunitywillrequirenotonlyfurther ing,andthebest-fitnumberofbranchlengthclassesforeach le-a refinement of our understanding of specific RLR functions alignmentwasinferredusingAIC. bs but also elucidation of how interactions among various Bayesian analyses were performed using MrBayes v3.1.2 trac RLRsmayaffectimmuneprocesses. (Ronquist and Huelsenbeck 2003), integrating over all avail- t/31 /1 Materials and Methods able evolutionary models. We assumed gamma-distributed /1 4 among-site rate variation and an estimated proportion of 0 /1 Gene Prediction invariant sites. MrBayes analyses were conducted using de- 0 4 8 WepredictedRLRgenesusingFGENESH+ v2.6,whichusesa fault parameters, with runs terminating when the average 4 1 homology-based hidden Markov model (HMM) framework standard deviation in clade posterior probabilities between 7 b y topredictgenestructuresfromgenomicDNA(Salamovand twoindependentrunsfellbelow0.01. g u Solovyev2000).Regionsofeachgenomepotentiallyharboring We also conducted Bayesian analyses using the CAT es RLR genes were identified using TBLASTN (Altschul et al. model, implemented in PhyloBayes v3.3b (Lartillot and t o n 1997) with human RLR proteins as query sequences. Gene Philippe 2004; Lartillot et al. 2009). Conceptually, the CAT 04 predictions were then made from the identified genomic model incorporates site-specific heterogeneity in qualitative Ap region ±3kb. Support for predicted genes was calculated evolutionaryconstraintsusingamixtureof aminoacidpro- ril 2 usinglog-oddsratios,whichreportthelog oftheprobability files.Recentanalysessuggestthatthisapproachmayimprove 01 2 9 of the genomic DNAsequence, given theHMM, dividedby tree reconstruction accuracy compared with standard ho- the probability of the genomic DNA under a random null mogenousmodels(Lartillotetal.2007). model. WeconstructedaconsensustreeusingPhySIC_ISTv1.0.1 Predicted gene sequences were confirmed using four (Scornavacca et al. 2008), which identifies consistently complimentary approaches. First, predicted genes were re- supported clades across a set of input phylogenies, tak- jected if a PSI-BLAST search against experimentally verified ing into account support values estimated on each input gene structures in the NCBI database failed to produce a tree. Weakly supported clades (<0.8 SH-like aLRT verified RLR gene as the top hit. Second, predicted genes [Anisimova et al. 2011] or Bayesian posterior probability) were rejected if the rate of nonsynonymous/synonymous were collapsed on each input tree. The same consensus substitutions (dN/dS) was outside the range of dN/dS treewasobtainedusingamodifiedminimum-cutapproach, values for confirmed vertebrate RLRs. Third, predicted which has been shown to be among the best-performing genes were rejected if their protein sequences did not supertree methods (Cotton and Wilkinson 2007; Buerki have the same complete domain architecture as confirmed et al. 2011). Consensus support for individual clades was 149
Description: