RESEARCHARTICLE Integrative Genomics-Based Discovery of Novel Regulators of the Innate Antiviral Response RobinvanderLee1*,QianFeng2☯¤a,MartijnA.Langereis2☯,RobterHorst1, RadekSzklarczyk1¤b,MihaiG.Netea3,ArnoC.Andeweg4,FrankJ.M.vanKuppeveld2, MartijnA.Huynen1* 1 CentreforMolecularandBiomolecularInformatics,RadboudInstituteforMolecularLifeSciences, Radbouduniversitymedicalcenter,Nijmegen,TheNetherlands,2 VirologyDivision,Departmentof InfectiousDiseasesandImmunology,FacultyofVeterinaryMedicine,UniversityofUtrecht,Utrecht,The Netherlands,3 DepartmentofInternalMedicineandRadboudCenterforInfectiousDiseases,Radboud universitymedicalcenter,Nijmegen,TheNetherlands,4 DepartmentofViroscience,ErasmusMedical Center,Rotterdam,TheNetherlands ☯Theseauthorscontributedequallytothiswork. ¤a Currentaddress:InstituteofBiochemistry,ETHZurich,Zurich,Switzerland ¤b Currentaddress:DepartmentofClinicalGenetics,UnitClinicalGenomics,MaastrichtUniversityMedical OPENACCESS Centre,Maastricht,TheNetherlands *[email protected](RvdL);[email protected](MAH) Citation:vanderLeeR,FengQ,LangereisMA,ter HorstR,SzklarczykR,NeteaMG,etal.(2015) IntegrativeGenomics-BasedDiscoveryofNovel Abstract RegulatorsoftheInnateAntiviralResponse.PLoS ComputBiol11(10):e1004553.doi:10.1371/journal. pcbi.1004553 TheRIG-I-likereceptor(RLR)pathwayisessentialfordetectingcytosolicviralRNAtotrig- Editor:BjoernPeters,LaJollaInstituteforAllergy gertheproductionoftypeIinterferons(IFNα/β)thatinitiateaninnateantiviralresponse. andImmunology,UNITEDSTATES Throughsystematicassessmentofawidevarietyofgenomicsdata,wediscovered10 Received:September3,2015 molecularsignaturesofknownRLRpathwaycomponentsthatcollectivelypredictnovel members.WedemonstratethatRLRpathwaygenes,amongothers,tendtoevolverapidly, Accepted:September12,2015 interactwithviralproteins,containalimitedsetofproteindomains,areregulatedbyspecific Published:October20,2015 transcriptionfactors,andformatightlyconnectedinteractionnetwork.UsingaBayesian Copyright:©2015vanderLeeetal.Thisisanopen approachtointegratethesesignatures,weproposelikelynovelRLRregulators.RNAi accessarticledistributedunderthetermsofthe knockdownexperimentsrevealedahighpredictionaccuracy,identifying94genesamong CreativeCommonsAttributionLicense,whichpermits unrestricteduse,distribution,andreproductioninany 187candidatestested(~50%)thataffectedviralRNA-inducedproductionofIFNβ.Thedis- medium,providedtheoriginalauthorandsourceare coveredantiviralregulatorsmayparticipateinawiderangeofprocessesthathighlightthe credited. complexityofantiviraldefense(e.g.MAP3K11,CDK11B,PSMA3,TRIM14,HSPA9B, DataAvailabilityStatement:Allrelevantdata CDC37,NUP98,G3BP1),andincludeuncharacterizedfactors(DDX17,C6orf58,C16orf57, supportingtheresultsofthispaperareavailablein PKN2,SNW1).OurvalidatedRLRpathwaylist(http://rlr.cmbi.umcn.nl/),obtainedusinga theSupplementalTablesandathttp://rlr.cmbi.umcn. combinationofintegrativegenomicsandexperiments,isanewresourceforinnateantiviral nl/. immunityresearch. Funding:RvdL,QF,ACAandMAHweresupported bytheVirgoconsortium,fundedbytheDutch government(FES0908),andbytheNetherlands GenomicsInitiative(050-060-452).QFandMALwere fundedbypersonalgrantsfromtheNetherlands OrganizationforScientificResearch(NWO- 017.006.043andNWO-863.13.008,respectively).RS issupportedbytheMetakidsFoundation.MGNwas PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 1/35 IntegrativeGenomicsoftheRLRPathway supportedbyanERCConsolidatorGrant(#310372). FJMvKwassupportedbyaECHOgrantfromthe AuthorSummary NetherlandsOrganizationforScientificResearch Virusesposeacontinuousthreattohumanhealth,eventhoughourimmunesystemshave (NWO-CW-700.59.007).Thefundershadnorolein studydesign,datacollectionandanalysis,decisionto evolvedtoneutralizeinvadingviruses.Aspartoftheinnateimmunesystem,theRIG-I- publish,orpreparationofthemanuscript. likereceptors(RLRs)areessentialfordetectingvirusesduringinfection.Recognitionof viralRNAbytheRLRstriggersanantiviralresponsethatinhibitsviralreplication,protects CompetingInterests:Theauthorshavedeclared thatnocompetinginterestsexist. uninfectedcells,andattractsspecializedimmunecells.Betterunderstandingoftheinnate antiviralresponsemayrevealnoveltargetsforantiviraltherapeuticsandvaccinedevelop- ment.However,thatrequiresknowledgeaboutwhichgenesandproteinsareinvolved.In thepresentstudy,wesystematicallyinvestigatedthewealthofavailablegenomicsdata (includinggeneexpression,proteininteractions,transcriptionregulationandgenome sequences)anddiscoverednolessthan10distinctivepropertiesofgenesknowntobepart oftheantiviralRLRpathway.Bycombiningthesepropertiesinastatisticalframework,we predicted187novelRLRpathwaycomponents.Ourvalidationexperimentsshowedthat ~50%ofthepredictedcandidategeneshaveasignificanteffectonantiviralsignaling. Theseresults,togetherwithindependentcomputationalandliterature-basedconfirma- tion,demonstratedthevalidityofourcombinedbioinformaticsandexperimental approach.Ourstudyexpandsthecollectionofknownantiviralgenes,openingupnewave- nuesforresearchintoinnateantiviralimmunity. Introduction Virusesareamajorcauseofhumandisease,ashighlightedbythepandemicsofinfluenza viruses,HIV–1,andthecurrentoutbreakoftheEbolavirus.Patternrecognitionreceptors (PRR)areamongthefirstmoleculesthatdetectvirusesduringinfection.TheRIG-I-likerecep- tors(RLRs,oneclassofPRRs)arepartoftheRLRpathway,whichformsacrucialinnateanti- viraldefensesystem[1,2].TwoRLRs,RIG-IandMDA5,resideinthecytosolwherethey recognizenon-self5’-triphosphateRNAmoleculeswithshortdouble-strandedregionsand longdouble-strandedRNAs(dsRNA),respectively[3].Activationofthereceptorstriggersa complexsignalingnetwork,keystepsofwhicharetheactivationofthemitochondrialadapter MAVS,subsequentrecruitmentoftheTBK1andIKKcomplexes,phosphorylation/activation ofIRF3andNFκB,andtranslocationofthesetranscriptionfactorstothenucleus.Thesesteps ultimatelyleadtotheproductionoftypeIinterferons(IFNα/β)andproinflammatorycyto- kines,whicharecrucialforestablishinganantiviralstateininfectedaswellasneighboring cells,andalsomodulatetheadaptiveimmuneresponse[4] TheimportanceoftheRLRsystemisfurtherdemonstratedbytheobservationthatviruses ofalltypesemploystrategiestointerferewithitsactivation,oftenatmultiplesteps[5,6].Better understandingofviralinteractionwiththepathwayhasresultedinnoveltargetsforthedevel- opmentofantiviraltherapeuticsandattenuatedlivevaccines,forexampleviruseslackingfunc- tionalRLRantagonists[7].Furthermore,mutationsinRIG-I,MDA5,MAVSandotherRLR pathwaycomponentsareassociatednotonlywithstrongsusceptibilitytoinfections,butalso IFN-associatedautoimmunedisorders[8–10]. Previousstudiesintovirus-hostinteractionsandtheinnateantiviralpathwayshaveused genomicsapproaches,oftengeneratinglargedatasetsdescribingphysicalorgeneticinterac- tions[11–14].Otherpublicationshavetakenacomparativeapproachbasedonmodelorgan- isms[15]orusedover-expressionscreeningsystems[16,17].Together,thesestudieshave identifiednumerousgeneswithantiviralactivity,includingmembersoftheRLRpathway. However,itremainsimportanttosystematicallyassessthequalityofindividualdatasetsas PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 2/35 IntegrativeGenomicsoftheRLRPathway suchscreensreportdistinctsetsofgenes,oftenwithlimitedoverlapbetweenthem.Combining themanyavailablegenomicsdatasetsinastatisticalframeworkpotentiallyallowsforamore systematicdiscoveryandcategorizationofgenesinvolvedintheRLRpathway.Indeed,Bayes- ianintegrationoflarge-scaledatathatincludesweighingindividualdatasetsfortheirpredictive potentialhasbeensuccessfulinothercellularsystems,forexampleidentifyingnovelprotein interactions[18],mitochondrialdiseasegenes[19],andsmallRNApathwaygenes[20]. Inthisworkwesystematicallyexploitthewealthofavailable(gen)omicsdata,including transcriptomicsandproteomicsdata,genomesequences,proteindomaininformation,and functionalgenomics,todiscoverdescriptivemolecularsignaturesoftheRLRpathwaysystem. Bayesianintegrationofthesedata,togetherwithcomprehensivecomputationalandexperi- mentalvalidation,confidentlyidentifiesnovelgenesinvolvedinantiviralRIG-Isignaling. Results TheRIG-I-likereceptor(RLR)pathwayisahighlyinterconnectedanddiversemolecularsys- tem.Weinvestigatedwhetheravailablegenomicsdatacontainsufficientsignaltoaccurately describeRLRpathwaycomponents,andwhethersuchdatacouldbeusedtoprioritizenovel genesforapossibleroleinRLRsignaling. TenmolecularsignaturesofRLRpathwaycomponentsingenomics data TodiscovermolecularsignaturesthatdistinguishRLRpathwaycomponentsfromothergenes, weexploredawidevarietyofgenome-scaledatadescribingdifferentaspectsofthevirology andbiologyofthepathway.Someofthesedataweuseddirectly,whileotherdatawereusedas thebasisforfurthercalculations(Table1).Wequantitativelyassessedthepredictivepowerof eachdatasetusingaliterature-curatedstandardof49knownRLRpathwaycomponentsfrom InnateDB[21](‘RLRgenes’,S1Fig)andasetof5,818‘non-RLRgenes’thatareunlikelytobe partofthepathway(i.e.geneswithknownfunctionsnotdirectlyrelatedtotheinnateantiviral response,suchasdevelopment,housekeepingandneurologicalprocesses,seeMethods).Below wedescribe10signaturesforpredictingnovelRLRpathwaycomponents.Thefirstfivesigna- turesarebasedontherelationshipofRLRgeneswithviruses,whereasthesecondsetoffivesig- naturesarebasedonpropertiesoftheRLRpathwayitself. Virus-basedsignatures Positiveselectioninprimates. Virusesevaderecognitionorinterferewiththeimmune systemsoftheirhoststoachievesuccessfulinfection[7].Thisinvolvesinteractionsbetween viralandhostproteins,theinterfacesofwhichareunderconstantpressuretochange[22].We collecteddataonrecurrentpositiveselectionintheprimatelineage,basedonmaximumlikeli- hoodanalysisofsequencealignments[23].RLRpathwaycomponents,e.g.themitochondrial signalingadapterMAVS[24]andtranscriptionfactorIRF7[25],areenrichedforrapidly evolvinggenes(9%of49RLRgenes)comparedtogenesthatareunlikelytobepartofthepath- way(5%of5,818non-RLRgenes,1.7-foldenrichment,non-significantP=0.38,one-tailed Fisher’sexacttest,Fig1A,Tables1andS1). Protein-proteininteractions(PPI)withviruses. Thenextsignatureisbasedonthephys- icalinteractionsbetweenhostRLRpathwaycomponentsandviruses.Viralproteinsoften interactwithmanyhostproteinsduringtheirinfectioncycle,includingthoseinvolvedinanti- viraldefense[26,27].Extractionofvirus-humanPPIsfromspecializeddatabases[28]revealed ~2,600humanproteinsthatarereportedtointeractwithatleastoneviralprotein(virus-inter- actinghumanproteins,S2Fig).Thesevirus-interactinghumanproteinsincludethemajority PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 3/35 IntegrativeGenomicsoftheRLRPathway Table1. TenmolecularsignaturesfromgenomicsdatausedforpredictingnovelRLRpathwaycomponents. Group Molecular Datasetdescription Typea References Numberof Likelihoodratio signature genesb scoreb,c Virus-based Positiveselectionin Rapidlyevolvinggenesintheprimateslineage, d [23] 926 1.7 primates detectedusingmaximumlikelihoodanalysisof nucleotidealignments PPIwithviruses Virus-interactinghumanproteinsextractedfromPPI c [28] 2,587 4.2 databases ViralmiRNAtarget Likelihoodscoresoftargetingofhumantranscriptsby c [30] 6,761 1.3 viralmiRNAs,basedonpredictedtargetsites Differential Genesshowingdifferentialexpressioninlung c see 1,680 3.5 expressionupon epithelialcellsinfectedwithfourrespiratoryviruses Methods infection Antiviralhostfactor Metaanalysisofgeneswithantiviralactivityfrom c seeS3 173 2.1 sevenRNAiscreensstudyingavarietyofviruses Table Pathway- Co-expressionwith Weightedco-expressionwithknownRLRgenes c [32] 4,149 2.4 based RLRpathway across>450humangeneexpressionstudies RLRpathwayprotein Proteinscontainingoneofthe25domainsthatare c [66] 711 8.9 domain significantlyenrichedinknownRLRproteins InnateantiviralTF GeneswithIRF,AP–1,NFκB,orSTATTFbinding c [33] 4,508 2.3 bindingmotifs motifsintheirpromoters,basedonconservation across29mammals NFκBactivation Hitsfromagenome-widesiRNAscreenofEpstein- d [34] 154 19.8 mediator Barrvirus-inducedNFκBactivation RLRpathwayPPI ProteinsthatinteractwithknownRLRproteins, c [35] 1,750 4.3 calculatedfromPPIdata Integration RLRscore Bayesianintegrationofthe10molecularsignaturestopredictnovelRLRpathwaycomponents aDatauseddirectly(d)orasbasisforfurthercalculation(c) bCombinationofallbinswithpositivelikelihoodratioscoresperfeature,derivedfromFig1A cRLRgenesversusnon-RLRgenes:P(D |RLRgenes)/P(D |non-RLRgenes),seeMethods. i i Notethat,toavoidcircularity,thepredictiveabilityoftheco-expression,proteindomainandRLRpathwayPPIdatasetswasassessedusingthesetof TLR,CLR,NLR,cytDNAgenesinsteadoftheRLRgenes(seeMethods). SeealsoFig1AandS1Table. doi:10.1371/journal.pcbi.1004553.t001 ofRLRpathwaycomponents(35/49=71%),whiletheyincludeasignificantlysmallerfraction ofnon-RLRgenes(1,000/5,818=17%,4.2-foldenrichment,P=1.4×10−16,one-tailedFisher’s exacttest,Fig1AandTable1).AmongtheRLRgenes,TRAF2(4PPIs),DDX3,MAPK9,and theNFκBsubunitRELA(3PPIseach)havereportedinteractionswiththelargestnumberof distinctvirusspecies. ViralmiRNAtarget. Anothermechanismthatvirusesusetointerferewiththeantiviral activityofhumancellsisdown-regulationofgeneexpressionbymiRNAs[29].Wecollected 128miRNAsencodedbynine,mainlyherpesDNAviruses(S2Table)[30],mostofwhich haveconfirmedphysiologicalrelevance.PredictedtargetsitesofthesemiRNAstothe3’UTRof humantranscriptswerethenusedtocalculateforeachgeneascorerepresentingthelikelihood thatvirusesaffectitsexpression(viralmiRNAtargetingscore).Forexample,ourmethod assignedIKBKE(IKKε)arelativelystrongviralmiRNAtargetingscoreof2.7.Indeed,Kaposi’s sarcoma-associatedherpesvirusmiR-K12-11hasbeenshowntoinhibittranslationofIKKε transcripts,leadingtosuppressionofinterferonsignaling[31].AnalysisoftheviralmiRNA targetingscoresrevealedthatRLRgenestendtohavestrongerscoresthannon-RLRgenes (P=0.03,one-tailedMann-WhitneyUtest).Althoughthestatisticalsignificanceofthistrend isonlymarginal,weincludeditasamolecularsignatureofRLRgenesasitstillprovidesa PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 4/35 IntegrativeGenomicsoftheRLRPathway Fig1.BayesianintegrationoftenmolecularsignaturesofRLRpathwaycomponentsfromgenomicsdata.(A)Distributionsofthe49knownRLR pathwaycomponents(RLRgenes,green)and5,818genesunlikelytobepartofthepathway(non-RLRgenes,red)acrossthe10molecularsignaturedata setsweidentifiedaspredictiveoftheRLRsystem(seealsoTable1).Datasetswerebinnedintodiscreteintervalsandfractionsof(non-)RLRgenesaddup toone.ArrowsindicatethebehaviorofRIG-Iacrossthedata.ThetopfivesignaturesdescribetherelationshipofRLRsignalingwithviruses;thebottomfive describepropertiesofthepathwayitself.(B)Boxplotsofthegenome-wideintegratedRLRscore(Bayesianposteriorprobabilityscore).Genesweregrouped intooneoffiveclasses:knownRLRgenes(green,see[A]),componentsofotherPRRsignalingpathways(‘TLR,CLR,NLR,cytDNA’;purple),genes functioninginotheraspectsoftheinnateimmuneresponse(‘innateimmunity’;blue),andnon-RLRgenes(red,see[A]).Theremaininggenesareclassified as‘other’(gray).(C)The50geneswiththehighestRLRscores.RepresentativeRLRandotherinnateantiviralresponsegenesareindicated.Thepiechart showstheoccurrencesofthedifferentgeneclassesinthetop354RLRranks.(D)Receiveroperatingcharacteristic(ROC)curveillustratingtheperformance oftheintegratedRLRscore(solidblackline)andtheindividualmolecularsignatures(blackdots)forpredictingknownRLRversusnon-RLRgenes. Sensitivityandspecificitywerecalculatedatvariousscorethresholds(fortheRLRscore),oratspecificthresholdsthatincludeallbinswithpositivelikelihood ratioscores(fortheindividualdatasets;see(A)).Theasteriskdenotesthesensitivityandspecificitycorrespondingtoafalsediscoveryrate(FDR)of57% (top354genes).Notethat,toavoidcircularity,thepredictiveabilityoftheco-expression,proteindomainandRLRpathwayPPIdatasetsin(A)and(D)was assessedusingthesetofTLR,CLR,NLR,cytDNAgenesinsteadoftheRLRgenes(seeMethods). doi:10.1371/journal.pcbi.1004553.g001 PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 5/35 IntegrativeGenomicsoftheRLRPathway moderateenrichmentovernon-RLRgenes(1.3-foldenrichmentamonggeneswiththestron- gestviralmiRNAtargetingscores,Table1),andevenweakfeaturescansubstantiallyimprove thepredictionsfornovelRLRgenes. Differentialexpressionuponinfection. Next,weaskedwhetherRLRpathwaygenesare differentiallyexpresseduponvirusinfection.Toanswerthis,weusedin-housegeneexpression dataofhumanlungepithelialcells(A549)exposedtofourrespiratoryviruses(respiratorysyn- cytialvirus,humanmetapneumovirus,parainfluenzavirus,ormeaslesvirus),forwhichgene expressionwasmeasuredat6,12and24hoursafterinfection(seeMethods).Analysisofthe transcriptomesrevealedthatmanyRLRgenes(31%)underwentsubstantialexpressionchanges (log foldchange>0.5)incellsinfectedwiththerespiratoryviruses,comparedtotheunin- 2 fectedcells.Thiscomparestoamuchsmallerfractionofnon-RLRgenes(9%,3.5-foldenrich- ment,P=1.3×10−5,one-tailedFisher’sexacttest,Fig1AandTable1).Thedifferentially expressedRLRgenesincludewell-knowninterferon-stimulatedgenes(ISGs)likeISG15, DDX58(RIG-I),IRF7,IFIH1(MDA5),andTRIM25,whicharethetopfiveRLRgenesmost inducedbytherespiratoryvirusesstudied(log foldchange>1.5comparedtouninfectedcells, 2 S3Fig).Thus,eventhoughweexpectmanyRLRgenestoalreadybeexpressedbeforeviral infection,theirexpressionlevelsarereinforcedininfectedcells. Antiviralhostfactor. RNAiscreeningpotentiallyallowstheidentificationofhostfactors thatlimitvirusreplication,suchasgenesinvolvedintheinnateantiviralresponse,although moststudiesfocusonhitswiththeoppositeeffect(i.e.factorsrequiredbyvirusesforreplica- tion)[11].Weperformedametaanalysisofhitsfromsevenlarge-scaleRNAistudiesinhuman cells,identifying173geneswithantiviralactivityagainstHIV–1,influenza,hepatitisC(HCV), WestNile,orenterovirusinfection(S3Table).IncontrasttoourexpectationthatRLRgenes wouldbecommonamongtheseantiviralhostfactors,thisdatasetisoneoftheweakerpredic- torsofRLRgenes:the173antiviralhostfactorscontainonlyasinglegene(IRF3)thatbelongs tothesetof49knownRLRgenes(~2%)comparedto56of5,818non-RLRgenes(~1%, 2.1-foldenrichment,non-significantP=0.38,one-tailedFisher’sexacttest,Fig1Aand Table1). Pathway-basedsignatures Co-expressionwithRLRpathway. ToaidinfindingnovelRLRgenes,wescreened>450 humanexpressionstudiesforgenesthatco-expresswithknownRLRpathwaycomponents usingatwo-stepapproach[32].First,weweighedindividualexpressiondatasetsfortheir propensitytopredictnewRLRgenes:experimentsinwhichthewholegroupofknownRLR genesshowhighco-expressionwitheachotherreceivedahigherweightandcontributedmost tothecalculations.Inthesecondstep,wecalculatedtheco-expressionofallgeneswiththe RLRgenes.Asexpected,RLRgenesdisplaysignificantlyhigherco-expressionscoreswitheach otherthanwiththerestofthegenomeorwithnon-RLRgenes(P(cid:1)10−27forbothcompari- sons,one-tailedMann-WhitneyUtest,S4Fig).However,RLRgenesalsoscorehigher thancomponentsofotherPRRsignalingpathways(Toll-likereceptor[TLR],C-typelectin receptor[CLR],NOD-likereceptor[NLR],andcytosolicdsDNAsensing[cytDNA]pathways; P=4.5×10−13,one-tailedMann-WhitneyUtest).Cross-validationbyleave-one-outanalysis confirmsthattheweightedco-expressionapproachretrievesRLRgenesmorereadilythan otherPRRpathwaygenes,orgenesinvolvedinotheraspectsofinnateimmunity(S4DFig), demonstratingspecificityforidentifyingRLRgenesintheco-expressiondata. RLRpathwayproteindomain. AnalysisofRLRpathwayproteinsequencesrevealedthe presenceof40uniquedomains,25ofwhichweresignificantlyover-representedcomparedto thefullhumanproteome(Benjamini-Hochberg-correctedFisher’sexactP<0.01,S4Table). PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 6/35 IntegrativeGenomicsoftheRLRPathway Theseincludeproteinkinasedomains(12-foldenrichment,P=1.1×10−8;presentinIKKα/β/ ε,MAPkinases,TBK1),theTBK1/IKKibindingdomain(TANK,TBK1BP1,AZI2),caspase anddeathdomains(CASP8/10,FADD),IRFdomains(IRF3/7),andtheDExD/HboxRNA helicasedomain(15-foldenrichment,P=2.7×10−3;RIG-I,MDA5,LGP2).Wethenassessed thedomainorganizationsofallhumanproteinsanddeterminedasetof711proteinscontain- ingoneormoreofthedomainsenrichedinRLRcomponents.Theseproteinsarepredictivefor RLRcomponentswithanenrichmentscoreof8.9(Table1). Innateantiviraltranscriptionfactor(TF)bindingmotifs. SignalingthroughtheRLR pathwaytriggerstheactivationofkeytranscriptionfactors(TFs)likeIRF3,IRF7,AP–1and NFκB.ActivationoftheseTFsleadstotheproductionoftypeIinterferonsandproinflamma- torycytokinesthateventuallyactivateSTAT1andSTAT2[2].STAT1andSTAT2inturnstim- ulatetranscriptionofinterferon-stimulatedgenes(ISGs),whichincludemanyRLRpathway components.TofurtherexplorethetranscriptionregulationofRLRpathwaycomponents,we analyzedtheirgenepromotersforthepresenceofTFbindingmotifsthatarehighlyconserved acrossthegenomesof29placentalmammals,suchasprimates,rodentsandmanyfarmani- mals[33].ConservedIRFandNFκBmotifsarehighlyabundantinthepromotersofRLR genes(Fisher’sexactP=3.3×10−3andP=2.0×10−4,respectively;S5Table),suggestingthe pathwayispartlyself-regulatingashasbeenobservedforindividualcomponents.Interestingly, aconservedIRFmotifwasdetectednotonlyinthepromotersofIRF7itselfandinallthree RIG-I-likereceptorfamilymembers(DDX58[RIG-I],IFIH1[MDA5],DHX58[LGP2]),but alsoinTRIM25,ISG15,andCYLD;threefactorscontrollingRIG-Isignalingactivationbyregu- latingthelevelofK63polyubiquitination.InordertopredictnovelRLRcomponents,we searchedforgenescontainingconservedIRFbindingmotifs(severalmotifvariants,collectively recognizedbyIRF1-9),STATbindingmotifs(severalmotifvariants,collectivelyrecognizedby STAT1-6),AP–1bindingmotifs,orNFκBbindingmotifs(Fig1A).Wefound3,558genes acrossthehumangenomecontainingoneofthesemotifsintheirpromoters.Thislargenum- berpartiallystemsfromsimilaritiesintheDNAbindingpreferencesofTFsthatbelongtothe samefamily,butdoesnotmeanthatallidentifiedgenesareregulatedbytheRLRpathway.For example,STATmotifsnotonlyoccurinthepromotersofISGs,butalsoingenesinvolvedin cellularproliferation,differentiationandapoptosis.Nevertheless,genescontainingoneofthe fourconservedTFmotifsalreadyshowagoodpredictivevalueforRLRpathwaycomponents (enrichmentscoreof2.2,S1Table).Geneswithmorethanonemotifareevenmorelikelytobe RLRgenes:789genescontaintwomotifs(2.6-foldenrichment)and161genescontainthreeor allfourmotifs(2.4-foldenrichment). NFκBactivationmediator. HostfactorsthatregulateNFκBactivationoftenalsoaffect theRLRpathway.Indeed,the154hitsthatwerepickedupinagenome-widesiRNAscreenof Epstein-Barrvirus-inducedNFκBactivation[34]includeamuchlargerfractionofknownRLR genes(6/49=12%)thannon-RLRgenes(36/5,818<1%,20-foldenrichment,P=1.0×10−6, one-tailedFisher’sexacttest,Fig1AandTable1).Thus,these154NFκBactivationmediators arelikelytocontainnovelRLRpathwaycomponentsaswell. RLRpathwayPPI. Finally,tofindnovelRLRgenesweassessedthehumanproteininterac- tionnetworkconnectingtheRLRpathway.PPIdatabases[35]report3,504interactionsbetween 1,750uniqueproteinsand47ofthe49RLRcomponents,theonlyexceptionsbeingDAKand NLRX1.Ofthe47RLRproteinswithreportedPPIs,41areinvolvedinatotalof147interactions withinthepathway(i.e.betweentwopathwaymembers).ThisnetworkofRLRcomponentshas significantlymoreconnectionswitheachotherthandorandomnetworksofthesamesizeand interactiondegreedistribution(physicalinteractionenrichmentscore=3.6,P<1.0×10−6[36]). UsingthePPIdata,weobtainedforeachhumanproteinthenumberofinteractingRLRpathway components(Fig1A).Wefoundatotalof1,397proteinsreportedtointeractwithoneortwo PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 7/35 IntegrativeGenomicsoftheRLRPathway RLRproteins.TheseproteinsarepredictiveforRLRcomponentswithanenrichmentscoreof1.8 (S1Table).Afurther221proteinsinteractwiththreeoffourRLRproteins(6.3-foldenrichment) and132proteinsinteractwithfiveormoreRLRproteins(15.8-foldenrichment).TBK1(18inter- actions),TRAF2andMAVS(both16)topthelist,supportingtheirrolesascentralplayersinthe RLRsystem[12].Thus,anincreasingnumberofinteractionswithRLRproteinsindicatesa higherlikelihoodthataproteinispartoftheRLRpathway. Bayesianintegrationofmolecularsignaturesprovidesgenome-wide probabilitiesforRLRpathwaycomponents TheRLRpathwaycomponentspublishedthusfarprobablyconstituteonlypartofthetotal proteinswithafunctioninthispathway.Toprioritizenovelhigh-confidencegenesforarolein theRLRpathway,weintegratedthe10identifiedmolecularsignaturesofRLRgenesinanaive Bayesianclassifier[18,19](seeMethods).Thisapproachweighsdatasetsbasedontheirpre- dictivevalue(i.e.theirabilitytoseparateknownpositivesandnegatives;Fig1A,Tables1and S1)sothat‘better’datacontributemoretothepredictions.Eachhumangenereceivedaposte- riorprobabilityscore(‘RLRscore’)reflectingthelikelihoodthatthegeneispartoftheRLR pathwaybasedonitsbehaviorinthecollectedgenomicsdata.Ascoreofzeroindicatesequal probabilitiesofagenebeinganRLRversusanon-RLRgene.S6Tablepresentsthegenome- widerankingofRLRscores(alsoavailableathttp://rlr.cmbi.umcn.nl/). Asexpected,knownRLRpathwaycomponentshavethehighestRLRscores(Figs1Band S5).Two-thirds(32/49)oftheserankwithinthefirst150genes.Thetoprankinggenesare IRF7,RIG-I,IKKε,subunitsofNFκB,TRADD,TRAF2,MDA5,andIKKγ(NEMO)(Fig1C). Otherexamplesofwell-describedRLRpathwaycomponentsincludeIRF3(rank51),ISG15 (52),MAVS(102),andLGP2(114).Genesthatareunlikelytoplayaroleinthepathway(the setofnon-RLRgenes)generallyhaveverylowRLRscores,althoughsomeofthesereceived highscoresaswell(Fig1B).Thisisnotunexpected,aseventhoughthislargesetofgeneswas selectedfromfunctionannotationsgenerallyunrelatedtotheinnateantiviralresponse,this doesnotprecludethatindividualgenes(also)functionintheRLRpathway. TogaininsightintowhatkindofgenesarepresentamongtheRLRpredictions,weexamined theirfunctionsbypathwayandgeneontologyenrichmentanalyses.Thetop354geneswiththe highestRLRscores(correspondingtohigh-confidencepredictions,seebelow)havestronglinks withotherpathwaysoftheinnateimmuneresponse,suchasTLR,NLR,interferon,andcytokine signaling(S6andS7Figs).Antiviraldefensefunctionsarealsoamongthemostfrequentandsig- nificanttermsassociatedwiththehigh-scoringgenes(S8FigandS7Table).Otherimportant biologicalprocessesincludevariousapoptosis-relatedfunctions,cancerandcellcyclepathways, andregulationofmetabolicprocessesandproteinlocalization.Furthermore,thetoppredictions includeawiderangeofproteinfamilies,notablyproteasomesubunits,ubiquitin(-like)conjugat- ingenzymes,andgenesinvolvedinphosphatidylinositolsignaling(whichwasrecentlyshownto affectthetypeIIFNresponse[14]).Finally,22%ofthetoppredictionsareinducedincells treatedwithinterferons(i.e.theyareinterferon-stimulatedgenes,ISGs)and~18%arepartofthe commonhosttranscriptionresponsetopathogens(Table2).Together,theseobservationsindi- catethatourframeworksuccessfullypredictsgeneswithalikelyroleintheinnateantiviral responseandsuggestsothercellularsystemsandfunctionsrequiredforthisresponse. Performanceestimatesandindependentdataestablishthereliabilityof theRLRscore WefurthercomputationallyassessedthereliabilityoftheintegratedRLRscorebyestimating thesensitivity,specificityandfalsediscoveryrate(FDR)ofthepredictionsusingthepositive PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 8/35 IntegrativeGenomicsoftheRLRPathway Table2. Overlapbetweeninnate(antiviral)responsedatasetsandthetop354RLRpredictionsexcludingknownRLRgenes. Dataset References Numberofgenes Fraction(number)ofdatasetgenes One-tailedFisher’s indataset intop354predictions exactP Interferon-stimulatedgenes(ISGs) [16] 354 22.0%(78) 1.2×10−67 ISGswithvalidatedantiviralactivity [16,91] 45 42.2%(19) 4.9×10−23 Commonhosttranscriptionresponseto [92] 496 17.7%(88) 5.2×10−68 pathogens InteractorsofthetypeIIFNproteinnetworkduring [12] 241 11.2%(27) 1.2×10−15 patternrecognition(HCIP)a HCIPwithconfirmedeffectsonIFNβexpression [12] 22 22.7%(5) 1.9×10−5 andantiviralactivity Tripartitemotif(TRIM)familygenes [17] 71 12.7%(9) 1.6×10−6 TRIMsthatenhanceRIG-I-inducedactivationof [17] 34 14.7%(5) 1.7×10−4 IFNβ,NFκBandISREpromoters Humaninteractorsofinnateimmune-modulating [37] 569 6.7%(38) 4.7×10−14 viralORFsb GenesexpressedinPBMCsstimulatedwith [38] 89 43.8%(39) 4.7×10−47 Candida(CRG) CRGwithalteredexpressioninCMCpatients [38] 23 65.2%(15) 2.6×10−22 TypeIIFNresponsemediators [14] 226 4.0%(9) 9.5×10−3 aThesePPIswerenotpartoftheRLRinteractionnetworkusedfortheRLRpredictions(i.e.forthe‘RLRpathwayPPI’signature) bTheseinteractionswerenotusedtodeterminethevirus-interactinghumanproteinsusedfortheRLRpredictions(i.e.forthe‘PPIwithviruses’signature) doi:10.1371/journal.pcbi.1004553.t002 (RLRgenes)andnegative(non-RLRgenes)standards.Integrationofthedatasetsachievedbet- tersensitivityandspecificitythananyoftheindividualdatasets(Fig1D),therebyenriching forRLRgenesanddepletingfalsepositives(S5,S9andS10Figs).AtanRLRrankthresholdof 354(RLRscore-1.10),theframeworkcorrectlypredicts78%oftheknownRLRgeneswitha specificityof98.4%(Fig1D).Atthisthreshold,only~57%ofthenovelpredictionsareesti- matedtobefalse(S11Fig,adjustedFDRtomatchtheexpectedtotalnumberofgenesinvolved intheRLRpathway,seeMethods).Thiscomparestoagenome-widefalsediscoveryrate (i.e.whenpredictinggenesrandomly)of~99%.Thus,theintegratedRLRscoreincreasesthe probabilityofcorrectlyidentifyingnovelRLRgenesbyafactorof43comparedtorandom classification. BecauseweusedthesamegenesetsforcalculatingtheRLRscoresandestimatingtheper- formanceoftheresultingpredictions(i.e.withoutsystematiccross-validation),thereexistsa dangerofcircularreasoning.Therefore,wealsocarefullyvalidatedthequalityoftheresults usingvariousindependentandexternaldatasets.First,weexaminedthehighRLRscoresfor genesthathaveaknownfunctionininnateimmunity,butnotintheRLRpathway,andthere- forewerenotpartofourtrainingset.ComponentsofotherPRRsignalingpathways(TLR, CLR,NLR,cytDNA)havelowerscoresthanRLRgenes,butmuchhigherscoresthantherest ofthegenome(Fig1B).Thesameistrueforgenesfunctioninginotheraspectsoftheinnate immuneresponse(Fig1B).Ofthe225novelpredictions(i.e.thosegenesthatarenotpartof thetrainingsets)inthetop354(FDRof57%,seeabove),142(~63%)arepartoftheseinnate immunitygenelists(Fig1C).Thus,themajorityofhigh-scoringgeneswithnoknownlinkto theRLRpathwayinfacthaveafunctioninotherPRRpathwaysorotherpartsofinnateimmu- nity,supportingtherelevanceofourpredictions. Second,wecomparedourpredictionstosixrecentdatasetsthatarerelevanttotheinnate (antiviral)responsebutthatwereinnowaypartoftheRLRscorecalculations.Theoverlap withthe354topgenes,excludingknownRLRgenes,issignificantlylargerthanexpectedby PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 9/35 IntegrativeGenomicsoftheRLRPathway chanceforallthesedatasets(Table2).Forexample,thetoppredictionsinclude:(i)19of45 (42%)interferon-stimulatedgeneswithvalidatedantiviralactivityagainste.g.HIV–1,HCV, yellowfever,WestNileorchikungunyavirus[16],(ii)27proteinsfromasetof241(11%)that interactwiththetypeIIFNproteinnetworkduringpatternrecognition,amongwhicharefive confirmedmodulatorsofIFNβexpressionandantiviralactivity[12],(iii)ninetripartitemotif (TRIM)familygenes,fiveofwhichenhanceRIG-I-inducedactivationofIFNβ,NFκBand ISRE(IFN-stimulatedresponseelement)promoters[17],and(iv)38humanproteinsinteract- ingwithinnateimmune-modulatingviralopenreadingframes(viORFs)from30viruses[37]. (v)Furthermore,thetypeIIFNresponsehasrecentlybeenproposedtoplayaroleinantifun- galimmunity[38,39]andthetopRLRpredictionsarestronglyenrichedforgenesexpressedin PBMCsstimulatedwiththefungalpathogenCandidaalbicans:almosthalf(39/89=44%) oftheseoccurinourtoppredictions(P=4.7×10−47,one-tailedFisher’sexacttest,Table2). (vi)Finally,theoverlapbetweenourpredictionsandagenome-widescreenforregulatorsof RIG-I-mediatedIFNβproductionis,atonlynine,marginalbutsignificant(9/226genes=4%, P=9.5×10−3,one-tailedFisher’sexacttest,Table2)[14].Insummary,thesediverseandinde- pendentexperimentaldatasupportthevalidityofourintegratedRLRscoreforpredicting geneswitharoleintheinnateantiviralresponse. RNAivalidationscreensconfirmthehighpredictivevalueofthe integratedRLRscore Tofurtherdeterminethepredictivepowerofourinsilicopredictions,weselected187candi- dateRLRgenesforexperimentalvalidation(S6andS8Tables).Theseinclude127high-confi- dencecandidatesfromthetop354,whichhavenotbeenpreviouslylinkedtotheRLRpathway, supplementedwith60candidatesweselectedfromthetop1000predictions,mainlyonthe basisoflimitedfunctionalcharacterizationingeneral(Fig2A).Importantly,candidateswitha knownroleinRLRsignaling,otherbranchesofPRRpathways,orapoptosiswereexcludedas weweremostinterestedinfindingnovelcomponentsoftheRLRpathway. Fortheselectedcandidatesweperformedamedium-throughputRNAiscreen(RNAiscreen 1)usingHeLacellsstablyexpressinganIFNβpromoter-controlledfireflyluciferasereporter (HeLa-IFNβ-Fluc).ToactivatetheRLRpathwayandinduceFlucreporterexpressionweused aknownsmall5’-ppp-containingRIG-Iligand[40].Thissetupledtospecificactivationof RIG-I,asRIG-IorMAVSsiRNAtransfection,butnotMDA5orscrambledsiRNAs,resulted inlossofreporteractivity(Figs2B,S12andS13).Allnegativecontrols(non-transfected, scrambledandMDA5siRNAs)scoredwithin1.25medianabsolutedeviationsoftheplatenor- malizedIFNβinductionlevels(Z-scorecutoff<-1.25or>1.25,Fig2B).Atthiscutoff,siRNA knockdownof94candidates(50%ofallcandidatestested)affectedRIG-I-mediatedIFNβ induction(Figs2AandS13A–S13DandS8Table).Amongthese,knockdownof59genes decreasedRIG-I-mediatedIFNβinduction(down-hits)and35genesincreasedIFNβinduction (up-hits).ItisimportanttonotethattheexperimentalapproachonlyactivatestheRIG-I branchoftheRLRpathwayandwillnotconfirmpredictedRLRcandidatesthatregulate MDA5activationanddownstreamsignalingtoMAVS.Thus,amongthe93non-confirmed candidates,theremightstillbenovelregulatorsoftheMDA5-mediatedIFNβinductionpath- way,whichshouldbefurtherinvestigated.Altogether,theintegratedRLRscoreisclearlya strongandreliablepredictorfornovelregulatorsoftheRIG-Ipathway. Fromthe94confirmedhits,wepickedthe57tophitswiththelargesteffect(stringentZ- score<-2or>2)forasecondRNAiscreenusingadifferentsetofsiRNAs(RNAiscreen2, Fig2A).InthissecondRNAiscreen,onlyasingleup-hit(7%of15up-hitstested)showedaZ- score>1.25.Besidesthishit,twonegativecontrolwellsalsohadaZ-score>1.25(Figs2B, PLOSComputationalBiology|DOI:10.1371/journal.pcbi.1004553 October20,2015 10/35
Description: