Evolutionary Genomics of Nuclear Receptors: From Twenty-Five Ancestral Genes to Derived Endocrine Systems Ste´phanie Bertrand, Fre´de´ric G. Brunet, Hector Escriva, Gilles Parmentier, Vincent Laudet, and Marc Robinson-Rechavi1 *Laboratoire deBiologie Mole´culaire dela Cellule,Ecole NormaleSupe´rieure deLyon,Lyon,France Bilateriananimalsarenotablycharacterizedbycomplexendocrinesystems.Thereceptorsformanysteroids,retinoids, andotherhormonesbelongtothesuperfamilyofnuclearreceptors,whicharetranscriptionfactorsregulatingmanyaspects ofdevelopmentandhomeostasis.Despiteadiversityofregulatorymechanismsandphysiologicalroles,nuclearreceptors shareacommonproteinorganization.Toobtainthebroadpictureofbilateriannuclearhormonereceptorevolution,we have characterized the complete set of nuclear receptor genes from nine animal genome sequences and analyzed it in aphylogeneticframework.Inaddition,expressedsequencetagsfromkeylineageswithnoavailablegenomesequence werealsosearched.Thisallowsustodatetheevolutionaryeventsthatledfromanancestralnuclearreceptorgene,inan D o earlymetazoan,topresentdaydiversity.Weshowthattherewere;25nuclearreceptorgenesinUrbilateria,theancestor w n ofbilaterians,atwhichpointthefundamentaldiversityofthesubfamilywasalreadyestablished.Surprisingly,differential lo a genelossplayedanimportantroleintheevolutionofdifferentnuclearreceptorsetsinbilaterianlineages.Thenuclear d e receptordistributionwasalsoshapedbyperiodsofgeneduplication,essentiallyinvertebrates,aswellasalineage-specific d duplicationburstinnematodes.Ourresultsimplythatthegenesformajorreceptorssuchassteroidreceptorsorthyroid fro m hormonereceptorswerepresentinUrbilateria. h ttp s Introduction ://a c One of the striking features of bilaterian animal genomic sequences (Robinson-Rechavi et al. 2001a) and ad e evolution is the development of complex endocrine sys- also robust phylogenetic reconstruction at the scale of the m ic tems, which allow the organism to coordinate its reaction superfamily (Laudet et al. 1992; Escriva, Laudet, and .o u to the environment, to regulate its development, and to Robinson-Rechavi 2003). This phylogenetic reconstruc- p .c maintain homeostasis. Among the players in these com- tion shows that receptors for similar ligands do not group o m plexsystems,thesuperfamilyofnuclearreceptors(NRs)is in the tree, but are interspersed with receptors for totally /m specific to animals and performs an abundance of func- different ligands, while orphans are widely distributed in be tfiroonms,hfroommeoemstabsriysonoifcvdaervieoluospmphenytsitoolomgeictaaml ofurpnhcotisoinssantod tehveoltvreede. fTrohmis laend otorpthhaenhryepcoetphteosrisththatatatchqeuisruepdersfeavmeirlayl /article the control of metabolism (for a review, see Laudet and times independently the capacity to bind ligands (Escriva -a b Gronemeyer 2002). Nuclear receptors are ligand-activated et al. 1997; Laudet 1997; Escriva, Delaunay, and Laudet stra ttrhaunsscbriinpdtiomnajfoarcthoorsr.mMonaensy, smucehmabserssteorofidthse, tshuypreorifdamhoilry- 2ac0i0d0s),.wTohuulds,htahveecbaepeancaitcyqutoirebdinsdev,efroarlteixmaemspblye,mruettiantiooinc ct/21 mones, or retinoids. These occupy a special position in of an orphan nuclear receptor in some ancestor of verte- /10 gene regulation, since they provide a direct link between /1 brates, allowing the establishment of nuclear retinoic acid 9 tehxeprleigssainodn, wthheiychretghuelyatbe.inTd,hearnedatrhee atalsrgoetmgaennye,nwuchloesaer stiiognnaolfinfgattpyatahcwidasysth(aRtAplRays,aRsXtrRucst)u.rTalherorleeciennstidiedetnhteifi3c-aD- 23/102 receptors that are not known to bind any ligand and are 5 structureofsomeorphan receptorssuggeststhatbonafide 1 tKchelupiestowcreasrllw,eLdhiec‘h‘homrwpahneanren, oannruidgcilWneaairllllyrseodcneispc1too9vr9se9’r’)e.(dGSaousmstoearfpsnhsuoacnnles1ah9ra9rv9ee-; ns2tu0ruc0lc3et)au.rraTlrhelceigeapinstoodrlsat(liirogenavniedowsf imannaAyourwhthaeovrlxeo,geDvrooofluvietnhd,eafnrvodemrLteabsurudacethet 76 by gue since been shown to bind small hydrophobic molecules, s suchasfattyacids,butothernuclearreceptorsappeartobe estrogen receptor (ER) from a mollusk has shown the t on true orphans (Ruse, Privalsky, and Sladek 2002; Wang et interest ofphylogeneticanalysisofnuclearreceptorsfrom 10 ainl.p2r0e0p3a;raHti.oGn)r.onemeyer,J.A.Gustafsson,andV.Laudet, dsiigstnaanTltihnegorepgvaatonhliwustmaioysns (ionTfhnodurencfitlnoeiannr,gNreeacenedpc,etosartnrsadl(oCrerneowdtohsce2rri0ng0oe3lno)e.gsy) April 2 Nuclear receptors share a common organization in can only be understood on the background of a reference 019 structural domains, which notably includes a very con- phylogeny of the species analyzed. One could hope that served DNA binding domain (DBD) and a moderately dealing with two nematode worms, two insects, and five conserved ligand-binding domain (LBD). These domains chordates, we would not encounter any problems defining allow relatively easy identification of nuclear receptors in this reference. But the phylogeny of animals has been the object of molecular studies that have resulted in contra- 1Present address: Joint Center for Structural Genomics, UCSD, dictory results concerning the relative positions of these LaJolla,California. threelineages.StudiesofribosomalRNAsuggestthatnem- Key words: bilaterian, development, endocrinology, gene duplica- atodes andinsectsform a clade, ‘‘Ecdysozoa’’(Aguinaldo tion,geneloss,phylogeny. et al. 1997; Mallatt and Winchell 2002), which is also E-mail:[email protected]. consistent with information from developmental genes Mol.Biol.Evol.21(10):1923–1937.2004 (Adoutteetal.2000).Butrecentstudies,usingmoregenes doi:10.1093/molbev/msh200 AdvanceAccesspublicationJune30,2004 and less species, have given renewed support to the Molecular Biology and Evolution vol. 21 no. 10 (cid:1) Society for Molecular Biology and Evolution 2004; all rights reserved. 1924 Bertrandetal. traditionalviewofacladegroupinginsectsandchordates, and Hasegawa 1999) as implemented in Tree-Puzzle, ‘‘Coelomata’’ (Blair et al. 2002; Dopazo, Santoyo, and correcting for rate heterogeneity between sites with Dopazo 2004; Wolf, Rogozin, and Koonin 2004). In this a gamma law. work,wehavechosentoconsiderallresultsinlightofboth Use of taxonomic and common names: We use the possibilities, the ‘‘ecdysozoan hypothesis’’ and the ‘‘coe- word ‘‘fish’’ for Actinopterygii, or ray-finned fishes; this lomatehypothesis.’’Thus,wehopeourresultstoberobust usage does not include cartilaginous fishes (sharks and to theoutcome ofthis interesting debate. rays) or flesh-finned fishes (lungfishes and coelacanths). In this work we take advantage of the wealth of data We use the word ‘‘pufferfish’’ for Tetraodontiformes, provided by complete genome projects, spanning several which include Takifugu rubripes and Tetraodon nigro- degreesofdivergenceamonganimals,toredefinerelation- viridis. We use the word ‘‘nematode worm’’ for the ships inside the superfamily and to characterize nuclear two Caenorhabditis species studied. We use the word receptorevolutionsincethedivergenceofmajorbilaterian ‘‘trematode’’ for Schistosoma mansoni. lineages. We are also in the process of characterizing D o experimentally all nuclear receptors from the zebrafish w Results n Danio rerio (unpublished data), and we took these lo How Does the Number of Nuclear Receptor Genes a sequences into account in our interpretations. Using a d Vary Among Bilaterian Lineages? e pthheyliongfoernmetaitcioanpparvoaaiclahb,laen,dwtehacnanksfotor tthheecfiormstptliemteendeessfinoef The original reports of several bilaterian genome d from overall nuclear receptor evolution in terms of gene gain projects include accounts on the nuclear receptor genes h andlossandshowwhichmechanismswereinstrumentalin found. While the result originally reported for the fruit fly ttps establishing modern endocrine systems. was correct (21 NR genes; Adams et al. 2000), theresults ://a c reported for the human genome were not: original reports a d Materials and Methods suggested60NRgenes,whereasfurtherexaminationonly em found 48 (Maglich et al. 2001; Robinson-Rechavi et al. ic .o A standard set of nuclear receptor sequences was 2001a). As for the nematode C. elegans, it has a very u p established for querying genome sequences by retrieving divergent nuclear receptor complement, with more than .c o one complete protein sequence per group (Nuclear 250 NR genes, which have been extensively studied m /m ReceptorsNomenclatureCommittee1999)fromNurebase elsewhere (Sluder et al. 1999; Maglich et al. 2001; Sluder b e (Ruau et al. 2004). These sequences were compared to and Maina 2001; unpublished data). Here we repeated the /a predicted peptides from each of the queried genomes by search for each available genome sequence, whatever the rtic le BlastP (Altschul et al. 1990). The data used for each published results. This yields consistent results with those -a queried genome is presented in table S3 in the online reportedfortheseasquirt(17NRgenes;Dehaletal.2002; bs SupplementaryMaterial.Thesamemethodologywasused Yagi etal. 2003) andC. briggsae (.250 NRgenes; Stein tra c to identify and classify potential nuclear receptor genes et al. 2003), but we found two more than reported for the t/2 1 as in Robinson-Rechavi et al. (2001a), including checks fugu (70 vs. 68 NR genes; Maglich et al. 2003). A recent /1 0 against genomic DNA sequences to detect annotation report analyzed nuclear receptor genes from the rat and /1 9 errors and pseudogenes. mouse genomes (Zhang et al. 2004). We confirm the 2 3 All new nuclear receptor protein sequences were nuclear receptor genes found in that study (47 in rat, 49 /1 0 alignedbyeye using Seaview (Galtier, Gouy, and Gautier in mouse). Because of the incomplete nature of the rat 25 1 1996) on the reviewed alignment of Nurebase. The align- genome (Rat Genome Sequencing Project Consortium 7 6 mentsusedforphylogeneticanalysisincludedallavailable 2004;Zhangetal.2004), andbecauseitdoesnotaddany b y sequences and were not limited to the species discussed newinsightsintotheevolutionofsuperfamily(i.e.,nonew g u in the text. Sequences with less than 200 amino acids genes discovered), we limit our discussion to the more es includingtheDBDwereexcluded,andonlycompletesites complete mouse genome (Waterston et al. 2002). We t o n (no gap,noX) were usedfor phylogenetic reconstruction. present the first analysis of nuclear receptor genes for the 1 0 Phylogenetic trees were built using (1) Neighbor-Joining mosquito (21 NR genes; Holt et al. 2002) and tetraodon A p (Saitou and Nei 1987) with distances corrected for rate (66 NR genes; unpublished genome). All genes we found ril 2 heterogeneity between sites, with a gamma law of param- are presented in table S1 (see online Supplementary 0 1 eter alpha estimated in Tree-Puzzle (Schmidt et al. 2002) Material), and an overall view of homology relations is 9 witheightcategories,andusing(2)PHYML(Guindonand presented in figure 1. We have also analyzed EST data Gascuel 2003), a fast and accurate maximum likelihood from several key lineages with no available genome heuristic,undertheJTTsubstitutionmodel(Jones,Taylor, sequence, which allowed us to clarify some unresolved and Thornton 1992), with a gamma distribution of rates phylogenies.GenesfoundbyExpressedSequenceTag(EST) between sites (eight categories, parameter alpha estimated analysis are presented in table S2 (online Supplementary by PHYML). Trees were built for the whole superfamily, Material). All protein alignments are available as online foreachsubfamily,andforeachgroup.Incaseofconflict, Supplementary Material and will be included in future trees built using groups were considered more reliable versions of Nurebase (Ruau et al. 2004). because they use more complete sites and less divergent Astrikingpatternintheresultsobtainedbycomplete sequences. All phylogenetic relations that are discussed genome analysis is that species that have diverged since were tested by a comparison of likelihoods between the Mesozoic (i.e., in the last 248 Myr: the diversification alternative topologies, using the SH test (Shimodaira of closely related animals species) have conserved very EvolutionaryGenomicsofNuclearReceptors 1925 D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /m b e /a rtic le -a b s tra c t/2 1 /1 0 /1 9 2 3 /1 0 2 5 1 7 6 b y g u e s t o n 1 0 A p ril 2 0 1 9 FIG.1.—Summarizedphylogenetictreeofthenuclearreceptorsuperfamily.Thisrepresentsasummaryofinformationfromgenetreesofeach groupand subfamilyof nuclearreceptors.Branchesthat arenot significantly supportedwerecollapsedintopolytomies.Althoughonlytaxonomic groupswithgenomesequencesavailablearerepresented,allavailablegeneswereusedforeachgenetree.NuclearreceptorgenenamesfollowtableS1 (onlineSupplementaryMaterial).Genesarecolor-codedaccordingtotaxonomicgroup:redforvertebrates,purpleforseasquirt,bluefordipteran insects,andgreenfornematodes.Forreadability,speciesphylogenyinsideeachofthesegroupsisnotindicated,norareevolutionaryeventsinsideeach ofthesegroups:secondarygeneloss,fish-specificgeneduplications,etc.Branchlengthsarearbitrary.Ontheright,genegroupswithofficialgene nomenclature(NuclearReceptorsNomenclatureCommittee1999)arelisted.Yellowbackgroundcirclesindicategenesthatareinferredtohaveexisted inthecommonancestorofinsects,nematodes,andchordates;yellowbackgroundnumbersattherootofeachsubfamilyindicatethenumberofsuch ancestralgenesinferredtohaveexistedinthissubfamily.Hatchedbackgroundcirclesrepresentcaseswheretheancestralgenestatecannotbeinferred withconfidence:blackhatchingrepresentsalternativeancestralgenesfortheNR1Hgroup;redhatchingrepresentsalternativeancestralgenesforthe NR2Egroup.Arrowsrepresentkeycontributionsoftaxaforwhichwedonothavecompletegenomes;taxonnamesareinitalicsifonlyESTsequences wereused.The broken linesleading toEcR, UNC-55,Rev-erbg, NHR67,and tothe coral TLL/DSFindicate thelack of significant resolution of phylogeneticmethodstopositionthesegenes. 1926 Bertrandetal. D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /m b e /a rtic le -a b s tra c t/2 1 /1 0 /1 9 2 3 /1 0 2 5 1 7 6 b y g u e s t o n 1 0 A p allbiFlaIGte.ri2a.n—s.SNumamminargyaosfinnuficlgeuarrere1c.eEpvtoerngtsenareegmaianpapneddltoossaibnraanncimhaolfetvhoelutrteioen,.bButraonrcdherleonfgethvsenatrseoanrbeitarcahryb.rUanrbcihlaitserniaotisktnhoewcno.mTmheonbraonkceenstolirnoesf ril 2 0 leading to trematodes and mollusks indicate the key phylogenetic positions of these groups (placed according to fig. 1 of Adoutte et al. [2000]), 1 9 althoughwedonothaveanycompletegenomesequence.Aboveeachspeciesisthenumberofnuclearreceptorgenesfoundinitsgenome.‘‘Supnrs’’ arethesupplementary,divergent,nuclearreceptorsfoundinnematodegenomes.‘‘SRs’’standsfor‘‘steroidreceptors’’andcoverthefourcloseparalogs AR,MR,GR,andPR.Intwocaseswherethephylogenywasambiguous,thefavoredphylogenywasusedeventhoughitwasnotsignificant(see hatchedbackgroundcirclesinfig.1).(A)The‘‘ecdysozoan’’speciestreeisassumedtomapgeneduplicationandloss(Aguinaldoetal.1997).(B)The ‘‘coelomate’’speciestreeisassumedtomapgeneduplicationandloss(Wolf,Rogozin,andKoonin2004). similar nuclear receptor complements (fig. 2): the two conservedNRsinC.elegans(Gissendanneretal.2004)by dipterans (250 MYA) both have 21 NR genes, the two classifying the two least divergent supplementary nuclear mammals(80MYA)have48and49,thetwopufferfishes receptors as conserved orthologs of HNF4; because these (25 MYA) have 70 and 66, and the two nematodes (40 two genes belong to the monophyletic clade of supple- MYA) have 13 and 12 conserved NRs, plus more than mentary NRs, we consider them separately. The similar- 250 supplementary divergent NRs each (which form one itiesarenotlimitedtothenumberofgenes;weindeedfind clade,unpublisheddata).Ofnote,arecentstudyreports15 orthologousnuclearreceptorsinthesepairsofspecies.The EvolutionaryGenomicsofNuclearReceptors 1927 D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /m b e /a rtic le -a b s tra c t/2 1 /1 0 /1 9 2 3 /1 0 2 5 1 7 6 b y g u e s t o n 1 0 A p FIG.2.—Continued. ril 2 0 1 9 fly and the mosquito are separated by only one gene loss ofmajoranimalphyla)orthePaleozoic(543to248MYA: (of Knirps in the mosquito) and one gain (duplication of diversification inside major animal phyla), on the other TLL-related in the mosquito), human and mouse (and rat) hand, seem to have accumulated quite divergent comple- are separated by one gene loss (FXRb), the two puffer- ments: nematodes have five to ten times more nuclear fishes are separated by four gene losses (SHP2, DAX1-2, receptors than other triploblastic animals (600 MYA), RARb, and ERRd, all lost in tetraodon), and the two pufferfishes have 38% more than mammals (400 MYA), nematodes are separated by one duplication (plus the andtheseasquirthasconsiderablylessthananyvertebrate ongoing diversification of HNF4-derived NRs). These (555 MYA). Although the divergence dates are debated, figures should be compared to a total of more than 75 most gene duplication or loss in nuclear receptors clearly estimated events of duplication or gene loss since the occurred relatively early in animal evolution, more than divergence of bilaterian phyla (fig. 2). Species that 400 MYA. An interesting exception to the divergence diverged in the Vendian (650 to 543 MYA: first evidence between animal phyla istheGCNF group (NR6A), which 1928 Bertrandetal. phylogenetic position of fish ERRd (fig. 3; Bardet et al. 2004); the fourth ERR was secondarily lost in mammals. ERRs are orphan receptors closely related to estrogen receptors (ERs). Moreover, these ERs include a mollusk ortholog, which implies that the proto-ERR existed in the ancestor of bilaterians. So the absence of ERR in nema- todes must again correspond to secondary loss, and we can infer that there were three nuclear receptor genes of subfamily 3 in the urbilaterian (fig. 1; see also Thornton, Need, and Crews 2003). Theexamplerepresentedinfigure4ismorecomplex. The placement of nematode DAF-12 implies that the split between the groups of VDR/PXR, FXR, and LXR D o (hatched black arrow) occurred before the speciation of w n bilaterian lineages (orange arrow). However, further lo a inference is dependent on our understanding of the spe- d e c(Aiegsupinhaylldoogeetnayl..1If99th7e),tehcednysthoezopahnylhoygpenoethtiecsipsosisitioconrorefcat d from Drosophila gene represents protostomes as well as a h nematode gene does. Thus, any subfamily that contains ttps either of these species can be dated back to the ancestor ://a c of bilaterians. In that case, FXRs, EcR and LXRs derive a d from two paralogous genes in the urbilaterian (fig. 4B). em However, if the coelomate hypothesis is correct (Wolf, ic .o Rogozin,andKoonin2004),thenanambiguityarises:itis u p possible that there were a proto-FXR and a proto-LXR/ .c FIG.3.—Asimplecase:estimationofthenumberofancestralgenes EcRinUrbilateria,bothbeinglostinnematodes(fig.4C). om for estrogen related receptors. Maximum-likelihood tree of sequences /m fromNurebaseandpredictedfromgenomes;zebrafishsequenceswhich Itisalsopossiblethattherewasonlyoneancestralgenein b e were not yet in Nurebase were added from GenBank. Two ERRg Urbilateria, that it was this proto-FXR/EcR/LXR which /a sequences with partial DBDs were excluded from the alignment was lost in nematodes and that it was duplicated in the rtic (GST32848 from Tetraodon and AAS66636 from zebrafish). Bootstrap le support,inpercentageof2,000replicatesinaNeighbor-Joininganalysis, ancestor of coelomates before the speciation of chordates -a iaslteinrndaictiavteedpolnactihnegbroafncFhResUtPh1at34d4e6fi2ne1theAvAerSte6b6r6at3e7duapslicfiasthio-snpse.cTifihce awnhdicahrtwhreohpaovdesn(fiogf.ur4thDe)r.eTvhidiseniscethtoecohnolyosesubbeftawmeielyntfhoer bstra c duplicates of ERRa is significantly less likely (SH test: P ¼ 0.042). different solutions. However, the relative rarity of gene t/2 Branch length is proportional to estimated evolutionary change; the duplicationintheputativeancestorofcoelomates(fig.2B) 1/1 measurebarrepresents0.2substitutions/site.Namingandcolorcodesas 0 in figure 1. Arrows indicate gene duplications or gene losses, some of and the frequency of gene loss in the nematode lineage /19 which are not visible on the simplified tree of figure 1. Thick-hatched favor the hypothesis of separate proto-LXR/EcR and 23 branches indicate secondary gene loss. The yellow background circle proto-FXR paralogs in Urbilateria (fig. 4C). /1 0 indicates the gene inferred to have existed in the common ancestor of In several other subfamilies, where similar problems 25 bilaterians. 1 occur, the phylogenetic position of nuclear receptor genes 7 6 from coral, other nematodes, or trematode, notably from b y is the only group of nuclear receptor genes to have EST data (online Supplementary table S2), allows us to g u remained apparently stable since the ancestor of bilat- positionthecommonancestorofanimals(arrowsinfig.1). es erians, with one gene per genome in all animals studied. The most notable result from trematode EST data is the t o n first evidence for a thyroid hormone receptor ortholog in 1 0 an invertebrate, proving that there was a proto-TR in the A Number of Nuclear Receptor Genes in Urbilateria p urbilaterian. The phylogenetic position of these EST ril 2 More widely, we can infer the minimum NR com- sequences inside the TR clade is strongly supported both 0 1 plementofthecommonancestorofchordates,nematodes, whennuclearreceptorsfromallgroupsofthesuperfamily 9 and insects (i.e., the common ancestor of bilaterians, areincludedintheanalysis(65completeaasites;SHtest: Urbilateria; fig. 2) from the phylogenetic relationships of P¼0.0030) and when only closely related sequences are their NRs. The principle of this inference is that for each used (90 complete aa sites; fig. 5). groupofNRswecanlocatethespeciationeventthatledto Applying this method to the entire superfamily, it thedifferentbilaterianlineagesbythedivergencebetween appears that there were at least 22 to 25 nuclear receptor chordate, insect, and nematode orthologs. Then, all genes genes in the common ancestor of nematodes, insects, and resulting from duplications that occurred before this spe- chordates(fig.2);giventhedatadiscussedabove,themost ciation were probably present in Urbilateria. This reason- probablenumberseemstobe25ancestralnuclearreceptor ing remains correct even after secondary loss in some genes (yellow background points in fig. 1; table 1). Of lineages. For example, although there are only three note, the lack of phylogenetic resolution in some groups ERR paralogs in the human genome, we can infer that does not appear to be due simply to higher evolutionary there were four in the ancestor of vertebrates, due to the rates: NR2E sequences do evolve significantly faster than EvolutionaryGenomicsofNuclearReceptors 1929 D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /m b e /a rtic le -a b s tra c t/2 1 /1 0 /1 9 2 3 /1 0 2 5 1 7 6 b y g u e s t o n 1 0 A p ril 2 0 FIG.4.—Aproblematiccase:estimationofthenumberofancestralgenesforecdysonereceptorsandcloselyrelatedgenes.Namingandcolor 19 codes as in figure 1. (A) Maximum-likelihood tree; short predicted sequences are not included because they diminish dramatically the number of completesitesinthealignment.Inthecircle,subtreebuiltwithonlyLXRsequences,withoutinterferencebythelongbranchesoftheothersequences. Allotherconventionsasinfigure3.(B–D)Simplifiedtrees,asinfigure1,withevolutionaryreasoningsuperimposed.Orangearrowsindicatethefirst speciationseparatingtheselineages.Yellowbackgroundcirclesindicategenesthatareinferredtohaveexistedinthecommonancestorofinsects, nematodes, and chordates. ‘‘Proto-XVR’’¼proto-xenobiotic vitamin D receptor. Thick-hatched branches indicate secondary gene loss. The arrow indicatesthelastcommonancestorgeneofallnuclearreceptorsinthistree.Thebrokenlineindicatesthenonsignificantsupportforthephylogenetic positionofEcR(SHtest:P¼0.45).(B)The‘‘ecdysozoan’’speciestreeisassumedtomapgeneduplicationandloss(Aguinaldoetal.1997):theorange arrowsseparatethedeuterostome(chordates)andprotostome(insectsandnematodes)lineages.(CandD)The‘‘coelomate’’speciestreeisassumedto mapgeneduplicationandloss(Wolf,Rogozin,andKoonin2004):theorangearrowsseparatethecoelomateandnematodelineages.(C)Intheabsence offurtherevidence,thisspeciationisassumedonthelatestbranchpossible.(D)Intheabsenceoffurtherevidence,thisspeciationisassumedonthe earliestbranchpossible. 1930 Bertrandetal. Table 1 Nuclear Receptor GenesInferred to HaveExistedin the Urbilaterian UrbilaterianNRGene Group Comments Proto-TR 1A ESTsinatrematode Proto-RAR 1B Proto-PPAR 1C Proto-Rev-erb/E75 1D Proto-E78/sex1 1E1G Proto-ROR/HR3 1F Proto-LXR/EcR 1H Lowphylogeneticresolution; dependentonanimal phylogeny Proto-FXR D Proto-XVRa 1I1J1K DescendantsincludeVDR, o w PXR,CAR,DHR96, n DAF-12,NHR8,NHR48 loa Proto-HNF4 2A Descendantsinclude de Proto-RXR/USP 2B ESnTesmiantoadteresmupantordseand d fro m anematode h Proto-TR2-4/HR78/NHR41 2C1D ttp Proto-TLL 2E Lowphylogeneticresolution; s knowngeneinacoral ://a Proto-DSF ca Proto-PNR de FIG. 5.—Phylogenetic position of the trematode thyroid hormone Proto-FAX-1 mic receptor. Maximum-likelihood(ML) tree; the sequences predictedfrom Proto-COUP/SVP 2F Unclearrelationtounc-55 .o trematode EST sequences are on a yellow background. At the branch Proto-EAR2 up grouping trematode EST sequences with TR sequences, statistical Proto-ER 3A Knowngeneinamollusk .c o support: above the branch, percentage of 2,000 bootstrap replicates in Proto-ERR 3B m a Neighbor-Joining analysis; under the branch, P-value of an SH Proto-SR 3C /m likelihoodtest.DuetoshortESTsequences(90completeaminoacidsites Proto-NR4 4A Descendantsinclude be used), phylogenetic relationships among TR sequences are not well NURR1,NOR1,NGFIB, /a resolved,butthespeciestreeisalmostaslikelyastheMLtree:SHtest, DHR38,CNR8 rtic P¼0.61.Allotherconventionsasinfigure3. Proto-FTZF1/SF1/LRH1 5A le-a Proto-HR39 5B b s Proto-GCNF 6A tra other NR2 sequences (relative-rate test: dK ¼ 0.23 6 NOTE.—TheUrbilaterianNRgenesarethenuclearreceptorgenesinferredto ct/2 0.076; P ¼ 0.002), but NR1H sequences do not evolve haveexistedinthecommonancestorofinsects,nematodes,andchordates.Theyare 1/1 significantly faster than other NR1 sequences (dK ¼ representedbyyellowbackgroundcirclesinfigure1. 0 0.20 6 0.11; P ¼ 0.07). Moreover, there are other fast- aProto-XVR¼Proto-xenobioticvitaminDreceptor. /19 2 evolvingreceptorswhosephylogeneticclassificationisnot 3/1 problematic (i.e., PPARs, PXR/CAR). the EST databases (fig. 5), both genes that are missing in 025 1 insects and nematodes. 7 6 The insect and nematode lineages (ecdysozoans) b Nuclear Receptor Subfamilies that Distinguish y notably lost genes that code for major liganded receptors g Vertebrates and Insects u invertebrates.Thelethalorsterilephenotypesofknockout es While vertebrates have more NR genes than their mice for these genes (for a review, see Laudet and t o n urbilaterian ancestor (fig. 2), dipteran insects have fewer. Gronemeyer 2002) and their conservation among verte- 1 0 In addition, there are groups of receptors that are specific brates suggest that the ancestral organism(s) lost genes A p toeachlineage.Wheredothesedifferencescomefrom?A which did not yet have their present function. Contradic- ril 2 first factor in the divergence of these two lineages is tory to this, a reconstructed ancestor of steroid receptors 0 1 specific loss of different genes: seven nuclear receptors (ERs and SRs) appears to be regulated by estrogen 9 were lost in the lineage leading to dipteran insects since (Thornton,Need,andCrews2003).However,themollusk their divergence with deuterostomes (including chordates) ER is not regulated by estrogen, which implies that either and four were lost in the ancestor of chordates (fig. 2). the result on the ancestral receptor is an artifact or that it Strikingly,allnuclearreceptorgenesmissingininsectsare hadinsufficientfunctionalimportanceintheurbilaterianto also absent from nematodes, which may either mean that prevent loss of the gene (nematodes, insects) or of the they were lost in the ancestor of ecdysozoans, with little function(mollusk).Theseessentialhormonebindingfunc- latter loss in the insect lineage (fig. 2A), or that there was tions were probably either acquired or integrated into anextremelyhighparallelismingenelossbetweenthetwo signaling pathways secondarily, together with the devel- lineages (fig. 2B). Data on lophotrochozoans would be opment of the vertebrate endocrine system; thyroid especiallyusefultounderstandthisphenomenon,asshown hormone signaling is thought to be specific to chordates by the isolation of a mollusk ortholog of ER (Thornton, (see Dehal et al. 2002), as is nuclear retinoic acid Need,andCrews2003)orthefindingofatrematodeTRin signaling, although other forms of retinoid signaling may EvolutionaryGenomicsofNuclearReceptors 1931 exist in insects (Mansfield et al. 1998; Adam, Perrimon, and that these genes may also regulate derived features of and Noselli 2003). Concerning steroid signaling, the only nematodes. As a side note, there is an ortholog of fly subfamily3receptorininsectsistheorphanERR(Adams Trithorax in the mosquito; it is not considered a nuclear et al. 2000; Ostberg et al. 2003), whereas the only steroid receptor, although it includes a highly divergent nuclear hormone known in insects, ecdysone, is mediated by receptor like DBD (Stassen et al. 1995). EcR,asubfamily1nuclearreceptorcloselyrelatedtoother As opposed to gene loss, gain by gene duplication is vertebrate steroid receptors (FXRs and LXRs; fig. 4) but frequent in vertebrates, but almost absent in insects, and nottosubfamily3.Thefunctionalstudyofmoreorthologs absent in early evolution of bilaterian lineages (fig. 2). of liganded nuclear receptors from diverse lineages, such While there are ten nuclear receptors with one orthologin as trematode TR (fig. 5), is needed to clarify this issue. insectsversustwoormoreinvertebrates,thereisonlyone Itshouldbeemphasizedthatthedataunambiguously receptor with one ortholog in vertebrates versus two in support the hypothesis that these nuclear receptors were a dipteran insect, and it is a recent duplication of TLL lost in nematodes and insects, as opposed to vertebrate specifically in the mosquito. This abundance of vertebrate D o ‘‘innovations.’’Thisresultisconsistentlyfoundacrossthe specific duplications is not restricted to nuclear receptors w n different nuclear receptors cited in this paragraph and (Escriva, Laudet, and Robinson-Rechavi 2003), which lo a acrossphylogeneticmethods,withstrongsupport,notably reflect a general pattern of abundant gene (or genome) d e fsrtoermoidmraexciemputomrslwikeerleihaoocdh.orFdoarteexoarmvperlete,birfatseubinfvamenitliyon3, deluspelwichaetrioen(aHtotlhlaendorigeitnaol.f v1e9r9te4b;raDteeshtahlatetis adli.sc2u0ss0e2d; d from bothinsectERRandmolluskERshouldbranchatthebase McLysaght, Hokamp, and Wolfe 2002; Panopoulou et al. h ofthesubfamily3genetree,apossibilitystronglyrejected 2003; Robinson-Rechavi, Boussau, andLaudet2004). ttps by a likelihood ratio test (SH test: P¼0.0030). This view ://a c contradicts our previous conclusion that most liganded a Derived Nuclear Receptor Complements for Derived d nuclearreceptorsarechordate-specific(Escrivaetal.1997; Anatomies: Nematodes and the Sea Squirt em Laudet 1997). ic .o In a similar manner, the chordate lineage lost four The highest rates of loss of nuclear receptor groups u p genes that mediate insect-specific responses (fig. 2). Un- arefoundintwoverydistantgroupsoforganisms:eightin .c o like genes lost in insects and nematodes, all are orphan nematodes and five in the sea squirt (fig. 2). Both are m /m receptors. FAX1 and DSF are both members of the TLL/ characterized by comparatively simple anatomy, brought b e PNR group, which bind DNA as strict homodimers (for totheextreme inC.elegansadultmorphology,whichhas /a a review Laudet and Gronemeyer 2002), which may have fewerthan1,000cells.Theapparentcontradictionbetween rtic le favored their loss in chordates (see Krylov et al. 2003). thisdescriptionofnematodesashavinglostnuclearrecep- -a Two other genes lost in chordates, Drosophila HR39 and tors and the abundance of more than 250 nuclear receptor bs E78arebothactiveinecdysonesignaling,whichisabsent genes in C. elegans (Sluder et al. 1999) comes from our tra c in chordates. It should be noted that an alternative phylogeneticreasoning.Thereareeightindependentevents t/2 1 scenario, with E78 as the result of an arthropod-specific of loss of genes, which most probably already had an /1 0 duplicationofE75,cannotbesignificantlyexcluded(like- establishedfunctionalroleatthatpointinevolution.Onthe /1 9 lihood SH test: P ¼ 0.069). Data from more diverse other hand, the supplementary nuclear receptor genes in 2 3 arthropods would probably help us reach a significant nematodes come from a unique burst of lineage-specific /1 0 conclusion. Finally, if the coelomate tree of animals is duplications(unpublisheddata),concerningonlyonegroup 25 1 correct(fig.2B),chordatesalsolosttheancestorofKnirps, of the superfamily (HNF4). Thus, we obtain the apparent 7 6 KNRL, and Eagle, a monophyletic group of proteins with paradox that an evolutionary history dominated by loss of b y a typical nuclear receptor DNA-binding domain but with- genesofknownfunctionalsoproducedahugediversityof g u out any detectable similarity to a ligand-binding domain. novelnuclearreceptors,whosefunctionremainsunclear. es These unusual members of the superfamily were pre- Given the importance of nuclear receptors in devel- t o n viously only known in dipteran insects, which made them opment and gene regulation, it is tempting to establish a 1 0 appear to be a dipteran innovation. However, we have relationshipbetween lossofregulatorygenesandderived, A p foundclearrepresentativesofthisgroupintrematodeEST simple anatomies and developmental pathways,especially ril 2 data, which pushes back significantly their appearance by because there are several cases of parallel gene loss. All 0 1 domain recombination. In dipterans, they are active in the nuclear receptors lost in the sea squirt except TLL were 9 regulation of early developmental genes. also lost in parallel in the nematode lineage (fig. 2). Con- Moregenerally,theonlymembersofthesuperfamily cerningTLL,phylogeneticresolutionisnotsignificant,but known to lack the typical DBD-hinge-LBD structure are nematode NHR67 seems to be the nematode ortholog of always different between insects and vertebrates, forming TLL.Theparallellosseswithnematodes(orecdysozoans) another source of divergence. The common functional notably include estrogen and ‘‘classical’’ steroid hormone point of these atypical nuclear receptors in dipterans and receptors. Similarly, at least HR39 was lost in parallel in vertebrates seems to be their role in the regulation of the nematodes and in the ancestor of chordates; DSF was derived developmental features of dipterans and verte- probably also lost in nematodes and chordates, although brates(Laudet1997). Itmay benotedthatthediversityof phylogenetic support is not significant. Both these genes nuclear receptors in nematodes also includes several pro- are instrumental in the development of insect specialized teins lacking the LBD or the DBD (Sluder et al. 1999), structures, which nematodes lack. This suggests that both notably Odr-7 (Sengupta, Colbert, and Bargmann 1994), sea squirt and nematode lineages went through a process 1932 Bertrandetal. 2, 3, and 4). Of note, we found duplications for two pufferfish NR genes which were not reported previously (Maglich et al. 2003): TRa1/2 and RARg1/2. It is also worth mentioning that RARb had never been isolated in a fish and was thought to be amniote-specific, but it is foundinthepufferfishgenomes,aswellasinParalichthys olivaceus (accession numbers BAB71757, BAB71755). Outofatotalof53NRgenesthatcanbeestimatedtohave existed in the ancestor of vertebrates (48 in human 1 FXRb14lostinmammals),20aresecondarilyduplicated in fishes (38%). Due to their number, we will not detail these 20 duplications here; most are described in Maglich et al. (2003). D o AspecialcaseofgenelossisHNF4b,whichappears w n to have been lost in parallel in both the pufferfish and lo a mammalianlineages.Wewerealsounabletocloneitfrom d e zaellbtrhaefischom(upnlpetuebdlivsehretdebdraattae)g.eTnhoums,eHs.NTFh4ebonislyabpsreenvtiofurosmly d from known HNF4b was cloned in Xenopus (Holewa et al. h 1997). A search on the prerelease of the chicken genome ttps FIG. 6.—Phylogeny of HNF4 genes, including chicken genome showsthreeHNF4genes,clearorthologsofhumanHNF4a ://a c sequences. Maximum-likelihood tree. The prereleased version ‘‘wash- andHNF4gplusaclearorthologofXenopusHNF4b(fig.6). a d uc1’’ was queried at http://pre.ensembl.org/Gallus_gallus/. All other This sequence clearly establishes that HNF4b is indeed a em conventionsasinfigure3. vertebrate-specific duplicate, lost independently in fishes ic .o andmammals. u p All genes that were lost in the mammalian (or tetra- .c of anatomical simplification, losing the corresponding o pode)lineageorinthefishlineageareparalogsformedby m regulatorygenes.Itisconsistentwithdatathatsuggestthat /m the anatomy both of nematodes (Aboobaker and Blaxter vertebrate-specific duplications. The case of Rev-erbs is be 2ar0e03d)eraivndeds,eaandsquniortt a(Hncoelslatrnadl.aNnodtaGbilbys,oint-Bwroouwldnb2e00in3-) ianntdermesatimngmbaelsc.aTusheeoffavpororebdabplheyplaorgaelnleyll(obsysliinkepluihffoeorfidsahneds /article terestingtoinvestigatewhetherregulationcascadesdown- distance methods) shows a parallel loss of Rev-erbg in -a stream of nuclear receptors have also been simplified in taentraalptoerdneastiavnedpohfyRloegve-neryb,aininwtehtircahodtohneti‘‘fRoremv-eesr.bHgo’’wgeevneers, bstra these organisms (Gissendanner et al. 2004). c wouldbefastevolvingRev-erbagenes,isnotsignificantly t/2 1 less likely (SH test: P ¼ 0.62). Although the alternative /1 Recent Evolution of Nuclear Receptor Subfamilies 0 phylogeny isappealingbecause itdoesnotnecessitate the /1 9 Theonlyfunctionaldifferencebetweenthetwomam- parallel gene losses, we have cloned orthologs of both 2 3 malian genomes is that the mouse (and rat, Zhang et al. potential Rev-erbg genes in the zebrafish, which already /1 0 2004) genome contains FXRb, a paralog of FXR, which has a Rev-erba (unpublished data). This supports the 25 1 was lost in primates. Similarly, four genes (DAX1-2, hypothesis that Rev-erba was lost in tetraodontiformes, 7 6 SHP2, RARb, and ERRd) found in the fugu were lost which is consistent with the most likely phylogeny. Thus, b y specifically in tetraodon. When compared to Drosophila, we have retained this hypothesis of parallel loss of g u the mosquito genome includes a specific duplication of Rev-erbs, although data from more diverse vertebrates es TLL, but has lost Knirps. TLL is a key gene in the (notablyamphibians)willbeneededtogetasolidanswer. t o n establishmentofnonmetamericunitsoftheembryoaswell Theothercasesofgenelossarewellsupported(likelihood 1 0 as in nervous system development in Drosophila, while tests, P , 0.05). A p Knirps is a transcription repressor also involved in the ril 2 regulation of developmental genes during early develop- 0 Discussion 1 ment, some of them the same as TLL (i.e., Ftz). It is thus 9 Periods of Gene Duplication or Loss possiblethat this loss and this gain representa divergence in developmental regulation pathways between flies and Pregenome studies of nuclear receptor evolution mosquitoes. brought evidence for two ‘‘waves’’ (or periods) of gene Most differences between pufferfish and mammal duplication in the superfamily (Laudet et al. 1992; Laudet genomesareduetogeneduplicationsintheray-finnedfish 1997): one before the diversification of major bilaterian lineage(asalreadyobservedbyMaglichetal.2003),con- lineagesandtheotherspecificallyinvertebrates.Evidence sistent with the general pattern of abundant gene/genome from complete genomes confirms this picture, but refines duplication in fishes (Robinson-Rechavi et al. 2001b; it to a great extent. The most striking characteristic of Christoffels et al. 2004; Vandepoele et al. 2004). Differ- the evolution of the superfamily, in fact, is that gene dup- encesbetweenthetwolineagesarealsoduetodifferential lication and loss are not at all regularly distributed during gene loss: four genes were lost in the mammalian (or evolution of species (fig. 2); there are branches of the tetrapode)lineage,andthreewerelostinpufferfishes(figs. phylogeny that represent ‘‘duplication periods’’ and there
Description: