ARTICLE doi:10.1038/nature14447 Complex archaea that bridge the gap between prokaryotes and eukaryotes AnjaSpang1*,JimmyH.Saw1*,SteffenL.Jørgensen2*,KatarzynaZaremba-Niedzwiedzka1*,JoranMartijn1,AndersE.Lind1, RoelvanEijk1{,ChristaSchleper2,3,LionelGuy1,4&ThijsJ.G.Ettema1 The origin of the eukaryotic cell remains one of the most contentious puzzles in modern biology. Recent studies have provided support for the emergence of the eukaryotic host cell from within the archaeal domain of life, but theidentityandnatureoftheputativearchaealancestorremainasubjectofdebate.Herewedescribethediscovery of ‘Lokiarchaeota’, a novel candidate archaeal phylum, which forms a monophyletic group with eukaryotes in phylogenomicanalyses,andwhosegenomesencodeanexpandedrepertoireofeukaryoticsignatureproteinsthatare suggestiveofsophisticatedmembraneremodellingcapabilities.Ourresultsprovidestrongsupportforhypothesesin whichtheeukaryotichostevolvedfromabonafidearchaeon,anddemonstratethatmanycomponentsthatunderpin eukaryote-specific features were already present in that ancestor. This provided the host with a rich genomic ‘starter-kit’tosupporttheincreaseinthecellularandgenomiccomplexitythatischaracteristicofeukaryotes. Cellular life is currently classified into three domains: Bacteria, tantarchaealhomologuesofactin25andtubulin26,archaealcelldivision Archaea and Eukarya. Whereas the cytological properties of proteins related to the eukaryotic endosomal sorting complexes BacteriaandArchaeaarerelativelysimple,eukaryotesarecharacter- required for transport (ESCRT)-III complex27, and several informa- izedbyahighdegreeofcellularcomplexity,whichishardtoreconcile tion-processingproteinsinvolvedintranscriptionandtranslation2,17,23. giventhatmosthypothesesassumeaprokaryote-to-eukaryotetrans- These findingssuggest an archaeal ancestor of eukaryotes that might ition1,2.Inthiscontext,itseemsparticularlydifficulttoaccountforthe have been more complex than the archaeal lineages identified thus suggestedpresenceoftheendomembranesystem,thenuclearpores, far2,23,28.Yet,theabsenceofmissinglinksintheprokaryote-to-eukaryote thespliceosome,theubiquitinproteindegradationsystem,theRNAi transitioncurrentlyprecludesdetailedpredictionsaboutthenatureand machinery,thecytoskeletalmotorsandthephagocytoticmachinery timingofeventsthathavedriventheprocessofeukaryogenesis1,2,17,28. inthelasteukaryoticcommonancestor(ref.3andreferencestherein). Herewedescribethediscoveryofanewarchaeallineagerelatedtothe EversincetherecognitionofthearchaealdomainoflifebyCarlWoese TACKsuperphylumthatrepresentsthenearestrelativeofeukaryotesin andco-workers4,5,Archaeahavefeaturedprominentlyinhypotheses phylogenomic analyses, and intriguingly, its genome encodes many for the origin of eukaryotes, as eukaryotes and Archaea represented eukaryote-specificfeatures,providingauniqueinsightintheemergence sister lineages in Woese’s ‘universal tree’5. The evolutionary link ofcellularcomplexityineukaryotes. betweenArchaeaandeukaryoteswasfurtherreinforcedthroughstud- ies of the transcription machinery6 and the first archaeal genomes7, GenomicexplorationofnewTACKarchaea revealingthatmanygenes,includingthecoreofthegeneticinforma- While surveying microbial diversity in deep marine sediments influ- tion-processingmachineriesofArchaea,weremoresimilartothoseof encedbyhydrothermalactivityfromtheArcticMid-OceanRidge,16S eukaryotes8 rather than to Bacteria. During the early stages of the rRNA gene sequences belonging to uncultivated archaeal candidate genomic era, it also became apparent that eukaryotic genomes were lineageswereidentifiedinagravitycore(GC14)sampledapproximately chimaericbynature8,9,comprisinggenesofbotharchaealandbacterial 15km north-northwest of the active venting site Loki’s Castle29 at origin,inadditiontogenesspecifictoeukaryotes.Yet,whereasmanyof 3283m below sea level (73.763167N, 8.464000E) (Fig. 1a)30,31. the bacterial genes could be traced back to the alphaproteobacterial Subsequentphylogeneticanalysesofthesesequences,whichcomprised progenitorofmitochondria,thenatureofthelineagefromwhichthe ,10% of the obtained 16S reads, revealed that they belonged to the eukaryotic host evolved remained obscure1,10–13. This lineage might gammacladeoftheDeep-SeaArchaealGroup/MarineBenthicGroupB eitherdescendfromacommonancestorsharedwithArchaea(follow- (hereafter referred to as DSAG)31–33 (Fig. 1b–d and Supplementary ing Woese’s classical three-domains-of-life tree5), or have emerged Figs1and2),acladeproposedtobedeeply-branchingintheTACK fromwithinthearchaealdomain(so-calledarchaealhostoreocyte-like superphylum23.DSAGconstitutesoneofthemostabundantandwidely scenarios1,14–17).Recentphylogeneticanalysesofuniversalproteindata distributed archaeal groups in the deep marine biosphere, but so far setshaveprovidedincreasingsupportformodelsinwhicheukaryotes noneofitsrepresentativeshavebeenculturedorsequenced31. emergeassistertoorfromwithinthearchaeal‘TACK’superphylum18–22, Toobtaingenomicinformationforthisarchaeallineage,weapplied a clade originally comprising the archaeal phyla Thaumarchaeota, deepmetagenomicsequencingtotheGC14sedimentsample,resulting Aigarchaeota, Crenarchaeota and Korarchaeota23. In support of this inasmaller(LCGC14,8.6Gbp)andalarger,multiple-stranddisplace- relationship, comparative genomics analyses have revealed several mentamplified(MDA)metagenomedataset(LCGC14AMP,56.6Gbp; eukaryoticsignatureproteins(ESPs)24inTACKlineages,includingdis- Fig.2a;SupplementaryFig.3andSupplementaryTable1).Giventhe 1DepartmentofCellandMolecularBiology,ScienceforLifeLaboratory,UppsalaUniversity,SE-75123Uppsala,Sweden.2DepartmentofBiology,CentreforGeobiology,UniversityofBergen,N-5020 Bergen,Norway.3DivisionofArchaeaBiologyandEcogenomics,DepartmentofEcogenomicsandSystemsBiology,UniversityofVienna,A-1090Vienna,Austria.4DepartmentofMedicalBiochemistryand Microbiology,UppsalaUniversity,SE-75123Uppsala,Sweden.{Presentaddress:GroningenInstituteforEvolutionaryLifeSciences,UniversityofGroningen,NL-9747AGGroningen,TheNetherlands. *Theseauthorscontributedequallytothiswork. 00 MONTH 2015 | VOL 000 | NATURE | 1 G2015 MacmillanPublishersLimited.Allrightsreserved RESEARCH ARTICLE a c Figure1|Identificationofanovelarchaeal Thaumarchaeota lineage. a,Bathymetricmapofthesamplingsite (1.2 / 8.9) (GC14;redcircle)attheArcticMid-Ocean Aigarchaeota (0 / 0) Knipovich Ridge 0.1 7597 MCCrGen +a rCch3a (e0o.0ta3 (/0 0 /. 204)) TACK SaathcspsetreievlseaesfdmtvinereengnpttRrsoeiifdsteemg.nebit,c,ltroh1oce6baSiftarerladRcdtN1iiv5oAenkrsmaoimtfyftprihonliemcGorenCLs-op1bk4eaic.’stsBeivdCaerassotlne DSAG (10.0 / 75.0) prokaryotictaxaandbarsontherightdepict GC14 MHVG archaealdiversity.Numbersrefertooperational 550 m 89 100 Korarchaeota (0(0 /. 003) / 0.19) tHaxyodnrootmheicrmunalitVsefonrteGarcohugpr;oDuHp.VMEHGV-6G,D,Meeapr-isneea HydrothermalVentEuryarchaeotaGroup6; Mohns Ridge LCoaksit’lse EuryarDchPaAeNoNta ( 2(0.1 / /0 1)5.7) Mc,MBGa-xAimaunmd-lBik,eMlihaoroindepBheynlothgiecnGyroofutpheAaarcnhdaBea.l 3,500 m 16SrRNAreads(seeb),revealingthatDSAG EU731915 b d HM998486 sequencesclusterdeeplyintheTACKsuper- HM480258 Taxon abundance JN12G36Q19027603 phylum.Numbersbetweenbracketsindicate 0.8 0.2 0.1 0.0 57 53 DEUSJ6AQ8G31G 29L7Q3C839G92C9716448 rtoeltaatlivaendabaurcnhdaaenaclere(%ad)so,rfeesapcehctgivroeluyp.MreClaGtiv,eto Other archaea 2 1 DSAG/MBG-B HHMM224444007405 MiscellaneousCrenarchaeotaGroup;MHVG, All taxa CandidTaPhtleaOa undtUhmciventaioDrcsr miHlbcoahayVsnaccsE OeteiGfieotDere-tid61aas 391575657 15 9 211DMMGThHrHBoaVGVuuEGpm-GA Ca-r63chaeotaArchaeal taxa 75 EU7E3AU2Y07843K3215CD8HA49GG4QQ1BE21UDQ13U1765Q67908753720133382761614165838991268007110 γ liuMBinknodaeioitrlctii(shnarttoereiaondHpdgfyosptduhnhrpatoy)tpltobhothgereteelronmvDnyaagSloulsAfVetGs1oe6antoSbhtpoerGevRrDreaNoSt5uiAA0opnGg.aaedrlcen,teMscahlxsuaoeosxqwntiuemonrem..nucimces- JN123687 Bacteroidetes 5 GQ926464 c,d,Scaleindicatesthenumberofsubstitutions GQ926439 DSAG/MBG-B 1 EU732030 persite. 100 GaAmlpmhaapprrootteeoobbaacctteerriiaa 18 18 909090 α ββ21 0.0 0.1 0.2 0.6 0.1 deepercoverage,thelatterdatasetwasusedtoextractmarkergenesthat phylogeneticanalyseswereperformed,usingsophisticatedmodelsof carry an evolutionary coherent phylogenetic signal (Supplementary molecularsequenceevolution.Byimplementingrelaxedassumptions Tables2and3).Usingsinglegenephylogeniesofthesemarkers,contigs of homogeneous amino acid composition across sites or across attributable to either one of the archaeal lineages present in the branches of the tree, these models are less sensitive to long-branch LCGC14AMP metagenome (DSAG, DSAG-related, DPANN and attractionandotherphylogeneticartefacts.Bothmaximum-likelihood Thaumarchaeota),couldbeextracted.Thesetaxon-specificcontigswere andBayesianinferenceanalysesofconcatenatedalignmentscompris- usedastrainingsetsforsupervisedbinningofcontigspresentinboththe ing 36 conserved phylogenetic marker proteins20 (Supplementary LCGC14andLCGC14AMPmetagenomes(SupplementaryFig.4).This Tables 2 and 3) revealed that the DSAG and DSAG-related archaea approach resulted in the identification of two DSAG bins (from (hereafter referred to as ‘Lokiarchaeota’) represent a monophyletic, LCGC14andLCGC14AMP,respectively)aswellasoneDSAG-related deeplybranchingcladeoftheTACKsuperphylum.Loki3represented bin(binLoki2/3fromLCGC14AMP).WefocusedontheDSAGbin the deepest branch of the Lokiarchaeota, and Lokiarchaeum and fromthenon-amplifieddatasettoavoidpotentialbiasesintroducedby Loki2 were inferred to be sister lineages with maximum support MDA(seeMethods).Theanalysesofthelow-abundantDSAG-related (SupplementaryFig.7).Intriguingly,wheneukaryoteswereincluded lineageswerebasedontheMDA-amplifiedLCGC14AMPdataset. inourphylogeneticanalyses,theywereconfidentlypositionedwithin Afterremovalofsmall(,1kbp)andlow-coveragecontigs(Supple- theLokiarchaeota(posteriorprobability51;bootstrapsupport580; mentaryFig.5),readsmappingtotheremainingDSAGbincontigswere Fig. 2b; Supplementary Figs 8 and 9), as the sister group of the reassembledinto504contigs,yieldinga92%complete,1.4fold-redund- Loki3 lineage (Fig. 2b). Robust assessment of these phylogenetic ant composite genome (‘Lokiarchaeum’) of 5.1Mbp, which encodes inferences (SupplementaryFigs10–14andSupplementaryTable5) 5,381proteincodinggenesaswellassinglecopiesofthe16Sand23S revealed strong support for the Lokiarchaeota–Eukarya affiliation rRNAgenes(SupplementaryTable4andSupplementaryDiscussion1). (SupplementaryDiscussion2). The DSAG-related bin (Loki2/3 from LCGC14AMP) was found to The proposednamingofthe Eukarya-affiliatedcandidate phylum contain two low-abundant, distinct lineages, displaying slight but LokiarchaeotaandtheLokiarchaeumlineageismadeinreferencetothe marked differences in GC content of 32.8 and 29.9%, allowing for samplinglocation,Loki’sCastle29,whichinturnwasnamedafterthe separation into two distinct groups (Loki2 and Loki3) (Supple- Norsemythology’sshape-shiftingdeityLoki.Lokihasbeendescribed mentary Fig. 6). Since these two lineages represent low-abundance as‘‘astaggeringlycomplex,confusing,andambivalentfigurewhohas communitymembers,onlypartialgenomescouldberecovered.The beenthecatalystofcountlessunresolvedscholarlycontroversies’’34,in Loki2/3contigsdidnotcontain16SrRNAgenes,renderingitimposs- analogytotheongoingdebatesontheoriginofeukaryotes. ibletoattributethemtoanyoftheunculturedarchaeal16Sphylotypes identifiedintheGC14sediments,suchasthelow-abundanceMarine PresenceofdiverseandabundantESPs Hydrothermal Vent Group archaea (abundance ,0.05%; Fig. 1c). Asourphylogeneticanalysesstronglysupportacommonancestryof However,phylogeneticmarkergeneswereextractedfortheselineages Lokiarchaeotaandeukaryotes,weinvestigatedthepresenceofputat- aswell(21and34markersforLoki2andLoki3,respectively)sincetheir iveESPs24inthecompositeLokiarchaeumgenome.Theamountof inclusionwaspotentiallyusefulinresolvingthephylogeneticplacement genomicdataobtainedfortheLoki2/3lineageswastoolowtoper- oftheLokiarchaeumlineage. form detailed gene content analyses. A comparative taxonomic assessment of theLokiarchaeum composite proteome revealed that LokiarchaeotaandEukaryaaremonophyletic alargefraction(32%)ofitsproteinsdisplayednosignificantsimilarity To determine the phylogenetic affiliation of Lokiarchaeum and the to any known protein, and that roughly as many proteins display Loki2/Loki3 lineages, maximum-likelihood and Bayesian inference highestsimilarityto archaealandbacterialproteins(26%and29%, 2 | NATURE | VOL 000 | 00 MONTH 2015 G2015 MacmillanPublishersLimited.Allrightsreserved ARTICLE RESEARCH a proteome(175proteinsor3.3%)wasmostsimilartoeukaryoticpro- LCGC14AMP LCGC14 (57 Gbp, amplified DNA) (8.6 Gbp, non-amplified DNA) teins(Fig.2c)andrevealedadominanceofproteins,whichineukar- yotesareinvolvedinmembranedeformationandcellshapeformation LCGC14AMP contigs LCGC14 contigs processes, including phagocytosis37 (Extended Data Table 1 and (725 Mbp) (227 Mbp) Supplementary Table 6). Several lines of evidence support that the 59 markers (arCOGs) Binning & Filtering presenceoftheseproteinsisnottheresultofpotentialcontaminating Training sets eukaryoticsequencedata.First,genesencodingLokiarchaeumESPs PVhiysluoagle sncerteice ntrineges for 6(c coantteiggso)ries Lokiarc(h5a.1e uMmb )contigs andotherproteinsmostsimilartoeukaryoteswerealwaysflankedby prokaryoticgenes(SupplementaryFig.16),andmostwereencodedby Binning LoLkCiaGrcCh1a4eAumM Pbin Lo5k9i amrcahrkaeerusm contigs that also contained archaeal signature genes. Second, ESP- encodingcontigsdisplayedhigh(.203)readcoverage,whileeukar- Loki2 (high GC) yotic sequences could not be detected in the LCGC14 dataset, and Loki2/3 bin 21 markers Concatenated phylogenies, representedonlyanegligiblefractionoftheLCGC14AMPmetagen- ML (RAxML), ome.Furthermore,theamplicondatageneratedwithuniversal16S/ Split contigs based Loki3 (low GC) BI (Phylobayes) on GC content 34 markers 18Sprimersdidnotrevealany18SrRNAgenesofeukaryoticorigin (Fig. 1b). Third, phylogenetic analyses of several Lokiarchaeal ESPs Legend Reads Contigs Genes Process/software Input/output revealedtheiremergenceatthebaseofeukaryoticclades(seebelow), b indicating that these proteins represent archaeal out-groups of the Methanopyrus kandleri AV19 eukaryotic proteins rather than being truly eukaryotic in origin. 1 0.7 Euryarchaeota Fourth,Lokiarchaeumappearstocontainbonafidearchaealinforma- 100 tional processing machineries (Supplementary Discussion 4 and 0.98 0.7 DPANN SupplementaryTables7–9)and,irrespectiveofthesignificantamount 1 ofESPsinitsgenome,lacksmanyotherkeyeukaryoticfeatures.Finally, 0.7 100 Thermococcales wecouldalsoidentifyhighlysimilarhomologuesoftheLokiarchaeal 1 1 100 Eukarya ESPsinarecentandindependentlygeneratedmarinesedimentmeta- 1 75 Loki3 (low GC) L genome derived from a sediment core sample off the Shimokita 810 1 Loki2 (high GC) okiarcha PtheenimnsiuclraoboifalJacpoamn,miunnwithy3i8c.hADsStAhGe fcuonmctpiorinsesanadsiegvnoifluictaionnt poafrtthoef 1 100 Lokiarchaeum eota LokiarchaealESPsholdrelevanceforunderstandingtheoriginofthe 94 eukaryoticcell,wereviewsomeofthekeyfindingsinmoredetailbelow. Korarchaeota TACK 1 1 Potentialdynamicactincytoskeleton 98 Crenarchaeota Actinsrepresentkeystructuralproteinsofeukaryoticcellsandcom- 1 72 MCG prisefilamentsthatarecrucialforvariouscellularprocesses,including 1 0.99 1 cell division, motility, vesicle trafficking and phagocytosis39. The 99 Thaumarchaeota 0.25 100 Lokiarchaeum genome encodes five actin homologues that display 98 1 Aigarchaeota higher similarity to eukaryotic actins and actin-related proteins 100 (ARPs)thantocrenactins,agroupofarchaealactinhomologuesthat c were recently shown to be involved in cell shape formation25,37,40 Cellular organisms Lokiarchaeum Eukarya (SupplementaryTable6).Thisobservationwasconfirmedinaphylo- Archaea geneticanalysisoftheLokiarchaealactinsthatalsoincludedhomo- Euryarchaeota Caldiarchaeum Crenarchaeota logues identified in a recently published marine sediment Thaumarchaeota Other archaea metagenome38 (up to 99% identity) as well as in the LCGC14 and Korarchaeum Bacteria LCGC14AMP metagenomes (Fig. 3a and Supplementary Fig. 17). Proteobacteria Firmicutes Lokiarchaeal actins (‘Lokiactins’) comprise several distinct clusters, Other bacteria MCG Other someofwhichbranchatthebaseofdistinctiveeukaryoticactinand No hits ARP clusters, albeit with weak support (Fig. 3a). Despite the poor 0.0 0.2 0.4 0.6 0.8 1.0 resolution of several deeper nodes in the actin tree, strong support Normalized number of hits is provided for a common ancestry of Lokiactins and eukaryotic Figure2|Metagenomicreconstructionandphylogeneticanalysisof actins,indicatingthattheproliferationofactinsalreadyoccurredin Lokiarchaeum. a,Schematicoverviewofthemetagenomicsapproach.BI, thearchaealancestorofeukaryotes.Notably,theLokiarchaeumgen- Bayesianinference;ML,maximumlikelihood.b,Bayesianphylogenyof omealsoencodesseveralhypotheticalshortproteinscontaininggel- concatenatedalignmentscomprising36conservedphylogeneticmarkerproteins solin-likedomainsthatsofarappeartobeabsentfrombacterialand usingsophisticatedmodelsofproteinevolution(Methods),showingeukaryotes anyotherarchaealgenomes(ExtendedDataTable1,Supplementary branchingwithinLokiarchaeota.Numbersaboveandbelowbranchesreferto Tables 6 and 10 and Supplementary Discussion 5). In eukaryotes, Bayesianposteriorprobabilityandmaximum-likelihoodbootstrapsupport these protein domains are part of the villin/gelsolin superfamily of values,respectively.Posteriorprobabilityvaluesabove0.7andbootstrapsupport valuesabove70areshown.Scaleindicatesthenumberofsubstitutionspersite. proteins, which comprise various key regulators of actin filament c,PhylogeneticbreakdownoftheLokiarchaeumproteome,incomparisonwith assemblyanddisassembly41.Althoughthefunctionofthesehypothet- proteomesofKorarchaeota,Aigarchaeota(Caldiarchaeum)andMiscellaneous icalgelsolin-domainproteinsremainstobeelucidated,itistempting CrenarchaeotaGroup(MCG)archaea.Category‘Other’containsproteins tospeculatethatLokiarchaeumhasadynamicactincytoskeleton. assignedtotherootofcellularorganisms,tovirusesandtounclassifiedproteins. GenomicexpansionofsmallGTPases respectively;Fig.2candSupplementaryFig.15),whichisinaccord- SmallGTPasesbelongingtotheRassuperfamilycompriseoneofthe ance with recent findings that suggest major inter-domain gene largestproteinfamiliesineukaryotes,wheretheyareinvolvedinvarious exchange between Bacteria and Archaea35,36 (Supplementary regulatoryprocesses,includingcytoskeletonremodelling,signaltrans- Discussion 3). Most notably, a significant part of the predicted duction,nucleocytoplasmictransportandvesiculartrafficking42.Being 00 MONTH 2015 | VOL 000 | NATURE | 3 G2015 MacmillanPublishersLimited.Allrightsreserved RESEARCH ARTICLE a c 100 100 51100 1L0C0GC14AMPL LaCCnGGdC CL1o14k4AiAaMMrcPPh a ((2e5)u)m (4/1) LL(1Co1kG/i1aC)rc1h4aAeMuPm and 68 51 LokiarchL_o3k1ia9rL3co0hkaie3aur1cm5h4 a(24e5u)4m570 5(23 C8)6a5.C04al7d iMarecthhaaenuomca slduobcteorcrcaBEunuaser cuiyntmaeferrcirahn auaesn odta (19) Actin and related sequences Lokiarchaeum (3) Lokiarch_12880 Arp1 69 Lokiarchaeum (3) 96100 Arp2 Arp3 97 LokTiahrecrhmLaooekpuiahmrilc u(h4ma) esupm. ( 2(4)) LCGC14AMP_06532160 Lokiarchaeum (19) LCGC14AMP_05736710 95 LokiarchL_o4k5ia4r2c0haeum (3) 100 LCGC14AMP and 99 Rab-family (7) 100 LCGC14AMP (2) Lokiarchaeum (5/1) 82 LokiaLrockhiaaercuhma e(2u)m (6) b 0.4 100 83 LCGC14AMP/LokiarcChreaneuacmti n(11/2) 7987100998892 LokiaLrcohkaiaLBeroucakmhcRitaaa e(ern6rcui-)ahmf a_(5 3m(2)7il)1y1 R(07a)s-famRhilyo -(f5a)mily (7) Nitrosopumilus maritimus SCM1 0 99 Lokiarchaeum (5) Bacteria and Myxococcus xanthus DK1622 3 61 Euryarchaeota (12) Lokiarchaeum (8) KoCraarldchiaarecuhmae ucmryp stuobfitluermra OnePuFm8 1 2 93100 ASraf-rf1a-mfailmy i(l7y )(7) 170290521 Ca. Korarchaeum cryptofilum OPF8 TryApcPaMinydroeuoTstlbhiohpaaamrcnolaauofsAu lpbusnryirmaodurbusu caiismedre aiko r baboppnropssudeiohscluni eeltuderhi imio 9a ATnl2 iVI4aa7M61nn/9249aa 24 43111339 841097061100 100 97 EuryCarrAcehrnScaaLRheroEacbokehuetiaaaratya er((ac 55o(rh7))tcaa)h e(a2ue1moL)to La(k2o (i)a2kri)acrhcaheauemum (2 ()7) Reticulomyxa filosa 200 96 EuEruyrayracrhcaheaoetoat a(9 ()77) Saccharomyces cerevisiae S288c 34 503411226 Methanobacterium lacus Giardia intestinalis ATCC 50581 23 98 9096 LokiarchLaoekuiamrc (h2a)eum (3) Tetrahymena thermophila SB210 152 51 Bacteria (35) Homo sapiens 453 Lokiarch_01230 170174596 Ca. Korarchaeum cryptofilum OPF8 Dictyostelium discoideum 153 499329248 Methanopyrus kandleri Lokiarchaeum 92 74 EuEryuaryrcahrcaheaoetao t(a1 3(2)1) Naegleria gruberi0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 217.80 2 0.4 87 73100 BBaaccteteriraia a (n4dB6 )aCcrteenraiar c(8h)aeota (10) Per cent Figure3|IdentificationandphylogenyofsmallGTPasesandactin archaealandbacterialspecies.NumbersrefertototalamountofsmallGTPases orthologues. a,Maximum-likelihoodphylogenyof378alignedaminoacid perpredictedproteome.c,Maximum-likelihoodphylogenyof150aligned residuesofactinhomologuesidentifiedinLokiarchaeumandinthe aminoacidresiduesofsmallRas-andArf-typeGTPases(IPR006689and LCGC14AMPmetagenome,includingeukaryoticactins,ARP1–3homologues IPR001806)inalldomainsoflife.Numbersinbracketsrefertothenumberof andcrenactins25.Consecutivenumbersinbracketsrefertothenumberof sequencesintherespectiveclades.a,c,Sequenceclusterscomprising sequencesinarespectivecladefromLCGC14AMPandLokiarchaeum, Lokiarchaeumand/orLCGC14AMPsequences(red),eukaryotes(blue)and respectively.b,RelativeamountofsmallGTPases(assignedtoIPR006689and Bacteria/Archaea(grey)havebeencollapsed.Bootstrapvaluesabove50are IPR001806)intheLokiarchaeumgenomeincomparisonwithothereukaryotic, shown.Scaleindicatesthenumberofsubstitutionspersite. key regulators of actin cytoskeleton dynamics, these small GTPases GTPase MglA43. Hence, the Lokiarchaeal roadblock/LC7 proteins represent essential components for the process of phagocytosis in representpossiblecandidatesforalternativeGAPsinthisarchaeon. eukaryotes. Intriguingly, the analysis of Lokiarchaeal ESPs revealed amultitudeofRas-superfamilyGTPases,comprisingnearly2%ofthe PresenceofaprimordialESCRTcomplex Lokiarchaeal proteome (Fig. 3b). The relative amount of small In eukaryotes, the ESCRT machinery represents an essential com- GTPasesintheLokiarchaeumgenomeiscomparabletothatobserved ponent of the multivesicular endosome pathway for lysosomal in several unicellular eukaryotes, only being surpassed by the protist degradationofdamagedorsuperfluousproteins,anditplaysarole Naegleriagruberi.Incontrast,bacterialandarchaealgenomesencode in several budding processes including cytokinesis, autophagy and only few, if any, small GTPase homologues of the Ras superfamily viral budding44. The ESCRT machinery generally consists of the (Fig. 3b). ESCRT-I–III subcomplexes, as well as associated subunits45. The PhylogeneticanalysesoftheLokiarchaealsmallGTPasesrevealed analysis of the Lokiarchaeum genome revealed the presence of an thattheserepresentseveraldistinctclusters,eachofwhichcomprises ESCRT gene cluster (Fig. 4a), as well as of several additional pro- several GTPase sequences (Fig. 3c and Supplementary Fig. 18). teins homologous to components of the eukaryotic multivesicular Althoughphylogeneticanalysesfailedtoresolvemostofthedeeper endosome pathway. For instance, Lokiarchaeum encodes divergent SNF7domainproteinsoftheeukaryoticESCRT-IIIcomplex,which nodes,severaloftheeukaryoticsmallGTPasefamiliesappeartoshare appeartorepresentmembersoftheVps2/Vps24/Vps46andVps20/ acommonancestrywithLokiarchaealGTPases(Fig.3c),suggesting Vps32/Vps60 families, respectively. A phylogenetic analysis of the an archaeal origin of specific subgroups of the eukaryotic small Lokiarchaeal SNF7 domain proteins revealed that these branch at GTPases, followed by independent expansions in eukaryotes and thebaseofthesetwoeukaryoticESCRT-IIIfamilieswithlowboot- Lokiarchaeota.Thisscenariocontrastswithpreviousstudiesthathave strap support (Fig. 4b and Supplementary Fig. 19), not only indi- suggested that eukaryotic small GTPases were acquired from the cating that they might represent ancestral SNF7 copies, but also alphaproteobacterialprogenitorofmitochondria37. suggestingthatthelasteukaryoticcommonancestoralreadyinher- AlthoughgenesencodingcanonicaleukaryoticGTPase-activating ited two divergent SNF7-domain-encoding genes from its putative proteins (GAPs) were absent in Lokiarchaeota, twelve roadblock/ archaealancestorratherthanasinglegene46.Furthermore,thegene LC7-domain-containing proteins were identified (Supplementary cluster encodes an ATPase that displays closest resemblance to Tables6and10).Whilesuchproteinshavebeenimplicatedindynein eukaryotic VPS4-type ATPases, including katanin, membrane scaf- organizationineukaryotes,roadblock/LC7proteinMglBofthebac- foldprotein(MSP)andspastin(Fig.4candSupplementaryFig.20) teriumMyxococcusxanthuswasshowntoactasaGAPofthesmall as well as hypothetical proteins that show significant similarity 4 | NATURE | VOL 000 | 00 MONTH 2015 G2015 MacmillanPublishersLimited.Allrightsreserved ARTICLE RESEARCH a b EAP30-domain Vps25-like Vps4 Vps2/24/46 Vps20/32/60 96 LCGC14AMP_06867510 LCGC14AMP_06982930 Lokiarchaeum 61 7.5 kbp 8 kbp 8.5 kbp 9 kbp 9.5 kbp 10 kbp 10.5 kbp 11 kbp 11.5 kbp29 kbp 29.5 kbp 594999561 marine sediment metagenome 0.7 Lokiarch_37480 (Lokiarchaeum) 75 LCGC14AMP_08445690 96 LCGC14AMP_09186610 100 598791552 marine sediment metagenome 0 kbp 0.5 kbp 1 kbp 1.5 kbp 2 kbp 2.5 kbp 3 kbp 3.5 kbp 4 kbp 63 100 598849147 marine sediment metagenome Loki 2/3 598763469 marine sediment metagenome 58 Vps25-like Vps4 Vps2/24/46 Vps20/32/60 EAP30-domain LCGC14AMP_06524000 c LCGC14AMP_08417280 (Loki2/3) Archaeal Vps4 (group 3) (4) 90 Eukarya (Vps2) (24) 100 95 Cren-/thaum-/aigarchaeal Vps4 (group 1) (31) Eukarya (Vps24) (18) 100 Euryarchaeal Vps4 (group2) (15) Eukarya (Vps46) (18) 66 99 Eukarya (MSP) (4) 100 LCGC14AMP_09264370 95 Eukarya (spastin/fidgetin) (6) 92 93 Lokiarch_16760 (Lokiarchaeum) 89 100 Eukarya (Vps4) (4) LCGC14AMP_09392100 100 86 LCGC14AMP_10376800 95 Eukarya (katanin) (6) 595075673 marine sediment metagenome 100 595076476 marine sediment metagenome 97604350177 marine sediment metagenome 59 48 99 604295686 marine sediment metagenome 604262720 marine sediment metagenome 74 LCGC14AMP_06821030 93 96 598854014 marine sediment metagenome LCGC14AMP_08417270 100 LCGC14AMP_08417290 (Loki2/3) 50 LCGC14AMP_01505270 604290277 marine sediment metagenome 100 100 Lokiarch_37470 (Lo kiarchaeum) 59 71 11834658E5u Tkeatrryaah y(Vmpesn2a0 t) h(1e1rm)ophila LCGC14AMP_08445680 Eukarya (Vps32) (13) 0.3 52 595008158 marine sediment metagenome Eukarya (Vps60) (8) 92 LCGC14AMP_06982940 Cren-/Thaum-/Aigarchaeota (59) Figure4|IdentificationofESCRTcomponentsintheLokiarchaeum phylogenyof388alignedaminoacidresiduesofAAA-typeVps4ATPases genome. a,SchematicoverviewofESCRTgeneclustersidentifiedinLoki- includingrepresentativesforeachofthefourmajoreukaryoticsub-groups archaeumandLoki2/3.Intensityofshadingbetweenhomologoussequencesis (membranescaffoldprotein(MSP),katanin,spastin/fidgetinandVps4)aswell correlatedwithBLASTbitscore.b,Maximum-likelihoodphylogenyof207 ashomologuesidentifiedintheLokiarchaeumgenome,inLCGC14AMPand alignedaminoacidresiduesofESCRT-IIIhomologuesidentifiedinLoki- insequencedarchaealgenomes.Bootstrapsupportvaluesbelow45arenot archaeum,LCGC14AMPandotherarchaeallineages.Eukaryotichomologues shown.b,c,Scaleindicatesthenumberofsubstitutionspersite.Numbersin includethetwodistantlyrelatedfamiliesVps2/24/46andVps20/32/60. bracketsrefertothenumberofsequencesintherespectiveclades. Bootstrapsupportvaluesabove50areshown.c,Maximum-likelihood to EAP30-domain-containing proteins (Vps36/22) and Vps25, finger/RING-domain-containing proteins, some of which might respectively(Fig.4a;SupplementaryFigs21and22).Ineukaryotes, serveascandidatesforE3ubiquitinproteinligases(Supplementary Vps22, Vps25 and Vps36 are components of the ESCRT-II com- Tables6and10).Severalofthesecomponentshavealsobeeniden- plex, which comprises two to three of these proteins depending on tifiedinAigarchaeota49. the eukaryotic species46. In addition, a protein domain analysis of theLokiarchaeumproteomeidentifiedaVps28-likeprotein,acom- A‘complex’archaealancestorofEukarya ponent of the eukaryotic ESCRT-I subcomplex that links the ubi- WehaveidentifiedandcharacterizedthegenomeofLokiarchaeota,a quitinpathwaytovesiculartransportandwhich,apartfromVps28, novel, deeply rooting clade of the archaeal TACK superphylum, comprises Vps23 and Vps37 (Extended Data Table 1 and whichinphylogenomicanalysesofuniversalproteinsformsamono- Supplementary Fig. 23). The different subunits of the eukaryotic phyletic group with eukaryotes. While the obtained phylogenomic ESCRT-I complex share similar two-helix core domains and have resolution testifies to a deep archaeal ancestry of eukaryotes, the been suggested to have evolved from a single ancestral sequence47, Lokiarchaeumgenomecontentholdsvaluablecluesaboutthenature which we now propose to be of archaeal origin. ofthearchaealancestorofeukaryotes,andabouttheprocessofeukar- Finally,theLokiarchaeumproteomewasfoundtocontainhypo- yogenesis.ManyoftheESPspreviouslyidentifiedindifferentTACK thetical proteins containing Longin-like domains, as well as several lineages are united in Lokiarchaeum, indicating that the patchy proteins belonging to the BAR/IMD superfamily (Supplementary distribution of ESPs amongst archaea is most likely the result of Tables 6 and 10), comprising curvature sensing protein families lineage-specificlosses2(Fig.5).Moreover,theLokiarchaeumgenome involved in various aspects of vesicle/membrane trafficking or significantly expandsthetotalnumberofESPs inArchaea,lending remodelling processes in eukaryotes. These findings suggest that supporttotheobservedphylogeneticaffiliationofLokiarchaeotaand LokiarchaeumcontainsaprimordialversionofaeukaryoticESCRT eukaryotes.Finally,andimportantly,sequence-basedfunctionalpre- vesicle trafficking complex. In eukaryotes, ubiquitylation of target dictionsforthesenewESPsindicateapredominanceofproteinsthat proteinsrepresentsacriticalstepinESCRT-mediatedproteindegra- play pivotal roles in various membrane remodelling and vesicular dation through the multivesicular endosome pathway44,48. The trafficking processes in eukaryotes. It is also noteworthy that Lokiarchaeum genome contains a gene cluster that encodes several Lokiarchaeumappearstoencodethemost‘eukaryotic-like’ribosome components required for a functional ubiquitin modifier system, identifiedinArchaeathusfar(SupplementaryDiscussion4),includ- including homologues for ubiquitin-activating enzyme E1, ubiqui- ing a putative homologue of eukaryotic ribosomal protein L22e tin-conjugatingenzymeE2,and26Sproteasomeregulatorysubunit (Fig.5;SupplementaryFig.24andSupplementaryTables7and8). RPN11.Inaddition,severalhypotheticalproteinswithubiquitin-like Takentogether,ourdataindicatethatthearchaealancestorofeukar- domains were identified in Lokiarchaeum, as well as diverse zinc- yoteswasevenmorecomplexthanpreviouslyinferred2andallowusto 00 MONTH 2015 | VOL 000 | NATURE | 5 G2015 MacmillanPublishersLimited.Allrightsreserved RESEARCH ARTICLE r-proteir-n pLr4ot1eier-n prSo2t5erie-n prSo3t0eire-n prLo1t3eire-n prLo3t8eieEnl oLn2g2aetTioopn ofiascoRtoNmr,eA r (apElsoUlFe) yb Ii1mqBeuirt(iacnsr ees (nyr)spat(coea)tri,) mnt sGuabulGienlssboliEn-SliCkeR Td-EI:oS mVCapiRsnT-2EsII8S:- lCiEkRAeT-PEII3S: 0 CVdRpoTs-EI2mISI5a:i- ClinVRkpTe-sSII2I/m: 2aVll4p /s4LG26oTc0n/Pg3ais2n/e-lB 6ieA0kxeR /pIdaonMsiDm-oaliinknse superfamily FedacPinouciiasgrlktatoocurarushicrrbsryheyeutoaddh5tetiieoceas|tislntTrr.rpciehoblhSeeefuycaotEchlinafooSedlPnmmiisfslaeoapinni.tslniesEddcexamiascocabaavahrtsrejeceoerEdhnrviSnacibaPeeedrywaicoiclshhfoaaaadatfnleeefptadcp-halseiwerhcsltittainiertcdheyduealaodagasfcerarsaEnoSdssP. Eukarya whitecircles,respectively.aWhileeukaryotesand Lokiarchaeota Lokiarchaeotacontainbonafideactins,other archaeaencodethemoredistantlyrelatedCren- Korarchaeota actins.bOnlyfewmembersoftheThaumarchaeota Aigarchaeota TACK c(aorn-ttauibnudliinssta).nctTlyhraeulamte-d,Ahiogm-aonlodgssoomfetubulins Thaumarchaeota Crenarchaeotacontaindistanthomologuesof ESCRT-III(SNF7domainproteins). Crenarchaeota Euryarchaeota speculateonthetimingandorderofseveralkeyeventsintheprocessof 12. Rochette,N.C.,Brochier-Armanet,C.&Gouy,M.Phylogenomictestofthe hypothesesfortheevolutionaryoriginofeukaryotes.Mol.Biol.Evol.31,832–845 eukaryogenesis. For example, the identification of archaeal genes (2014). involvedinmembraneremodellingandvesiculartraffickingprocesses 13. Thiergart,T.,Landan,G.,Schenk,M.,Dagan,T.&Martin,W.F.Anevolutionary indicatesthattheemergenceofcellularcomplexitywasalreadyunder- networkofgenespresentintheeukaryotecommonancestorpollsgenomeson eukaryoticandmitochondrialorigin.GenomeBiol.Evol.4,466–485(2012). waybeforetheacquisitionofthemitochondrialendosymbiont,which 14. Henderson,E.etal.Anewribosomestructure.Science225,510–512(1984). nowappearstobeauniversalfeatureofalleukaryotes28,37,50.Indeed, 15. Koonin,E.V.Theoriginandearlyevolutionofeukaryotesinthelightof baseduponourresultsitseemsplausiblethatthearchaealancestorof phylogenomics.GenomeBiol.11,209(2010). 16. Lake,J.A.Originoftheeukaryoticnucleusdeterminedbyrate-invariantanalysisof eukaryotes had a dynamic actin cytoskeleton and potentially endo- rRNAsequences.Nature331,184–186(1988). and/orphagocyticcapabilities,whichwouldhavefacilitatedtheinva- 17. Williams,T.A.,Foster,P.G.,Cox,C.J.&Embley,T.M.Anarchaealoriginof ginationofthemitochondrialprogenitor. eukaryotessupportsonlytwoprimarydomainsoflife.Nature504,231–236 (2013). Thepresentidentificationandgenomiccharacterizationofanovel 18. Cox,C.J.,Foster,P.G.,Hirt,R.P.,Harris,S.R.&Embley,T.M.Thearchaebacterial archaealgroupthatsharesacommonancestrywitheukaryotesindi- originofeukaryotes.Proc.NatlAcad.Sci.USA105,20356–20361(2008). catesthatthegapbetweenprokaryotesandeukaryotesmight,tosome 19. Foster,P.G.,Cox,C.J.&Embley,T.M.Theprimarydivisionsoflife:aphylogenomic approachemployingcomposition-heterogeneousmethods.Phil.Trans.R.Soc. extent,bearesultofpoorsamplingoftheexistingarchaealdiversity. Lond.B364,2197–2207(2009). Environmental surveys have revealed the existence of a plethora of 20. Guy,L.,Saw,J.H.&Ettema,T.J.Thearchaeallegacyofeukaryotes:aphylogenomic unculturedarchaeallineages,andsomeoftheselikelyrepresenteven perspective.ColdSpringHarb.Perspect.Biol.6,a016022(2014). 21. Lasek-Nesselquist,E.&Gogarten,J.P.Theeffectsofmodelchoiceandmitigating closerrelativesofeukaryotes.Excitingly,thegenomicexplorationof biasontheribosomaltreeoflife.Mol.Phylogenet.Evol.69,17–38(2013). thesearchaeallineageshasnowcomewithinreach.Suchendeavours, 22. Williams,T.A.,Foster,P.G.,Nye,T.M.,Cox,C.J.&Embley,T.M.Acongruent combinedwithprospectivestudiesfocusingonuncoveringmetabolic, phylogenomicsignalplaceseukaryoteswithintheArchaea.Proc.R.Soc.Lond.B chemicalandcellbiologicalpropertiesoftheselineages,willuncover 279,4870–4879(2012). 23. Guy,L.&Ettema,T.J.Thearchaeal‘TACK’superphylumandtheoriginof furtherdetailsabouttheidentityandnatureofthearchaealancestor eukaryotes.TrendsMicrobiol.19,580–587(2011). ofeukaryotes,sheddingnewlightontheevolutionarydarkagesofthe 24. Hartman,H.&Fedorov,A.Theoriginoftheeukaryoticcell:agenomicinvestigation. eukaryoticcell. Proc.NatlAcad.Sci.USA99,1420–1425(2002). 25. Ettema,T.J.,Linda˚s,A.-C.&Bernander,R.Anactin-basedcytoskeletoninarchaea. Mol.Microbiol.80,1052–1061(2011). FullMethodsandanyassociatedreferencesareavailableintheonlineversionof 26. Yutin,N.&Koonin,E.V.Archaealoriginoftubulin.Biol.Direct7,10(2012). thepaper. 27. Linda˚s,A.-C.,Karlsson,E.A.,Lindgren,M.T.,Ettema,T.J.&Bernander,R.Aunique celldivisionmachineryintheArchaea.Proc.NatlAcad.Sci.USA105,18942– Received15December2014;accepted1April2015. 18946(2008). Publishedonline6May2015. 28. Martijn,J.&Ettema,T.J.Fromarchaeontoeukaryote:theevolutionarydarkagesof theeukaryoticcell.Biochem.Soc.Trans.41,451–457(2013). 1. Embley,T.M.&Martin,W.Eukaryoticevolution,changesandchallenges.Nature 29. Pedersen,R.B.etal.Discoveryofablacksmokerventfieldandventfaunaatthe 440,623–630(2006). ArcticMid-OceanRidge.Nat.Commun.1,126(2010). 2. Koonin,E.V.&Yutin,N.Thedispersedarchaealeukaryomeandthecomplex 30. Jørgensen,S.L.etal.Correlatingmicrobialcommunityprofileswithgeochemical datainhighlystratifiedsedimentsfromtheArcticMid-OceanRidge.Proc.Natl archaealancestorofeukaryotes.ColdSpringHarb.Perspect.Biol.6,a016188 Acad.Sci.USA109,E2846–E2855(2012). (2014). 31. Jørgensen,S.L.,Thorseth,I.H.,Pedersen,R.B.,Baumberger,T.&Schleper,C. 3. Koumandou,V.L.etal.Molecularpaleontologyandcomplexityinthelast QuantitativeandphylogeneticstudyoftheDeepSeaArchaealGroupinsediments eukaryoticcommonancestor.Crit.Rev.Biochem.Mol.Biol.48,373–396(2013). oftheArcticmid-oceanspreadingridge.Front.Microbiol.4,299(2013). 4. Woese,C.R.&Fox,G.E.Phylogeneticstructureoftheprokaryoticdomain:the 32. Inagaki,F.etal.Microbialcommunitiesassociatedwithgeologicalhorizonsin primarykingdoms.Proc.NatlAcad.Sci.USA74,5088–5090(1977). coastalsubseafloorsedimentsfromtheSeaofOkhotsk.Appl.Environ.Microbiol. 5. Woese,C.R.,Kandler,O.&Wheelis,M.L.Towardsanaturalsystemoforganisms: 69,7224–7235(2003). proposalforthedomainsArchaea,Bacteria,andEucarya.Proc.NatlAcad.Sci.USA 33. Vetriani,C.,Jannasch,H.W.,MacGregor,B.J.,Stahl,D.A.&Reysenbach,A.L. 87,4576–4579(1990). Populationstructureandphylogeneticcharacterizationofmarinebenthic 6. Pu¨hler,G.etal.ArchaebacterialDNA-dependentRNApolymerasestestifytothe Archaeaindeep-seasediments.Appl.Environ.Microbiol.65,4375–4384(1999). evolutionoftheeukaryoticnucleargenome.Proc.NatlAcad.Sci.USA86, 34. VonSchnurbein,S.ThefunctionofLokiinSnorriSturluson’sEdda.Hist.Relig.40, 4569–4573(1989). 109–124(2000). 7. Bult,C.J.etal.Completegenomesequenceofthemethanogenicarchaeon, 35. Deschamps,P.,Zivanovic,Y.,Moreira,D.,Rodriguez-Valera,F.&Lopez-Garcia,P. Methanococcusjannaschii.Science273,1058–1073(1996). Pangenomeevidenceforextensiveinterdomainhorizontaltransferaffecting 8. Rivera,M.C.,Jain,R.,Moore,J.E.&Lake,J.A.Genomicevidencefortwofunctionally lineagecoreandshellgenesinunculturedplanktonicthaumarchaeotaand distinctgeneclasses.Proc.NatlAcad.Sci.USA95,6239–6244(1998). euryarchaeota.GenomeBiol.Evol.6,1549–1563(2014). 9. McInerney,J.O.,O’Connell,M.J.&Pisani,D.ThehybridnatureoftheEukaryota 36. Nelson-Sathi,S.etal.Originsofmajorarchaealcladescorrespondtogene andaconsilientviewoflifeonEarth.NatureRev.Microbiol.12,449–455(2014). acquisitionsfrombacteria.Nature517,77–80(2015). 10. Gribaldo,S.,Poole,A.M.,Daubin,V.,Forterre,P.&Brochier-Armanet,C.Theorigin 37. Yutin,N.,Wolf,M.Y.,Wolf,Y.I.&Koonin,E.V.Theoriginsofphagocytosisand ofeukaryotesandtheirrelationshipwiththeArchaea:areweataphylogenomic eukaryogenesis.Biol.Direct4,9(2009). impasse?NatureRev.Microbiol.8,743–752(2010). 38. Kawai,M.etal.Highfrequencyofphylogeneticallydiversereductive 11. Yutin,N.,Makarova,K.S.,Mekhedov,S.L.,Wolf,Y.I.&Koonin,E.V.Thedeep dehalogenase-homologousgenesindeepsubseafloorsedimentary archaealrootsofeukaryotes.Mol.Biol.Evol.25,1619–1630(2008). metagenomes.Front.Microbiol.5,80(2014). 6 | NATURE | VOL 000 | 00 MONTH 2015 G2015 MacmillanPublishersLimited.Allrightsreserved ARTICLE RESEARCH 39. Pollard,T.D.&Cooper,J.A.Actin,acentralplayerincellshapeandmovement. LaboratoryatUppsalaUniversity,anationalinfrastructuresupportedbytheSwedish Science326,1208–1212(2009). ResearchCouncil(VR-RFI)andtheKnutandAliceWallenbergFoundation.Wethank 40. Bernander,R.,Lind,A.E.&Ettema,T.J.Anarchaealoriginfortheactin theUppsalaMultidisciplinaryCenterforAdvancedComputationalScience(UPPMAX) cytoskeleton:Implicationsforeukaryogenesis.Commun.Integr.Biol.4,664–667 atUppsalaUniversityandtheSwedishNationalInfrastructureforComputing(SNIC)at (2011). thePDCCenterforHigh-PerformanceComputingforprovidingcomputational 41. Pollard,T.D.&Borisy,G.G.Cellularmotilitydrivenbyassemblyanddisassembly resources.ThisworkwassupportedbygrantsoftheSwedishResearchCouncil(VR ofactinfilaments.Cell112,453–465(2003). grant621-2009-4813),theEuropeanResearchCouncil(ERCStartinggrant 42. Takai,Y.,Sasaki,T.&Matozaki,T.SmallGTP-bindingproteins.Physiol.Rev.81, 310039-PUZZLE_CELL)andtheSwedishFoundationforStrategicResearch 153–208(2001). (SSF-FFL5)toT.J.G.E.,andbygrantsoftheCarlTryggersStiftelsefo¨rVetenskaplig 43. Zhang,Y.,Franco,M.,Ducret,A.&Mignot,T.AbacterialRas-likesmallGTP-binding Forskning(toA.S.),theWenner-GrenStiftelserna(toJ.H.S.),andbyMarieCurieIIF proteinanditscognateGAPestablishadynamicspatialpolarityaxistocontrol (331291toJ.H.S.)andIEF(625521toA.S.)grantsbytheEuropeanUnion.S.L.J directedmotility.PLoSBiol.8,e1000430(2010). receivedfinancialsupportfromtheH2DEEPprojectthroughtheEuroMARCprogram, 44. Hurley,J.H.TheESCRTcomplexes.Crit.Rev.Biochem.Mol.Biol.45,463–487(2010). andbytheResearchCouncilofNorwaythroughtheCentreforGeobiology,Universityof 45. Field,M.C.&Dacks,J.B.Firstandlastancestors:reconstructingevolutionofthe Bergen.C.S.issupportedbytheAustrianScienceFund(FWFgrantP27017). endomembranesystemwithESCRTs,vesiclecoatproteins,andnuclearpore complexes.Curr.Opin.CellBiol.21,4–13(2009). AuthorContributionsT.J.G.E.,S.L.J.andC.S.conceivedthestudy.S.L.J.provided 46. Leung,K.F.,Dacks,J.B.&Field,M.C.EvolutionofthemultivesicularbodyESCRT deep-seasedimentsandisolatedcommunityDNA.R.v.E.,J.H.S.andA.E.L.prepared machinery;retentionacrosstheeukaryoticlineage.Traffic9,1698–1716(2008). sequencinglibraries.A.E.L.,J.H.S.,S.L.J.andJ.M.analysedenvironmentalsequence 47. Kostelansky,M.S.etal.StructuralandfunctionalorganizationoftheESCRT-I data.L.G.,K.Z.-N.andJ.H.S.performed,optimisedandanalysedmetagenomic traffickingcomplex.Cell125,113–126(2006). sequenceassemblies.L.G.,J.H.S.,A.S.,K.Z.-N.andT.J.G.E.analysedgenomicdataand 48. Raiborg,C.&Stenmark,H.TheESCRTmachineryinendosomalsortingof performedphylogeneticanalyses.A.S.,L.G.,S.L.J.andT.J.G.Eanalysedgenomic ubiquitylatedmembraneproteins.Nature458,445–452(2009). signaturesofDSAG.T.J.G.E.,A.S.,S.L.J.andL.G.wrote,andallauthorseditedand 49. Nunoura,T.etal.InsightsintotheevolutionofArchaeaandeukaryoticprotein approvedthemanuscript. modifiersystemsrevealedbythegenomeofanovelarchaealgroup.NucleicAcids Res.39,3204–3223(2011). AuthorInformationSequencedatahavebeendepositedtotheNCBISequenceRead 50. Poole,A.M.&Gribaldo,S.Eukaryoticorigins:howandwhenwasthe ArchiveunderstudynumberSRP045692,whichincludes16rRNAreads(experiment mitochondrionacquired?ColdSpringHarb.Perspect.Biol.6,a015990(2014). numberSRX872366).ProteinsequencesofLoki2/3weredepositedtoGenBankunder accessionnumbersKP869578–KP869724.TheLokiarchaeumgenomebinandthe SupplementaryInformationisavailableintheonlineversionofthepaper. LCGC14metagenomeprojectshavebeendepositedatDDBJ/EMBL/GenBankunder AcknowledgementsWededicatethispapertothememoryofRolfBernander.We theaccessionsJYIM00000000andLAZR00000000,respectively.Theversions thankP.OffreandI.deBruijnfortechnicaladviceandusefuldiscussions,andA.Denny describedinthispaperareversionsJYIM01000000andLAZR01000000.Reprints forimageprocessing.WealsoacknowledgethehelpfromchiefscientistR.B.Pedersen, andpermissionsinformationisavailableatwww.nature.com/reprints.Theauthors thescientificpartyandtheentirecrewonboardtheNorwegianresearchvesselG.O. declarenocompetingfinancialinterests.Readersarewelcometocommentonthe Sarsduringthesummer2010expedition.Allsequencingwasperformedbythe onlineversionofthepaper.Correspondenceandrequestsformaterialsshouldbe NationalGenomicsInfrastructuresequencingplatformsattheScienceforLife addressedtoT.J.G.E.([email protected])orL.G.([email protected]). 00 MONTH 2015 | VOL 000 | NATURE | 7 G2015 MacmillanPublishersLimited.Allrightsreserved RESEARCH ARTICLE METHODS (v8.0.22)63(GTRGAMMAmodelofnucleotidesubstitutionand100bootstraps). TheresultingtreewasimportedintoiTOLonline64tocollapsemajorclades. Nostatisticalmethodswereusedtopredeterminesamplesize. PhylogeneticanalysisofDSAG-relatedOTU’s.All16SrRNAgenesequences Samplingsiteandsampledescription.A2-mlonggravitycore(GC14)was classifiedasDSAGbyJørgensenetal.52wereusedasqueriesinaBLASTsearch retrievedfromtheArcticMid-OceanRidgeduringsummer2010(approximately (E,1025,identity.83%)againstallarchaealentriesintheSILVAdatabase 15kmnorth-northwestoftheactiveventingsiteLoki’sCastle;3283mbelowsea (release119)thatmetthefollowingcriteria:sequencelength.900bp,alignment level;73.763167N,8.464000E)(Fig.1a).Samplesforgeochemistryandmicro- identity.70,alignmentquality.75andpintailquality.75andthequalityof biologywerecollectedimmediatelyandeitherprocessedonboardorfrozenfor recoveredsequenceswaschecked(forexample,using‘cut-head’and‘cut-tail’ lateranalysis.Uponportarrival,thecorewasstoredinsealedcorelinersat4uC information).Thenumberofsequencesinthedatasetwasreducedwhilekeeping (coredepositoryfacility,UniversityofBergen,Norway).Comprehensivegeo- maximum diversity as follows. First, the retained 16S rRNA sequences were chemicalandmicrobialcharacteristicsfromthisandadjacentsiteshavebeen alignedwithSINA(v1.2.11)65,usingallarchaeaintheSILVAdatabaseasref- described elsewhere51,52. The core consists of hemipelagic-glaciomarine sedi- erence.ThealignmentwasmanuallycuratedwithSeaview(v4)66.Uponremoval ments receiving episodic hydrothermal input. The oxygen penetration depth ofgaps,sequenceswereusedtocreateOTUswithUCLUST(v1.2.22)67(94% wasestimatedto,50cmbelowseafloor(b.s.f.)andthecontentoforganiccarbon identitycut-offandthe‘–optimal’option).Allsequencesthatcorrespondedto variedbetween0.6-1.3%.Whilenomeasurableamountsofmethaneorsulphide OTUseedswereselectedto representfullDSAGgeneticdiversityand,upon couldbemeasured,highandfluctuatinglevelsofdissolvedironweredetected. addingarchaealoutgroupsequencesandthesingleampliconOTU,classified Therelativeabundanceofbacterialandarchaeal16SrRNAgenecopynumbers asDSAG,thefinaldatasetwasalignedwithSINA(v1.2.11)asdescribedabove, wasestimatedbyquantitativePCR(qPCR)previously52,indicatinghighabund- trimmedwithTrimAl(v1.4)(gapthresholdof50%)andsubjectedtoRAxML anceoftheDSAGinseveraloftheinvestigatedsedimenthorizons,especiallyat phylogeneticanalyses(v7.2.8;GTRGAMMAsubstitutionmodel,100rapidboot- 75cmb.s.f.(upto40%ofthetotalprokaryoticpopulation;2.73106copiesper straps).Allinternalbrancheswith#40bootstrapsupportwerecollapsedwith gramsediment).Thus,samplematerialfromhorizonat75cmb.s.f.wasusedfor Newick-Utilities(v1.6)68.TheresultingtreewasthenimportedintoiTOLonline64 alldownstreamanalysesincludingampliconandmetagenomelibraries. tocollapsemajorclades. DNA extraction and genomic DNA amplification. To obtain sufficient Metagenomesequencingandassembly. amountsofgenomicDNAforsequencinglibrarypreparation,newsamplemater- Librarypreparationandshotgunsequencing.Nexteralibraries(Illumina)were ialwasobtainedfromthe75-cm-b.s.f.layerofgravitycoreGC14insummer2013. prepared according to the manufacturer’s instructions, using unamplified After qPCR-based verification of high DSAG abundance in the re-sampled LCGC14(20ng)andamplifiedLCGC14AMP(50ng)asinputDNA.Sinceless material,DNAwasextractedfrom7.5gsedimentusingtheFastDNAspinkit startingmaterialwasusedforthegenerationoftheunamplifiedlibrary,atotalof forsoilinconjunctionwiththeFastPrep-24instrument(MPBiomedicals)fol- eightamplificationcycleswereusedinthePCRstepduringwhichtheIllumina lowing manufacturer’s protocol, except for the addition of polyadenosine as barcodesandadapters(NextEraIndexkit)werefused,ratherthanthedefaultfive describedinref.53.Theindividualextractionswerethenpooledandconcen- cycles.TheLCGC14andLCGC14AMPNextEralibrariesweresequencedwith tratedtoafinalvolumeof50mlusingAmiconUltra-0.5filters(50.000NMWL) threeandtwolanes,respectively,ofHiSeq2500(Illumina),usingrapidmode followingthemanufacturer’sprotocol(MerckMillipore).Duetolowyieldand setting, generating two 150-bp paired reads. These runs yielded 8.6Gbp and presenceofinhibitors,2.73ngofthisgenomicDNAwasamplifiedusingthe 56.6Gbpofdatawithanaverageinsertsizeof620and350bpfortheLCGC14 REPLI-gultrafastminikit(Qiagen)accordingtothestandardprotocolforpuri- andLCGC14AMPNextEralibraries,respectively. fiedgenomicDNA. Readpreprocessing.SeqPrep(v.b5efabc5f7,https://github.com/jstjohn/SeqPrep) Ampliconsequencingandanalysisof16SrDNAphylogeneticanalyses.Toget was used to merge overlapping paired-end reads and to trim adapters, with a better estimate of the microbial diversity of Loki’s Castle sediment core defaultsettings.Mergedreadsandnon-mergedpairsweretrimmedwithSickle LCGC14, ‘universal’ primer pairs (A519F (59-CAGCMGCCGCGGTAA-39) (v.1.210,https://github.com/najoshi/sickle),using‘‘se’’and‘‘pe’’options,respect- and U1391R (59-ACGGGCGGTGWGTRC-39)) were used to amplify a ively,anddefaultsettings. ,900bpfragmentofthe16SrRNAgenespresentinthenon-amplifiedgenomic Metagenomicassembly.Pre-processedpaired-endreadsandsinglereadswere community DNA (extracted from LCGC14, 75cmb.s.f.) using the following assembledwithSPAdesv.3.0.069insingle-cellmode,totakeintoaccountthe conditions:15minofheatactivationofpolymeraseat95uCand35cyclesof widelyvaryingcoverageofmetagenomicscontigsaswellastotrytoassemble 95uC(30s),54uC(45s),72uC(60s),followedbyfinalextensionat72uCfor contigswithlowcoverage.Thereadcorrectiontoolwasturnedonandkmers21, 7min.QiagenHotStarTaqDNApolymerasewasusedforthePCRreactions. 33, 55 and 77 were used. Mismatch correction was not performed on the LCGC14AMPdataset.Contigsshorterthan1kbpwerediscarded. Subsequently,PCRproductsofthecorrectsizewerepurifiedwithQiagenPCR purificationkit,andquantifiedusingaNanodropND-3300fluorospectrometer Genepredictions.Proteincodinggenes(CDS)wereidentifiedwithprodigalv. 2.6070,usingthe‘meta’optionformetagenomes.RibosomalRNA(rRNA)genes (ThermoScientific).CleanPCRproductswerethenusedasinputmaterialsfor werecalledwithrnammerv.1.271,usingthearchaealmodelandsearchingforall libraryconstructionusingTruSeqDNALTSamplePrepKit(Illumina)according three rRNA subunits. Transfer RNA genes (tRNA) were identified with tothemanufacturer’sinstructionsandappliedtosequencingwithanIllumina tRNAscan-SEv.1.2372,usingthe‘–G’optionformetagenomesand‘–A’option MiSeqinstrument.TheIlluminaMiSeqrunproducedtwo300-bppaired-end fortheLokiarchaeumcompositegenome(seesubsequentparagraphs).Forthe reads.RawMiSeqfastqsequencesweretreatedwithTrimmomatictool(v0.32)54 latter,theanalysiswasalsorunwithSPLITSX(noversionnumberavailable; using the following options: TRAILING:20, MINLEN:235 and CROP:235, to sourcecodedownloadedon14August2014)73todetecttRNAgenesthatare remove trailing sequences below a phred quality score of 20 and to achieve splitorthathavemultipleintrons. uniformsequencelengthsfordownstreamclusteringprocesses.Remainingtraces Proteinclustering.Archaea-specificclustersoforthologousgenes(arCOGs)74, ofIlluminaadaptorsequenceswereremovedbySeqPrep(https://github.com/ basedon120archaealproteomes(hereaftercalledarCOGs2012),wereextended jstjohn/SeqPrep) and by BLAST55 searches against NCBI Univec database. withproteomesfrom45recentlysequencedorganisms,including31single-cell Quality-filtered MiSeq reads were checked for correct orientation of the 16S amplifiedgenomes(SAGs)(SupplementaryTable1).First,existingarCOGswere rRNAsequenceinthepaired-endreadsandthosecontainingtheforwardprimer attributed to the new proteomes: protein sequences in each of the 10,323 sequence(A519F)wereextractedforOTUclusteringwithUPARSEpipeline56, arCOGs2012 were aligned with MAFFT L-INS-i v.7.130b61. Each alignment settingaOTUcutoffthresholdto97%.Chimericsequenceswerefilteredoutby was used as a query (-in_msa) to search the new proteomes using PSI- the Uchime tool57 integrated in the UPARSE pipeline. Remaining chimeric BLAST55,ignoringthemastersequence,using1024asanE-valuecut-off,fixing sequences,ifstillpresent,weremanuallycheckedandremoved.Abundancesof thedatabasesizeto108,gatheringatmost1,000sequences,andnotusingcom- eachOTUwerecalculatedbymappingthechimaera-filteredOTUsagainstthe position-basedstatistics.Hitswerethensortedpersubjectproteinand,foreach quality-filtered reads using the UPARSE pipeline. Using the mothur package subject, the highest-scoring query alignment was deemed the main arCOG. (v1.33.2)58,representativesequencesforeachOTUwerealignedtogetherwith Wheneverapplicable,thenext-highest,non-overlappingqueryalignmentwas theSilvaNR99release-11559alignmentfiletoclassifytheOTUs. deemedthesecondaryarCOG.Second,proteinswithoutarCOGattribution(sin- Phylogenetic analysis of archaeal 16S rRNA gene sequences. Twenty-nine gletons)inboththeoriginalandextendedsetofproteomesweregathered,and archaealOTUsidentifiedfromtheamplicondatawerealignedtogetherwith newarCOGs(arCOGs2014)werecreatedfromsymmetricalbesthits,usingthe 220 sequences representing the major clades in the archaeal 16S rRNA tree tools available in COG software suite, release 201204 (ref. 75). PSI-BLAST accordingtothestudybyDurbinandTeske60.Atotalof249sequenceswere searcheswereperformedaccordingtotheCOGsoftwareinstructions.Lineage- alignedwithMAFFTL-INS-i(v7.012b)61,trimmedwithTrimAl(v1.4)62,and specific expansions were identified with COGlse, using a job-description file subjected to a maximum-likelihood phylogeny analysis using RAxML containingallpossiblepairsoforganismsthatdonotbelongtothesamephylum. G2015 MacmillanPublishersLimited.Allrightsreserved ARTICLE RESEARCH COGtriangleswasrunwithdefaultsettings,andyielded3,570newarCOGs.Of thefollowingcategories:Lokiarchaeum,Loki2/3(distantLokiarchaeaum-related the325,405proteinsinthecombineddatasets(165proteomes),29,249(9%)had clades), Thaumarchaeota, DPANN, Diapherotrites, Mimivirus, Bacteria or noarCOGattribution. unknown.Classificationwasbasedonphylogeneticplacement.Insomecases AttributionofarCOGstometagenomesorcompositegenomesinthisstudy wherethephylogeneticplacementwasinconclusive,presenceonthesamecontig wasperformedwithPSI-BLASTasdescribedabove,usingthearCOGs2014as ofanothergenealreadyclassifiedwasusedtoaidclassification.Thefactthat queries. Lokiarchaeumistheonlycladeforwhichfourtosixdistinctbutcloselyrelated Phylogeneticanalysesof‘taxonomicmarker’proteinsforbinningandcon- strainsarepresentinLCGC14AMPgreatlyaidedclassification.Inaminorityof catenatedproteintrees. cases,somegeneswereclassifiedinacategorybutmarkedas‘putative’,astheir Phylogenetic inference. Maximum-likelihood phylogenies were inferred with attributionwasslightlyambiguous. RAxML8.0.963,calculating100non-parametricbootstraps.PROTGAMMALG Qualitycontrolofthetrainingset.Contigscontainingmarkersclassifiedinthe andGTRGAMMAwereusedforaminoacidandnucleotidealignments,respect- firstsixcategoriesmentionedabovewereextractedfromtheassembly,andtheir ively, unless otherwise stated. Bayesian inference phylogenies were calculated tetranucleotidefrequencies(TNF)werecalculated.Tothenassessthereliabilityof with PhyloBayes MPI 1.5a76, using the CAT model and a GTR substitution theclassification,lineardiscriminantanalysis(LDA)wasperformedinR83with matrix. Four chains were run, and runs were checked for convergence. packageMASS84,usingGCcontentandTNFasinputdata:halfofthecontigs Wheneverconvergencewasnotreached,thetopologyofindividualchainswas belongingtoeachofthesixselectedcategorieswererandomlyselected(excluding compared.Consensustreeswereobtainedwithbpcomp,usingallfourchainsand thecontigsmarkedas‘putative’),andusedtocalculateLDA(function‘lda’in aburn-inofatleasthalfthegenerations.Toaddbootstrapsupportvaluestothe MASS)(SupplementaryFig.4).Basedonthis,classificationwaspredictedusing Bayesianphylogenies,sumtrees.py(DendroPypackage77)wasused,withdefault theMASSfunctionpredict.lda.Incorrectpredictions(thatis,whentheprediction settings,takingtheBayesianinferencetreeasaguidetreeandthe100bootstraps basedonLDAwasnotcongruentwiththeclassificationbasedonthephylogen- asinput.Forconcatenatedphylogenies,amino-acidsequenceswerealignedagain etictrees)wererecorded.Theprocedurewasrepeated100times,andcontigsthat withMAFFTL-INS-iindividuallyforeachcluster.Positionswith.50%gaps wereattributedtothewrongcategory30timesormoreweremanuallyreviewed weretrimmedandalignmentswereconcatenated. andeventuallydiscardedfromthetrainingset(SupplementaryFig.4a).Contigs Aminoacidbiasfiltering.Toassesstheeffectofaminoacidbiasonthephylo- markedasputativewereattributedtothecategoryifthepredictionwascongruent genies,ax2filteringanalysiswasperformedontheconcatenatedalignment.Fora with the putative classification 90 times or more, or discarded otherwise. A completedescription,seerefs78and79.Inbrief,aglobalx2scoreiscalculatedfor furthercycleofLDAcalculationandpredictionwasperformed,withnocontigs theconcatenatedalignment,bysumming,foreachaminoacidandeachsequence, classifiedas‘putative’thistime(SupplementaryFig.4b).Tofurtherinvestigate thenormalizedsquareddifferencebetweentheexpectedandobservedfrequency therobustnessofthemethod,werandomizedthecategoriesoftheinputand oftheaminoacidinthisparticularsequenceanditsfrequencyexpectedfromthe performedthesameLDAcalculationandpredictionasabove,andassessedthe wholealignment.Eachpositioninthealignmentisindividuallytrimmedandthe numberofincorrectpredictionsineachcase(SupplementaryFig.4c).Thistest difference(Dx2)betweentheglobalx2scoreandthex2scorecalculatedonthe confirmedthatclassificationsbasedontreesweregenerallycongruentwithpre- trimmedalignmentprovidesanestimationoftherelativecontributionofeach dictions based on LDA, significantly more often than just by chance positiontotheglobalaminoacidcompositionheterogeneity.Positionsarethen (SupplementaryFig.4d–f).Thefinalsetofcontigswasusedasatrainingset rankedbytheirDx2values,andthemostorleastbiasedsitesuptoathresholdare forphymmBL85(seebelow),andcomprised839kbpforLokiarchaeum,544kbp removed. for Loki2/3, 521kbp for Thaumarchaeota, 646kbp for DPANN, 43kbp for Treetopologytests.Tocomparehowwelldifferenttreesexplainedthealigned Diapherotritesand21kbpforMimivirus. sequencedata,approximatelyunbiasedtests80wereperformedonconcatenated BinningusingPhymmBL.PhymmBLversion4.085wasrunseparatelyforbinning as well as single-gene alignments. Two maximum-likelihood hypothesis trees thecontigslargerthan1kbpfrombothLCGC14AMPandLCGC14.Astraining weretestedagainstthealignments.Thefirstone,showingLokiarchaeotagroup- sets,allprokaryoticgenomespublishedinGenBank(retrievedon2014-03-04, ingwitheukaryotes,wasobtainedfromtheconcatenationof36markers,shown 2716genomes)werecomplementedwiththe60newlysequencedgenomesused inFig.2b.Thesecondwasobtainedfromtheconcatenationofthe21ribosomal toconstitutethearCOGsetthatwasabsentfromGenBank(SupplementaryTable proteins present in the previous set, and shows Korarchaeota grouping with 1),andthesixtrainingsets(Lokiarchaeum,Loki2/3,Thaumarchaeota,DPANN, eukaryotes.Forindividualgenetrees,thetaxamissinginthealignmentwerealso DiapherotritesandMimivirus)obtainedfromLCGC14AMPasdescribedabove. prunedfromthehypothesistreesusingtheutilitynw_prunefromtheNewick ReassemblyofLokiarchaeumbin.IntheLCGC14assembly,3,165contigs(18.6 Utilitiespackage68.Foreachalignmenttested,per-sitemaximumlikelihoodwas Mbpintotal)werepredictedtobelongtotheLokiarchaeumgenus,indicatinga calculatedforbothhypothesistreeswithRaxML8.0.9,usingtheoption‘–fG’,and largedegreeofmicrodiversity.Inordertoreduceredundancy,contigsetswere the PROTGAMMALG model. CONSEL 0.2081 was then used to perform constituted,withincreasinglow-coveragecut-offs(from1to1003,witha13 approximatelyunbiasedtests,usingdefaultsettings. increment).Thecompletenessandredundancyofeachsetwasthenestimated Identificationoftaxonomicmarkers.Areferencesetof59highlyconserved,low- usingthemicompletescript(manuscriptinpreparation).Inbrief,micomplete orsingle-copygeneswereusedbothastaxonomicmarkersinthebinningprocess bases its predictions on the presence or absence of a set of single-copy pan- andforconcatenatedphylogenies(SupplementaryTable2).Fifty-sevenofthese, orthologs,inthiscase162markersdefinedinref.86.Toavoidoveremphasizing whichwereshowntobepronetoveryfewornohorizontalgenetransferswere thepresenceofmarkersthatareoftenveryclosetoeachother(forexample, takenfromref.79.TwofurtherarCOGs(arCOG04256andarCOG04267,sub- ribosomalproteins),eachmarkerreceivesaweightcoefficientbasedonthedis- unitsA0andA9oftheDNA-directedRNApolymerase,respectively)wereadded tancebetweenthismarkeranditsclosestneighboursbothupstreamanddown- totheset(seeSupplementaryInformationandSupplementaryTable2foralist stream,averagedoverarepresentativesetof70Archaea(setdescribedinref.79). overwhicharCOGisincludedineachphylogeny). Completenessisthefractionofweightedmarkerspresent,andisthusconstrained Unlessotherwisestated,alltreesincludedthesamesetof101referencegen- between0(nomarkerpresent)and1(allmarkerspresent).Redundancyiscal- omes:58archaealgenomesselected79fromthe120analysedbyWolfetal.74;21 culatedasthetotalnumberofcopiesofweightedmarkerspresentdividedbythe selectedfromthe45newlysequencedorganismsthatwerealsousedforcluster- numberofweightedmarkerspresent,andisthusalwaysgreaterthanone,where ing,someofthemalreadyanalysedinGuyetal.82;twogroupsofthreeclosely onewouldmeanthatallmarkerspresentaresinglecopy.Thesetwonumbers relatedSAGswerepooledtoprovidemorecompleteproteomes;tenbacteriaand werecalculatedforeachcontigset,andacut-offof243representedthebest teneukaryotes,asinGuyetal.82(SupplementaryTable1).Toremoveparalogues compromisebetweencompleteness(0.89)andredundancy(1.67)(Loki243set, andobtainsetswithatmostonehomologuepergenome,membersofeachofthe SupplementaryFig.5a). selectedarCOGswerealignedwithMAFFTL-INS-iandamaximum-likelihood To obtain a better assembly with longer contigs with only reads from phylogenywasinferredwithRAxML,underaPROTCATLGmodelwith100 Lokiarchaeum,readsbelongingtoLokiarchaeumcontigswerereassembledas slowbootstraps.Previouslyremovedparalogues79werenotincluded.Treeswere follows.ReadsfromtheLCGC14dataset,correctedbySPAdes,weremapped thenvisuallyinspectedandparaloguesremovedusingthesameguidelinesasin againstthewholeLCGC14assemblywithbwa-mem87,andreadsthatmatched ref.79.Thisset,includingatmostonecopyofeachofthe59referencearCOGsin contigsintheLoki243setwereextracted.Forpaired-endreads,bothreadswere 101genomes,ishereafterreferredtoas‘59ref’. retainedifatleastonereadmatchedtheLoki243set.Theseextractedreadswere Binning. assembledwithSPAdesasabove,butwithoutthesingle-cellmodeandwithout Training set. After arCOG attribution (see above), genes from LCGC14AMP readcorrection.Again,completenessandcoveragewereassessedforsetsofcon- belongingtotherespectivearCOGswereaddedtothe59refset.Sequenceswere tigswithincreasinglow-coveragecut-offs,andathresholdof203coveragewas alignedandindividualtreeswerebuiltforeacharCOG,asdescribedabove.Trees foundtogivethebestcompromisebetweencompleteness(0.92)andredundancy werethenvisuallyinspectedandsequencesfromLCGC14AMPwereclassifiedin (1.44)(SupplementaryFig.5b).Theselected504contigs,hereafterreferredtoas G2015 MacmillanPublishersLimited.Allrightsreserved RESEARCH ARTICLE ‘Lokiarchaeum’,represented5.14Mbpofsequence.TheN50andN90ofthis ‘Other’category.ResultsareshowninFig.2b.Usingthesameparameters,func- assemblywere15.4and5kbp,respectively. tionalCOGcategorieswereassignedtotheLokiarchaealproteometogetinsights Annotation and contamination assessment of Lokiarchaeum genome bin. into the functional and taxonomic affiliation of the Lokiarchaeal proteome AnnotationofallpredictedopenreadingframesoftheLokiarchaeumgenome (SupplementaryFig.15). binwasdoneusingprokka88,usingaconcatenationofthethreekingdom-specific Phylogenetic analyses of selected eukaryotic signature proteins (ESPs). proteindatabasesshippedwithprokkaasthemaindatabase,predictingtRNA Selection of ESCRT-III homologues. For the ESCRT-III phylogeny, eukaryotic and rRNA as above. Furthermore, proteins were compared to sequences in ESCRT-IIIhomologuesasdescribedinMakarovaetal.97(comprisingthefamilies NCBI’s non-redundant database and RefSeq using BLAST55 and results were Vps60/Vps20/Vps32andVps46/2/24),aswellasarchaealESCRT-IIIhomolo- inspectedusingMEGAN89.Additionally,anInterProScan590(whichintegrates guesbelongingtoarCOG00452,arCOG00453andarCOG00454familiespresent acollectionofproteinsignaturedatabases suchasBlastProDom, FPrintScan, in Crenarchaeota, Thaumarchaeota and Aigarchaeota were extracted from HMMPIR, HMMPfam, HMMSmart, HMMTigr, ProfileScan, HAMAP, GenBank.ThemoredistantlyrelatedSNF7-likearCOGfamilies(arCOG09747, PatternScan, SuperFamily, SignalPHMM, TMHMM, HMMPanther, Gene3D, arCOG09749andarCOG07402)97presentinafeweuryarchaealspecieswerenot PhobiusandCoils)wasperformedandthegenomewasviewedandanalysed includedinthealignment.Subsequently,respectivearCOGswereretrievedfrom inMAGE91.Selectedgenesofinterestfortheevolutionoftheeukaryoticcelland/ boththeLCGC14AMPmetagenomeandLokiarchaeumfinalbin(seesectionon or subjected to phylogenetic analyses were checked manually and annotated arCOGattribution).TheESCRToperonpresentonaLoki2/3contigrevealedthe according to their protein domains/signatures based on PSI-BLAST55 results, presence of an additional ESCRT-III homologue (most similar to eukaryotic arCOGattributions(SupplementaryTables6–10)aswellasproteinstructure Vps20/32/60sequences), whichwasnot attributedtoan archaealCOG.This predictions using Phyre292. To check for the presence of particular genes of homologuewasusedasanadditionalquerytoretrievehighlysimilarsequences interest,suchasspecificeukaryoticribosomalproteins,oreukaryoticribosomal fromtheLCGC14AMPmetagenomeaswellastheLokiarchaeumfinalbinusing proteinL41ewhichhasbeendetectedinseveralEuryarchaeota93,existingalign- blastp.Finally,eachofthetwodifferentSNF7-familyproteins,whicharepartof mentsfromarCOGsand/orKOGsweredownloadedfromeggNOG94andusedin theESCRToperonsofLokiarchaeumandLoki2/3,respectively,wereusedas PSI-BLASTsearchesasqueryagainsttheLokiarchaealcompositegenome. queriestosearchpublishedmetagenomes(NCBI)withblastp.Highlysimilar Severalcontrolswereperformedtoconfirmtheabsenceofobviouscontami- sequences(coverage.70%;identity.40%)wereretrievedandincludedinthe nantsinthefinalLokiarchaeumbin.Mostimportantly,allcontigscontaining phylogenyaswell. ESPsdiscussedinthemanuscriptweremanuallyinspectedtoverifythatthese SelectionofVps4homologues.ArchaealsequencesassignedtoarCOG01307(cell actuallybelongtoLokiarchaeum,by:(1)inspectingneighbouringgenesforthe division ATPase of the AAA1 class, ESCRT system component) as well as presenceofarchaealmarkers;(2)byqueryingallproteinspresentoncontigs eukaryoticVps4homologues,includingafewproteinsofthecdc48subfamily, containingESPsagainsttheLCGC17metagenometocheckwhetherhighlysim- wereretrievedfromGenBank.Thelatterproteinfamilyservedasoutgroup,as ilarhomologuescouldbefoundseveraltimesinthesample(generallybetween described in Makarova et al.97 Sequences assigned to arCOG01307 were also 3–7copies)accountingforthedifferent,highlyrelatedLokiarchaeotastrains;and extractedfromLCGC14AMPmetagenomeaswellasfromtheLokiarchaeum (3)byqueryingthesameproteinsagainstenvironmentalmetagenomespublicly bin, and sequences highly similar to the Vps4 of Lokiarchaeum were availableatNCBI,controllingthatmostofthemhadhighlysimilarhomologues retrievedfrompublishedmetagenomes(coverage.60%;identity.50%).The inanoceansedimentmetagenome95,butnotinanyothermetagenome.Thislast LCGC14AMPmetagenomecontainedalargeamountofsequencesassignedto checkwasbasedonourfindingthatallESPsofLokiarchaeumhadhighlysimilar arCOG01307,includinghitstoVps4homologuesofThaumarchaeota.However, homologuesinthismarinesedimentmetagenome(forexample,upto98%for ATPasesthat,basedonphylogeneticanalyses,turnedouttobeunrelatedtoVps4 Lokiactins)indicatingthatcloselyrelatedgenomesofmembersofLokiarchaeota wereremovedfromtheanalysis.Basedontheinitialphylogenythatincludedall arepresent,whichisinaccordancewiththefindingthatDSAGrepresentsan ofthesesequences,onlythoseLCGC14AMPVps4homologuesthatclustered abundantgroupinthesesub-seafloorsediments96. withtheVps4homologueoftheLokiarchaeumbinwereselectedtoavoidthe Finally,proteinscomprisinginformationalprocessingmachinerieswerealso inclusionoffalsepositives. investigatedusingMEGAN89.Theabsenceofbacterialinformationalprocessing SelectionofEAP30-domain(Vps22/36-like)andVps25homologues.EAP30and proteinsindicatedthatthereisnobacterialcontaminationinthefinalbin(see Vps25homologueshavesofarnotbeendetectedinArchaeaandthustherespect- SupplementaryDiscussion3). ive sequences present in Lokiarchaeum (Extended Data Table 1 and Identificationoftaxonomicmarkersinthebins.ForLokiarchaeum,arCOG SupplementaryTable6)havenotbeenassignedtoanarCOGfamily.Thus,only attributionwasperformedasdescribedabove,andtaxonomicmarkerswereiden- Lokiarchaeum homologues, as well as selected representative eukaryotic tifiedbytheirarCOGattribution.Wheneverthereweretwocopiesofthesame sequences spanning the eukaryotic diversity that were retrieved from the marker,thecopylocatedonthecontigwiththehighestcoveragewasselected. GenBank database were included in these phylogenetic reconstructions. ForLoki2/3,thecategoryhadtwocopiesof19outof36markerspresent,with Putative EAP30- and Vps25-like homologues were discovered in the divergentphylogeneticplacement.AclearGCcontentdifferencecouldalsobe Lokiarchaeum genome since they are part of the ESCRT operon present on observedbetweenthecopies,and,withasingleexception,thetwosetsofcopies contig119.Thesesequenceswereusedasqueriestoalsoretrievehomologues werenotoverlapping(SupplementaryFig.6).Theexceptionwasdiscardedand fromtheLCGC14AMPmetagenome(Ecut-off,0.1;qcoverage,85)aswellas theremainingtwo-copymarkersweredividedintotwobins,Loki2(highGC, frommetagenomesdepositedatNCBI. rangingbetween32.2–37.3%)andLoki3(lowGC,rangingbetween27.7–30.7%). SelectionofsmallGTPasefamilyhomologues(IPR006689andIPR001806).The Single-copymarkerswithaGCcontentfallingintotherangeofeitherofLoki2or investigationoftheLokiarchaeumproteomerevealedlargenumbersofproteins Loki3wereattributedtothecorrespondingbin,theothercopieswerediscarded. homologoustosmallGTPasesoftheRasandArffamilies.Inordertoreliably Loki2(highGC,average32.8%)consistedof21markersandLoki3(lowGC, identifyallputativesmallGTPasesintheLokiarchaeumbin,anInterProscan90,98 average29.9%)of34markers. was performed and all proteins assigned to IPR006689 (Ras type of small Taxonomic affiliation of the Lokiarchaeum proteome. To estimate how GTPases) and IPR001806 (Arf/Sar type of small GTPases) were extracted. Lokiarchaeum relates to its closest relatives, its proteome was aligned to Subsequently,archaealreferencesequencesbelongingtotheseIPRfamilieswere NCBI’snon-redundantdatabaseusingblastp,withanEthresholdof0.001.To retrieved from GenBank. Eukaryotic and bacterial reference sequences were provide a way to compare results, the complete proteomes of ‘Candidatus selectedbasedonapreviousstudybyDongetal.99thatinvestigatedthephylo- Korarchaeum cryptofilum’ OPF8, ‘Candidatus Caldiarchaeum subterraneum’ geneticrelationshipsofmembersoftheRassuperfamily.Duetothelargenumber andtheincompleteproteomeofSCGCAB-539-E09,solerepresentativeofthe ofGTPasehomologuesintheLokiarchaeumbin,andthedifficultyassigning MiscellaneousCrenarchaeotalgroup(MCG)weresimilarlyanalysed.Theresults theseproteinstoaparticulartaxon,itwasdecidednottoanalyseallGTPase oftheblastswerefilteredtoremoveself-hitsandhitstoorganismsbelongingto homologuespresentinmetagenomes.UponinspectionoftheMAFFTL-INS-i thesamephylum.In thecaseoftheMCGrepresentative,onlyself-hitswere alignment,partialsequencesandextremelydivergenthomologueswereremoved. removed.FilteredresultswerethenanalysedwithMEGAN5.4.0.Lastcommon Selectionofactinhomologues.Sofar,theonlyactin-relatedproteinsdetectedina ancestorparametersweresetasfollows:MinScore,50;MaxExpected,0.01;Top fewmembersofthearchaeabelongtoarCOG05583andhavebeenreferredtoas Percent,5;MinSupport,1;MinComplexity,0.0.Foreachresult,brancheswere crenactins100. Three proteins encoded by the Lokiarchaeum genome were uncollapsed at the level below super-kingdom. Profiles were compared using assignedtothisarCOG,andablastpsearchagainstRefSeqrevealedthatthese Absolute counts, and the results were exported and further analysed in R. proteinsaremorecloselyrelatedtobonafideactinsofEukaryotesthantoarchaeal Categoriestowhichlessthan100hitswereattributedinLokiarchaeumwere crenactins.Inordertoidentifyadditionalfull-lengthactinhomologues,blastp groupedunderthe‘OtherArchaea’or‘OtherBacteria’categories.Hitsto‘root’, (E-value cut-off,10210) searches were performed against the Lokiarchaeum viruses,unclassifiedsequencesandhitsnotassignedweregroupedunderthe genomeaswellastheLCGC14AMPmetagenome,usingthisLokiarchaeumactin G2015 MacmillanPublishersLimited.Allrightsreserved
Description: