The Evolution and Functional Impact of Human Deletion Variants Shared with Archaic Hominin Genomes Yen-Lung Lin,1 Pavlos Pavlidis,2 Emre Karakoc,3 Jerry Ajay,4 and Omer Gokcumen*,1 1DepartmentofBiologicalSciences,StateUniversityofNewYorkatBuffalo,NY,US 2InstituteofMolecularBiologyandBiotechnology(IMBB),FoundationofResearchandTechnology—Hellas,Heraklion,Crete, Greece 3DepartmentofEvolutionaryGenetics,MaxPlanckInstituteforEvolutionaryBiology,Plo€n,Germany 4DepartmentofComputerScienceandEngineering,StateUniversityofNewYorkatBuffalo,NY,US *Correspondingauthor:E-mail:[email protected]. Associateeditor:JoshuaAkey Abstract D o w n Allele sharing between modern and archaic hominin genomes has been variously interpreted to have originated from lo a ancestral genetic structure or through non-African introgression from archaic hominins. However, evolution of poly- d e d morphic human deletions that are shared with archaic hominin genomes has yet to be studied. We identified 427 fro polymorphichumandeletionsthataresharedwitharchaichominingenomes,approximately87%ofwhichoriginated m h beforetheHuman–Neandertaldivergence(ancient)andonlyapproximately9%ofwhichhavebeenintrogressedfrom ttp Neandertals (introgressed). Recurrence, incomplete lineage sorting between human and chimp lineages, and hominid- s://a specificinsertionsconstitutetheremainingapproximately4%ofallelesharingbetweenhumansandarchaichominins. c a We observed that ancient deletions correspond to more than 13% of all common (45% allele frequency) deletion de m variationamongmodernhumans.Ouranalysesindicatethatthegenomiclandscapesofbothancientandintrogressed ic deletionvariantswereprimarilyshapedbypurifyingselection,eliminatinglargeandexonicvariants.Wefound17exonic .ou p deletionsthataresharedwitharchaichominingenomes,includingthoseleadingtothreefusiontranscripts.Theaffected .c o genesareinvolvedinmetabolismofexternalandinternalcompounds,growthandspermformation,aswellassuscep- m /m tibilitytopsoriasisandCrohn’sdisease.Ouranalysessuggestthatthese“exonic”deletionvariantshaveevolvedthrough b e differentadaptiveforces,includingbalancingandpopulation-specificpositiveselection.Ourfindingsrevealthatgenomic /a structuralvariantsthataresharedbetweenhumansandarchaichominingenomesarecommonamongmodernhumans rtic le andcaninfluencebiomedicallyandevolutionarilyimportantphenotypes. -a b s Keywords:Neandertal,Denisovan,copynumbervariation(CNV),DMBT1,LCE3C,GHR,ACOT1,GSTT1. tra c t/3 2 Introduction /4/1 A 00 The release of ancient Neandertal and Denisovan genomes regions in modern human genomes that carry Neandertal 8 r /1 t allowed us to study the relationship between genomes of introgressedsequencesoverlapwithgeneslessthanexpected 07 ic ancient hominids and modern humans. Neandertal and bychance.Thisimpliesthatpurifying(i.e.,negativeselection 62 2 le Denisovan genomes are more closely related to each other against deleterious phenotypes) removed some Neandertal 4 b y thantheyaretomodernhumangenomesandtheydiverged allelesaftertheintrogressionevent. g u frommodernhumanancestorsapproximately500,000years Archaicadmixtureisnottheonly sourceof ancientvari- e s ago (Pru€fer et al. 2014). Recent studies have shown that ation in human genome. Previous studies identified highly t o n archaic hominins, including but not limited to Neandertals divergent haplotypes in the human genome, potentially 1 1 and Denisovans, contributed genetic material to modern indicating the presence of ancient structure in Africa that A p humans (Veeramah and Hammer 2014). The origin and has been maintained since before the expected coalescent ril 2 impactoftheseintrogressionsvarygeographicallyandinvolve date for modern human genetic variation (e.g., Barreiro 01 9 differentspecies(Greenetal.2010;Reichetal.2010;Hammer et al. 2005; Cagliani et al. 2008; Teixeira et al. 2014). One et al. 2011; Lazaridis et al. 2013). The exact timing and geo- hypothesis for preservation of these haplotypes is that they graphicaloriginoftheseintrogressionshavebeenthefocusof may have been under polymorphism-conserving balancing severalrecentstudies(Greenetal.2010;CurratandExcoffier selection(e.g.,heterozygoteadaptivefitnessadvantage). 2011; Wall et al. 2013; Hu et al. 2014). In 2014, two studies Genomicstructuralvariants,thatis,deletions,duplications, documented the genome-wide distribution of Neandertal inversions,andtranslocationsofgenomicsegments,havere- allelesacrossmodernhumangenomes(Sankararamanetal. cently been recognized as a major part of human genomic 2014; Vernot and Akey 2014). These studies found that variation (Conrad et al. 2009). It has been an ongoing (cid:2)TheAuthor2015.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution. ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense (http://creativecommons.org/licenses/by-nc/4.0/),whichpermitsnon-commercialre-use,distribution,andreproductioninany Open Access medium,providedtheoriginalworkisproperlycited.Forcommercialre-use,[email protected] 1008 Mol.Biol.Evol.32(4):1008–1019 doi:10.1093/molbev/msu405 AdvanceAccesspublicationJanuary2,2015 MBE Ancienthumandeletions . doi:10.1093/molbev/msu405 challengetodiscoverandgenotypegenomicstructuralvari- Torigorouslyidentifytheseoutliers,weassumedthatthe ants(Alkanetal.2011).However,inthelast5years,therehas read-depth/sizeratioinNeandertalandDenisovangenomes beenmajorprogressindiscoveryandgenotypingofdeletion across these intervals follows a normal distribution (fig. 1A polymorphisms(1000GenomesProjectConsortium2012). andB)withtheobservedmeanandstandarddeviation.We We previously described a common deletion polymor- then identify outliers that do not fit into this distribution phism in modern humans that is shared with Neandertal (P<0.01). Using this conservative estimate, we identified and Denisovan genomes (Gokcumen et al. 2013). We rea- 325 and 227 polymorphic 1KG deletions that are shared soned that this deletion has evolved before Human– withNeandertalandDenisovangenomes,respectively. Neandertal/Denisovan divergence in Africa and has been To ensure the accuracy of our genotyping pipeline, we maintained through balancing selection. Herein, we extend conducted multiple checks. First, to avoid any GC bias that ouranalysestotheentiregenometoidentifydeletionvariants mayaffectmapping,weinvestigatedtheGCcontentofallthe observed among modern humans that are shared with human deletions and those that we found to be shared NeandertalandDenisovangenomes. with archaic hominins. We found no significant difference D o (supplementary fig. S2, Supplementary Material online). w n Results Second, we have manually checked all calls in both the lo a d Neandertal and Denisovan genomes using Integrative e Identification of Polymorphic Human Deletions that d Are Shared with Archaic Hominins Genome Browser (Nicol et al. 2009) (e.g., supplementary fro fig. S3A, Supplementary Material online). Third, we were m Toidentifythedeletionpolymorphismsthatarealsopresent alsoabletotakeadvantageofrecentlypublishedexomese- http ihnigthh-ecoNnefiadnednecretadlealentdionDepnoilsyomvaonrpgheinsmomsdeos,cwumeestnatretdedbywtihthe qNueeanncdeesrtoalfuthseredeinNethanisdsetrutadly)getnoovmereisfy(itnhcelupdriensgentcheeoAflt1a6i s://ac a 1000GenomesProjectPhase1datarelease(1000Genomes exonic deletions in other Neandertal genomes (supplemen- de m ProjectConsortium2012).Briefly,thisdataset(referredtoas tary fig. S3B, Supplementary Material online). These exome ic “1KG deletions”) includes 14,422 deletion polymorphisms sequences also verified the accuracy of our observation for .ou p detected among 1,092 human genomes across 14 popula- one human deletion that we found to be deleted in .c o tions. The deletions were identified by comparing Denisovan,butnotinAltaiNeandertalgenome. m/m genomeresequencingdatatothehumanreferencegenome Last but not least, we used the software SPLITREAD b e (FHurgt1h9e)rmanodre,totheeabchreaoktphoeirntussionfgtmheuseltipploelydmisocropvheirsymtsooarlse. (1h0t,t2p0:/1/4s)pltiotrereamd.saopuNrceeafonrdgeer.ntaelta/,ndlaDstenaiscocveasnserdeadDsetcoemjubnecr- /article well characterized, and an extensive validation effort was tionsofthedeletionbreakpointsasdefinedin1KGdeletion -ab made to ensure the accuracy of these deletion polymor- dataset(supplementaryfig.S3CandtableS1,Supplementary stra phisms (see 1000 Genomes Project Consortium 2012). It is Material online). We were able to provide strong split-read ct/3 importanttonotethatbecauseoftheemphasisonaccuracy, supportforallofthe220nonrecurrentdeletionsweobserved 2/4 mostcomplexregionsofthegenome(e.g.,telomericregions) inDenisovangenome(pleaseseebelowfordiscussionofthe /1 0 thatcouldleadtofalse-positivegenotypingmaybeunderrep- recurrent, ancient, and introgressed deletions), with at least 08 /1 resentedin1KGdeletions.Assuch,thisdatasetprovidesan 20readsmappingtothebreakpointjunctions.Potentiallydue 0 7 accurate and straightforward starting point for genotyping todifferencesinsequencelengthsandwhole-genomeampli- 62 2 polymorphichumandeletionsinNeandertalandDenisovan fication artifacts, Neandertal sequences performed worse 4 b genomes. than Denisovan sequences for this analysis, with overall y g Recent Neandertal (Pru€fer et al. 2014) and Denisovan distributionofnumberofsplit-reads,areanorderofmagni- ue s (Meyeretal.2012)sequencesprovidehigh-depthcoverage tudesmallerthanthoseobservedforDenisovans.Eventhen, t o n (~30(cid:2))alignedtothehumanreferencegenome,thesame wewereabletoshowthatatleasttwosplit-readsoverlapthe 1 1 assembly against which 1KG deletions were compiled. breakpoint junctions for approximately 89% (280 of 315) of A p As such, to detect human deletion variants that are the nonrecurrentNeandertaldeletions. Notethat we found ril 2 shared with archaic hominins, we simply “genotyped” the strongsplit-readsupportfromDenisovanreadsforallofthe 01 9 1KG deletions using read-depth data from high-coverage 123nonrecurrentdeletionsthatwefoundinbothNeandertal Neandertal and Denisovan genomes. Briefly, we calculated andDenisovans.Basedontheseobservations,wearguethat thenumberofDenisovanandNeandertalreadsmappingto the reduced split-read support from Neandertals is due to a given interval in the human reference genome where a lowerpower,ratherthanfalsepositivesinourdeletiondata deletion polymorphism was previously detected among set. Regardless, we were able to provide split-read support modern humans. As expected the number of reads of either from Neandertals or Denisovans for approximately these regions correlate well with size for both Neandertal 95%ofthenonrecurrentdeletions. and Denisovan sequences (R2=0.8582 and 0.8713, respec- To further quantify our accuracy, we applied the same tively). However, there are intervals with obviously less procedure for genotyping deletions on a set of random than expected read depth as compared with their size intervals that match the size distribution of 1KG deletion (supplementary fig. S1A and B, Supplementary Material data set. Based on this analysis, we found seven Neandertal online). These outliers suggest potential deletions in the and nine Denisovan deletions, corresponding to a false availablearchaicgenomes. discovery rate of 0.02 and 0.04 for Neandertal and 1009 MBE Linetal. . doi:10.1093/molbev/msu405 Denisovan deletions, respectively. Overall, we conservatively estimate that at least 427 (~3%) of all polymorphic human deletionsaresharedwitheitherNeandertalsandDenisovans orboth(fig.1C). Ancient Genetic Structure, not Introgression, Explains the Majority of Polymorphic Deletions among Humans that Are Shared with Archaic Hominins We considered several scenarios to explain the origins of polymorphicdeletionsamongmodernhumanswithrespect to the Neandertal and Denisovan genomes (fig. 2). For the majorityofhumandeletions,wefoundnoevidenceofallele D o sharing with archaic hominins. We will refer to these as w n “human-specific deletions.” It is important to note that our lo a genotyping strategy in archaic hominin genomes is highly de d conservative and would not be able to pick up deletions fro that varied among archaic hominins at low frequencies. m h As such, the human-specific deletion data set may include ttp s several variantsthatareactuallysharedwithotherhominin ://a genomes. First, we considered recurrence of the deletion c a d polymorphisms in humans, Neandertals and/or Denisovans. e m Under this scenario, we expect that the breakpoints of the ic .o deletions differ between species. Indeed, we found evidence u p for 15 recurrent deletions in our manual inspection .co m for different breakpoints (e.g., supplementary fig. S3D, /m Supplementary Material online), explaining approximately be 3.5% of the deletions shared with archaic human genomes. /artic Wewillrefertothesedeletionsas“recurrentdeletions.”With le the high-quality Denisovan split-read support, we found no -ab s evidenceforsimilar,butnotexactbreakpointsthatwemissed tra c inourmanualinspection.Itisunlikely,butstillpossibleforthe t/3 2 deletionstoberecurrenteveniftheyshareexactbreakpoints. /4 The 1KG deletion data set was compiled with accuracy as /10 0 amainpriority.Assuch,evolutionarilycomplexregionsofthe 8/1 genomethatshowhighlevelsofrecurrence(e.g.,Gokcumen 07 6 et al. 2011) may have been underrepresented and further 2 2 4 studiesmay uncoverimportantrecurrentdeletions in these b y regions. Therefore, our estimate of 15 recurrent events is g u a lower bound for the recurrence of deletions among es Human/Neandertallineage. t on Second, we considered the possibility that these deletion 11 A FIG. 1. Identification of deletions shared with Neandertal and polymorphisms may actually be hominid-specific sequences p Denisovan genomes. (A) We counted the number of Neandertal (e.g., novel insertions or duplications) that evolved in the ril 2 0 sequence reads mapping to the genomic intervals where the human modernhumanlineageandremainpolymorphicwithinthe 1 9 deletionswerereportedby1000GenomesProject.Thehistogramshows species. Under this scenario, these regions then should be onthexaxisthedistributionofnumberofreadsineachintervaldivided observed as deletions in nonhuman outgroups, including bythesizeoftheintervalandontheyaxisthefrequencyoftheintervals chimpanzees and rhesus macaques. We found only two observed.Thedottedblueverticallineindicatesthemeanofthisdis- regionsthatmayfitthispattern,asallotherdeletionregions tribution. The solid blue line shows the normal distribution with the that we have investigated had an orthologous sequence in mean and the standard deviation of the observed distribution. Thedottedredverticallineindicatestheread-depth/sizeratio,where chimpanzeeorrhesusmacaquereferencegenomes(supple- theprobabilityofobservingasmallerratioundernormaldistributionis mentary table S1, Supplementary Material online). 0.01. For the intervals that fall into the red transparent box, the read-depth in Neandertal resequencing is lower than expected by chance/noise,andconsequentlyweassumedthatthesearedeletedin FIG.1.Continued theNeandertalgenome.Thesedeletionswereindicatedbyredhisto- deletions that are found only in humans (light brown), shared with gram bars. (B) The read depth/size ratio distribution analysis for the Neandertal genome(pink), shared with Denisovangenome (blue), or Denisovangenome.(C)Venndiagramshowingthenumberofhuman sharedwithboth(purple). 1010 MBE Ancienthumandeletions . doi:10.1093/molbev/msu405 In addition, we found two deletions, for which rhesus ma- caque but not the chimpanzee reference genome has an orthologoussequence.Themostlikelyexplanationofthisis incompletelineagesortinginhuman–chimpanzeelineagefor these variants (Caswell et al. 2008). Albeit interesting, dele- tionsthatareexplainedbythesetwoscenariosconstituteless than1%ofthedeletionsthataresharedwithNeandertaland Denisovan genomes and will be referred to as the “other deletions.” Third, we considered previously reported introgression from Neandertals into ancestors of modern Eurasians as a potential source of the observed allele sharing. Under this scenario, the Neandertals contributed genetic material to D o ancestorsofallnon-Africanpopulations.Onechallengewas w n thatwecouldnotmerelydependonthefrequencydistribu- loa d tionofdeletionsinAfricanandnon-Eurasianpopulationsto e d distinguish an introgression scenario from the ancient fro m structurescenario.Thefrequencydistributionofanygenetic h variant is highly susceptible to drift and recent migrations. ttp s As such, we further evaluated the overlap of deletion poly- ://a morphisms with Neandertal-introgressed regions that were ca d identified based on single-nucleotide variation-based haplo- em typeconstruction(VernotandAkey2014).Overall,weexpect ic.o that 1) most polymorphic deletions that introgressed from up .c Neandertals would have breakpoints precisely shared o m between the Neandertal genome and modern human dele- /m b tions(i.e.,nonrecurrent),2)thechimpanzeegenomewillhave e /a thesequencethatisdeletedinthehumangenome,3)these rtic deletions fall into previously reported regions where le -a introgressionwasdetected(VernotandAkey2014),4)such b s deletions do not exist or have lower frequency in Africa as tra c compared with Eurasia, and 5) if they exist in Africa, the t/3 2 haplotypes that carry them have lower nucleotide diversity /4 /1 than Eurasian haplotypes carrying the deletions. As 0 0 8 Denisovans contributed genetic materials mostly to the /1 0 Melanesian populations (Skoglund and Jakobsson 2011; but 7 6 seeforlowlevelDenisovanancestryinAsiaHuerta-Sa(cid:2)nchez 22 4 etal.2014),weonlyassumedNeandertalintrogressionforthe b y FIG.2. Possibleevolutionaryscenariosexplainingallelesharing(orlack present purpose as 1KG deletions do not include gu e thereof) between modern and archaic hominins. This figure shows Melanesians. In summary, we found that 38 (~9%) of s possible evolutionary scenarios, in the form of cartoon phylogenetic human deletions that are shared with archaic hominin t on trees,explainingthedeletionpolymorphismsacrossdifferentlineages. genomes can be explained by Neandertal introgression. We 11 Red color designates branches where the deletion was observed. A willrefertotheseas“introgresseddeletions”fromnowon. p The number shown under each tree is the number of polymorphic Toindependentlyestimatepotentialmiscategorizationof ril 2 deletions corresponding to the observation. The red headers indicate 0 thelikelymechanisms,whichwereseparatedfromeachotherbydotted low-frequencyancientdeletionsasintrogressed,weusedthe 19 polymorphic human deletions that are shared with the lines,throughwhichtheallelesharinghasevolved.The“human-specific deletion”scenariocoverspolymorphicdeletionswhicharesharednei- Denisovan genome, but not with the Neandertal genome. ther with Neandertal nor with Denisovan genomes. The “recurrent” As 1KG deletions do not include sample from Melanesian scenario covers Neandertal- or Denisovan-shared human deletions, or any other South East Asian populations, which were breakpoints of which vary among different lineages. The “Neandertal reported to have Denisovan introgression, we expect no introgression”scenarioindicatesallelesharingduetoNeandertalgene Denisovan introgression in the 1KG deletions. As such, the flow into non-African human populations. The “ancient genetic proportion of polymorphic deletions that are shared with structure” scenario indicates deletions that were evolved in the Human–Neandertal ancestral population and have been maintained FIG.2.Continued sincethen.The“primateincompletelineagesorting”scenarioindicates as deletions in chimpanzee and rhesus monkey, and show polymor- deletion polymorphisms that have potentially been maintained since phism in hominid genomes. This scenario represents likely novel se- before the Human–Chimpanzee divergence. The “hominid-specific quences that evolved inthe ancestral population of Neandertals and insertion” scenario covers polymorphic deletions that are genotyped humans. 1011 MBE Linetal. . doi:10.1093/molbev/msu405 been maintained since then in extant humans. Overall, 370 (~87%)ofthepolymorphichumandeletionsthatareshared witharchaichomininscanbetracedbacktoancientgenetic structure predating Human–Neandertaldivergence.We will refertotheseas“ancientdeletions”fromhereon. Ourobservations,takenasawhole,supporttheconclusion thatthevastmajorityofallelesharingbetweenhumansand archaichomininsaffectingdeletionvariationisduetoancient genetic structure, rather than introgression. In other words, ourfindingsareconsistentwiththenotionthatmostdeletion polymorphismssharedwitharchaicgenomesevolvedpriorto the Human–Neandertal divergence and have been main- tainedeversince. D o w n lo Deletions that Are Shared with Archaic Hominins a d e Constitute More than 13% of Common Deletion d Variation in Humans, but Are Smaller and Rarely fro m Overlap with Functional Regions of the Genome h ttp Wthaetfoaurendshtahraetdthweitahllealrecfhraeiqcuhenocmieinsionfgtheenodmeleestioanrevasirgiannifits- s://ac a cantly higher than human-specific deletion variants de (P<2.2(cid:2)10(cid:3)16,Wilcoxonranktest;fig.3Bandsupplemen- mic taryfig.S4A,SupplementaryMaterialonline).Wealsofound .ou p FIG. 3. Characterization of ancient deletion variants. (A) This figure thatthedeletionsthatwedetectedinbothNeandertaland .c o showstheproportionsofNeandertalintrogresseddeletions,maintained Denisovan genomes have significantly higher frequency in m ancient deletions, recurrent deletions, and human-specific deletions /m humans than those we detected only in one of the archaic b thatoverlapwithexonicregions.H,human-specificdeletions;R,recur- hominingenomes(P<0.001,Wilcoxonranktest).Although e/a rentdeletions;A,deletionsmaintainedfromancientgeneticstructure; ancient deletion variants correspond to only 2.5% of “all” rtic N,Neandertal-introgresseddeletions.Theorange-coloredsectionindi- le polymorphichumandeletions,theyconstituteapproximately -a catesthefractionofdeletionsthatoverlapswithexons.Deletionpoly- b morphismsareknowntobedepletedforexoniccontent.However,the 13%ofall“common”(allelefrequency 45%)deletionpoly- stra ancientdeletionvariantsareevenlessexonicthanhuman-specificde- morphismsreportedamong1KGdeletions. ct/3 letionvariants(P=0.0011,Chi-squaretest).(B)Cumulativefractionplot Theseobservationswouldimplythattheancestralpop- 2/4 ofallelefrequencyinmodernhumansfordeletionvariantssharedwith ulation that gave rise to the Human, Neandertal, and /1 0 Neandertalsand/orDenisovans(Shared—lightblue),andhuman-spe- Denisova lineages harbored a considerable number of 08 cific(Human—red)deletions.Thexaxisindicatesthefrequencyofthe common deletion variants that have been inherited by all /10 7 deletion variants and y axis indicates the cumulative fraction of the three species, and remain polymorphic in extant humans. 62 deletionsofallthedeletionsatagivenorlowerfrequency.Thesteep 2 Toinvestigatewhetherthesedeletionsarealsopolymorphic 4 slopetowardtheleftendofthehuman-specificdeletionscurveshows b thatthemajorityofthesepolymorphicdeletionshaveallelefrequencies in Neandertals, we manually checked 17 exonic deletions y g u that humans and archaic hominins share among recently e smallerthan0.1.AsfordeletionscommontoNeandertals/Denisovans, s about43%ofthemhaveallelefrequenciesover0.1.Overall,“Shared” released exome sequencing data for three Neandertal ge- t o n deletionsaresignificantlymorecommonthanHuman-specificdeletions nomes, including the Altaian individual that we used in 1 1 (P<2.2(cid:2)10(cid:3)16,Wilcoxonranktest). this study (Castellano et al. 2014) (supplementary fig. S3B, A p SupplementaryMaterialonline).Weverifiedthe16regions ril 2 where we previously observed deletions in the Altai 0 1 9 only Denisovans and categorized as “introgressed” with our Neandertal whole-genome sequence. Furthermore, we ob- pipelinewillgiveusanindirectestimateofmiscategorization. served that these regions are homozygously deleted in the Basedonthis,weestimatethatonly5of102deletionstobe othertwoNeandertalgenomesaswell.Thelikelihoodofnot miscategorizedasintrogressedwithourpipeline. detectinganywithinspeciesvariationacrossthe16deleted Fourth,weconsideredthescenariowheretheageofpoly- lociamongtwoadditionalindividualsisinfinitesimallysmall, morphic deletion variants we observed in humans actually unlesstheallelefrequenciesofthesedeletionsareextremely predates the Human–Neandertal divergence. It has been high (supplementary fig. S4B, Supplementary Material argued previously that some of the ancient variation that online). As such, we conclude that the majority of shared exists in the Human–Neandertal ancestral population has deletions described in our study are indeed fixed and not beenmaintainedinhumans.Assuch,thepolymorphicdele- necessarilypolymorphicwithinNeandertals.Thisisprobably tionsamongmodernhumansthatalsoexistinarchaichomi- duetohighinbreedingreportedforNeandertals(Castellano nins that cannot be explained by introgression should have etal. 2014; Pru€fer etal. 2014).Using the same data set, we originated before the Human–Neandertal divergence and found no evidence for a deletion in the three Neandertal 1012 MBE Ancienthumandeletions . doi:10.1093/molbev/msu405 exomesforoneexonichumandeletionwherewepreviously Totesttheseexpectations,wecalculatedbasicpopulation observed a deletion in Denisovan, but not for the Altai statisticsforancientandnonancientdeletionsincludingthe Neandertal. 10-kb sequence immediately flanking them. We performed Deletion variants have already been shown to be signifi- these analyses in sub-Saharan African populations, which cantly biased away from exonic sequences, indicating the havehighereffectivepopulationssizesthanEurasianpopula- effect of purifying selection (Conrad et al. 2009). We found tionsandconsequentlylesspronetoeffectsofgeneticdrift. thatthedeletionsthataresharedwitharchaicgenomeshave Our results showed that regions harboring ancient deletion even less overlap with exonic regions in the genome than variants indeed yield significantly higher (cid:2) and (cid:3) values as other deletion variants (P=0.0004, Chi-square test, fig. 3A). compared with regions harboring nonancient deletions Taking the heterogeneous nature of genome into consider- (P<10(cid:3)4 for both measures for both YRI (Yoruba in ation,we have simulatedgenomic intervalsusinga genome Ibadan,Nigeria) andLWK (Luhya in Webuye, Kenya) popu- structure correction (Bickel et al. 2010). Our results showed lations,Student’st-test,supplementaryfig.S6,Supplementary approximately 3.4- (P=0.0003) and5.3-fold (P=6.6(cid:2)10(cid:3)8) Material online). This observation is consistent with older D o depletionoftheexonicdeletionsinancientandintrogressed coalescent times for regions harboring ancient deletions. wn deletionsascomparedwithrandomexpectation,respectively. Our analysis also showed that Tajima’s D values for regions loa d We also found that ancient deletions are smaller than harboringancientdeletionsdonotsignificantlydeviatefrom ed human-specific deletions (P<2.2(cid:2)10(cid:3)16, Wilcoxon rank zero with means of (cid:3)0.14 and (cid:3)0.31 for YRI and LWK, re- fro m test,supplementaryfig.S5,SupplementaryMaterialonline). spectively.However,theseregionshavesignificantlylessneg- h Together, these observations are consistent with recent ative Tajima’s D values, when compared with regions ttp s studies that highlight purifying selection as a major force in harboring nonancient deletions (P<10(cid:3)5, one-tail ://a shapingthegenomicdistributionofNeandertal-introgressed Student’s t-test for both YRI and LWK, supplementary fig. ca d single-nucleotide variation in the human genome S6, Supplementary Material online). The most plausible ex- em (Sankararamanetal.2014;VernotandAkey2014).Ourresults planation for this observation is that regions harboring an- ic.o furtheredtheseobservationsforancientandintrogressedge- cientdeletionshavebeenaffectedtoalesserextentbyrecent up nomicstructuralvariants. humandemographythanothergenomicregionswithamore .com recentcommonancestor.Mostofthegenomicregionscon- /m b The Majority of Ancient Deletion Polymorphisms tainpolymorphismsaffectedbythejointeffectofbottleneck e/a Have Evolved under Neutral Conditions and recent expansion. These regions in African populations rtic The most likely evolutionary scenario to explain the lack of willbecharacterizedbyslightlynegativeTajima’sDasprevi- le-a b erixfyoinnigcosevleerclatiponobesleimrveindataemdonmgoasntcaiennctiednetlevtiaorniastiiosnth. aTthpeun-, ohrouaunsnldyd,siwhnogewaonnbcs(ieeer.ngv.te,dGdeatlrheratigitoanpnsoalaynrmdeoHcrhapamhraimscmteersr2ioz0en0d6o)bl.dyOrsneiggtnihoifiencsoatnshutelryr- stract/3 since before the Human–Neandertal divergence, the dele- 2 tions that were not affected by the initial filtering through greatervaluesofTajima’sD.Theseobservationsareconsistent /4/1 0 purifying selection evolved largely under neutral conditions. withthescenariothatthemajorityofancientdeletionsand 08 According to this hypothesis, we expect that demographic their surrounding haplotypes are indeed older than the /10 7 changes and not selective forces operating on ancient dele- genome-wideaverageandwerenotsubjecttomajoradaptive 62 tionsarethemainprocessesthatwouldaffectneutralitytests. pressures. 24 b Ancient deletions provide an especially interesting case. y g These variants are by definition old, often older than the ue Identifying Potentially Adaptive Ancient Deletions s variationobservedinotherpartsofthegenome.Weexpect t o among the Exonic Deletions n that surrounding haplotypes, depending on the recombina- 1 1 tionrateinthatregion,areasoldastheseancientdeletions. As mentionedabove, deletions thatare sharedwitharchaic A p Thus,weexpect,underneutrality,toobservehighervaluesfor hominins are depleted for exonic sequences, indicating the ril 2 Watterson’s (cid:2) estimator ((cid:2) ) and for Tajima’s (cid:2) estimator effectofpurifyingselection.However,wefound17instances 0 W 1 (measuredbytheaveragenumberofpairwisedifferences,(cid:3)) wherethesedeletionsoverlapwithexonicsequences(table1 9 as compared with haplotypes harboring nonancient dele- and fig. 4A–C). We reasoned that these exonic deletions, tions.Tajima’sDmeasuresthenormalizeddifferencebetween which lead to whole gene deletions, fusion transcripts and (cid:2) and(cid:3).Deviationsfrom0indicateeithernonneutralevo- loss-of-functionalleles,areunlikelytoevolveunderneutrality W lutionordemographicchanges.Recentexpansionsgenerate incontrasttoother,nonexonicancientalleles. negativevaluesforTajima’sDbecausemostofthepolymor- Tofurtherinvestigatetheevolutionoftheseexonicdele- phisms are recent and are characterized by low allele fre- tions,we calculatedpopulationdifferentiationbasedon the quency (Tajima 1989). Human demographic history is variationsinallelefrequencywithinandamongpopulations characterizedbyarecentexpansion,thusgeneratingnegative (F )(Hudsonetal.1992)forall1000Genomesdeletions(fig. ST values for Tajima’s D. However, for old regions surrounding 4D).Wealsousedpolymorphismsimmediatelyupstreamre- ancient deletions, we expect that Tajima’s D values will be gionsofthesedeletionstocalculateTajima’sD(Tajima1989) shifted toward higher values as a proportion of polymor- (fig.4E).Wethenanalyzedthehumanexonicdeletionsthat phismswillbeasoldastheregionitself. aresharedwithNeandertalsorDenisovanswithinthecontext 1013 MBE Linetal. . doi:10.1093/molbev/msu405 Table 1. List of Exonic Deletions Shared with Archaic Hominin Genomes. Chromosome Start End Gene Comment Ancestral State chr11 60,228,164 60,229,386 MS4A1 UTR Introgressed chr1 213,002,368 213,013,665 SPATA45 Loss-of-function Introgressed chr13 20,077,974 20,080,405 TPTE2 UTR Ancient chr17 79,285,360 79,286,612 TMEM105 UTR Ancient chr19 41,355,733 41,387,636 CYP2A6, CYP2A7 Fusion Ancient chr10 124,369,735 124,377,838 DMBT1 Partial-CDS Recurrent chr11 3,238,738 3,244,086 MRGPRG UTR Ancient chr8 144,634,064 144,636,239 GSDMD UTR Ancient chr15 25,436,588 25,438,493 SNORD115-12, SNORD115-13 Fusion Ancient chr12 27,648,142 27,655,163 SMCO2 Partial-CDS Ancient chr11 128,682,716 128,683,410 FLI1 UTR Ancient D o w chr7 99,461,389 99,463,562 CYP3A43 UTR Ancient n lo chr14 73,997,051 74,024,450 ACOT1 Loss-of-function Ancient a d chr4 70,124,301 70,230,600 UGT2B28 Loss-of-function Ancient ed chr1 152,555,542 152,587,742 LCE3C Loss-of-function Ancient fro m chr5 42,628,311 42,630,990 GHR Partial-CDS Ancient h chr22 24,343,050 24,397,301 GSTTP1, GSTT1 Fusion Ancient ttp s ://a c a d e m of all polymorphic human exonic deletions across the relativelylowoveralldifferentiationbetweencontinentalpop- ic .o genome. ulationsascomparedwithotherexonic deletion variantsin u p Our analysis highlighted several highly interesting exonic humans(fig.4D).PositiveTajima’sDvaluesandlowpopula- .co m loci (e.g., supplementary fig. S7, Supplementary Material tiondifferentiationarehallmarksofclassicalbalancingselec- /m online)andbelowwewilldiscussseveralofthesegenesindi- tion (e.g., Cagliani et al. 2008). This observation, in parallel b e vgridaupahlilcy.eHxpoawnesvioenr,siatnids bimotptloernteacnktstiontrnoodtuecethhaitgh1)ledveelmsoo-f wtaiitnhedouirnunhdigehrstaalnledleingfrtehqautenthciiessdeslientcioenbheafosrebeeHnummaanin–- /article noise,increasingthethresholdforsignificance(e.g.,falseneg- Neandertaldivergence,isconsistentwithbalancingselection -ab s atives for detecting balancing selection), 2) recombination actingontheLCE3Cdeletionvariant. tra c thatbrokethehaplotypesbetweenthedeletionsandflanking The UGT2B genes comprise evolutionary dynamic genes t/3 regionsmayhaveintroducednoiseintoourcalculations(e.g., thatareinvolvedinmetabolismofexternalandinternalcom- 2/4 bydecreasingotherwisehigherTajima’sD),and3)thatthere pounds, including several hormones and steroids. Adaptive /10 0 aremultiple(e.g.,incontrasttoasingleexternalpressure)and deletionvariantshavealreadybeenreportedforsomemem- 8 /1 complex (e.g., dependent on time, geography, frequency in bersofthisfamily,includingdeletionofUGT2B17gene(Xue 07 6 population,etc.)evolutionaryforcesthatcomplicatethein- etal.2008).Moreover,UGT2B17andUGT2B28,bothofwhich 2 2 terpretationofbothFSTandTajima’sD. are involved in steroid metabolism, have been found to be 4 b y commonlydeletedandthefunctionalimpactofthesedele- g u tions may have cumulative effects (Me(cid:2)nard et al. 2009). e Ancient and Introgressed Exonic, Loss-of-Function s Indeed, we found that the deletion that encompasses t o Deletions Are Associated with Xenobiotic and Lipid n UGT2B28 deletion is to be ancient. This deletion is very 1 Metabolism, Psoriasis, and Spermatogenesis common in Africa, reaching to almost 40% allele frequency. 1 A p Wefoundthatfourofthenonrecurrentexonicdeletionsthat SimilartowhatweobservedforLCE3C,Tajima’sDwascon- ril 2 aresharedwitharchaichominingenomeslikelyleadtoloss- sistently higher than genome-wide distribution for other 01 9 of-function alleles, where either the entire gene or entire exonicdeletionvariants(fig.4E),whereasF wasconsistently ST codingsequencesweredeleted(table1).One suchdeletion lowamongpopulations(fig.4D).Theseobservationsarecon- overlapswiththeLCE3Cgene,whichhasbeenstronglyasso- sistentwithbalancingselectionactingonUGT2B28. ciatedwithpsoriasis(deCidetal.2009).Theallelefrequency Another ancient loss-of-function deletion with very high of LCE3C gene deletion is extremely high among Eurasians, globalfrequencyoverlapswithACOT1gene.Thisgeneisin- reachingtoover70%allelefrequencyinsomeEuropeanand volvedinlipidbiosyntheticpathwaysandhasbeenarguedto Asianpopulations(fig.4A–C).Wefoundconsistentlypositive play a role in regulation of milk fat synthesis in mammals Tajima’sDvaluesacrossalleightnonadmixedEurasianpop- (Rudolph et al. 2007), which is critical in neonatal develop- ulationsanalyzedascalculatedforthevariationintheregion ment. Interestingly, this whole-gene deletion is shared with harboringthedeletion(fig.4E).Whencomparedwithother theNeandertalgenome.Theallelefrequencyofthisdeletion exonichumandeletions,theTajima’sDvaluesreachto95th shows considerable differences between continents, almost percentile in five populations. In contrast, differentiation as reaching fixation among Asian populations, but remains as measured by F between continental populations shows theminoralleleinothercontinents(fig.4BandC).Unlikethe ST 1014 MBE Ancienthumandeletions . doi:10.1093/molbev/msu405 D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /m b e /a rtic le -a b s tra c t/3 2 /4 /1 0 0 8 /1 0 7 6 2 2 4 b y g u e s t o n 1 1 A p ril 2 0 FIG.4. Analysisofexonicvariants.(A–C)Theallelefrequencyofexonicdeletionvariants.Thex-andyaxesindicatetheallelefrequenciesinagiven 1 9 continent.Theheatmapcolorsrepresentnumberofobservations.Theredtodarkbluegradientcorrespondstodecreaseddensityofobserveddeletions. Here,weplotallelefrequenciesofall14,4221KGdeletionvariants.Assuch,redspotsdesignatethousandsofobservationsdecreasingtohundredsof observationsforyellowpixelsandsingleobservationsforpurpledots.Avastmajorityofdeletionshaveverylowallelefrequencies.Theexonicvariants sharedwithNeandertal/Denisovangenomesareshownwithwhite-coloredcircles.(D)HeatmapofthepercentilesofpairwiseF valuesmeasuredfor ST theflankingregionsoftheancientexonicdeletionvariantsbetweentennonadmixedpopulationsusingclusteringwithoutaprioriinput.Thecolorsin theheatmapcorrespondtothepercentileoftheF valuesascomparedwiththedistributionofallexonichumandeletions,withlight-yellow/white ST beingthehighestvaluesobserved(1indicatingthehighestpercentile)anddarkredcorrespondingtolowervalues(0indicatingthelowestpossible percentile).ExactvaluestogeneratethismapcanbefoundinsupplementarytableS2,SupplementaryMaterialonline.Onthexaxisaretheexonic deletionvariants,representedbythenamesofthegenesthattheyaffect.Theoneshighlightedinblueareintrogresseddeletions,theonehighlightedin greenisarecurrentdeletion,andthosethatarenothighlightedareancientdeletions.Onyaxisarepopulationpairsusedintheanalysis.AFR,African; ASN,Asian;EUR,European;CEU,Utahresidents(CEPH)withNorthernandWesternEuropeanancestry;CHB,HanChineseinBeijing;CHS,Southern HanChinese;FIN,FinishinFinland;GBR,BritishinEnglandandScotland;IBS,IberianpopulationinSpain;JPT,JapaneseinTokyo;LWK,Luhyain Webuye,Kenya;TSI,ToscaniinItalia;YRI,YorubainIbadan,Nigeria.(E)HeatmapofthepercentilesofTajima’sDvaluesobservedforancientexonic deletion variants within the Tajima’s D distribution of all exonic human deletions. The colors in the heatmap correspond to the percentile (continued) 1015 MBE Linetal. . doi:10.1093/molbev/msu405 aforementionedLCE3CandUGT2B28genevariation,thehap- deletionsinthisregion.DMBT1alsogotsomaticallydeleted lotypic variation around ACOT1 is characterized by consis- inmalignantbraintumorsandlikelycontributedtocancer tentlynegativeTajima’sDvaluesinhumanpopulationswhen progression (Mollenhauer et al. 1997). Indeed, the region compared with values calculated for other exonic human harboring DMBT1 was discussed within the context of ge- deletions(fig.4E).However,asexpectedfromhighfrequency nomic instability (Mollenhauer et al. 1999). Based on our differences,theF isconsistentlyhigherbetweenAsianand results, we can conclude that it is likely that the genomic ST non-Asian populations. This finding may indicate ongoing instabilityofthisgeneticregionalongchromosome10qhas positiveselectionpressureontheACOT1deletion,specifically likely been maintained since Human–Neandertal diver- favoringthedeletionalleleinAsianpopulations. gence. Moreover, the deletion variant has been associated We foundthat only one of theloss-of-function deletions withCrohn’sdisease(Renneretal.2007).Wefoundthatthe wasintrogressedfromNeandertals.Thisdeletionincludesall Tajima’sDcalculatedbasedonthehaplotypicvariationup- of the coding sequences of the spermatogenesis-associated streamoftheDMBT1deletionishighlynegativeandinthe gene SPATA45. Several evolutionarily important phenotypic 5th percentile for two European populations when com- D o trends,suchasreproductiveefficacyandsuccess,responsesto pared with Tajima’s D values calculated similarly for all w n sexualselectionpressuresorapoptoticpathwaysthatregulate human exonic deletions in these populations (fig. 4E). loa d spermselection,arelinkedtospermatogenesis.Assuch,the Moreover, a recent comprehensive analysis for balancing e d lossoffunctionofSPATA45duetotheintrogresseddeletion selection among human genomes identified DMBT1 as fro mentionedaboveisaprimecandidateforadaptiveforcesto one of the top candidates for balancing selection in both m h acting after introgression from Neandertals. Indeed, Vernot CEU and YRI populations for which the analysis was con- ttp s and Akey (2014) described a Neandertal-derived haplotype ducted (DeGiorgio et al. 2014). The low Tajima’s D values ://a for the functionally similar gene, SPATA18. Unlike the andthebalancingselectionreportedrecentlyseemtobein ca d SPATA18 haplotype however, the deletion variant affecting conflict. However, as mentioned before, it is plausible that em SPATA45 is relatively rare, found in less than 5% of human complex mutational and adaptive mechanisms may have ic.o genomes. shaped the haplotypes carrying some of the deletion vari- up .c ants,leavingcomplicatedsignaturesofadaptation.Itissafe o m Incomplete Gene Deletions Lead to Novel Transcripts, toargue,basedonourresultsandthoseofpreviouspubli- /m b Including Gene Fusions cationsthatDMBT1deletionvariationlikelyevolvedunder e /a Nineofthe17exonicdeletionssharedwitharchaichominins nonneutralpressures. rtic overlapwithpartsofgenes.Thesedeletionspotentiallylead Oneunexpectedobservationwasthatthreeoftheancient le-a to alternative protein products, rather than deleting the deletionsledtofusiontranscripts,wherebycodingsequences bs entire coding sequences. One such deletion overlaps with oftwoseparategenesarefused.Thesedeletionscombinethe trac the exon 3 of growth hormone receptor gene (GHR). This transcripts of SNORD115-12 and SNORD115-13, as well as t/32 well-studieddeletionvariantleadstoatranscriptthatmisses CYP2A6 and CYP2A7 genes. Another such deletion variant, /4/1 exon 3 (d3). The d3 haplotype was associated with smaller which is very common (435%) in all human populations, 00 8 birthsize(Sørensenetal.2010;Padidelaetal.2012).Thed3 fuses GSTT1 and GSTTP1 genes. GSTT1 is also involved in /1 0 haplotypewasalsolinkedtoa1.7–2timesincreaseingrowth metabolizingexternalcompounds.Thehaplotypesurround- 76 2 acceleration in children that are treated with growth hor- ing this gene shows one of the highest Tajima’s D values 24 mone and, consequently is a major target for pharmacoge- measuredforexonicdeletionsinhumans(fig.4E).Inaddition, by nomicsresearch(DosSantosetal.2004).Theallelefrequency thereisrelativelyhighpopulationdifferentiationasmeasured gu e ofthisdeletionisapproximately44%inAfrica,approximately by FST between continental populations, especially between st o 31%inEurope,butonly17%inAsia.Notsurprisingly,theFST African and Eurasian populations (fig. 4D). This observation n 1 between Asian and non-Asian populations is consistently may be explained by a scenario similar to that is often put 1 A hcaigthinegr tgheaongrgaepnhoymdee-wpeidnedeanvterapgoesi(tfiivge. 4sDel)e,cptiootnenstiimalliylairndtio- feosrswenacred,ftohresivcakrlieatcieolnlttrhaiattinleamdaslatroias-isctkrilcek-ceenllgteroagitrahpahsiebse.eInn pril 20 1 ACOT1. maintainedinthepopulationthroughgeography-specificbal- 9 Weidentifiedonlyone recurrent exonic deletion thatis ancing selection (reviewed in Dean et al. 2002). A similar shared with archaic hominins overlapping with DMBT1. scenario would explain the extremely high Tajima’s D in Unliketheotherexonicdeletionswhichare sharedamong African populations observed for GSTT1–GSTTP1 fusion hominingenomesbecauseofcommondescentorintrogres- deletion,aswellasthehighF valuesbetweenAfricanand ST sion,DMBT1deletionsharingismostlikelyduetorecurrent non-Africanpopulationsobservedforthispolymorphism. FIG.4.Continued oftheTajima’sDvaluesascomparedwithTajima’sDvaluescalculatedforallexonichumandeletions,withlight-yellow/whitebeingthehighestvalues observed(1indicatingthehighestpercentile)anddarkredcorrespondingtolowervalues(0indicatingthelowestpossiblepercentile).Exactvaluesto generatethismapcanbefoundinsupplementarytableS2,SupplementaryMaterialonline.Onthexaxisaretheexonicdeletionvariants,represented bythenamesofthegenesthattheyaffect.Theoneshighlightedinblueareintrogresseddeletions,theonehighlightedingreenisarecurrentdeletion, andthosethatarenothighlightedareancientdeletions.OntheyaxisarethepopulationsforwhichTajima’sDvaluesweremeasured.Thepopulation designationscanbefoundabove,inthelegendoffigure4D. 1016 MBE Ancienthumandeletions . doi:10.1093/molbev/msu405 Conclusion Materials and Methods Deletionvariants,thebestcharacterizedofallgenomicstruc- Genotyping of Neandertal and Denisovan Genomes turalvariants,havebeenshowntoplayanimportantrolein We used 1000 Genomes data set Phase 1 data set (http:// human evolution (McLean etal.2011).However,theevolu- www.1000genomes.org/data, last accessed December 10, tionaryroleofthesevariantswithinspecieshasnotbeenwell- 2014) (1000 Genomes Project Consortium 2012), as well as established. High-quality sequences and, more importantly, the high-coverage genome-wide sequencing data for highly improved discovery and genotyping tools primarily Neandertal (Pru€fer et al. 2014) and Denisovan (Meyer et al. developed within the context of the 1000 Genomes Project 2012) (available at http://cdna.eva.mpg.de, last accessed haverecentlyallowedforstudyofhumandeletionvariation December10,2014)asourmainstartingpoint.Thesequenc- inapopulationgeneticsframework.Weusedtheseexciting ing data from both Neandertal and Denisovan genomes, as resources to assess deletion variants in humans that are wellasthe1KGdeletionsarealignedtothesameversionof shared with the Neandertal and Denisovan genomes. In so thehumanreferencegenome(Hg19).Assuch,wewereable D doing, we were able to 1) identify hundreds of ancient and to directly measure the number of reads mapping to the ow n introgresseddeletionvariantsinhumans,2)investigatedele- intervals where 14,422 polymorphic human deletions were lo a tion variation within their haplotypic backgrounds, and 3) reported. To accomplish this, we used a custom bedtools de d shed light on the evolution of individual deletion variants (http://bedtools.readthedocs.org/, last accessed December fro thatmayhavephenotypiceffects.Tothebestofourknowl- 10,2014)script. m h edge, this study is the first to document and characterize We then constructed a normal distribution of the read- ttp s deletion variation in humans that are shared with depth(asnormalizedbysize)forNeandertalandDenisovan ://a Neandertal and Denisovan genomes in a genome-wide datasetwherethemeanandstandarddeviationwereequal c a d context. to the observed mean and standard deviation. We then es- e m Our results suggest that the majority of allele sharing in- tablishedathresholdvaluethatcorrespondsto0.01quantile ic .o volving deletion variants between modern humans and ar- inthenormaldistribution,belowwhichweassumedthein- u p chaichomininsisduetoancestralstructureand,notdueto tervals have lower than expected read-depth/size ratio indi- .co m introgression. This observation does not conflict with the catingadeletionintherespectivegenome. /m recentreportsregardingNeandertalandDenisovanintrogres- To determine the false discovery rate, we used the be spiloonretdodmeoedperanncheusmtryanfos,rbautcroantshiedrerhaibghleligphotrstiaonlar(g~e1ly3u%n)eox-f srahnudffloemBeidntfeurnvcatlsiotnhaotfmbeadtctohoeslstthoegseizneerdaitsetriabusteitonofo1f41,4K2G2 /article commondeletionvariationinhumans.Moreover,thegeno- deletions. We then followed the genotyping workflow de- -ab s micdistributionofthesevariantsshowssignaturesofancient scribed above to identify deletions. We found less than ten tra purifying selection, eliminating all but a few exonic variants. deletions in those random regions in both Neandertal and ct/3 Thisobservationcomplementssimilarobservationsmadefor Denisovan genomes, indicating that a false discovery rate is 2/4 essentiallylowerthan5%forbothgenomes. /1 deletion variants in general (Mills et al. 2011), and further 0 0 suggests the potential role of recent, rare deletion variants We used a modified version of SPLITREAD approach 8/1 (http://splitread.sourceforge.net/, last accessed December 0 indetrimentalphenotypesanddisease(Itsaraetal.2009). 7 6 Only a very small percentage (~4%) of these maintained 10, 2014) to remap the Neandertal and Denisovan reads 22 ancient and introgressed deletions are exonic; however, the to the junctions of the breakpoints of deletions as defined 4 b genesinvolvedaffectevolutionarilyrelevantphenotypes,such in 1KG deletion data set, rather than applying it to the y g u whole genome. Specifically, we extended the deletion e asgrowth,immunityandmetabolismofexternalandinternal s breakpoints 50,000 bases to the upstream and downstream t o compounds. Some of these deletions were also associated n to generate out mappable reference contigs. The mappings 1 with common human diseases, including Crohn’s disease 1 to these reference contigs were performed using BWA- A and psoriasis. Exonic deletions are functionally drastic p eventsthatarecomparabletoframeshift,orstop-codonin- MEM method (Li and Durbin 2009). The polymerase ril 2 troducing mutations, or if a gene still functions, to multiple chain reaction duplicates, identified by the reads that are 019 mapping exactly to the same positions, were removed nonsynonymoussinglenucleotidevariants.Assuch,weargue from the downstream analysis. Due to the fact thatunlikethemajorityofancientdeletions,thosethatover- that short reads are single ended, there is an increased lap with exons that have been maintained since Human– rate of support for the regions that are repetitive or in Neandertal divergence are unlikely to have evolved under segmental duplication regions. SPLITREAD method also neutral conditions. Instead, these ancient exonic deletions relies on the consistent mapping of the splits at the break- mayhavebeenmaintainedthroughacombinationof1)geo- points with base pair resolution. The deletions that are graphicallydifferent,potentiallyfrequency-dependent,adap- recurrent with different breakpoints are expected to have tive forces and 2) balancing selection. We argue that less support overall. We observed significantly reduced pathways that involve in these important phenotypes are number of split-read mapping with the Neandertal viable targets for some form of complex adaptive selection genome sequences as compared with what was observed thathelpedmaintainthegeneticstructuralvariationatthese for Denisovan genome sequences. We think this is due to lociforhundredsofthousandsofyears. inherent read-length variation and differential impact of 1017
Description: