RESEARCHARTICLE Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes MarkLipson1*,Po-RuLoh2,3,SriramSankararaman1,3,NickPatterson3,BonnieBerger3,4, DavidReich1,3,5* 1DepartmentofGenetics,HarvardMedicalSchool,Boston,Massachusetts,UnitedStatesofAmerica, 2DepartmentofEpidemiology,HarvardSchoolofPublicHealth,Boston,Massachusetts,UnitedStatesof America,3MedicalandPopulationGeneticsProgram,BroadInstituteofMITandHarvard,Cambridge, Massachusetts,UnitedStatesofAmerica,4DepartmentofMathematicsandComputerScienceandArtificial IntelligenceLaboratory,MassachusettsInstituteofTechnology,Cambridge,Massachusetts,UnitedStatesof America,5HowardHughesMedicalInstitute,HarvardMedicalSchool,Boston,Massachusetts,United StatesofAmerica *[email protected](ML),[email protected](DR) OPENACCESS Abstract Citation:LipsonM,LohP-R,SankararamanS, PattersonN,BergerB,ReichD(2015)Calibratingthe Thehumanmutationrateisanessentialparameterforstudyingtheevolutionofourspecies, HumanMutationRateviaAncestralRecombination interpretingpresent-daygeneticvariation,andunderstandingtheincidenceofgeneticdis- DensityinDiploidGenomes.PLoSGenet11(11): e1005550.doi:10.1371/journal.pgen.1005550 ease.Nevertheless,ourcurrentestimatesoftherateareuncertain.Mostnotably,recent approachesbasedoncountingdenovomutationsinfamilypedigreeshaveyieldedsignifi- Editor:GrahamCoop,UniversityofCaliforniaDavis, UNITEDSTATES cantlysmallervaluesthanclassicalmethodsbasedonsequencedivergence.Here,wepro- poseanewmethodthatusesthefine-scalehumanrecombinationmaptocalibratetherate Received:February18,2015 ofaccumulationofmutations.Bycomparinglocalheterozygositylevelsindiploidgenomes Accepted:September3,2015 tothegeneticdistancescaleoverwhichtheselevelschange,weareabletoestimatea Published:November12,2015 long-termmutationrateaveragedoverhundredsorthousandsofgenerations.Weinfera Copyright:©2015Lipsonetal.Thisisanopen rateof1.61±0.13×10−8mutationsperbasepergeneration,whichfallsinbetweenphylo- accessarticledistributedunderthetermsofthe geneticandpedigree-basedestimates,andwesuggestpossiblemechanismstoreconcile CreativeCommonsAttributionLicense,whichpermits ourestimatewithpreviousstudies.Ourresultssupportintermediate-agedivergences unrestricteduse,distribution,andreproductioninany medium,providedtheoriginalauthorandsourceare amonghumanpopulationsandbetweenhumansandothergreatapes. credited. DataAvailabilityStatement:Alldatahave previouslybeenmadeavailableaspartofrefs.21 and24. AuthorSummary Funding:MLacknowledgessupportfromthe Therateatwhichnewheritablemutationsoccurinthehumangenomeisafundamental SimonsFoundation(www.simonsfoundation.org)and parameterinpopulationandevolutionarygenetics.However,recentdirectfamily-based NationalInstitutesofHealth(www.nih.gov;grant R01GM108348,toBB).PRLwassupportedby estimatesofthemutationratehaveconsistentlybeenmuchlowerthanpreviousresults NationalInstitutesofHealthfellowshipF32 fromcomparisonswithothergreatapespecies.Becausesplittimesofspeciesandpopula- HG007805andSSbyNationalInstitutesofHealth tionsestimatedfromgeneticdataareofteninverselyproportionaltothemutationrate, grantK99GM111744.NPandDRweresupportedby resolvingthedisagreementwouldhaveimportantimplicationsforunderstandinghuman NationalScienceFoundation(www.nsf.gov) evolution.Inourwork,weapplyanewtechniquethatusesmutationsthathaveaccumu- HOMINIDgrant#1032255andNationalInstitutesof latedovermanygenerationsoneithercopyofachromosomeinanindividual’sgenome. HealthgrantGM100233.DRisanInvestigatoratthe PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 1/25 CalibratingtheHumanMutationRate HowardHughesMedicalInstitute(www.hhmi.org). Thefundershadnoroleinstudydesign,data Insteadofanexternalreferencepoint,werelyonfine-scaleknowledgeofthehuman collectionandanalysis,decisiontopublish,or recombinationratetocalibratethelong-termmutationrate.Ourprocedureaccountsfor preparationofthemanuscript. possibleerrorsfoundinrealdata,andwealsoshowthatitisrobusttoarangeofmodel CompetingInterests:Theauthorshavedeclared violations.Usingeightdiploidgenomesfromnon-Africanindividuals,weinferarateof thatnocompetinginterestsexist. 1.61±0.13×10−8single-nucleotidechangesperbasepergeneration,whichisintermedi- atebetweenmostphylogeneticandpedigree-basedestimates.Thus,ourestimateimplies reasonable,intermediate-agepopulationsplittimesacrossarangeoftimescales. Introduction Allgeneticvariation—thesubstrateforevolution—isultimatelyduetospontaneousheritable mutationsinthegenomesofindividualgermlinecells.Themostcommonlystudiedmutations arepointmutations,whichconsistofsingle-nucleotidechangesfromonebasetoanother.The rateatwhichthesechangesoccur,incombinationwithotherforces,determinesthefrequency withwhichhomologousnucleotidesdifferfromoneindividual’sgenometoanother. Anumberofdifferentapproacheshavepreviouslybeenusedtoestimatethehumanmuta- tionrate[1–3],ofwhichwementionfourcategorieshere.Thefirstmethodistocountthe numberoffixedgeneticchangesbetweenhumansandanotherspecies,suchaschimpanzees [4].Populationgenetictheoryimpliesthatifthemutationrateremainsconstant,thenneutral mutations(thosethatdonotaffectanorganism’sfitness)shouldaccumulatebetweentwo genomesataconstantrate(thewell-known“molecularclock”[5]).Thus,themutationrate canbeestimatedbasedonthedivergencetimeofthegenomes,ifthiscanbeconfidently inferredfromfossilevidence.However,eveniftheageoffossilremainscanbeaccuratelydeter- mined,assigningtheirproperphylogeneticpositionsisoftendifficult.Moreover,becauseof sharedancestralpolymorphism,thetimetothemostrecentcommonancestorisalwaysolder —andsometimesfarolder—thanthetimeofspeciesdivergence,meaningthatsplit-timecali- brationscannotalwaysbedirectlyappliedtogeneticdivergences. Asecondcommonapproach,whichhasonlybecomepossiblewithinthelastfewyears,isto countnewlyoccurringmutationsindeepsequencingdatafromfamilypedigrees,especiallypar- ent-childtrios[6–10].Thisapproachprovidesadirectestimatebutcanbetechnicallychalleng- ing,asitissensitivetogenotypeaccuracyanddataprocessingfromhigh-throughputsequencing. Inparticular,sporadicsequencingandalignmenterrorscanbedifficulttodistinguishfromtrue denovomutations.Surprisingly,thesesequencing-basedestimateshaveconsistentlybeenmuch lowerthanthosebasedonthefirstapproach:intheneighborhoodof1–1.2×10−8perbaseper generation,asopposedto2–2.5×10−8forthosefromlong-termdivergence[1–3]. Athirdmethod,andanotherthatisonlynowbecomingpossible,istomakedirectcompari- sonsbetweenpresent-daysamplesandprecisely-datedancientgenomes.Thismethodissimi- lartothefirstone,butbyusingtwotime-separatedsamplesfromthesamespecies,itavoids thedifficultyofneedinganexternallyinferredsplittime.Arecentstudyofahigh-coverage genomesequencefroma45,000-year-oldUpperPaleolithicmodernhumanproducedtwoesti- matesofthistype[11].Directmeasurementofdecreasedmutationalaccumulationinthissam- pleledtorateestimatesof0.44–0.63×10−9perbaseperyear(rangeof14estimates),or1.3– 1.8×10−8perbasepergeneration(assuming29yearspergeneration[12]).Analternativetech- nique,leveragingtimeshiftsinhistoricalpopulationsizes,yieldedanestimateof0.38– 0.49×10−9perbaseperyear(95%confidenceinterval),or1.1–1.4×10−8perbasepergenera- tion,althoughare-analysisofdifferentmutationalclassesledtoatotalestimateof0.44– 0.59×10−9perbaseperyear(1.3–1.7×10−8),inbetteragreementwiththefirstapproach[11]. PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 2/25 CalibratingtheHumanMutationRate Finally,afourthtechniqueistocalibratetherateofaccumulationofmutationsusingasepa- rateevolutionaryratethatisbettermeasured.Inonesuchstudy,theauthorsusedamodelcou- plingsingle-nucleotidemutationstomutationsinnearbymicrosatelliteallelestoinferasingle- nucleotiderateof1.4–2.3×10−8perbasepergeneration(90%confidenceinterval)[13].In principle,thisgeneraltechniqueisappealingbecauseitonlyinvolvesintrinsicinformation, withoutanyreferencepoints,andyetcanleveragethesignalofmutationsthathaveoccurred overmanygenerations. Inthisstudy,wepresentanewapproachthatfallsintothisfourthcategory:wecalibratethe mutationrateagainsttherateofmeioticrecombinationevents,whichhasbeenmeasuredwith highprecisioninhumans[14–17].Intuitively,ourmethodmakesuseofthefollowingrelation- shipbetweenthemutationandrecombinationrates.Ateverysiteiinadiploidgenome,the twocopiesofthebasehavesometimetomostrecentcommonancestor(TMRCA)T,mea- i suredingenerations.Thegenomecanbedividedintoblocksofsequencethathavebeeninher- itedtogetherfromthesamecommonancestor,withdifferentblocksseparatedbyancestral recombinations.IfagivenblockhasaTMRCAofTandalengthofLbases,andifμistheper- generationmutationrateperbase,thentheexpectednumberofmutationsthathaveaccumu- latedineithercopyofthatblocksincetheTMRCAis2TLμ.Thisistheexpectednumberofhet- erozygoussitesthatweobserveintheblocktoday(disregardingthepossibilityofrepeat mutations).Wealsoknowthatiftheper-generationrecombinationrateisrperbase,thenthe expectedlengthoftheblockis(2Tr)−1.Thus,theexpectednumberofheterozygoussitesper block(regardlessofage)isμ/r. Thisrelationshipallowsustoestimateμgivenagoodpriorknowledgeofr.Ourfullmethod ismorecomplexbutisbasedonthesameprinciple.Weshowbelowhowwecancapturethe signalofheterozygosityperrecombinationtoinferthehistoricalper-generationmutationrate fornon-Africanpopulationsoverapproximatelythelast50–100thousandyears(ky).A broadlysimilarideaisalsoappliedinanindependentstudy[18],butoveramorerecenttime scale(upto*3ky,viamutationspresentininferredidentical-by-descentsegments),andthe twofinalestimatesareinverygoodagreement. Results Overviewofmethods Onedifficultyofthesimplemethodoutlinedaboveisthatinpracticewecannotaccurately reconstructthebreakpointsbetweenadjacentnon-recombinedblocks.Instead,weuseanindi- rectstatisticthatcapturesinformationaboutthepresenceofbreakpointsbutcanbecomputed inasimpleway(withoutdirectlyinferringblocks)andaveragedovermanyloci(Fig1). Startingfromacertainpositioninthegenome,theTMRCAofthetwohaploidchromo- somesasafunctionofdistanceineitherdirectionisastepfunction,withchangesatancestral recombinationpoints(Fig1A).Heterozygosity,beingproportionaltoTMRCAinexpectation (anddirectlyobservable),followsthesamepatternonaverage(Fig1B). Ifweconsideracollectionofstartingpositionshavingsimilarlocalheterozygosities,thenas afunctionofthegeneticdistancedawayfromthem,theaverageheterozygositydisplaysa (cid:1) smoothrelaxationfromthecommonstartingvaluetowardtheglobalmeanheterozygosityH astheprobabilityincreasesofhavingencounteredrecombinationpoints(Fig1C).Wedefinea statisticH (d)thatequalsthisaverageheterozygosity,whereSisasetofstartingpointsindexed S bythelocalnumberofheterozygoussitesper100kb(wealsouseSattimestorefertothehet- erozygosityrangeitself).TheTMRCAsofthesepointsdeterminethetimescaleoverwhichour inferredvalueofμismeasured.Ourdefaultchoiceistousestartingpointswithalocaltotalof 5–10heterozygoussitesper100kb(seeMethods). PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 3/25 CalibratingtheHumanMutationRate Fig1.ExplanationofthestatisticHS(d).(A)Ancestralrecombinationsseparatechromosomesintoblocks ofpiecewise-constantTMRCA(andhenceexpectedheterozygosity).(B)Fromthedata,wemeasurelocal heterozygosityasafunctionofgeneticdistance;redandbluecirclesrepresentheterozygousand homozygoussites,respectively,alongadiploidgenome.(C)OurstatisticHS(d)isanaverageheterozygosity asafunctionofgeneticdistanceovermanystartingpointswithsimilarlocalheterozygosities,yieldinga smoothrelaxationtowardthegenome-wideaverage. doi:10.1371/journal.pgen.1005550.g001 Toestimateμ,weusethefactthattheprobabilityofhavingencounteredarecombinationas onemovesawayfromastartingpointisafunctionofbothdandthestartingheterozygosity H (0),sincesmallervaluesofH (0)correspondtosmallerTMRCAs,withlesstimeforrecom- S S binationtohaveoccurred,andhencelongerunbrokenblocks.Thisrelationshipallowsusto PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 4/25 CalibratingtheHumanMutationRate calibrateμagainsttherecombinationraterviatherelaxationrateofH (d).Ourinferencepro- S cedureinvolvesusingcoalescentsimulationstocreatematching“calibrationdata”withknown valuesofμandthensolvingforthebest-fitmutationrateforthetestdata(seeMethodsandFig 2).WenotethatwhencomparingH (d)forrealdatatothecalibrationcurves,alargervalueof S μwillcorrespondtoalowercurve.ThisisbecauseH (0)isfixed,whichmeansthatthe S TMRCAsatthestartingpointsareproportionallylowerforlargervaluesofμ.Thus,recombi- nationsarelessfrequentasafunctionofd,leadingtoaslowerrelaxation. Inorderforourinferencestobeaccurate,thecalibrationcurvesmustrecapitulateasclosely aspossibleallaspectsoftherealdatathatcouldaffectH (d)(seeMethods,S1Text,andFig2). S First,becausecoalescentprobabilitiesdependonancestralpopulationsizes,weusePSMC[19]to learnthedemographichistoryofoursamples.Next,weadaptapreviouslydevelopedtechnique [20]toinferthefine-scaleuncertaintyofourgeneticmap.Finally,wecorrectourrawinferred valuesofμforthreeadditionalfactorsinordertoisolatethedesiredmutationalsignal:(1)we multiplybyacorrectionforgenotypeerrors;(2)wesubtractthecontributionofnon-crossover geneconversion,usingaresultfrom[21]adjustedforlocalrecombinationrate;and(3)wescale thefinalvaluetocorrespondtogenome-widebasecontentandmutability(seeMethodsandS1 Text).Wealsotestadditionalpotentialmodelviolationsthroughsimulations(seeS1TextandS1 Fig).Weaccountforstatisticaluncertaintyusingablockjackknifeandincorporateconfidence intervalsformodelparameters;allresultsaregivenasmean±standarderror. Simulations First,forsevendifferentscenarios,includingarangeofpossiblemodelviolations,wegenerated 20simulateddiploidgenomeswithaknowntruemutationrate(μ=2.5×10−8pergeneration exceptwhereotherwisespecified)andranourprocedureaswewouldforrealdata,withper- turbedgeneticmapsforboththetestdataandcalibrationdata(varianceparameterα=3000 M−1;seeMethods).Tomeasuretheuncertaintyinourestimates,weperformed25independent trialsofeachsimulation,andwealsocomparedthestandarddeviationsoftheestimatesacross trialswithjackknife-basedstandarderrors(aswewouldmeasureuncertaintyforrealdata). FulldetailsofthesimulationprocedurescanbefoundinMethodsandS1Text. Inallcases,theH5–10(d)curvesmatchedquitewellbetweenthetestdataandthecalibration data,andourfinalresultswerewithintwostandarderrorsofthetruerate(Fig3).Furthermore, ourjackknifeestimatesofthestandarderrorwerecomparabletotherealizedstandarddevia- tionsandonaverageconservative,especiallyforthemostcomplexsimulation(g),despitenot incorporatingPSMCuncertainty(seeMethods):0.08×10−8,0.04×10−8,0.04×10−8, 0.06×10−8,0.09×10−8,0.05×10−8,and0.11×10−8,respectively,forthesevenscenarios(see Fig3forempiricalstandarddeviations).Thefactthatalloftheinferredratesareclosetothe truevaluesleadsustoconcludethatnoneoftheaspectsofthebasicprocedureorthetested modelviolationscreateasubstantialbias. Errorparameters Beforeobtainingmutationrateestimatesfromrealdata,wequantifiedtwoimportanterror parameters:therateoffalseheterozygousgenotypecallsandthedegreeofinaccuracyinour geneticmap. Weestimatedthegenotypeerrorratebytakingadvantageofthefactthatmethylatedcyto- sinesatCpGdinucleotidesareroughlyanorderofmagnitudemoremutablethanotherbases [3,7,8,10](seeMethods).Thus,suchmutationsarestronglyover-representedamongtrue heterozygoussitesascomparedtofalselycalledheterozygoussites.Bycountingtheproportion ofCpGmutationsoutofallheterozygoussitesaroundourascertainedstartingpoints,we PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 5/25 CalibratingtheHumanMutationRate Fig2.Illustrationofthestepsofourinferenceprocedure.(A)Overview:fromthedata,wecomputeboth thestatisticHS(d)andotherparametersnecessarytocreatematchingcalibrationcurveswithknownvalues ofμ.(B)Detailsofcapturingaspectsoftherealdataforthecalibrationdata.(C)ComputationofHS(d):the statisticcapturestheaverageheterozygosityasafunctionofgeneticdistancedfromastartingpointwith heterozygosityinadefinedrangeS,averagedovermanysuchpoints.(D)Forthefinalinferredvalueofμ,we comparematchedHS(d)curvesfortherealdataandcalibrationdata(withknownvaluesofμ). doi:10.1371/journal.pgen.1005550.g002 PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 6/25 CalibratingtheHumanMutationRate Fig3.Resultsforsimulateddata.Meansandstandarddeviationsof25independenttrialsaregiven,andthecurvesdisplayedareforrepresentativeruns matchingthe25-trialmeans.Thetruesimulatedrateisμ=2.5×10−8unlessotherwisespecified.(A)Baselinesimulateddata;theinferredrateisμ= 2.47±0.05×10−8.(B)Basicsimulateddatawithatruerateof1.5×10−8;theinferredrateisμ=1.57±0.04×10−8.(C)Datawithatruerateof1.5×10−8plus geneconversion;theinferredrateisμ=1.49±0.05×10−8(correctedfromarawvalueof1.70×10−8withgeneconversionincluded).(D)Datawithsimulated genotypeerrors;theinferredrateisμ=2.39±0.06×10−8(correctedfromarawvalueof2.71×10−8withgenotypeerrorsincluded).(E)Datasimulatedwith variablemutationrate;theinferredrateisμ=2.61±0.08×10−8.(F)Datafromasimulatedadmixedpopulation;theinferredrateisμ=2.57±0.07×10−8.(G) Simulateddatawithallthreecomplicationsasin(D)–(F);theinferredrateisμ=2.53±0.06×10−8(correctedfromarawvalueof2.77×10−8). doi:10.1371/journal.pgen.1005550.g003 PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 7/25 CalibratingtheHumanMutationRate inferredanerrorrateofapproximately1per100kb(1.08±0.28×10−5perbase;seeMethods andS1Text),consistentwithpreviousresults[22]. Itwasalsonecessaryforustoestimatetheaccuracyofourgeneticmap.Weusedthe “shared”versionoftheAfrican-American(AA)mapfrom[17]asourbasemapandamodified versionoftheerrormodelof[20]:Z*Gamma(αγ(g+πp),α),whereZisthetruegenetic lengthofamapinterval,gistheobservedgeneticlength,pisthephysicallength,αisthe parametermeasuringtheaccuracyofthemap,andγandπareconstants(seeMethods).Based onpedigreecrossoverdatafrom[23],weestimatedα=2802±14M−1forthefullAAmapand α=3414±13M−1forthe“shared”map,whichshouldserveaslowerandupperbounds(see Methods).Forouranalyses,wetookα=3100M−1(withastandarderrorof300M−1to accountforouruncertaintyintheprecisevalue).Thismeansthat1/α(cid:1)0.03cMcanbe thoughtofasthelengthscalefortheaccuracyofgeneticdistancesaccordingtothebasemap (seeMethodsfordetails).Inordertotranslatetheuncertaintyinαintoitseffectontheinferred μ,werepeatedourprimaryanalysiswitharangeofalternativevaluesofα(S2Fig). Wenotethatthevaluesofαreportedin[20]aresubstantiallylowerthanours,whichwesus- pectisbecauseourvalidationdatahavemuchfinerresolutionthanthoseusedpreviously. (Whenusingthesamevalidationdata,the“shared”andHapMapLD[15]mapsappeartoberel- ativelysimilarinaccuracy.)Ifwesubstituteournewαvaluesfortheoriginalapplicationofinfer- ringthedateofNeanderthalgeneflowintomodernhumans,weobtainalessdistanttimeinthe past,28–65ky(mostlikely35–49ky),versus37–86ky(mostlikely47–65ky)reportedin[20]. Whilerelativelyrecent,thisdaterangeisnotinconflictwitharchaeologicalevidenceorwithan estimateof49–60ky(95%confidenceinterval)basedonanUpperPaleolithicgenome[11]. EstimatesforEuropeansandEastAsians Ourprimaryresults(Fig4)wereobtainedfromeightdiploidgenomesofEuropeanandEast Asianindividuals(twoeachFrench,Sardinian,Han,andDai)usingourstandardparameter settings(seeaboveandMethods).Forallreal-dataapplications,tominimizenoisefromthe randomizedelementsoftheprocedure(namely,coalescentsimulationandgenerationofthe perturbedcalibrationmap),weaveraged25independentcalibrationsofthedatatoobtainour finalpointestimate.Withalleightindividualscombined,weestimatedamutationrateofμ= 1.61±0.13×10−8pergeneration(Fig4A).Usingthisvalueofμ,ourstartingheterozygosity H (0)(cid:1)7.4×10−5correspondstoaTMRCAofapproximately1550–3100generations,or45– S 90ky,assuminganaveragegenerationtimeof29years[12]. Itispossiblethatourfullestimatecouldbeslightlyinaccurateduetopopulation-leveldiffer- encesineitherthefine-scalegeneticmapordemographichistory(seeS1Text).However,we expectEuropeansandEastAsianstobecompatibleinourprocedurebothbecausetheyarenot toodistantlyrelatedandbecausetheyhavesimilarpopulationsizehistories[19,24].Totest empiricallytheeffectsofcombiningthepopulations,weestimatedratesforthefourEuropeans andfourEastAsiansseparately(Fig4Band4C).Usingthesamegenotypeerrorcorrections, wefoundthattheH5–10(d)curvesaswellasthefinalinferredvaluesweresimilartothosefor thefulldata:μ=1.72±0.14×10−8forEuropeansandμ=1.55±0.14×10−8forEastAsians. Thus,inconjunctionwithoursimulationresults,itappearsthatthefulleight-genomeestimate isrobusttotheeffectsofpopulationheterogeneity. Additionally,toinvestigatetheinfluenceofdifferentmutationaltypes,weestimatedrates separatelyforCpGtransitionsandallothermutations(seeMethods).Weinferredvaluesofμ =0.50±0.06×10−8forCpGsandμ=1.36±0.13×10−8fornon-CpGs(S3Fig),withasum (1.87±0.14×10−8)thatissomewhathigherthanourfull-dataestimate.SinceCpGtransitions areknowntocompriseapproximately17–18%ofallmutations[8],ourfull-dataandnon-CpG PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 8/25 CalibratingtheHumanMutationRate Fig4.ResultsforEuropeansandEastAsians.(A)Alleightindividualstogether;theinferredrateisμ=1.61±0.13×10−8pergeneration.(B)Resultsfor thefourEuropeans;theinferredrateisμ=1.72±0.14×10−8.(C)ResultsforthefourEastAsians;theinferredrateisμ=1.55±0.14×10−8.Forallreal-data results,thecurvesdisplayedareforrepresentativecalibrationsmatchingtheoverallmeans.Thereportedvaluesarealsocorrectedforgeneconversion, genotypeerror,andbasecontent,whichexplainstheapparentdiscrepancybetweenthefinalestimatesandthecurves(forexample,theestimate(A)is correctedfromarawvalueof2.00×10−8). doi:10.1371/journal.pgen.1005550.g004 estimatesappeartobeinverygoodagreement,whereastheCpG-onlyestimateislikely inflated,perhapsbecauseourmethodperformspoorlywiththelowdensityofheterozygous sites(only1per100kbwindowforourCpG-onlystartingpoints).Asaresult,webelievethat ourvalueofμ=1.61×10−8isaccurate,oratmostslightlyunderestimated,asatotalmutation rateforallsites. PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 9/25 CalibratingtheHumanMutationRate Estimatesforotherpopulations Wealsoranourprocedureforthreeothernon-Africanpopulations:aboriginalAustralians, Karitiana(anindigenousgroupfromBrazil),andPapuaNewGuineans.Usingtwogenomes perpopulationandcomputingcurvesforstartingregionswith1–15heterozygoussitesper100 kb(toincreasethenumberoftestregions,withapotentialtrade-offinaccuracy),weinferred ratesofμ=1.86±0.19×10−8,μ=1.37±0.19×10−8,andμ=1.62±0.17×10−8forAustralian, Karitiana,andPapuan,respectively(Fig5).Wenotethattherelativelyhigh(butnotstatisti- callysignificantlydifferent)per-generationvalueforAustraliansisconsistentwiththehigh averageagesoffathersinmanyaboriginalAustraliansocieties[12,25].Overall,giventhe expectedsmalldifferencesforhistorical,cultural,orbiologicalreasons(including,asmen- tionedabove,ouruseofthesame“shared”geneticmapforallgroups),wedonotseeevidence ofsubstantialerrorsorbiasesinourprocedurewhenappliedtodiversepopulations. Discussion Usinganewmethodforestimatingthehumanmutationrate,wehaveobtainedagenome- wideestimateofμ=1.61±0.13×10−8single-nucleotidemutationspergeneration.Our approachcountsmutationsthathavearisenovermanygenerations(afewthousand,i.e.,sev- eraltensofthousandsofyears)andreliesonourexcellentknowledgeofthehumanrecombina- tionratetocalibratethelengthoftherelevanttimeperiod. Wehaveshownthatourestimateisrobusttomanypossibleconfoundingfactors(S1Fig). Inadditiontostatisticalnoiseinthedata,ourmethoddirectlyaccountsforancestralgenecon- versionandforerrorsingenotypecallsandinthegeneticmap.Wehavealsodemonstrated, basedonsimulations,thatheterogeneityindemographicandgeneticparameters,includingthe mutationrateitself,doesnotcauseanappreciablebias.However,weacknowledgethatouresti- materequiresalargenumberofmodelingassumptions,andwhilewehaveattemptedtojustify eachstepofourprocedureandtoincorporateuncertaintyateachstageintoourfinalstandard error,itispossiblethatwehavenotpreciselycapturedtheinfluenceofeveryconfounder.Simi- larly,whileweconsiderabroadrangeofpossiblesourcesoferror,wecannotguaranteethat theremightnotbeothersthatwehaveneglected. Themeaningofanaveragerate Itisimportanttonotethatthemutationrateisnotconstantatallsitesinthegenome[26]. Aswehavediscussed,webelievethatthisvariabilitydoesnotcauseasubstantialbiasinour inferences,buttotheextentthatsomebasesmutatefasterthanothers,arateisonlymeaning- fulwhenassociatedwiththesetofsitesforwhichitisestimated.Forexample,methylated cytosinesatCpGpositionsaccumulatepointmutationsroughlyanorderofmagnitudefaster thanotherbasesbecauseofspontaneousdeamination[3,7,8,10].Sucheffectscanleadto larger-scalepatterns,suchasthehighermutabilityofexonsascomparedtothegenomeasa whole[27]. Inourwork,wefilterthedatasubstantially,removingmorethanathirdofthesitesinthe genome.Thefilterstendtoreducetheheterozygosityoftheremainingportions[24,28],which istobeexpectediftheyhavetheeffectofpreferentiallyremovingfalseheterozygoussites.We alsomakeasmalladjustmenttoourfinalvalueofμtoaccountfordifferencesinbasecomposi- tionbetweenourascertainedstartingpointsandthe(filtered)genomeasawhole(seeMeth- ods).Forreference,inS1Table,wegiveheterozygositylevelsandhuman–chimpanzee divergencestatisticsforsitespassingourfilters,i.e.,thesubsetofthegenomeforwhichour inferredratesareapplicable. PLOSGenetics|DOI:10.1371/journal.pgen.1005550 November12,2015 10/25
Description: