ebook img

The Genetic Ancestry of African Americans, Latinos, and European Americans Across the United ... PDF

19 Pages·2016·3.94 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Genetic Ancestry of African Americans, Latinos, and European Americans Across the United ...

Chapman University Chapman University Digital Commons Biology, Chemistry, and Environmental Sciences Biology, Chemistry, and Environmental Sciences Faculty Articles and Research 2015 The Genetic Ancestry of African Americans, Latinos, and European Americans Across the United States Katarzyna Bryc Harvard University Eric Y. Durand 23andMe J. Michael Macpherson Chapman University, [email protected] David Reich Harvard University Joanna Mountain 23andMe Follow this and additional works at:http://digitalcommons.chapman.edu/sees_articles Part of theGenetics Commons,Genomics Commons, and theOther Genetics and Genomics Commons Recommended Citation Bryc, Katarzyna, Eric Y. Durand, J. Michael Macpherson, David Reich, and Joanna L. Mountain. "The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States."The American Journal of Human Genetics96.1 (2015): 37-53. doi: 10.1016/j.ajhg.2014.11.010 This Article is brought to you for free and open access by the Biology, Chemistry, and Environmental Sciences at Chapman University Digital Commons. It has been accepted for inclusion in Biology, Chemistry, and Environmental Sciences Faculty Articles and Research by an authorized administrator of Chapman University Digital Commons. For more information, please [email protected]. The Genetic Ancestry of African Americans, Latinos, and European Americans Across the United States Comments This article was originally published inThe American Journal of Human Genetics, volume 96, issue 1, in 2015 and has accompanying data.DOI: 10.1016/j.ajhg.2014.11.010 Creative Commons License This work is licensed under aCreative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. Copyright The authors This article is available at Chapman University Digital Commons:http://digitalcommons.chapman.edu/sees_articles/27 ARTICLE The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States Katarzyna Bryc,1,2,* Eric Y. Durand,2 J. Michael Macpherson,3 David Reich,1,4,5 and Joanna L. Mountain2 Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans (broughtlargelybythetrans-Atlanticslavetrade),shapingtheearlyhistoryofwhatbecametheUnitedStates.Westudiedthegenetic ancestryof5,269self-describedAfricanAmericans,8,663Latinos,and148,789EuropeanAmericanswhoare23andMecustomersand showthatthelegacyofthesehistoricalinteractionsisvisibleinthegeneticancestryofpresent-dayAmericans.Wedocumentpervasive mixedancestryandasymmetricalmaleandfemaleancestrycontributionsinallgroupsstudied.Weshowthatregionalancestrydiffer- encesreflecthistoricalevents,suchasearlySpanishcolonization,wavesofimmigrationfrommanyregionsofEurope,andforcedrelo- cationofNativeAmericanswithintheUS.Thisstudyshedslightonthefine-scaledifferencesinancestrywithinandacrosstheUnited Statesandinformsourunderstandingoftherelationshipbetweenracialandethnicidentitiesandgeneticancestry. Introduction andareabletodetectlowproportionsofNativeAmerican ancestry.3–11 Latinos across the Americas have differing Overthelastseveralhundredyears,theUnitedStateshas proportions of Native American, African, and European beenthesiteofongoingmixingofpeoplesofcontinental genetic ancestry, shaped by local historical interactions populationsthatwerepreviouslyseparatedbygeography. withmigrantsbroughtbytheslavetrade,Europeansettle- NativeAmericans,EuropeanimmigrantstotheAmericas, ment, and indigenous Native American populations.12–18 and Africans brought to the New World largely via the Individuals from countries across South America, the trans-AtlanticslavetradecametogetherintheNewWorld. Caribbean, and Mexico have different profiles of genetic Mating between individuals with different continental ancestry molded by each population’s unique history origins,whichwerefertohereas‘‘populationadmixture,’’ and interactions with local Native American popula- results in individuals who carry DNA inherited from tions.1,19–25EuropeanAmericansareoftenusedasproxies multiple populations. Although US government census for Europeans in genetic studies.26 European Americans, surveysandotherstudiesofhouseholdsintheUShavees- however,haveahistoryofadmixtureofmanygenetically tablishedfine-scaleself-describedethnicityatthestateand distinct European populations.27,28 Studies have shown countylevel(seetheUS2010Censusonline),therelation- that European Americans also have non-European ship between genetic ancestry and self-reported ancestry ancestry, including African, Native American, and Asian, foreachregionhasnotbeendeeplycharacterized.Under- though it has been poorly quantified with some discor- standing genetic ancestry of individuals from a self-re- danceamongestimatesevenwithinstudies.29–32 ported population, and differences in ancestry patterns That genetic ancestry of self-described groups varies amongregions,caninformmedicalstudiesandpersonal- across geographic locations in the US has been docu- ized medical treatment.1 The genetic ancestry of individ- mented in anecdotal examples but has not previously uals can also shed light on the history of admixture and been explored systematically. Most early studies of Afri- migrationswithindifferentregionsoftheUS,whichisof canAmericanshadlimitedresolutionofancestrybecause interesttohistoriansandsociologists. ofsmallsamplesizesandfewgeneticmarkers,andrecent Previousstudies haveshownthat African Americansin studies typically have limited geographic scope. Though theUStypicallycarrysegmentsofDNAshapedbycontri- much work has been done to characterize the genetic di- butionsfrompeoplesofEurope,Africa,andtheAmericas, versity among Latino populations from across the Amer- withvariationinAfricanandEuropeanadmixturepropor- icas, it is unclear the extent to which Latinos within tions across individuals and differences in groups across the US share or mirror these patterns on a national or parts of the country.2–4 More recent studies that utilized local scale. Most analyses have relied on mitochondrial high-density genotype data provide reliable individual DNA, Y chromosomes, or small sets of ancestry-informa- ancestryestimates,illustratethelargevariabilityinAfrican tive markers, and few high-density genome-wide andEuropeanancestryproportionsatanindividuallevel, SNP studies have explored fine-scale patterns of African 1DepartmentofGenetics,HarvardMedicalSchool,Boston,MA02115,USA;223andMe,Inc.,MountainView,CA94043,USA;3SchoolofComputational Sciences,ChapmanUniversity,Orange,CA92866,USA;4HowardHughesMedicalInstitute,HarvardMedicalSchool,Boston,MA02115,USA;5BroadInsti- tuteofMITandHarvard,Cambridge,MA02142,USA *Correspondence:[email protected] http://dx.doi.org/10.1016/j.ajhg.2014.11.010.(cid:1)2015TheAuthors ThisisanopenaccessarticleundertheCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/3.0/). TheAmericanJournalofHumanGenetics96,37–53,January8,2015 37 andNativeAmericanancestryinindividualslivingacross ResearchCohorts the US. 23andMecustomerswereinvitedtofilloutweb-basedquestion- Here,wedescribealarge-scale,nationwidestudyofAfri- naires,includingquestionsonancestryandethnicity,onstateof canAmericans,Latinos,andEuropeanAmericansbyusing birth, and current zip code of residence. They were also invited high-density genotype data to examine subtle ancestry to allow their genetic data and survey responses to be used for research.OnlydataofcustomerswhosignedIRB-approvedcon- patterns in these three groups across the US. To improve sentdocumentswereincludedinourstudy.Surveyintroductions the understanding of the relationship between genetic areexplicitabouttheirapplicationsinresearch.Forexample,the ancestry and self-reported ethnic and racial identity, and ethnicitysurveyintroductiontextstatesthatthesurveyresponses to characterize heterogeneity in the fine-scale genetic will be used in ancestry-related research (Table S1 available ancestry of groups from different parts of the US, we in- online). ferred the genetic ancestry of 5,269 self-reported African Self-ReportedAncestry Americans, 8,663 Latinos, and 148,789 European Ameri- Itisimportanttonotethatancestry,ethnicity,identity,andrace cans who are 23andMe customers living across the US, are complex labels that result both from visible traits, such as by using high-density SNPs genotype data from 650K to skincolor,andfromcultural,economic,geographical,andsocial 1M arrays. 23andMe customers take an active role in factors.23,44 As a result, the precise terminology and labels used participating in research by submitting saliva samples, for describing self-identity can affect survey results, and care in choice of labels should be utilized. However, we chose to consentingfordatatobeusedforresearch,andcompleting maximize our available self-reported ethnicity sample size by surveys. We generated cohorts of self-reported European combininginformationfromquestionsaskingforcustomerself- American,AfricanAmerican,andLatinoindividualsfrom reportedancestry. We used two survey questions,with different self-reportedethnicityandidentity.Weobtainedancestry nomenclature, to gaugeresponses aboutidentity, whichhere we estimates from genotype data by using a Support Vector view as ‘‘the subjective articulation of group membership and Machine-based algorithm that infers population ancestry affinity.’’45 with Native American, African, and European reference ThefirstquestionismodeledaftertheUScensusnomenclature panels, leveraging geographic information collected andisamultiquestionsurveythatallowsforchoiceof‘‘Hispanic’’ throughsurveys(seeDurandetal.33).Fordetailsongeno- or‘‘NotHispanic,’’andparticipantswereasked‘‘Whichofthese typingandancestrydeconvolutionmethods,seeSubjects US Census categories describe your racial identity? Please check andMethods. all that apply’’ from the following list of ethnicities: ‘‘White,’’ ‘‘Black,’’‘‘AmericanIndian,’’‘‘Asian,’’‘‘NativeHawaiian,’’‘‘Other,’’ ‘‘Not sure,’’ and ‘‘Other racial identity.’’ For inclusion into our Subjects and Methods EuropeanAmericancohort,individualshadtoselect‘‘NotHispan- ic’’and‘‘White,’’butnotanyotheridentity.Forinclusionintoour Latinocohort,individualshadtoselect‘‘Hispanic,’’withnoother HumanSubjects restrictions. For inclusion into our African American cohort, Allparticipantsweredrawnfromthecustomerbaseof23andMe, individualshadtoselect‘‘NotHispanic’’and‘‘Black’’andnoother Inc., a consumer personal genetics company. This data set has identity. been described in detail previously.34,35 Participants provided The second question on identity is a single-choice question, informedconsentandparticipatedintheresearchonline,under where respondents were asked to choose ‘‘What best describes a protocol approved by the external AAHRFP-accredited IRB, your ancestry/ethnicity?’’ from ‘‘African,’’ ‘‘African American,’’ Ethical&IndependentReviewServices(E&IReview). ‘‘CentralAsian,’’‘‘Declined,’’‘‘EastAsian,’’‘‘European,’’‘‘Latino,’’ ‘‘Mideast,’’‘‘Multipleancestries,’’‘‘NativeAmerican,’’‘‘Notsure,’’ Genotyping ‘‘Other,’’‘‘PacificIslander,’’‘‘SouthAsian,’’and‘‘SoutheastAsian.’’ Participants were genotyped as described previously.36 In short, Becauseindividualscouldselectonlyoneresponse,weincluded DNAextractionandgenotypingwereperformedonsalivasamples individuals who selected‘‘European’’ in ourEuropeanAmerican byNationalGeneticsInstitute(NGI),aCLIA-licensedclinicallab- cohort, those who selected ‘‘African American’’ in our African oratory and a subsidiary of Laboratory Corporation of America. Americancohort,andthosewhoselected‘‘Latino’’inourLatino Samples have been genotyped on one of four genotyping cohort. platforms.TheV1andV2platformswerevariantsoftheIllumina SomeAfricanAmericanparticipantsincludedinthisstudywere HumanHap550þBeadChip,includingabout25,000customSNPs recruited through 23andMe’s Roots into the Future project selectedby23andMe,withatotalofabout560,000SNPs.TheV3 (accessedOctober2013),whichaimedtoincreaseunderstanding platform was based on the Illumina OmniExpressþ BeadChip, ofhowDNAplaysaroleinhealthandwellness,especiallyfordis- with customcontenttoimprovetheoverlapwith ourV2 array, easesmorecommonintheAfricanAmericancommunity.Individ- with atotalof about950,000 SNPs.The V4platformin current ualswhoself-identifiedasAfricanAmerican,black,orAfricanwere useisafullycustomarray,includingalowerredundancysubset recruitedthrough23andMe’scurrentmembership,atevents,and ofV2andV3SNPswithadditionalcoverageoflower-frequency viaotherrecruitmentchannels. codingvariationandabout570,000SNPs.Samplesthatfailedto In the present work, we do not include individuals who self- reach98.5%callratewerereanalyzed.Individualswhoseanalyses reportashavingmultipleidentities,becausethisrepresentsonly failedrepeatedlywererecontactedby23andMecustomerservice asmallfractionofindividualsinourdataset.Lowratesofreport- to provide additional samples, as is done for all 23andMe ingasmultiracialormultiethnicisinlinewithpreviousstudies;an customers. Customer genetic data have been previously utilized analysisofthe2000USCensusshowsthat95percentofblacks inassociationstudiesandstudiesofgeneticrelationships.34–43 and 97 percent of whites acknowledge only a single identity.45 38 TheAmericanJournalofHumanGenetics96,37–53,January8,2015 Futurestudiesincludingmultiracialindividualsmightfurtherillu- weexpectthatoursurveydatarepresenthighlyreliableancestry minatepatternsofgeneticancestryandthecomplexrelationship information, with errors affecting fewer than 1% of survey withself-identity. responses. Differencesamongstates,wheredifferentproportionsofpeople GeographicLocationCollection self-reportasmixedrace,mightexplainsomeregionaldifferences Self-reported state-of-birth survey data was available for 47,473 in genetic ancestry. However, we note that, first, proportionally customers of 23andMe. However, because overlap of these cus- fewerpeopleidentifyasmixedracethanasasingleidentity,and tomerswithourcohortswaspoor,wealsochosetoincludedata second,itremainsimportanttoestablishregionaldifferencesin fromaquestiononcurrentzipcodeofresidence.Thisprovided geneticancestryofself-reportedgroupsevenifthesedifferences anadditional34,351zipcodesofcurrentresidence.Incaseswhere are driven, to some degree, by regional changes in self-reported boththezipcodeofresidenceandstateofbirthwereavailable,we identity. More work is needed to determine to what extent usedstate-of-birthinformation.Toobtainstateinformationfrom regionaldifferencesarearesultofhowpeopletodayreporttheir zipcodes,wetranslatedzipcodestotheirstatelocationsviaanon- ancestry. Lastly, when available, we excluded individuals who linezipcodedatabase(accessedOctober2013). answered‘‘No’’toaquestionwhethertheyarelivingintheUS. In total, we had 50,697 individuals with available location In total, our final sets included 5,269 African Americans, 8,663 information.Aboutonethirdofeachofourcohortshadlocation Latinos,and148,789EuropeanAmericans. information:1,970AfricanAmericans,2,944Latinos,and45,783 NotesonTerminologyandSelectionofPopulations EuropeanAmericanswereusedinourgeographicanalyses. Throughoutthemanuscript,theterm‘‘NativeAmericanancestry’’ referstoestimatesofgeneticancestryfromindigenousAmericans AncestryAnalyses foundacrossNorth,Central,andSouthAmerica,andwedistin- AncestryComposition guishthistermfrompresent-dayNativeAmericanslivinginthe We apply Ancestry Composition, a three-step pipeline that effi- US.Weusetheterm‘‘NativeAmerican’’torefertoindigenouspeo- ciently and accurately identifies the ancestral origin of chromo- plesoftheAmericas,acknowledgingthatsomepeoplemayprefer somal segments in admixed individuals, which is described in othertermssuchas‘‘AmericanIndian.’’OurestimatesofAfrican Durandetal.33Weapplythemethodtogenotypedatathathave ancestry specifically aim to infer ancestry of sub-Saharan Africa beenphasedviaareimplementationofBeagle.46AncestryCompo- and does not include ancestry from North Africa. We note that sition applies a string kernel support vector machines classifier theterm‘‘Latino’’hasmanymeaningsindifferentcontexts,and to assign ancestry labels to short local phased genomic regions, inourcase,weuseittorefertoindividualslivingintheUSwho which are processed via an autoregressive pair hidden Markov self-reportaseither‘‘Latino’’or‘‘Hispanic.’’ model to simultaneously correct phasing errors and produce Ourworkrepresentsasnapshotintimeofgeneticancestryand reconciled local ancestry estimates and confidence scores based identity,andfutureworkisneededtoinformthedynamicchanges on the initialassignment. Lastly, these confidenceestimates are andforcesthatshapesocialinteractions. recalibrated by isotonic regression models. This results in both We note that our cohorts are likely to have ancestry from precision and recall estimates that are greater than 0.90 across many African populations, but because of current reference many populations, and on a continental level, have rates of sample availability, our resolution of West African ancestries is 0.982–0.994forprecisionandrecallratesof0.935–0.993,depend- outside the scope of our study. Likewise, our estimates of ing on populations (see Table 1 from Durand et al.33). We note Native American ancestry arise from a summary over many thathere,andthroughoutthemanuscript,Africanancestrycorre- distinct subpopulations, but we are limited in scope because of spondstosub-SaharanAfricanancestry(includingWestAfrican, insufficient sample sizes from subpopulations, so we currently East African, Central, and South African populations, but use individuals from Central and South American together as a excludingNorthAfricanpopulationsfromthereferenceset).For reference set (see Durand et al.33 for a list of populations and more details on our ancestry estimation method, see Durand samplesizes). etal.33 ValidationofSelf-ReportedIdentitySurveyResults AggregatingLocalAncestryInformation Toverifythatourself-reportedethnicitieswerereliable,weexam- 23andMe’s Ancestry Composition methodprovides estimates of inedtheconsistencyofethnicitysurveyresponseswhenindivid- ancestry proportions for several worldwide populations at each ualscompletedbothancestryandethnicitysurveys.Becausethe windowof thegenome.To estimategenome-wide ancestry pro- structure of the two surveys is different and multiple selections portionsofEuropean,African,andNativeAmericanancestry,we wereallowedinonesurveybutnottheother,weexaminedthe aggregate over populations to estimate the total likelihood of replication rate of the primary ethnicity from the single-choice eachpopulation,andwithamajoritythresholdof0.51,ifanywin- ethnicitysurveyinthemultiple-selectionsurvey. dowhasamajorityofacontinentalancestry,weincludeitinthe In addition to structural differences, the survey content used calculation of genome-wide ancestry, which is estimated as the very different nomenclature, and therefore we believe our esti- numberofwindowspassingthethresholdforeachancestryover matederrorratestobeoverestimatesofthetrueerrorrate,because the total number of windows. Some windows might not pass itislikelythatsomeindividualschoosetoidentifywithonelabel our threshold for any population, so they remain unassigned, butnottheother(i.e.,‘‘AfricanAmerican’’butnot‘‘black’’).Dis- making it possible for estimates for all ancestries to not sum to crepancies in the question nomenclatures are likely to increase 100%, resulting in population averages that likewise might not the error rate. Furthermore, because the two surveys could be sum to 100%. We allow for this unspecified ancestry to reduce completedatdifferenttimes,eitherbeforeorafterobtainingper- theerrorratesofourassignments,so,insomesense,ourestimates sonalancestryresults,itispossiblethatviewinggeneticancestry mightbeviewedaslowerboundsonancestry,anditispossible resultsmighthaveledtoachangeinself-reportedancestry.Such thatindividualscarrymoreancestrythanestimated.Inpractice, achangewouldbetalliedasanerrorinourestimates,butinstead wetypicallyassignnearlyallwindows,withanaverageofabout reflectsatruechangeinperceivedself-identityovertime.Overall, 1%–2%unassignedancestry,sowedonotexpectittoaffectour TheAmericanJournalofHumanGenetics96,37–53,January8,2015 39 results,withtheexceptionofNativeAmericanancestry,whichwe 74.8%, of the mean proportion of African ancestry in African discussbelow. Americans. GeneratingtheDistributionofAncestryTracts ToquantifydifferencesinAfricanancestrydrivingmeanstate Wegenerateancestrysegmentsasdefinedascontinuousblocksof differences,weexaminedthedistributionsofestimatesofAfrican ancestry,estimatingthebestguessofancestryateachwindowto ancestry in African Americans from the District of Columbia define segments of each ancestry. Assigning the most likely (D.C.) and Georgia, which had at least 50 individuals with the ancestryateachwindowresultsinfewerspuriousancestrybreaks lowest and highest mean African ancestry proportions andallowsforasmallerupwardbiasinadmixturedates,because (FigureS1E).We finda qualitativeshiftin thetwo distributions breaksinancestrysegmentspushestimatesofdatesfurtherback of African ancestry, with D.C. showing a reduced mode, higher intime.Wemeasuresegmentlengthsbyusinggeneticdistances, variance,andaheavierlowertailofAfricanancestry,correspond- by mapping segment start and end physical positions to the ingtomoreAfricanAmericanswithbelow-averageancestrythan HapMapgeneticmap. Georgia. Qualitative differences in the distributions of African AdmixtureDating ancestryproportionsinAfricanAmericansfromstateswithhigher Toestimatethetimeframeofadmixtureevents,wetestasimple andlowermeanancestryappeartobedrivenbybothashiftinthe two-event,three-populationadmixturemodelviaTRACTS.47We mode of the distribution as well as a heavier left tail reflecting use a grid-search optimization to find four optimal parameters moreindividualswithaminorityofAfricanancestry(FigureS1). for the times of two admixture events and the proportions of Wepositthatdifferencesamongstatescouldbeduetodifferences admixture.Wearelimitedtosimpleadmixturemodelsresulting inadmixture,differencesinself-identity,ordifferencesinpatterns fromthecomputationallyintensivegridsearch,becausewewere ofassortativemating,wherebyindividualswithsimilarancestry unabletoobtainlikelihoodconvergencewithanyofthebuilt-in mightpreferentially mate. For example, greater levels of admix- optimizers.Themodeltestedisasfollows:twopopulationsadmix ture with Europeans would both shift the mode and result in t1 generationsago, with proportion frac1 and 1 (cid:2) frac1, respec- moreAfricanAmericanindividualswhohaveaminorityofAfri- tively.Athirdpopulationlatermixesint2generationsago,with canancestry.Alternatively,ashifttowardAfricanAmericanself- proportionfrac2. identity for individuals with a majority of European ancestry Bothourancestrysegmentsandpriorresultssupportedamodel (possibly because of changes in cultural or social forces) would withanearlierdateofNativeAmericanadmixture.25,47 Weesti- likewiseresultinlowerestimatesofmeanAfricanancestry.Lastly, matedlikelihoodsoverplausiblegridofadmixturetimesandfrac- assortativematingwouldworktomaintainorincreasethevari- tionsforAfricanAmericans,Latinos,andEuropeanAmericansto ance in ancestry proportions, though assortative mating alone estimatedatesofinitialNativeAmericanandEuropeanadmixture could not shift the mean proportion of African ancestry in a and subsequentAfricanadmixture.Thesedates areestimatedas population. the best fit for a pulse admixture event: because they represent SexBiasinAncestryContributions anaverageover morecontinuousor multiple migrations, initial Sexbiasinancestrycontributions,oftenassessedthroughancestry admixtureislikelytohavebegunearlier. ofmtDNAandYchromosomehaplogroups,isalsomanifestedin LowerEstimatesofAfricanAncestryin23andMeAfricanAmericans unequalestimatesofancestryproportionsontheXchromosome, Unlike previous estimates of the mean proportion of African whichhasaninheritancepatternthatdiffersbetweenmalesand ancestry,whichtypicallyhaverangedfrom77%to93%African females.TheXchromosomemorecloselyfollowsfemaleancestry ancestry,2–4,48–62 our estimates, depending on exclusions, are contributionsbecausemalescontributehalfasmanyXchromo- 73%or75%.Thereareseveralpossibleexplanationsforourlow somes. Comparing ancestry on the X chromosome to the auto- mean African ancestry. If our Ancestry Composition estimates somalancestryallowsustoinferwhetherthatancestryhistorically are downward biased, then the African Americans might have entered via males (lower X ancestry) or by females (higher X levelsofAfricanancestryconsistentwithotherstudies,andourre- ancestry). Under equal ancestral contributions from both males sultsaresimplyunderestimates.However,ourAncestryComposi- andfemales,theXchromosomeshouldshowthesamelevelsof tionestimatesareextremelywellcalibratedforAfricanAmericans admixtureasthegenome-wideestimates.Tolookforevidenceof from the 1000 Genomes Project and their consensus estimates, unequalmaleandfemaleancestrycontributionsinourcohorts, and we see no evidence of a downward bias (see Figure 5 from weexaminedancestryontheXchromosome(NRYregion),which Durandetal.33). follows a different pattern of inheritance from the autosomes. Themeanancestryproportionof23andMeself-reportedAfrican In particular, estimates of ancestry on the X chromosome have Americans is about 73%. A small fraction, about 2%, of African been shown to have higher African ancestry in African Ameri- Americans carry less than 2% African ancestry, which is far less cans.9WecalculateancestryontheXchromosomeastheestimate thantypicallyseeninmostAfricanAmericans(FigureS18Aavail- ofancestryonjustwindowsontheX,andwecomparetogenome- able online). Further investigation reveals that the majority of wideestimates(whichdothemselvesincludetheXchromosome). these individuals (88%)have predominantlyEuropeanancestry, Itshouldbenotedthatthesecalculationsdifferamongmalesand and others carry East Asian, South Asian, and Southeast Asian females,becausetheXchromosomeisdiploidinfemalesandthus ancestry, roughly in proportion to the frequencies found in hastwiceasmanywindowsincalculationofgenome-widemean the 23andMe database overall. Given the large number of non- proportions. However, our results still allow a peek into sex AfricanAmericanindividualsinthe23andMedatabase,evenan bias because the overall contribution of the X chromosome to exceedinglowsurveyerrorrateof0.02%couldbesufficienttoac- the genome-wide estimates is small. We note that because our countforthenumberofoutlierindividualswedetect.Hence,we ancestryestimationmethodconservativelyassignsNativeAmer- posit that these individuals represent survey errors rather than icanancestry,weexpectthatmuchoftheremainingunassigned trueself-reportedAfricanAmericans.Exclusionofthese108self- ancestry might be due to Native American ancestry assigned as reported African Americans with less than 2% African ancestry broadly East Asian/Native American, which is not included in from mean ancestry calculations results in a moderate rise, to thesevalues(seeFigure5inDurandetal.33). 40 TheAmericanJournalofHumanGenetics96,37–53,January8,2015 Toinferestimatesofmaleandfemalecontributionsfromeach age,predictingself-reportedancestrybyusingproportionAfrican ancestralpopulation,weestimatedthemaleandfemalefractions ancestry,sex,age,intercept,andinteractionvariables. of ancestry that total the genome-wide estimates and minimize themeansquareerroroftheXchromosomeancestryestimates. We assume that overall male and female contributions are each ValidationofNon-EuropeanAncestryinAfrican P P 50% ( popfpop;male¼0:5 and popfpop;female¼0:5). We assume AmericansandEuropeanAmericans thatthetotalcontributionfrommalesandfemalesofapopulation AlthoughourAncestryCompositionestimatesarewellcalibrated givesrisetotheautosomalancestryfraction(fpop,maleþfpop,female¼ and have been shown to accurately estimate African, European, autopop). We then compute, via a grid search, the predicted and Native American ancestry in tests of precision and recall,33 X chromosome estimates from fpop,male, fpop,female for each we wereconcernedthat low levelsof non-Europeanancestry in pop˛fAfrican;NativeAmerican;Europeang, which are calculated, as EuropeanAmericansthatwedetectedmightrepresentanartifact inLindetal.,6as of Ancestry Composition. Hence, we pursued several lines of Xbpop¼fpop0;m:5ale,1þþ2,0f:p5op,;f2emale¼fpop;maleþ12:5,fpop;female iNnavteivsteigAamtioenrictaonparnocveisdteryevinidEeunrcoeptehaantAemstiemricaatenssoafreArforibcuasntaanndd notartifacts. Wechoosetheparametersofmaleandfemalecontributionsthat Comparisonwith1000GenomesProjectConsensusEstimates minimizethemeansquarederroroftheXancestryestimatesand Comparisonsofourestimateswiththosepublishedbythe1000 the predicted Xbpop. These are the estimates of male and female GenomesConsortiumshowthehighconsistencyacrosspopula- ancestry fractions under a single simplistic population mixture tionsandindividuals.WecompareestimatesacrossAmericansof eventthatbestfitourXchromosomeancestryestimatesobserved. AfricanAncestryinSWUSA(ASW),ColombiansfromMedellin, PopulationSizeCorrelations Colombia (CLM), Mexican Ancestry from Los Angeles USA Fromthe2010CensusBrief‘‘TheBlackPopulation’’availableon- (MXL),andPuertoRicansfromPuertoRico(PUR).Wenotethat line,wecalculatedthecorrelationbetweenthenumberofreported our estimates of Native American ancestry are conservative. African Americans living in a state and our sample of African Indeed,whenourAncestryCompositionassignmentprobabilities Americansfromthatstate.Thecorrelationisstrong,withpvalue do not pass over the confidence threshold, including signals of of9.5310(cid:2)14,suggestingthatourlowsamplesizesfromstatesin NativeAmericanancestrytogetherwithgeneralEastAsian/Native theUSMountainWestisexpectedfromestimatesofpopulation American ancestry (but not East Asian) recapitulates estimates sizes. fromthe1000GenomesProjectconsensusestimates.Fiveindivid- AfricanancestryinEuropeanAmericansmostfrequentlyoccurs uals from the ASW population from the 1000 Genomes Project inindividualsfromstateswithhighproportionsofAfricanAmer- havepoorconsistencyintheirestimates.Theseindividualshave icansandisrareinstateswithfewAfricanAmericans.Thisobser- alargeamountofNativeAmericanancestrythatwasnotmodeled vationledustolookatthecorrelationbetweenpopulationsize(as bythe1000GenomesProjectestimates.Thattheseparticularindi- apercentofstatepopulationusingself-reportedethnicityfromthe vidualsweresampledinOklahoma,andcarrysignificantNative 2010USCensus)andstatemeanlevelsofancestry. American ancestry, is supported by our own high estimates of Toexaminetheinteractionbetweenproportionsofminorities NativeAmericanancestryin23andMeself-reportedAfricanAmer- and ancestry, we used the 2010 US Census demographic survey icansfromOklahoma. by state. We compare the state population proportion to the EstimatesofAfricanandNativeAmericanAncestryinEuropeans mean estimated admixture proportion of individuals from that We looked at whether all individuals who are expected to carry state, fitting linear regressions, and generating figures with solely European ancestry also have similar rates of detection of geom_smooth(method¼‘‘lm,’’formula¼y~x)fromtheggplot2 non-European ancestry. To this end, we generated a cohort of packageinR. 15,289customersof23andMewhoreportedthatallfouroftheir WefindthatAfricanancestryinEuropeanAmericansisstrongly grandparentswereborn inthesameEuropeancountry. The use correlatedwiththepopulationproportionofAfricanAmericansin offour-grandparentbirth-countryhasbeenutilizedasaproxyfor eachstate.WefindthatthehigherthestateproportionofAfrican assessingancestry.27,63WethenexaminedAncestryComposition Americans,themoreAfricanancestryisfoundinEuropeanAmer- resultsfortheseindividualsandcalculatedatwhatratewedetected icansfromthatstate,reflectingthecomplexinteractionofgenetic atleast1%Africanandatleast1%NativeAmericanancestry. ancestry,historicaladmixture,culture,andself-identifiedancestry. IndependentValidationofAfricanAncestryinEuropeanAmericansviaf4 LogisticRegressionModelingofSelf-Identity Statistics We examine the probabilistic relationship between self-identity Weusedf4statisticsfromtheADMIXTOOLSsoftwarepackageto and genetically inferred ancestry. To explore the interaction be- confirmthepresenceofAfricanancestry.64Weusedthef4ratio tween genetic ancestry and self-reported identity, we estimated test, designed to estimate the proportion of admixture from a the proportion of individuals that identify as African American relatedancestralpopulation,tocompareadmixtureinEuropean andEuropeanAmerican,partitionedbylevelsofAfricanancestry. Americans versus reference European individuals. We tested JointlyconsideringthecohortsofEuropeanAmericansandAfri- whether European Americans with estimated African ancestry canAmericans,weexaminedtherelationshipbetweenanindivid- showedanyadmixturefromAfricansbyusingourcohortsofindi- ual’sgenome-wideAfricanancestryproportionandwhetherthey vidualswithestimatedAfricanancestryandreferencepopulations self-reportasEuropeanAmericanorAfricanAmerican.Wenotea from the 1000 Genomes Project data set. Admixture would be strongdependenceontheamountofAfricanancestry,withindi- expectedtoresultinestimatesofasignificantlydifferentfrom1. vidualscarryinglessthan20%Africanancestryidentifyinglargely Detection of Native American mtDNA in European Americans and asEuropeanAmerican,andthosewithgreaterthan50%reporting AfricanAmericans asAfricanAmerican.Totestthesignificanceofthisrelationship, ThemitochondrialDNA(mtDNA)haplogroupsA2,B2,B4b,C1b, wefitalogisticregressionmodel,usingPython’sstatsmodelspack- C1c,C1d,andD1aremostprevalentlyfoundintheAmericasand TheAmericanJournalofHumanGenetics96,37–53,January8,2015 41 ican,EastAsian,sub-SaharanAfrican,MiddleEastern,andOcean- Table1. ComparisonofGenome-wideAncestryEstimatesandX ChromosomeEstimatesinAfricanAmericans,Latinos,and ianancestryproportions.WeusethesupervisedalgorithmforK¼ EuropeanAmericans 6,with9,694referenceindividualsrepresentingthesixaforemen- Ancestry tionedpopulations. WeranADMIXTUREon269,229 autosomal markers after pruning SNPs to have r2 < 0.5, via PLINK.67 To Estimate African NativeAmerican European reducecomputationtime,weexaminedconsistencyofmethods AfricanAmericans on the African Americans whom we estimated to have at least 1%NativeAmericanancestry,EuropeanAmericansestimatedto Genome-wide 73.2% 0.8% 24.0% haveatleast1%NativeAmericanancestry,andEuropeanAmeri- Xchromosome 76.9% 0.9% 19.8% cansestimatedtohaveatleast1%Africanancestry. Relativeincrease þ5.1% þ13.6% (cid:2)17.7% ordecreaseonX Results pvalue 4.4310(cid:2)17*** 0.078 7.8310(cid:2)24*** Latinos Self-reported survey data was used to generate cohorts Genome-wide 6.2% 18.0% 65.1% of African Americans, Latinos, and European Americans. Out of 35,524 self-reported ‘‘European’’ individuals, Xchromosome 6.8% 19.4% 56.7% 35,279 selected ‘‘white’’ on the ethnicity survey, yielding Relativeincrease þ9.0% þ7.4% (cid:2)13.0% a per-survey error estimate of 0.2%. Out of 1,560 self-re- ordecreaseonX ported ‘‘Latino’’ individuals, 1,540 selected ‘‘Hispanic,’’ pvalue 0.008** 2.4310(cid:2)10*** 4.2310(cid:2)94*** giving a per-survey error estimate of 0.7%. Lastly, out of EuropeanAmericans 1,327 self-reported ‘‘African American’’ individuals, 1,287 selected ‘‘black,’’ resulting in a per-survey error rate esti- Genome-wide 0.19% 0.18% 98.6% mateof1.1%.Formoredetailsonourcross-surveyvalida- Xchromosome 0.19% 0.22% 98.4% tion,seeSubjectsandMethods. Relativeincrease (cid:2)0.04% þ23.73% (cid:2)0.1% ordecreaseonX TheGeneticLandscapeoftheUS pvalue 0.99 6.6310(cid:2)10*** 8.0310(cid:2)5*** PatternsofGeneticAncestryofSelf-ReportedAfricanAmericans MeanestimatesofAfrican,NativeAmerican,andEuropeanancestryareshown. Genome-wide ancestry estimates of African Americans pvaluesprovidedarecalculatedbytwo-sidedStudent’sttestonindividual show average proportions of 73.2% African, 24.0% Euro- ancestry estimates for each cohort per ancestry, with no multiple testing correction. Significance is assigned as *p < 0.05, **p value < 0.01, and pean, and 0.8% Native American ancestry (Table 1). We ***p value < 0.001. Relative increase on the X chromosome is calculated find systematic differences across states in the US in as the absolute difference, X chromosome estimate minus genome-wide meanancestryproportionsofself-reportedAfricanAmeri- estimate,dividedbythegenome-wideestimate. cans(Figure1andTableS2).Onaverage,thehighestlevels ofAfrican ancestryarefound inAfrican Americans living arelikelytobeNative-American-specifichaplogroupsbecausethey in or born in the South, especially South Carolina and arerarelyfoundoutsideoftheAmericas.Weassessedthefraction Georgia (Figure 1Aand Table S3). We find lower propor- ofindividualsthatcarrythesehaplogroupstovalidatethelikeli- tions of African ancestry in the Northeast, the Midwest, hood of Native American ancestry in European Americans and the Pacific Northwest, and California. The amount of AfricanAmericansandshowthatthesehaplogroupsarevirtually NativeAmericanancestryestimatedforAfricanAmericans absentinEuropeancontrols.BecausemtDNAhaplogroupsareas- alsovariesacrossstatesintheUS.Morethan5%ofAfrican signedbyclassificationwithSNPsthatsegregateontheselineages, Americansareestimatedtocarryatleast2%NativeAmer- theseorthogonalresultsprovideanindependentlineofsupport ican ancestry genome-wide (Figures S1 and 1D). African for ourestimated NativeAmericanancestry in European Ameri- Americans in the West and Southwest on average carry cansandAfricanAmericans. DistributionofAncestrySegmentStartPositions higher levels of Native American ancestry, a trend that is Regions of the genome that have structural variation or show largely driven by individuals with less than 2% Native strong linkage disequilibrium (LD) have been shown both to Americanancestry (Figure 1B). With a lower threshold of confoundadmixturemappingandtoinfluencethedetectionof 1% Native American ancestry, we estimate that about population substructure in studies using Principal Components 22% of African Americans carry some Native American Analysis (PCA).27,63,65 If such regions were to drive artifacts of ancestry(FigureS2). spuriousancestry,wewouldexpectthatsegmentsoflocalancestry We used the lengths of segments of European, African, wouldprobablyoccuraroundtheseregions,ratherthaninauni- andNativeAmericanancestrytoestimateabest-fitmodel form distribution acrossthe genome. To this end, we examined ofadmixturehistoryamongthesepopulationsforAfrican thestartingpositionsofallAfricanandNativeAmericanancestry Americans(FigureS3).Weestimate that initialadmixture segmentsinEuropeanAmericansandNativeAmericanancestryin between Europeans and Native Americans occurred 12 AfricanAmericans. ComparisonwithADMIXTUREGenome-wideEstimates generations ago, followed by subsequent African admix- WeappliedADMIXTURE,66amodel-basedestimationofancestry ture 6 generations ago, consistent with other admixture proportions, to estimate proportions of European, Native Amer- inferencemethodsdatingAfricanAmericanadmixture.A 42 TheAmericanJournalofHumanGenetics96,37–53,January8,2015 A B Mean Mean Native African American ancestry ancestry 0.80 0.0100 0.0075 0.75 0.0050 0.70 0.0025 0.65 0.0000 C D Self−reported African Americans with > 2% Native American ancestry Mean European Percent of ancestry African Americans 0.30 0.25 10 0.20 5 0.15 0 Figure1. TheDistributionofAncestryofSelf-ReportedAfricanAmericansacrosstheUS (A)DifferencesinlevelsofAfricanancestryinAfricanAmericans(blue). (B)DifferencesinlevelsofNativeAmericanancestryinAfricanAmericans(orange). (C)DifferencesinlevelsofEuropeanancestryofAfricanAmericans(red),fromeachstate.Stateswithfewerthantenindividualsare excludedingray. (D)Thegeographicdistributionofself-reportedAfricanAmericanswithNativeAmericanancestry.TheproportionofAfricanAmericans ineachstatewhohave2%ormoreNativeAmericanancestryisshownbyshadeofgreen.Stateswithfewerthan20individualsare excludedingray. sex bias in African American ancestry, with greater male highest mean levels of African ancestry in Latinos living EuropeanandfemaleAfricancontributions,hasbeensug- in or born in states in the South, especially Louisiana, gested through mtDNA, Y chromosome, and autosomal the Midwest, and Atlantic (Figure 2A). Further stratifica- studies.6 On average, across African Americans, we esti- tion of individuals by their self-reported population matethattheXchromosomehasa5%increaseinAfrican affiliation (e.g., ‘‘Mexican,’’ ‘‘Puerto Rican,’’ or ‘‘Domin- ancestryand18%reductioninEuropeanancestryrelative ican’’) reveals a diversity in genetic ancestry, consistent togenome-wideestimates(seeTable1).Throughcompar- with previous work studying these populations (see ison of estimates of X chromosome and genome-wide FigureS5andTableS5).10,20,24,25,68,69WefindthatLatinos African and European ancestry proportions, we estimate who, besides reporting as ‘‘Hispanic,’’ also self-report as that approximately 5% ofancestors of African Americans MexicanorCentralAmerican,carrymoreNativeAmerican were European females and 19% were European males ancestrythanLatinosoverall;thosealsowhoself-reportas (TableS4). black, Puerto Rican, or Dominican have higher levels of PatternsofGeneticAncestryofSelf-ReportedLatinos African ancestry; and those who additionally self-report LatinosencompassnearlyallpossiblecombinationsofAf- as white, Cuban, or South American have on average rican,NativeAmerican,andEuropeanancestries,withthe higherlevelsofEuropeanancestry. exception of individuals who have a mix of African and Admixture date estimates for Latino admixture suggest NativeAmericanancestrywithoutEuropeanancestry(see thatNativeAmericanandEuropeanmixtureoccurredfirst, FiguresS4AandS1).Onaverage,weestimatethatLatinos about11generationsago,followedbyAfricanadmixture7 in the US carry 18.0% Native American ancestry, 65.1% generations ago. Consistent with previous studies that European ancestry, and 6.2% African ancestry. We find show a sex bias in admixture in Latino populations,12–18 the highest levels of estimated Native American ancestry weestimate13%lessEuropeanancestryontheXchromo- inself-reportedLatinosfromstatesintheSouthwest,espe- somethangenome-wide(Table1),showingproportionally cially those bordering Mexico (Figure 2C). We find the greater European ancestry contributions from males. We TheAmericanJournalofHumanGenetics96,37–53,January8,2015 43 A B Mean Mean African European ancestry ancestry 0.20 0.85 0.15 0.80 0.10 0.75 0.70 0.05 0.65 C Mean Native American ancestry 0.20 0.15 0.10 0.05 Figure2. TheDistributionofAncestryofSelf-ReportedLatinosacrosstheUS DifferencesinmeanlevelsofAfrican(A),European(B),andNativeAmerican(C)ancestryinLatinosfromeachstateisshownbyshadeof blue,red,andorange,respectively.Stateswithfewerthantenindividualsareexcludedingray. inferred elevated African and Native American ancestry instatesintheSouththaninotherpartsoftheUS:about5% on the X chromosome, corresponding to higher female of self-reported European Americans living in South Car- ancestry contributions from both Africans and Native olina and Louisiana have at least 2% African ancestry. Americans. Lastly, Latinos show higher proportions of Lowering the threshold to at least 1% African ancestry inferred Iberian ancestry than both European Americans (potentiallyarisingfromoneAfricangenealogicalancestor andAfricanAmericans(FigureS6). withinthelast11generations),EuropeanAmericanswith Patterns of Genetic Ancestry of Self-Reported European African ancestry comprise as much as 12% of European Americans Americans from Louisiana and South Carolina and about Wefindthatmanyself-reportedEuropeanAmericans,pre- 1in10individualsinotherpartsoftheSouth(FigureS8). dominantlythoselivingwestoftheMississippiRiver,carry Mostindividualswhohavelessthan28%Africanancestry NativeAmericanancestry(Figure3B).WeestimatethatEu- identifyasEuropeanAmerican,ratherthanasAfricanAmer- ropeanAmericanswhocarryatleast2%NativeAmerican ican(Figures4and5A).Logisticregressionofself-identified ancestry are found most frequently in Louisiana, North European Americans and African Americans reveals that Dakota,andotherstatesintheWest.Usingalessstringent the proportion of African ancestry predicts self-reported thresholdof1%,ourestimatessuggestthatasmanyas8% ancestry significantly, with a coefficient of 20.1 (95% CI: of individuals from Louisiana and upward of 3% of indi- 18.0–22.2)(TableS6andFigureS9).Forafullcharacteriza- vidualsfromsomestatesintheWestandSouthwestcarry tionoftermsandlogisticmodels,seeTableS6andFigureS9. NativeAmericanancestry(FigureS7). Fitting a model of European and Native American Consistent with previous anecdotal results,32 the fre- admixture followed later by African admixture, we find quencyofEuropeanAmericanindividualswhocarryAfri- the best fit with initial Native American and European canancestryvariesstronglybystateandregionoftheUS admixtureabout12generationsagoandsubsequentAfri- (Figure3A).Weestimatethatasubstantialfraction,atleast cangeneflowabout4generationsago. 1.4%,ofself-reportedEuropeanAmericansintheUScarry Non-EuropeanancestryinEuropeanAmericansfollowsa at least 2% African ancestry. Using a less conservative sexbiasinadmixturecontributionsfrommalesandfemales, threshold, approximately 3.5% of European Americans asseeninAfricanAmericansandLatinos.Theratiobetween have1%ormoreAfricanancestry(FigureS8).Individuals Xchromosomeandgenome-wideNativeAmericanancestry withAfricanancestryarefoundatmuchhigherfrequencies estimates in European Americans shows greater Native 44 TheAmericanJournalofHumanGenetics96,37–53,January8,2015

Description:
ences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced populations that were previously separated by geography. trans-Atlantic slave trade came together in the New World. these values (see Figure 5 in Durand et al.33).
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.