GBE The Missing Link of Jewish European Ancestry: Contrasting the Rhineland and the Khazarian Hypotheses Eran Elhaik1,2,* 1DepartmentofMentalHealth,JohnsHopkinsUniversityBloombergSchoolofPublicHealth 2McKusick-NathansInstituteofGeneticMedicine,JohnsHopkinsUniversitySchoolofMedicine *Correspondingauthor:E-mail:[email protected]. Accepted:December 5, 2012 Abstract The question of Jewish ancestry has been the subject of controversy for over two centuries and has yet to be resolved. The “Rhinelandhypothesis” depictsEasternEuropeanJewsasa“populationisolate” thatemergedfromasmallgroupofGerman Jewswhomigratedeastwardandexpandedrapidly.Alternatively,the“Khazarianhypothesis”suggeststhatEasternEuropeanJews descendedfromtheKhazars,anamalgamofTurkicclansthatsettledtheCaucasusintheearlycenturiesCEandconvertedtoJudaism inthe8thcentury.MesopotamianandGreco–RomanJewscontinuouslyreinforcedtheJudaizedempireuntilthe13thcentury. Followingthecollapseoftheirempire,theJudeo–KhazarsfledtoEasternEurope.TheriseofEuropeanJewryisthereforeexplainedby thecontributionoftheJudeo–Khazars.Thusfar,however,theKhazars’contributionhasbeenestimatedonlyempirically,asthe absence of genome-wide data from Caucasus populations precluded testing the Khazarian hypothesis. Recent sequencing of modernCaucasuspopulations prompted us to revisit the Khazarian hypothesis andcompare it withthe Rhinelandhypothesis. Weappliedawiderangeofpopulationgeneticanalysestocomparethesetwohypotheses.OurfindingssupporttheKhazarian hypothesisandportraytheEuropeanJewishgenomeasamosaicofNearEastern-Caucasus,European,andSemiticancestries, therebyconsolidatingpreviouscontradictoryreportsofJewishancestry.WefurtherdescribeamajordifferenceamongCaucasus populationsexplainedbytheearlypresenceofJudeansintheSouthernandCentralCaucasus.Ourresultshaveimportantimplica- tionsforthedemographicforcesthatshapedthegeneticdiversityintheCaucasusandformedicalstudies. Key words: Jewish genome, Khazars, Rhineland, Ashkenazi Jews, population isolate, Eastern European Jews, Central European Jews, population structure. populations with Eastern European Jews, at odds with the Introduction narrativeofaCentralEuropeanfoundergroup.Becausecor- Contemporary Eastern European Jews comprise the largest rectingforpopulationstructureandusingsuitablecontrolsare ethno-religiousaggregateofmodernJewishcommunities,ac- criticalinmedicalstudies,itisvitaltoexaminethehypotheses counting for approximately 90% of over 13 million Jews purporting to explain the ancestry of Eastern and Central worldwide (Ostrer 2001). Speculated to have emerged from EuropeanJews.Oneofthemajorchallengesforanyhypoth- asmallCentralEuropeanfoundergroupandthoughttohave esis is to explain the massive presence of Jews in Eastern maintainedhighendogamy,EasternEuropeanJewsarecon- Europe, estimated at eight million people at the beginning sidereda“populationisolate”andinvaluablesubjectsindis- of the 20th century. We investigate the genetic structure of easestudies(Carmeli2004),althoughtheirancestryremains European Jews, by applying a wide range of analyses— debatable between geneticists, historians, and linguists includingthreepopulationtest,principalcomponent,biogeo- (Wexler 1993; Brook 2006; Sand 2009; Behar et al. 2010). graphical origin, admixture, identity by descent (IBD), allele Recently, several large-scale studies haveattemptedtochart sharing distance, and uniparental analyses—and test their the genetic diversity of Jewish populations by genotyping veracity in light of the two dominant hypotheses depicting Eurasian Jewish and non-Jewish populations (Conrad et al. either a sole Middle Eastern ancestry or a mixed Middle 2006; Kopelman et al. 2009; Behar et al. 2010). Eastern–Caucasus–Europeanancestrytoexplaintheancestry Interestingly, some of these studies linked Caucasus ofEasternEuropeanJews. (cid:2)TheAuthor(s)2012.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution. ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense(http://creativecommons.org/licenses/by-nc/3.0/),which permitsunrestrictednon-commercialuse,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited. GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012 61 GBE Elhaik The “Rhineland hypothesis” envisions modern European (figs. 1 and 2) (Polak 1951; Brook 2006; Sand 2009). The Jews to be the descendents of the Judeans—an assortment Khazarian, Armenian, and Georgian populations forged of Israelite–Canaanite tribes of Semitic origin (figs. 1 and 2) fromthisamalgamationoftribes(Polak1951)werefollowed (supplementary note S1, Supplementary Material online). It by relative isolation, differentiation, and genetic drift in situ proposes two mass migratory waves: the first occurred over (Balanovsky et al. 2011). Biblical and archeological records the200yearsfollowingtheMuslimconquestofPalestine(638 allude to active trade relationships between Proto-Judeans CE) and consisted of devoted Judeans who left Muslim and Armenians in the late centuries BCE (Polak 1951; Palestine for Europe (Dinur 1961). Whether these migrants Finkelstein and Silberman 2002), that likely resulted in a joinedtheexistingJudaizedGreco–Romancommunitiesisun- small scale admixture between these populations and a clear, as is the extent of their contribution to the Southern Judean presence in the Caucasus. After their conversion to Europeangenepool.Thesecondwaveoccurredatthebegin- Judaism,thepopulationstructureoftheJudeo–Khazarswas ningofthe15thcenturybyagroupof50,000GermanJews further reshaped by multiple migrations of Jews from the who migrated eastward and ushered an apparent hyper- Byzantine Empire and Caliphate to the Khazarian Empire baby-boom era for half a millennium (Atzmon et al. 2010). (fig.1).FollowingthecollapseoftheirempireandtheBlack The Rhinelandhypothesis predictsa Middle Easternancestry Death(1347–1348)theJudeo–Khazarsfledwestward(Baron toEuropeanJewsandhighgeneticsimilarityamongEuropean 1993), settling in the rising Polish Kingdom and Hungary Jews(Ostrer2001;Atzmonetal.2010;Beharetal.2010). (Polak 1951) and eventually spreading to Central and The competing “Khazarian hypothesis” considers Eastern Western Europe. The Khazarian hypothesis posits that EuropeanJewstobethedescendantsofKhazars(supplemen- European Jews are comprised of Caucasus, European, and tary note S1, Supplementary Material online). The Khazars Middle Eastern ancestries. Moreover, European Jewish com- were a confederation of Slavic, Scythian, Hunnic–Bulgar, munitiesareexpectedtobedifferentfromoneanotherboth Iranian,Alans,andTurkishtribeswhoformedinthecentral– inancestryandgeneticheterogeneity.TheKhazarianhypoth- northernCaucasusoneofmostpowerfulempiresduringthe esis also offers two explanations for the genetic diversity in lateIronAgeandconvertedtoJudaisminthe8thcenturyCE Caucasus groups first by the multiple migration waves to FIG.1.—MapofEurasia.AmapofKhazariaandJudahisshownwiththestateoforiginofthestudiedgroups.EurasianJewishandnon-Jewish populationsusedinallanalysesareshowninsquareandroundbullets,respectively(supplementarytableS3,SupplementaryMaterialonline).Themajor migrationsthatformedEasternEuropeanJewryaccordingtotheKhazarianandRhinelandhypothesesareshowninyellowandbrown,respectively. 62 GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012 GBE TheMissingLinkofJewishEuropeanAncestry FIG.2.—Anillustratedtimelinefortherelevanthistoricalevents.Thehorizontaldashedlinesrepresentcontroversialhistoricaleventsexplainedbythe differenthypotheses,whereassolidblacklinesrepresentundisputedhistoricalevents. Khazaria during the 6th–10th centuries and second by the populations, estimated at 0.5% per generation over the Judeo–KhazarswhoremainedintheCaucasus. past50generations(Ostrer2001).Theserelativelyrecentad- Genetic studies attempting to infer the ancestry of mixtures have likely reshaped the population structure of all European Jews yielded inconsistent results. Some studies EuropeanJewsandincreasedthegeneticdistancesfromthe pointed to the genetic similarity between European Jews CaucasusorMiddleEasternpopulations.Therefore,wedonot and Caucasus populations like Adygei (Behar et al. 2003; expect to achieve perfect matching with the surrogate Levy-Coffman 2005; Kopelman et al. 2009), whereas some Khazarian and Judean populations but rather to estimate pointedtothesimilaritytoMiddleEasternpopulationssuchas theirrelatedness. Palestinians (Hammer et al. 2000; Nebel et al. 2000), and others pointedtothesimilarity toSouthernEuropean popu- Materials and Methods lations like Italians (Atzmon et al. 2010; Zoossmann-Diskin 2010). Most of these studies were done in the pregenome- DataCollection wide era using uniparental markers and including different Thecompletedatasetcontained1,287unrelatedindividuals reference populations, which makes it difficult to compare of 8 Jewish and 74 non-Jewish populations genotyped over their results. More recent studies employing whole genome 531,315 autosomal single nucleotide polymorphisms (SNPs). data reported high genetic similarity of European Jews to Alinkagedisequilibrium(LD)-pruneddatasetwascreatedby Druze, Italian, and Middle Eastern populations (Atzmon removing one member of any pair of SNPs in strong LD etal.2010;Beharetal.2010). (r2>0.4)inwindowsof200SNPs(slidingthewindowby25 Although both the Rhineland and Khazarian hypotheses SNPs at a time) using indep-pairwise in PLINK (Purcell et al. depict a Judean ancestry and are not mutually exclusive, 2007). This yielded a total of 221,558 autosomal SNPs that theyarewelldistinguished,asCaucasusandSemiticpopula- werechosenforallautosomalanalysesexcepttheidenticalby tionsareconsideredethnicallyandlinguisticallydistinct(Patai descent(IBD)analysisthatutilizedthecompletedataset.Both andPatai1975;Wexler1993;Balanovskyetal.2011).Jews, data sets were obtained from http://www.evolutsioon.ut.ee/ according to either hypothesis, are an assortment of tribes MAIT/jew_data/ (last accessed December 19, 2012) (Behar whoacceptedJudaism,migratedelsewhere,andmaintained et al. 2010). Mitochondrial DNA (mtDNA) and Y- theirreligionuptothisdateandare,therefore,expectedto chromosomal data were obtained from previously published exhibitcertaindifferencesfromtheirneighboringpopulations. data sets as appeared in Behar et al. (2010). These markers Because both hypotheses posit that Eastern European Jews were chosen to match the phylogenetic level of resolution arrived at Eastern Europe roughly at the same time (13th achieved in previously reported data sets and represent a and15thcenturies),weassumedthattheyexperiencedsimilar diversified set of markers. A total of 11,392 samples were low and fixed admixture rates with the neighboring assembled for mtDNA (6,089) and Y-chromosomal (5,303) GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012 63 GBE Elhaik analyses from 27 populations (supplementary tables S1 and PrincipalComponentAnalysis S2,SupplementaryMaterialonline). Although the commonly used “multipopulation” principal component analysis (PCA) has many attractive properties, it should be practiced with caution to avoid biases due to the Terminology choice of populations and varying sample sizes (Price et al. Incommonparlance,EasternandCentralEuropeanJewsare 2006; McVean 2009). To circumvent these biases, we de- practically synonymous with Ashkenazi Jews and are con- velopedasimple“dualpopulation”frameworkconsistingof sideredasingleentity(Tianetal.2008;Atzmonetal.2010; three “outgroup” populations that are available in large Behar et al. 2010). However, the term is misleading, for the sample sizes and are the least admixed—Mbuti and Biaka Hebrewword“Ashkenaz”wasappliedtoGermanyinmedi- Pygmies (South Africa), French Basques (Europe), and Han eval rabbinical literature—contributing to the narrative that Chinese (East Asia)—and two populations of interest, all of modern Eastern European Jewry originated on the Rhine. equalsamplesizes.Thecornerstoneofthisframeworkisthat We thus refrained from using the term “Ashkenazi Jews.” it minimizes the number of significant PCs to four or fewer Jews were roughly subdivided into Eastern (Belorussia, (Tracy-Widom test, P<0.01) and maximizes the portion of Latvia,Poland,andRomania)andCentral(Germany,Nether- explained variance to over 20% for the first two PCs. PCA lands, and Austria) European Jews. In congruence with the calculations were carried out using smartpca of the literature that considers “Ashkenazi Jews” distinct from EIGENSOFT package (Patterson et al. 2006). Convex hulls “SephardicJews,” weexcludedthelater.Completepopula- werecalculatedusingMatlab“convhull”functionandplotted tionnotationisdescribedinsupplementarytableS3,Supple- aroundtheclustercentroids.Relatednessbetweentwopopu- mentaryMaterialonline. lationsofinterestwasestimatedbythecommensurateover- lap of their clusters. Small populations (<7 samples) were excludedfromtheanalysis. ChoiceofSurrogatePopulations As the ancient Judeans and Khazars have been vanquished EstimatingtheBiogeographicalOriginsofPopulation and their remains have yet to be sequenced, in accordance Novembreetal.(2008)proposedaPCA-basedapproach,ac- with previous studies (Levy-Coffman 2005; Kopelman et al. curatetoafewhundredkilometerswithinEurope,toidentify 2009;Atzmonetal.2010;Beharetal.2010),contemporary thecurrentbiogeographicaloriginofapopulation.Although Middle Eastern and Caucasus populations were used as this approach has no implied historical model, it correlates surrogates. Palestinians were considered proto-Judeans be- genetic diversity with geography and can thus be a useful cause they are assumed to share a similar linguistic, ethnic, tool to study biogeography. To decrease the bias caused by and geographic background with the Judeans and were multiple populations of uneven sizes (Patterson et al. 2006; shown to share common ancestry with European Jews McVean2009),weadoptedthedual-populationframework (Bonne´-Tamir and Adam 1992; Nebel et al. 2000; Atzmon with three outgroup populations and two populations of et al. 2010; Behar et al. 2010). Similarly, Caucasus interest: a population of known geographical origin during Georgians and Armenians were considered proto-Khazars therelevanttimeperiodshowntoclusterwiththepopulation because they are believed to have emerged from the same inquestion(e.g.,Armenians)andthepopulationinquestion genetic cohort as the Khazars (Polak 1951; Dvornik 1962; (e.g.,EasternEuropeanJews).Thefirstfourpopulationswere Brook2006). used as a training set for the population in question. PCA calculationswerecarriedoutasdescribedearlier.Therotation angleofPC1–PC2coordinateswascalculatedasdescribedby TheThreePopulationTest Novembre et al. (2008). Briefly, in each figure the PC axes Thef3statisticsusesallelefrequencydifferencestoassessthe were rotated to find the angle that maximizes the summed presenceofadmixtureinapopulationXfromtwootherpopu- correlationofthemedianPC1andPC2valuesofthetraining lationsAandB,sothatf3(X;A,B)(Reichetal.2009).IfXisa populationswiththelatitudeandlongitudeoftheircountries. mixtureofAandB,ratherthantheresultofgeneticdrift,f3 Latitudinalandlongitudinaldatawereobtainedfromthelit- wouldbenegative.Asignificantnegativef3indicatesthatthe erature or by the country’s approximate centroid. Geodesic ancestorsofgroupXexperiencedahistoryofadmixturesub- distanceswerecalculatedinkilometersusingtheMatlabfunc- sequent to their divergence from A and B. The f3 statistics tion“distance.” were calculated with the threepop program of TreeMix AdmixtureAnalysis (Pickrell and Pritchard 2012) with k¼500 over the set of 221,558SNPs.ThistestdiffersfromADMIXTURE(Alexander Astructure-likeapproachwasappliedinasupervisedlearning etal.2009),whichreportstheproportionsofadmixturewith modeasimplementedinADMIXTURE(Alexanderetal.2009). themostlikelyancestor. ADMIXTURE provides an estimation of the individual’s 64 GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012 GBE TheMissingLinkofJewishEuropeanAncestry ancestriesfromtheallelefrequenciesofthedesignatedances- Arlequinversion3.1(Excoffieretal.2005).Inbrief,similarity tralpopulations.ADMIXTURE’sbootstrappingprocedurewith between populations was defined as the fraction of I hap- defaultparameterswasusedtocalculatethestandarderrors. logroups that the two populations shared as measured by Weobservedlow(<0.05)standarderrorsinallouranalyses. theKroneckerfunctiond (i): xy WiththeexceptionofSouthernEuropeans,populationswere I sortedbytheirmeanAfricanandAsianancestries.Inthisana- d ¼X(cid:2) ðiÞ, ð2Þ xy xy lysis, the three Netherland Jews were grouped with Eastern i¼1 EuropeanJews. which equals 1 if the haplogroup frequency of the ith hap- logroupisnonzeroforbothpopulationsandequals0other- IBDAnalysis wise. In other words, populations sharing the same exact TodetectIBDsegments,weranfastIBD10timesusingdiffer- haplogroups or their mutual absence are considered more ent randomseedsandcombinedtheresultsasdescribedby genetically similar than populations with different hap- Browning and Browning (2011). Segments were considered logroups. For brevity, we considered only haplogroups with to be IBD only if the fastIBD score of the combined analysis frequencies higher than 0.5%. This measure has several de- was less than e–10. This low threshold corresponds to long sirableproperties thatmakeit anexcellent measure foresti- shared haplotypes ((cid:2)1cM) that are likely to be IBD. Short mating genetic distance between populations, such as a gaps (<50 indexes) separating long domains were assumed simpleinterpretationintermsofhomogeneityandapplicabil- to be false-negative and concatenated (Browning and itytobothmtDNAandY-chromosomaldata. Browning 2011). Pairwise-IBD segments between European Jewsanddifferentpopulationswereobtainedbyfindingthe Results maximumtotalIBDsharingbetweeneachEuropeanJewand allotherindividualsofaparticularpopulation. To confirm that the Rhineland and Khazarian hypotheses indeedportraydistinctancestries,weassessedthedegreeof AlleleSharingDistances backgroundadmixturebetweenCaucasusandSemiticpopu- lations. We calculated the f3 statistics between Palestinians Allelesharingdistances(ASD)wasusedformeasuringgenetic and six Caucasus and Eurasian populations using African distances between populations as it is less sensitive to small San as an outgroup, for example, f3(Palestinians, San, samplesizesthanothermethods.PairwiseASDwascalculated Armenians). The f3 results for Turks (–0.0013), Armenians using PLINK (Purcell et al. 2007), and the average ASD be- and Georgians (–0.0019), Lezgins and Adygei (–0.0015), tweenpopulationsIandJ,wascomputedas andRussians (–0.0011) indicateda minorbutsignificant ad- ! XX mixture (–26<Z-score<–13) between Palestinians and the WIJ ¼ Wij =nm, ð1Þ populations tested. Because Armenians and Georgians i2I i2J diverged from Turks 600 generations ago (Schonberg et al. where Wij is the distance between individuals i and j from 2011),wecanassumethatthelion’sshareoftheiradmixture populations I and J of sizes n and m, respectively. To verify derivedfromthatancestryandwithintheexpectedlevelsof that these ASD differences are significant, a bootstrap ap- backgroundadmixturetypicaltotheregionratherthanrecent proach was used with the null hypothesis: H : ASD (p , 0 1 admixturewithSemiticpopulations.Therefore,similaritiesbe- p )¼ASD (p , p ), where the ASD between populations p 2 1 3 1 tweenEuropeanJewsandCaucasuspopulationswillunlikely andp iscomparedwiththeASDbetweenpopulationsp and 2 2 beduetoasharedSemiticancestry. p (supplementary note S2, Supplementary Material online). 3 PCA was next used to identify independent dimensions TocomparecontinentalJewishcommunities,individualswere that capture most of the information in the data. PCA was groupedbytheircontinentandthecomparisonwascarriedas appliedusingtwoframeworks:the“multipopulation”carried described. forallpopulations(fig.3)andseparatelyforEurasianpopula- tions along with Pygmies and Han Chinese (supplementary UniparentalAnalysis fig.S2,SupplementaryMaterialonline)andournovel“dual- To infer the migration patterns of European Jews, we inte- population” framework (supplementary fig.S3, Supplemen- grated haplogroup data from over 11,300 uniparental tary Material online). In all analyses, the studied samples chromosomes with geographical data. The haplogroup fre- aligned along the two well-established geographic axes of quencies were compared between populations to obtain a globalgeneticvariation:PC1(sub-SaharanAfricavs.therest measure of distance between populations. Pairwise genetic of the Old World) and PC2 (east vs. west Eurasia) (Li et al. distances between population haplogroups (supplementary 2008). Our results reveal geographically refined groupings, tables S1 and S2, Supplementary Material online) were esti- suchasthenearly symmetricalcontinuous Europeanrim ex- matedbyapplyingtheKroneckerfunctionasimplementedin tending from Western to Eastern Europeans, the parallel GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012 65 GBE Elhaik FIG.3.—Scatterplotofallpopulationsalongthefirsttwoprincipalcomponents.Forbrevity,weshowonlythepopulationsrelevanttothisstudy.The insetmagnifiesEurasianandMiddleEasternindividuals.Eachlettercodecorrespondstooneindividual(supplementarytableS3,SupplementaryMaterial online).Apolygonsurroundingalloftheindividualsamplesbelongingtoagroupdesignationhighlightsseveralpopulationgroups. Caucasusrim,andtheNearEasternpopulations(supplemen- therearenorecordsofCaucasuspopulationsmass-migrating tary fig. S1, Supplementary Material online) organized in to Eastern and Central Europe prior to the fall of Khazaria Turk–IranianandDruzeclusters(fig.3).MiddleEasternpopu- (Balanovskyetal.2011),thesefindingsimplyasharedorigin lationsformagradientalongthediagonallinebetweenBed- forEuropeanJewsandCaucasuspopulations. ouins and Near Eastern populations that resembles their ToassesstheabilityofourPCA-basedapproachtoidentify geographical distribution. The remaining Egyptians and the thebiogeographicaloriginsofapopulation,wefirstsoughtto bulk of Saudis distribute separately from Middle Eastern identify the biogeographical origin of Druze. The Druze reli- populations. gionoriginatedinthe11thcentury, butthepeople’sorigins EuropeanJewsareexpectedtoclusterwithnativeMiddle remainasourceofmuchconfusionanddebate(Hitti1928). EasternorCaucasuspopulationsaccordingto the Rhineland WetracedDruzebiogeographicalorigintothegeographical or Khazarian hypotheses, respectively. The results of all PC coordinates: 38.6±3.45(cid:3) N, 36.25±1.41(cid:3) E (supplementary analyses(fig.3,supplementaryfigs.S2andS3,Supplementary fig.S4,SupplementaryMaterialonline)intheNearEast(sup- Material online)showthatover70% of EuropeanJewsand plementary fig. S1, Supplementary Material online). Half of almost all Eastern European Jews cluster with Georgian, the Druze clustered tightly in Southeast Turkey, and the re- Armenian, and Azerbaijani Jews within the Caucasus rim mainingwerescatteredalongnorthernSyriaandIraq.These (fig. 3 and supplementary fig. S3, Supplementary Material results are in agreement with the findings of Shlush et al. online).Approximately15%ofCentralEuropeanJewscluster (2008)usingmtDNAanalysis.Theinferredgeographicalpos- with Druze and the rest cluster with Cypriots. All European itionsofDruzewereusedinthesubsequentanalyses. JewsclusterdistinctlyfromtheMiddleEasterncluster.Strong The geographical origins of European Jews varied for dif- evidence for the Khazarian hypothesis is the clustering of ferentreferencepopulations(fig.4andsupplementaryfig.S5, EuropeanJewswiththepopulationsthatresideonopposite SupplementaryMaterialonline),butalltheresultsconverged ends of ancient Khazaria: Armenians, Georgians, and to Southern Khazaria along modern Turkey, Armenia, Azerbaijani Jews (fig. 1). Because Caucasus populations re- Georgia, and Azerbaijan. Eastern European Jews clustered mainedrelativelyisolatedintheCaucasusregionandbecause tightlycomparedwithCentralEuropeanJewsinallanalyses. 66 GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012 GBE TheMissingLinkofJewishEuropeanAncestry FIG.4.—BiogeographicaloriginofEuropeanJews.FirsttwoprincipalcomponentswerecalculatedforPygmies,FrenchBasques,HanChinese(black), Armenians(blue),andEasternorCentralEuropeanJews(red)—allofequalsize.PCAwascalculatedseparatelyforEasternandCentralEuropeanJewsand the results were merged. Using the first four populations as a training set, Eastern (squares) and Central (circles) European Jews were assigned to geographicallocationsbyfittingindependentlinearmodelsforlatitudeandlongitudeaspredictedbyPC1andPC2.Eachshaperepresentsanindividual. Majorcitiesaremarkedincyan. Thesmallestdeviationsinthegeographicalcoordinateswere European ancestries exhibit opposite gradients among obtained with Armenians for both Eastern (38±2.7(cid:3) N, Europeanpopulations.TheNearEastern–Caucasusancestries 39.9±0.4(cid:3) E) and Central (35±5(cid:3) N, 39.7±1.1(cid:3) E) are dominant among Central (38%) and Eastern (32%) European Jews (fig. 4). Similar results were obtained for European Jews followed by Western European ancestry Georgians (supplementary fig. S5, Supplementary Material (30%).Amongnon-Caucasuspopulations,theCaucasusan- online). Remarkably, the mean coordinates of Eastern Euro- cestryisthelargestamongEuropeanJews(26%)andCypriots pean Jews are 560 km from Khazaria’s southern border (31%). These populations also exhibit the largest fraction of (42.77(cid:3) N,42.56(cid:3) E)nearSamandar—thecapitalcityofKha- MiddleEasternancestryamongnon–MiddleEasternpopula- zariafrom720to750CE(Polak1951). tions. As both Caucasus and Middle Eastern ancestries are The duration, direction, and rate of gene flow between absentinEasternEuropeanpopulations,ourfindingssuggest populations determine the proportion of admixture and the thatEasternEuropeanJewsacquiredtheseancestriespriorto total length of chromosomal segments that are identical by their arrival to Eastern Europe. Although the Rhineland hy- descent. Admixture calculations were carried out using a pothesisexplainstheMiddleEasternancestrybystatingthat supervisedlearningapproachinastructure-likeanalysis.This JewsmigratedfromPalestinetoEuropeinthe7thcentury,it approach has many advantages over the unsupervised ap- fails to explain the large Caucasus ancestry, which is nearly proach that not only traces ancestry to K abstract unmixed endemictoCaucasuspopulations. populations under the assumption that they evolved inde- Although they cluster with Caucasus populations (fig. 5), pendently(Chakravarti2009;WeissandLong2009)butalso EasternandCentral EuropeanJewssharealargefractionof isproblematicwhenappliedtostudyJewishancestry,which WesternEuropeanandMiddleEasternancestries,bothabsent canbedatedonlyasfarbackas3,000years(fig.2).Moreover, inCaucasuspopulations.AccordingtotheKhazarianhypoth- the results of the unsupervised approach vary based on the esis,theWesternEuropeanancestrywasimportedtoKhazaria particularpopulationsusedfortheanalysisandthechoiceof byGreco–RomanJews,whereastheMiddleEasternancestry K, rendering the results incomparable between studies. alludestothecontributionofbothearlyIsraeliteProto-Judeans Admixturewascalculatedwithareferencesetofsevenpopu- as well as Mesopotamian Jews (Polak 1951; Koestler 1976; lations representing largely genetically distinct regions: Sand2009).CentralandEasternEuropeanJewsdiffermostly Pygmies(SouthAfrica),Palestinians(MiddleEast),Armenians in their Middle Eastern (30% and 25%, respectively) and (Caucasus), Turk–Iranians (Near East), French Basque (West Eastern European ancestries (3% and 12%, respectively), Europe),Chuvash(EastEurope),andHanChinese(EastAsia) probablyduetolateadmixture. (fig.5).Theancestralcomponentsgroupedallpopulationsby Druze exhibits a large Turk–Iranian ancestry (83%) in theirgeographicalregionswithEuropeanJewsclusteringwith accordance with their Near Eastern origin (supplementary Caucasus populations. As expected, Eastern and Western fig. S4, Supplementary Material online). Druze and Cypriot GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012 67 GBE Elhaik FIG.5.—AdmixtureanalysisofEuropean,Caucasus,NearEastern,andMiddleEasternpopulations.Thexaxisrepresentsindividualsfrompopulations sortedaccordingtotheirancestriesandarrayedgeographicallyroughlyfromNorthtoSouth.Eachindividualisrepresentedbyaverticalstackedcolumn (100%)ofcolor-codedadmixtureproportionsoftheancestralpopulations. appearsimilartoEuropeanJewsintheirMiddleEasternand thebestofourknowledge,thesearethelargestIBDregions WesternEuropeanancestries,thoughtheydifferlargelyinthe everreportedbetweenEuropeanJewsandnon-Jewishpopu- proportionofCaucasusancestry.Theseresultscanexplainthe lations.ThedecreaseintotalIBDbetweenEuropeanJewsand genetic similarity between European Jews, Southern Euro- other populations combined with the increase in distance peans,andDruzereportedinstudiesthatexcludedCaucasus fromtheCaucasussupporttheKhazarianhypothesis. populations (Price et al. 2008; Atzmon et al. 2010; Zooss- WenextestimatedthelevelofendogamyamongEurasian mann-Diskin2010).Overall,ourresultsportraytheEuropean Jewish communities and compared their genetic distances JewishgenomeasamosaicofNearEastern-Caucasus,Wes- with non-Jewish neighbors, Caucasus, and Middle Eastern ternEuropean,MiddleEastern,andEasternEuropeanances- populations.Ourresultsexpandthepreviousreportofhighen- triesindecreasingproportions. dogamyinJewishpopulations(Beharetal.2010)andnarrow Togleanfurtherdetailsofthegenomicregionscontribut- the endogamy to regional Jewish communities (table 1, left ingtothegeneticsimilaritybetweenEuropeanJewsandthe panel).Jewsaresignificantlymoresimilartomembersoftheir perspective populations, we compared their total genomic owncommunitythantootherJewishpopulations(P<0.01, regions shared by IBD. If European Jews emerged from bootstrapttest),withtheconspicuousexceptionofBulgarian, Caucasus populations, the two would share longer IBD re- Turkish,andGeorgianJews.Theseresultsstressthehighhet- gionsthanwithMiddleEasternpopulations.TheIBDanalysis erogeneity among Jewish communities across Eurasia and exhibits a skewed bimodal distribution embodying a major even within communities, as in the case of the Balkan and Caucasus ancestry with a minor Middle Eastern ancestry CaucasusJews. (fig. 6), consistent with the admixture results (fig. 5). The When compared with non-Jewish populations, all Jewish total IBD regions shared between European Jews and communitiesweresignificantly(P<0.01,bootstrapttest)dis- Caucasus populations (9.5cM on average) are significantly tantfromMiddleEasternpopulationsand,withtheexception larger than regions shared with Palestinians (5.5cM) of Central European Jews, significantly closer to Caucasus (Kolmogorov–Smirnov goodness-of-fit test, P<0.001). To populations (table 1, right panel). Similar findings were 68 GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012 GBE TheMissingLinkofJewishEuropeanAncestry FIG.6.—ProportionoftotalIBDsharingbetweenEuropeanJewsanddifferentpopulations.Populationsaresortedbydecreasingdistancefromthe Caucasus.ThemaximalIBDbetweeneachEuropeanJewandanindividualfromeachpopulationaresummarizedinboxplots.Linespassthroughthemean values. reportedbyBeharetal.(2010)althoughtheyweredismissed tothoseobtainedfromthebiparentalanalyses.BothmtDNA as“abiasinherentinourcalculations.” However,wefound and Y-chromosomal analyses yield high similarities between no such bias. The close genetic distance between Central European Jews and Caucasus populations rooted in the European Jews and Southern European populations can be Caucasus (fig. 7) in support of the Khazarian hypothesis. attributedtoalateadmixture.Theresultsareconsistentwith Interestingly, the maternal analysis depicts a specific ourpreviousfindingsinsupportoftheKhazarianhypothesis. Caucasus founding lineage with a weak Southern European As the only commonality among all Jewish communities is ancestry(fig.7A),whereasthepaternalancestryrevealsadual their dissimilarity from Middle Eastern populations (table 1, Caucasus–Southern European origin (fig. 7B). As expected, right panel), grouping different Jewish communities without the maternal ancestry exhibits a higher relatedness scale correcting for their country of origin, as is commonly done, withnarrowdispersalcomparedwiththepaternalancestry. wouldincreasetheirgeneticheterogeneity. Dissectinguniparentalhaplogroupsallowsustodelvefur- Finally, we carried uniparental analyses on mtDNA and therintoEuropeanJews’migrationroutes.Astheresultsdo Y-chromosome comparing the haplogroup frequencies be- notspecifywhethertheSouthernEurope–Caucasusmigration tweenEuropeanJewsandotherpopulations.TheRhineland was ancient or recent nor indicate the migration’s direction, hypothesisdepictsMiddleEasternoriginsforEuropeanJews’ thatis,fromSouthernEuropetotheCaucasusortheopposite, paternalandmaternalancestriesboth,whereastheKhazarian therearefourpossiblescenarios.Ofthese,theonlyhistorically hypothesisdepictsaCaucasusancestry alongwithSouthern supportable scenarios are ancient migrations from Southern European and Near Eastern contributions of migrates from EuropetowardKhazaria(6th–13thcenturies)andmorerecent Byzantium and the Caliphate, respectively. Because Judaism migrations from the Caucasus to Central and Southern wasmaternallyinheritedonlysincethe3rdcenturyCE(Patai Europe (13th–15th centuries) (Polak 1951; Patai and Patai andPatai1975),themtDNAisexpectedtoshowastronger 1975; Straten 2003; Brook 2006; Sand 2009). A westward local female-biased founder effect compared with the migration from the diminished Khazariatoward Central and Y-chromosome. Haplogroup similarities between European Southern Europe would have exhibited a gradient from the Jews and other populations were plotted as heat maps on Caucasus toward Europe for both matrilineal and patrilineal the background of their geographical locations (fig. 7). The lines.Suchagradientwasnotobserved.Bycontrast,Judaized pairwisedistancesbetweenallstudiedpopulationsareshown Greco–Roman male-driven migration directly to Khazaria is insupplementaryfig.S6,SupplementaryMaterialonline. consistent with historical demographic migrations and could Our results shed light on sex-specific processes that, al- havecreatedtheobservedpattern.Moreover,wefoundlittle thoughnotevidentfromtheautosomaldata,areanalogous genetic similarity between European Jews and populations GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012 69 GBE Elhaik Table1 GeneticDistances(ASD)between Regionaland ContinentalJewish Communities(LeftPanel)andbetween RegionalJewishCommunitiesand Their Non-JewishNeighboringPopulations, Caucasus, andMiddleEastern Populations(Right Panel) RegionalJewishCommunity JewishPopulations Non-JewishPopulations Self European Asian African NeighboringPopulation Caucasus MiddleEastern EasternEuropean 0.2318 0.2328 0.2381 0.2446 Hungarian 0.2346 0.2340 0.2387 CentralEuropean 0.2312 0.2326 0.2378 0.2445 Italians 0.2335 0.2338 0.2385 Bulgarian 0.2326 0.2331 0.2376 0.2439 Romanian 0.2347 0.2337 0.2380 Turkish 0.2336 0.2336 0.2376 0.2439 Turkish 0.2353 0.2337 0.2379 Iraqi 0.2303 0.2351 0.2375 0.2447 Iranian 0.2363 0.2338 0.2381 Georgian 0.2304 0.2345 0.2372 0.2442 Georgian 0.2332 0.2332 0.2378 Azerbaijani 0.2304 0.2365 0.2386 0.2465 Lezgins 0.2367 0.2352 0.2398 Iranian 0.2310 0.2364 0.2391 0.2434 Iranian 0.2414 0.2361 0.2383 NOTE.—Underlinedentriesaresignificantlysmallerthroughouteachpanel.Thegeographicallynearestnon-Jewishpopulationswereconsideredneighboringpopulations. ThedistancesinthelasttwocolumnsarebetweenaJewishcommunityandoneCaucasus(ArmeniansorGeorgians)orMiddleEastern(Palestinians,Bedouins,orJordanians) populationthatexhibitedthelowestmeanASD. eastwardtotheCaspianSeaandsouthwardtotheBlackSea, hypotheses.Weshowedthatthehypothesesarealsogenet- delineatingthegeographicalboundariesofKhazaria(table1 ically distinct and that the miniscule Semitic ancestry in andfig.1). Caucasus populations cannot account for the similarity be- tweenEuropeanJewsandCaucasuspopulations.Therecent availability of genomic data from Caucasus populations Discussion allowed testing the Khazarian hypothesis for the first time Eastern and Central European Jews comprise the largest andpromptedustocontrastitwiththeRhinelandhypothesis. group of contemporary Jews, accounting for approximately To evaluate the two hypotheses, we carried out a series of 90% of over 13 million worldwide Jews. Eastern European comparativeanalysesbetweenEuropeanJewsandsurrogate Jews made up over 90% of European Jews before World KhazarianandJudeanpopulationsposingthesamequestion War II. Despite their controversial ancestry, European Jews eachtime:areEasternandCentralEuropeanJewsgenetically are an attractive group for genetic and medical studies closer to Khazarian or Judean populations? Under the due to their presumed genetic history (Ostrer 2001). Rhineland hypothesis, European Jews are also expected to Correcting for population structure and using suitable con- exhibithighendogamy,particularlyacrosstheirEurasiancom- trols are critical in medical studies, thus it is vital to deter- munities,andbemoresimilartoMiddleEasternpopulations mine whether European Jews are of Semitic, Caucasus, or compared with their neighboring non-Jewish populations, other ancestry. whereastheKhazarianhypothesispredictstheoppositescen- ThoughJudaismwasbornencasedintheological–historical ario. We emphasize that these hypotheses are not exclusive myth,noJewishhistoriographywasproducedfromthetime andthatsomeEuropeanJewsmayhaveotherancestries. ofJosephusFlavius(1stcenturyCE)tothe19thcentury(Sand OurPC,biogeographicalestimation,admixture,IBD,ASD, 2009). Early historians bridged the historical gap simply by and uniparental analyses were consistent in depicting a linking modern Jews directly to the ancient Judeans (fig. 2), CaucasusancestryforEuropeanJews.Ourfirstanalysesre- aparadigmthatwaslaterembeddedinmedicalscienceand vealed tight genetic relationship of European Jews and crystallizedasanarrative.Manyhavechallengedthisnarrative Caucasus populations and pinpointed the biogeographical (Koestler1976;Straten2007),mainlybyshowingthatasole origin of European Jews to the south of Khazaria (figs. 3 Judean ancestry cannot account for the vast population of and 4). Our later analyses yielded a complex ancestry with EasternEuropeanJewsinthebeginningofthe20thcentury a slightly dominant Near Eastern–Caucasus ancestry, large without the major contribution of Judaized Khazars and by Southern European and Middle Eastern ancestries, and a demonstrating that it is in conflict with anthropological, his- minor Eastern European contribution; the latter two differ- torical, and genetic evidence (Patai and Patai 1975; Baron entiated Central and Eastern European Jews (figs. 4 and 5 1993;Sand2009). andtable1).AlthoughtheMiddleEasternancestryfadedin With uniparental and whole genome analyses providing the ASD and uniparental analyses, the Southern European ambiguous answers (Levy-Coffman 2005; Atzmon et al. ancestry was upheld, probably attesting to its later time 2010; Behar et al. 2010), the question of European Jewish period(table1andfig.7). ancestry remained debated mainly between the supporters WeshowthattheKhazarianhypothesisoffersacompre- of the Rhineland and Khazarian hypotheses. Although both hensive explanation for the results, including the reported theoriesoversimplifycomplexhistoricalprocessestheyareat- Southern European (Atzmon et al. 2010; Zoossmann-Diskin tractive due to their distinct predictions and testable 2010)andMiddleEasternancestries(Nebeletal.2000;Behar 70 GenomeBiol.Evol.5(1):61–74. doi:10.1093/gbe/evs119 AdvanceAccesspublicationDecember14,2012
Description: