ebook img

Matchmaker, Matchmaker, Make Me a Match - Daniel M. Abrams PDF

23 Pages·2014·3.89 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Matchmaker, Matchmaker, Make Me a Match - Daniel M. Abrams

PHYSICAL REVIEW X 4, 041009 (2014) Matchmaker, Matchmaker, Make Me a Match: Migration of Populations via Marriages in the Past Sang Hoon Lee (이상훈),1,2* Robyn Ffrancon,3 Daniel M. Abrams,4 Beom Jun Kim (김범준),5 and Mason A. Porter2,6 1Integrated Energy Center for Fostering Global Creative Researcher (BK 21 plus) and Department of Energy Science, Sungkyunkwan University, Suwon 440-746, Korea 2Oxford Centre for Industrial and Applied Mathematics (OCIAM), Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom 3Department of Physics, University of Gothenburg, 412 96 Gothenburg, Sweden 4Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, Illinois 60208, USA 5Department of Physics, Sungkyunkwan University, Suwon 440-746, Korea 6CABDyN Complexity Centre, University of Oxford, Oxford OX1 1HP, United Kingdom (Received 27 January 2014; revised manuscript received 27 May 2014; published 16 October 2014) Thestudyofhumanmobilityisbothoffundamentalimportanceandofgreatpotentialvalue.Forexample, itcanbeleveragedtofacilitateefficientcityplanningandimprovepreventionstrategieswhenfacedwith epidemics.Thenewfoundwealthofrichsourcesofdata—includingbanknoteflows,mobilephonerecords, and transportation data—has led to an explosion of attempts to characterize modern human mobility. Unfortunately,thedearthofcomparablehistoricaldatamakesitmuchmoredifficulttostudyhumanmobility patternsfromthepast.Inthispaper,wepresentananalysisoflong-termhumanmigration,whichisimportant forprocessessuchasurbanizationandthespreadofideas.WedemonstratethatthedatarecordfromKorean familybooks(called“jokbo”)canbeusedtoestimatemigrationpatternsviamarriagesfromthepast750 years. We apply two generative models of long-term human mobility to quantify the relevance of geographical information to human marriage records in the data, and we find that the wide variety in thegeographical distributionsoftheclans posesinterestingchallengesforthedirectapplicationofthese models.Usingthedifferentgeographicaldistributionsofclans,wequantifythe“ergodicity”ofclansinterms ofhowwidelyanduniformlytheyhavespreadacrossKorea,andwecomparetheseresultstothoseobtained usingsurnamedatafromtheCzechRepublic.Toexaminepopulationflowinmoredetail,wealsoconstruct andexamineapopulation-flownetworkbetweenregions.Basedonthecorrelationbetweenergodicityand migrationinKorea,weidentifytwodifferenttypesofmigrationpatterns:diffusiveandconvective.Weexpect theanalysisofdiffusiveversusconvectiveeffectsinpopulationflowstobewidelyapplicabletothestudyof mobilityandmigrationpatternsacrossdifferentcultures. DOI: 10.1103/PhysRevX.4.041009 SubjectAreas:ComplexSystems, Interdisciplinary Physics, Statistical Physics I. INTRODUCTION that are difficult to deduce by focusing on the character- istics of isolated members [11]. Since Quetelet’s advocacy of “social physics” in the Research that takes a physics-based approach has 1830s [1] and Ravenstein’s seminal work later in the focused predominantly on modern mobility—rather than nineteenth century [2], quantitative studies of human historical mobility and migration—because of the dispro- mobility have suggested that human movements follow portionate availability of large, rich data sets from modern statisticallypredictablepatterns[3–10].Suchsystems-level life [12–16]. By contrast, historical data tend to be sparse, studies are an important complement to individual-based incomplete,andnoisy.Theseconstraintslimitthescopeof approaches,astheycanrevealpopulation-levelphenomena conclusionsthatonecandrawabouthowhumansmingled, mixed, and migrated over long time scales [17,18]. In this *Corresponding author. paper, we investigate historical human mobility and asso- [email protected] ciated human migration by studying the matchmaking process for traditional marriages in Korea combined with Published by the American Physical Society under the terms of modern census data in South Korea. We obtain our data the Creative Commons Attribution 3.0 License. Further distri- fromKorean“familybooks”calledjokbo(족보inKorean). butionofthisworkmustmaintainattributiontotheauthor(s)and the published article’s title, journal citation, and DOI. Suchaconfluenceofhistoricalandmoderndataisrare,and 2160-3308=14=4(4)=041009(23) 041009-1 Published by the American Physical Society LEE et al. PHYS. REV. X 4, 041009 (2014) it allows a novel test of generative models for human Such processes include urban population growth and the mobility. demographicstructureofcities[26];cityinfrastructureand According to Korean tradition, family names are sub- planning [27]; unemployment [28]; and the spread of divided into clans called bon-gwan (본관), which are culture, religion, and other ideas [29]. Most early studies identified by a unique place of origin. For example, of migration emphasized so-called “internal migration” the two Korean authors of this paper belong to the clans (i.e., movement within a country) [2,3,30,31] like the “Kim from Gimhae (김해 김)” and “Lee from Hakseong phenomenathatweinvestigate,thoughinternationalmigra- (학성 이),” and the clan “Lee from Hakseong” is distinct tion is also a prominent field of study [32,33]. It has been fromtheclan“LeefromJeonju(전주이)”[theroyalclanof morepopulartostudyinternationalmigrationthaninternal the Joseon dynasty and the Great Korean Empire (1392– migration during the past few decades, but present-day 1910)].Whentwo Koreansmarry,thebride’sclanandher urbanization processes in Asia, Africa, and Latin America birth year are customarily recorded in the jokbo owned by have led to renewed interest in internal migration [26,31]. the groom’s family. These jokbo are kept in the groom’s We hopethat ourwork providesuseful ideasto helpsolve familyandpasseddownthroughthegenerations;theyserve some of the fundamental questions in the migration primarilyasarecordofthenamesandbirthyearsofallmale literature: who migrates, why people migrate, and the descendants[19,20].Inpreviouswork,researchersusedthe consequences of migration (e.g., rural depopulation). marriage data contained in these books to estimate the The remainder of our paper is organized as follows. In populationsizesanddistributionsofclansinKoreaasfaras Sec.II,weintroducethejokboandcensusdatathatweuse 750yearsinthepast[21–23].Suchdistributionsareuseful in our investigation. In Sec. III, we present our primary forunderstandingquantitativeaspectsofhumanculture,and methodology for data analysis: the gravity and radiation weproceedevenfurtherbyconductingasystematicinves- models for marriage-flux analysis, a special case of the tigationofthegeographicalinformationembeddedinjokbo. gravity model that we call the population-product model, We examine a set of ten jokbo to try to understand how and a diffusion model for ergodicity analysis. We present geographical separation affected human interaction in the ourmainresultsinSec.IV,andweconcludeinSec.V.We past in Korea. Specifically, we examine how interclan includedetailedinformationonthedatasets,datacleaning, marriage rates can be predicted by physical distance and additional results and practical considerations for our how clans themselves have spread across the country analysis, an investigation of a network model for popula- duringthepastseveralhundredyears.Todothis,weapply tion flow, and various other results in Appendices A–I. two generative models for describing human mobility patterns to jokbo records of past marriages between two II. DATA SETS clans. Note that the identification of clans with specific geographical origins is not unique to Korea. For example, A. Jokbo data sets the origins of British and Czech surnames were also the Forourmarriage-fluxanalysis,weusethesametenjokbo subject of recent investigations [24,25]. datasetsthatwereemployedin[21–23].Anindividualbook Our analysis consists of two parallel approaches. First, contains between 1873 and 104356 marriage entries, and we use marriages recorded in jokbo to obtain snapshotsof there are a total of 221598 entries across all books. (See migration (mainly of individual women) for a “marriage- Table I and Figs. 8 and 9 in Appendix A for details.) flux analysis.” We apply two generative models for Each entry contains the bride’s clan and year of birth [34]. population flow, discuss the results of applying these The oldest book has entries that date back to the 13th models, and explain the limitations that arise from the century. widevarietyinthegeographical-distributionpatternsofthe Previous studies of this data set [21–23] did not use clans.Second,toconsiderthegeographicalspreadofclans any of the information that is encoded implicitly in the inmoredetail,weconductan“ergodicityanalysis.”Weuse geographical origins of each clan. Such information, the modern geographical distribution of clans from census together with the modern geographical distribution of data to infer “ergodicity” of clans (mainly caused by past clans, comprises a key ingredient of our analysis. We movementofmaledescentlines).Toprovideanadditional convert location names to geographical coordinates using perspective, we also use these data to construct a network the Google Maps Application Programming Interface model of population flows. To the best of our knowledge, (GMAPI) [36]. Because of the much sparser coverage of thenotionofdiffusiveversusconvectivepopulationflowis North Korean regions by Google Maps (see Fig. 12 in new for data-driven studies of human mobility and migra- Appendix D), this geolocation data are a biased sample of tion,andwebelievethatthiskindofapproachcanprovide the full data. However, data for the southern half of the valuableinsightsformanyproblemsinpopulationmobility Korean peninsula are rich [37], and it is sufficient to draw and migration. In the present paper, we focus on long- interesting androbustconclusions.Forexample,theeffect term migration, which has significant effects on many of a change in the legalityofintraclan marriage in 1997is processes over a variety of spatial and temporal scales. clearly observable in the data. 041009-2 MATCHMAKER, MATCHMAKER, MAKE ME A MATCH: … PHYS. REV. X 4, 041009 (2014) TABLEI. Numberofentriesandotherinformationavailableineachjokbo,valuesthatwedeterminedbyusingadditionaldatathatwe obtainedfromothersources,andasummaryofsomeofourcomputationalresultsfortheclancorrespondingtoeachjokbo.Foreach jokbo,weindicatetheID(1–10),theyeart0ofitsearliestentry,itsnumberofentriesNe,andthenumberofdistinctclans(includingat leastonebrideforeachclan)Ncamongthoseentries[21].ThequantityNγ¼0givesthenumberofclansfromthe2000census(whichis 4303)plusthenumberofclansineachjokbothatarenotalreadyinthecensus.WecanusetheseNγ¼0clansinthegravitymodelwhen γ ¼0(i.e.,forthepopulation-productmodel,whichisapplicablewithoutgeographicalinformation)andα¼1.(Seethediscussionin AppendixB.)WealsoindicatethebestvaluesforthefittingparametersαandγofthegravitymodelinEq.(1).Weapplythisfittothe brides’sideofmarriages,andwecalculatethesevaluesbyminimizingthesumofsquareddifferencesusingtheSCIPY.OPTIMIZEpackage inPYTHON[35](withinitialvaluesofα¼γ ¼aG ¼1.0inourcomputations).WecomputethenumberofadministrativeregionsNadmin inwhichtheclanthatcorrespondstoeachjokbo(i.e.,thegrooms’side)appearsbasedoncensusdatafrom1985and2000.Weusethe census data to compute a radius of gyration r (km) for both 1985 and 2000 and to estimate a diffusion constant D (km2=year) for g diffusionofclansbetweenthosetwoyears.WeconsiderclanswithN2000 ≥150tobeergodic(seeFig.5).Basedonthisdefinition,all admin ten clans in the jokbo data are ergodic. ID t0 Ne Nc Nγ¼0 α γ N1ad9m85in N2ad0m00in rg (1985) rg (2000) Ergodic? D 1 1513 104356 2657 5510 1.0749 −0.0349 199 199 115.5 113.5 Y 0.062 2 1562 29139 1274 4796 1.0145 0.2305 199 199 124.4 128.7 Y 0.737 3 1752 3500 390 4364 1.0853 0.2000 199 199 132.7 151.5 Y 0.426 4 1698 15445 915 4524 0.9678 0.1210 199 199 132.7 151.5 Y 0.426 5 1439 17911 923 4551 0.9452 0.2346 198 199 101.2 97.4 Y 0.062 6 1476 16379 727 4462 1.1102 0.5377 130 196 144.6 128.8 Y 2.253 7 1802 1873 289 4359 1.4930 −0.0961 199 199 110.2 116.1 Y −0.062 8 1254 15006 958 4570 0.9651 0.1285 198 198 114.1 109.6 Y 0.101 9 1458 6463 548 4376 1.1253 0.3650 196 195 118.6 121.5 Y 0.784 10 1475 11526 736 4463 0.9947 0.4502 198 196 117.7 127.7 Y 0.461 B. Modern name distributions ourdata[42].Historically,professionalmatchmakerswere employed to travel between families to arrange marriages In addition to the jokbo data sets that we employ for [43], so we posit that physical distance plays a significant marriage-flux analysis, we also use data from two Korean role in determining marriage flux. We examine this censusreports(1985and2000)toevaluatethecurrentspatial hypothesis using two generative models: a conventional distributionofclansinKorea[38,39].AsillustratedinFig.1, gravitymodelwithadjustableparametersthatincorporates someclanshavedispersedratherbroadly,butothersremain the distance between regions and the effects (or lack localized(usuallyneartheirplaceoforigin).Drawingonideas fromstatisticalmechanics[40,41],weusetheterm“ergodic” thereof) of each region’s population [4,5], and a recently developed, parameter-free radiation model [44,45]. as an analogy to describe clans that have spread broadly Thegravitymodelhas been usedto explainphenomena throughoutKorea.Wesupposethatsuchclanshavereacheda dynamic equilibrium: An ergodic clan is “spread equally” suchascommutingpatternsanddiseasespread[46–49].In thismodel,thefluxofpopulationG fromsiteitositejis throughout Korea in the sense that one expects it to have ij roughlythesamegeographicaldistributionasthepopulation mαmβ aassapwathiaollley.Nunoitfeotrhmatswtaetedofonrotthexepseacmtaenreeragsoodnisccthlaantttohereafuclhl Gij ¼ riγ j ; ð1Þ ij populationisnotspatiallyuniform(e.g.,inhomogeneitiesin where α, β, and γ are adjustable exponents. For our naturalresources,advantagestocongregatingincities,etc.). purposes, G is proportional to the flux of women from Nonergodic clans should have rather different distribu- ij clan i to clan j through marriage. The total population of tions from those that we dub ergodic because their clan i is given by m, and the variable r is the distance distribution must differ significantly from that of the full i ij between the centroids of clans i and j. We employ census population. One can construe the notion of ergodicity as a data from 2000 to calculate centroids using the spatial naturalextensionofotherphysicalanalogiesthatwereused populationdistributionforeachclan[38].Importantly,note in previous quantitative studies (including the original ones) on human migration [1–10]. As we discuss later, that choosing γ ¼0 in the gravity model yields a special case in which flux is independent of distance. As we will we can quantify the extent of clan ergodicity. see in Sec. IVA, this situation arises when large uncer- III. METHODS tainties in geographical locations (due to clan ergodicity) hinder the accuracy of estimations of distances. A. Generative models for marriage-flux analysis Determiningthecentroidlocationsofclansfrommodern We compute a “marriage flux”—the rate of marriage of census data is more accurate than attempting to determine women from clan i into clan j—for all clan pairs ði;jÞ in the locations where clans originated [50] for two reasons. 041009-3 LEE et al. PHYS. REV. X 4, 041009 (2014) FIG. 1. Examples of (a) ergodic and (b) nonergodic clans. We color the regions of South Korea based on the fraction of the total populationcomposedofmembersoftheclanintheyear2000.Weusearrowstoindicatetheoriginsofthetwoclans:Gimhaeontheleft andUlsan(“Hakseong”istheoldnameofthecity)ontheright.Inthismap,weusethe2010administrativeboundaries[39].Seethe appendices for discussions of data sets and data cleaning. First, for many clans, origin-place names have differed Note that members of clans i and j are not included in from geographical clan centers from the beginning of computing s [44]. As before, m is the population of ij i recorded Korean history—which, in particular, predates clani,membersofclanimarryintoclanj,andclanjkeeps the period that spans our jokbo data sets [20,51]. Second, the marriage records. In contrast to the gravity model, the origin-place names for many clans have become the radiation model does not include any external param- outdated and cannot be located accurately via the names eters. Importantly, this renders it unable to describe the of modern administrative regions. For instance, the clan geographically-independent situation that we need to con- origin“Hakseong”ofthefirstauthorisanoldnameforthe sider in our study (and which we can obtain by setting city Ulsan in South Korea, but the name Hakseong is γ ¼0 in the gravity model). currently only used to describe the small administrative Forboththegravityandradiationmodels,weusecensus region “Hakseong-dong” in Ulsan. However, as we dem- data from the year 2000 [38] as a proxy for past pop- onstrateinFig.1,usingthecentroidlocationof“Leefrom ulations. This allows us to compute the quantities r , m, ij i Hakseong” correctly gives the modern city Ulsan. This and sij. Our approximation is supported by previously procedure works in part because Lee from Hakseong is a reported estimates of stability in Korean society. nonergodic clan; for ergodic clans such as Kim from Historically, most clans have grown in parallel with the Gimhae, the spatial precision is much worse. This is an total population, so we assume that the relative sizes of important observation that we will discuss in detail later. clanshaveremainedroughlyconstant[23].InbothEqs.(1) Weuseaversionoftheradiationmodelthattakesfinite- and (2), only the relative sizes mi=N and sij=N matter for sizeeffectsintoaccount[45].ThepopulationfluxR from calculating the flux (up to a constant of proportionality). ij clan i to clan j is B. Human diffusion and ergodicity analysis Ω mm R ¼ i × i j ; ð2Þ One way to quantify the notion of clan ergodicity is to ij 1−m=N ðm þs Þðm þm þs Þ examine what we call the “clan-density anomaly,” which i i ij i j ij P describes the local deviation in density of members of a where Ω ¼ R is proportional to the total population given clan. The clan-density anomaly is ϕðr;tÞ¼ i j ij i that marries from clan i into any other clan, N is the total cðr;tÞ−½mðtÞ=NðtÞ(cid:2)ρðr;tÞ at position r¼ðx;yÞ and i i population, and s is the exclusive population within a time t, where cðr;tÞ is the (spatially and temporally ij i circle of radius r centered on the centroid of clan i. varying) local clan concentration (i.e., the clan population ij 041009-4 MATCHMAKER, MATCHMAKER, MAKE ME A MATCH: … PHYS. REV. X 4, 041009 (2014) density), mðtÞ is the total clan population, ρðr;tÞ is the 1 X i r ðtÞ¼ rðkÞϕðk;tÞA ; ð5Þ local population density (i.e., the total population of all i;C ϕ ðtÞ i k clansatpointrandtimet,dividedbythedifferentialarea), i;tot k and NðtÞ is the total population of all of the clans at where rðkÞ¼½xðkÞ;yðkÞ(cid:2) gives the coordinates of the time t. If a clan were to occupy a constant fraction of the centroid of region k and the normalization constant is total population everywhere in the country, then ϕi ¼0 X everywhere because its local concentration would be ϕ ðtÞ¼ ϕðk;tÞA ; ð6Þ c ¼ðm=NÞρ.(Thissituationcorrespondstoperfectergo- i;tot i k i i k dicity.) The range of typical values for the clan-density anomalydependsonaclan’saggregateconcentrationinthe whereϕðk;tÞistheanomalyofclaniinregionkattimet. i country. Examining the anomaly relative to clan concen- Notethatwecalculatethecentroidofpopulationfortheith tration,theyear-2000numbersforϕ=ðmρ=NÞrangefrom clan (as opposed to the centroid of its anomaly) using i i −1700to7400forKimfromGimhaeandfrom−19000to analogous formulas to Eqs. (5) and (6) in which ϕ is i 87000forLeefromHakseong.Clearly,thedistributionof replaced by the concentration c. The radius of gyration i the latter is much more heterogeneous (see Fig. 17 in (i.e.,thespatialsecondmoment)r ðtÞofclaniattimetis gi Appendix I). then defined by Combining the notion of clan-density anomaly with 1 X traditional arguments—flow ideas based on Ohm’s law r 2ðtÞ¼ ∥rðtÞ−r ðtÞ∥2ϕðk;tÞA ; ð7Þ and “molecular weights for population” are mentioned gi ϕ i;C i k i;tot k explicitly in [6,10]—about migration from population gradients [2–10] suggests a simple Fickian law [52] for where ∥·∥ is the Euclidean norm. We can use the set of human transport on long time scales. We propose that the radii of gyration fr ðtÞg from Eq. (7) as a proxy for g flux of clan members is J ∝∇ϕ, so individuals move ergodicity because (by construction) r ðtÞ quantifies how i i g i preferentiallyawayfromhighconcentrationsoftheirclans. widelytheclan-densityanomalyofclanihasspreadacross This implies that ∂c=∂t¼∇·J ∝∇2ϕ (where we have Korea [55]. i i i assumedthattheconstantofproportionalityisindependent We simulate Eq. (3) between the known anomaly of space), which yields the diffusion equation distributions from census data at t1 ¼1985 and t2 ¼ 2000 to estimate a best-fit diffusion constant D for each i ∂ϕ i ¼D∇2ϕ: ð3Þ clan. We compare our results to a null model in which ∂t i i movement is diffusive but driven by the aggregate population density in each region rather than by clan- We thereby identify the constant of proportionality as a population anomaly. Our clan-based diffusion model per- mean diffusion constant D with dimensions ½length2= i formsbetterthanthenullmodelforapproximately84%of time(cid:2). This prediction of diffusion of clan members is the clans. consistentwithexistingtheoriesthatposithumandiffusion (e.g.,cultural[53]anddemic[54]diffusion).Animportant IV. RESULTS distinction is that we are proposing a process of diffusive mixingofclansratherthandiffusiveexpansionofanideaor A. Marriage-flux analysis based on jokbo group.Ifthistheoryiscorrect,thenoneshouldexpectclan- and modern census data densityanomaliestosimplydiffuseovertime.Oneshould We apply a least-squares fit on a doubly logarithmic also be able to estimate diffusion constants by comparing scale to determine the coefficients α and γ from Eq. (1) the spatial variation at two points in time. (along with the proportionality coefficient a , which is Onecangaininsightintotheabovediffusionprocessby G essentiallyanormalizationconstant,forthetotalnumberof calculatingtheradiusofgyration(asecondmoment)ofthe marriages).Theparameterβisirrelevantfortheaggregated clan-density anomaly as a proxy for measuring ergodicity. entriesinasinglejokbobecausem isconstant(andisequal Suppose that clan i’s concentration cðr;tÞ is known on a j i tothetotalnumberofgroomsinthatjokbo).Thestrongest setofdiscreteregionsfS gwithareasfA g.Wedefinethe k k correlationbetweenthegravity-modelfluxandthenumber centroid coordinates for the kth region as ofentriesforeachclaninjokbo1occursforα≈1.0749and 1 X γ≈−0.0349, which suggests that the frequency of mar- rðkÞ¼ r; ð4Þ jS j riagebetweentwofamiliesisproportionaltotheproductof k r∈S k thepopulationsofthetwoclansand,inparticular,thatthere where jS j is the total number of coordinate points r in S islittle orno geographical dependence. Thelikely explan- k k for normalization, and we henceforth use ϕðk;tÞ to ation is that the clan in jokbo 1 is ergodic, so the grooms i indicateϕ½rðkÞ;t(cid:2).Thecentroidoftheclan’sanomalyhas could have been almost anywhere in the country, which i coordinates wouldindeedmakegeographical factorsirrelevant.(Inthe 041009-5 LEE et al. PHYS. REV. X 4, 041009 (2014) TABLE II. Gravity-model parameters α and γ in Eq. (1) henceforthusetheterm“population-productmodel”forthe calculatedfortemporally-dividedentriesofjokbo1byminimiz- gravity model with γ ¼0. For our analysis of other jokbo ing the sum of squared differences using the SCIPY.OPTIMIZE and additional details, see Appendix A (and Tables I package in PYTHON [35]. (We again use initial values of α¼ and II). γ ¼aG ¼1.0 in these computations.) We sort the list of brides Withlittlelossofaccuracyforthefit,wetakeγ ¼0(i.e., according to birth year, (temporally) partition the data such that we use the population-product model)to avoid divergence eachtimewindow(exceptforthelastone)has10001entries,and intherarecasesinwhichabridecomesfromthesameclan indicate the mean and median birth year in each window. as the groom (for which the distance is r ¼0). We also ij Window Year (mean) Year (median) α γ takeα¼1withlittle lossofaccuracy.Usingγ ¼0 allows 1–10001 1739.72 1756 1.0943 −0.1019 ustoincludedatafromtheapproximately22%ofclansfor 10002–20002 1828.51 1829 1.1130 −0.0396 which geographical origin information is not available. In 20003–30003 1865.08 1865 1.1186 −0.0776 Fig. 2, we show the fit for jokbo 1, where we have used 30004–40004 1890.72 1891 1.1277 −0.0272 linear regression to quantify the correlation between the 40005–50005 1910.91 1911 1.0802 0.0209 population-product-model flux and the number of entries 50006–60006 1926.80 1927 1.0463 0.0270 for each clan in the jokbo. The noticeably lower outlier to 60007–70007 1938.99 1939 1.0886 −0.0146 therightofthelineisthedatapointthatcorrespondstothe 70008–80008 1949.64 1950 1.0405 0.0027 clan of jokbo 1, and we remark that this deviation results 80009–90009 1958.01 1958 1.0443 −0.0807 fromaculturaltabooagainstmarryingintoone’sownclan. 90010–100010 1964.90 1965 1.0030 −0.0247 Women from the same clan as the owners of a jokbo have 100011–104356 1971.78 1971 1.0240 −0.1077 traditionallybeenstronglydiscouragedfrommarryingmen listed in the jokbo (it is possible that they were even context of population genetics, this corresponds to “full recorded under false clans in the book), and it was illegal mixing” [56–59].) In other words, as we discussed in until 1997 [60]. For the other jokbo, see Fig. 6 in Sec. IIIA, this special case of the gravity model (for Appendix A. In the bottom panel of Fig. 2, we illustrate which we use γ ¼0 in our analysis) corresponds to thattheradiationmodeldoesnotgiveagoodfittothedata. having geographical independence. Consequently, wewill Recall from our discussion in Sec. IIIA that the lack of FIG.2. Fluxpredictionsfromthepopulation-productmodel(i.e.,thespecialcaseofthegravitymodelwithγ ¼0)withα¼1andthe radiationmodelsforjokbo1.(a)Scatterplotofthenumberofclanentriesinjokbo1versusthecorrespondingcentroidin2000usingthe population-product-modelfluxwithα¼1.Wecomputethelineusingalinearregressiontofindthefittingparametera ≈6.55ð4Þ× G 10−11(witha95%confidenceinterval)tosatisfytheexpressionN ¼a G ,whereG isthepopulation-product-modelfluxofwomen i G ij ij from clan i to clan j and N is the total number of entries from clan i in the jokbo. (b) We compare the same clan entries using the i radiation model. We compute the line using a linear regression to find the fitting parameter a ≈0.049ð2Þ to satisfy the expression R N ¼a R ,whereR istheradiation-modelfluxofwomenfromclanitoclanjandN isthetotalnumberofentriesfromclaniinthe i R ij ij i jokbo.Inbothpanels,wecolorthepointsusingthenumberofadministrativeregionsthatareoccupiedbythecorrespondingclans[see Figs. 3(a) and (b)]. The red markers (outliers) in both panels correspond to the clan of jokbo 1 (i.e., the case i¼j). 041009-6 MATCHMAKER, MATCHMAKER, MAKE ME A MATCH: … PHYS. REV. X 4, 041009 (2014) parameters in the radiation model does not allow us to explicitly consider a geographically independent special casewhenusingit.We emphasize,however,thatthisdoes not imply that the gravity model is “better” than the radiation model, as a direct comparison between the two models is hampered by the ergodicity of clans. In other words, the standard formulations of the gravity and radiation models do not provide a solution for how to estimate fluxes between the clan centroids. Consequently, to investigate population fluxes, we incorporate modern censusdata.Seeourdiscussionsinthenextsubsectionand in Appendix H. B. Ergodicity analysis based on modern census data and a simple diffusion model We use censusdata from theyear2000[38]to examine the ergodicity of clans in three different ways: (1) The number of administrative regions quantifies how “widely” eachclanisdistributed;(2)theradiusofgyration,whichwe calculate from the clan-density anomaly using Eq. (7), FIG. 3. Distribution of the number of different administrative quantifies how “uniformly” each clan is distributed; and regions occupied by clans. (a) Probability distribution of the numberofdifferentadministrativeregionsoccupiedbyaKorean (3)thestandarddeviationofanomalyvaluesmeasureshow clanintheyear2000.(b)Probabilitydistributionofthenumberof much the anomaly varies across regions. For instance, differentadministrativeregionsoccupiedbytheclanofaKorean usingdatafromthe2000censusandconsideringallofthe individual selected uniformly at random in the year 2000. The clansandthe199standardizedregions,wefindthat3.04% differencebetweenthispanelandthepreviousonearisesfromthe oftheclanshaveamemberineveryregionbutthat22.1% fact that clans with larger populations tend to occupy more of the clans have members in ten or fewer regions. administrative regions. [That is, we select a clan uniformly at Weillustratethedichotomyofergodicversusnonergodic random in panel (a), but we select an individual uniformly at clans with the bimodal distribution in Fig. 3(a). However, randominpanel(b).]Notethattherightmostbarhasaheightof from the perspective of individual clan members [see 0.17, but we have truncated it for visual presentation. (c) Prob- abilitydistributionofradiiofgyration(inkm)forclansin2000. Fig. 3(b)], such a dichotomy is not apparent. We show (d)Probabilitydistributionofradiiofgyration(inkm)forclansof theradiiofgyrationthatwecalculatefromthe2000census a Korean individual selected uniformly at random in 2000. The datainFigs.3(c)and3(d).Wecanagainseethebimodality differencebetweenthispanelandthepreviousonearisesfromthe in Fig. 3(c). In Fig. 17 in Appendix I, we illustrate the fact that clans with larger populations tend to occupy more dichotomyforKimfromGimhaeandLeefromHakseong. administrativeregions.Solidcurvesarekerneldensityestimates As we indicate in Table I, all ten of the clans for which (from MATLAB R2011a’s KSDENSITY function with a Gaussian wehavejokboareergodicoratleastreasonablyergodic,so smoothing kernel of width 5). thevariablesassociatedwiththejindices(i.e.,thegrooms) in Eqs. (1) and (2) have already lost much of their 0.7 geographical precision, which is consistent with both γ ¼0 (i.e., with using the population-product model) n o and α¼0. Again, see the scatter plots in Fig. 2, in which uti b we color each clan according to the number of different stri administrative regions that it occupies. Note that the three y di different ergodicity diagnostics are only weakly correlated bilit a with each other (see Fig. 18 in Appendix I). b o Our observations of clan bimodality for Korea contrast pr sharply with our observations for family names in the Czech Republic, where most family names appear to be 0 −5 diffusion constant (km2/year) 20 nonergodic [25] (see Fig. 19in AppendixI). One possible explanationoftheubiquityofergodicKoreannamesisthe FIG. 4. Distribution of estimated diffusion constants (in historical fact that many families from the lower social km2=year) computed using 1985 and 2000 census data and classes adopted (or even purchased) names of noble clans Eq.(3).Thesolidcurveisakerneldensityestimate(fromMATLAB from the upper classes near the end of the Joseon dynasty R2011a’s KSDENSITY function with default smoothing). See (19th–20th centuries) [20,61]. At the time, Korean society AppendixGfordetailsofthecalculationofdiffusionconstants. 041009-7 LEE et al. PHYS. REV. X 4, 041009 (2014) wasveryunstable,andthisprocessmighthave,inessence, (a) (b) 0.7 0.7 introduced a preferential growth of ergodic names. In Fig. 4, we show the distribution of the diffusion constants that we computed by fitting to Eq. (3). Some of c c di di thevaluesarenegative,whichpresumablyarisesfromfinite- o o g g size effects in ergodic clans as well as basic limitations in n er n er o o estimating diffusion constants using only a pair of nearby cti cti a a years. In Fig. 20 in Appendix I, we show the correlations fr fr between the diffusion constants and other measures. 0.3 0 C. Convection in addition to diffusion as another 0 325 0 325 mechanism for migration distance from clan origin distance from clan origin location location to Seoul (km) to current clan centroid (km) Theassumptionthathumanpopulationssimplydiffuseis a grossoversimplification ofreality.Wewill thusconsider FIG. 5. Fraction of ergodic clans and distancescales of clans. the intriguing (but still grossly oversimplified) possibility Forthisfigure,weusethe3900clansfromthe2000censusdata for which we were able to identify the origin location (see of simultaneous diffusive and convective (bulk) transport. Appendix D). (a) Fraction of ergodic clans versus distance to In the past century, a dramatic movement from rural to Seoul. The correlation between the variables is positive and urbanareashascausedSeoul’spopulationtoincreasebya statistically significant. (The Pearson correlation coefficient is factor of more than 50, tremendously outpacing Korea’s r≈0.83,andthep-valueisp≈0.0017.)Forthepurposeofthis population growth as a whole [62]. This suggests the calculation,wecallaclan“ergodic”ifitispresentinatleast150 presence of a strong attractor or “sink” for the bulk flow administrativeregions.Weestimatethisfractionseparatelyineach of population into Seoul, as has been discussed in rural- of11equallysizedbinsforthedisplayedrangeofdistances.The urbanlabor migration studies [28]. The density-equalizing grayregionsgive95%confidenceintervals.(b)Fractionofergodic populationcartogram[63]inFig.21inAppendixIclearly clansversusthedistancebetweenthelocationofclanoriginand demonstrates the rapid growth of Seoul and its surround- the present-day centroid. We measure ergodicity as in the left panel, and we estimate the fraction separately for each range ings between 1970 and 2010. ofbinneddistances.(Weusethesamebinsasintheleftpanel.) Ifconvection(i.e.,bulkflow)directedtowardsSeoulhas Thecorrelationbetweenthevariablesispositiveandsignificantup indeedoccurredthroughoutKoreawhileclansweresimul- to150km(r≈0.94,p≈0.0098)andisnegativeandsignificant taneously diffusing from their points of origin, then one forlargerdistances(r≈−0.98,p≈2.4×10−4). ought to be able to detect a signature of such a flow. In Fig.5(a),weshowwhatwebelieveissuchasignature.We observethatthefractionofergodicclansincreaseswiththe many of the small clans in this calculation because we distance between Seoul and a clan’s place of origin. This cannot estimate the locations of their centroids from our would be unexpected for a purely diffusive system or, data (see Appendix B). indeed,inanyothersimplemodelthatexcludesconvective We assume that clans that have moved a larger distance transport.Byallowingforbulkflow,weexpecttoobserve have also existed for a longer time and hence have under- thataclan’smemberspreferentiallyoccupyterritoryinthe gonediffusionlonger;wethusalsoexpectsuchclanstobe flowpaththatislocatedgeographicallybetweentheclan’s more ergodic. This is consistent with our observations in startingpointandSeoul.ForclansthatstartclosertoSeoul, Fig. 5(b) for distances less than about 150 km, but it is thispathisshort;forthosethatstartfartheraway,thelonger difficult to use the same logic to explain our observations flow path ought to contribute to an increased number of fordistancesgreaterthan150km.However,ifoneassumes occupied administrative regions and hence to a greater that long-distance moves are more likely to arise from aggregate ergodicity. We plot the fraction of ergodic clans convective effects than from diffusive ones, then our versusthedistanceaclanhasmoved(whichweestimateby observations for both short and long distances become calculatingdistancesbetweenclanoriginlocationsandthe understandable. The fraction of moves from bulk-flow corresponding modern clan centroids) in Fig. 5(b). This effects like resettlement or transplantation is larger for also supports our claim that both convective and diffusive long-distance moves, and they become increasingly dom- transporthaveoccurred.Tofurtherexamineclanergodicity, inant as the distance approaches 325 km (roughly the size we also compare each clan’s radius of gyration r to the of the Korean peninsula). We speculate that the clans that g distance from its origin location to (1) Seoul and (2) its moved farther than 150 km are likely to be ones that present-daycentroid(seeFig.22inAppendixI).Thelatter originated in the most remote areas of Korea, or even showsthesamegeneraltendencyasinFig.5.Wespeculate outsideofKorea,andthattheyhaveonlyrelativelyrecently thattheabsenceofstatisticalsignificanceinthecorrelation beentransplantedtomajorKoreanpopulationcenters,from betweenr andthedistancesbetweenclanoriginlocations whichtheyhavehadlittletimetospread.Thisobservation g and Seoul is a sampling issue, as we could not include is necessarily speculative because the age of a clan is not 041009-8 MATCHMAKER, MATCHMAKER, MAKE ME A MATCH: … PHYS. REV. X 4, 041009 (2014) easytodetermine.Thefirstentryinajokbo(seeTableIfor and their consequences for human locations on long time our ten jokbo) could have resulted from the invention of scales (human migration via clan ergodicity). An in- charactersorprintingdevicesratherthanfromthetruebirth teresting further wrinkle would be to compare such of a clan [20]. mobility-derived time scales for human mixing patterns Ultimately,ourdataareinsufficienttodefinitivelyaccept to genetically-derived patterns [56–59]. or reject the hypothesis of human diffusion. However, as From a more general perspective, our research has our analysisdemonstrates, our data areconsistentwith the allowed us to test the idea of using a physical analogy theory of simultaneous human “diffusion” and “convec- formodelinghumanmigration—anideaputforth(butnot tion.”Furthermore,ouranalysissuggeststhatifthehypoth- quantified) as early as the 19th century [1–10]. Physics- esisofpurediffusioniscorrect,thenourestimateddiffusion inspired ideas have been very successful for the study of constants indicate a possible time scale for relaxation to a human mobility, which occurs on shorter time scales than dynamic equilibrium and thus for mixing in human soci- human migration, and we propose that Ravenstein was eties.InmainlandSouthKorea,itwouldtakeapproximately correct when he posited that such ideas are also useful for ð100000 km2Þ=ð1.5 km2=yearÞ≈67000 years for purely human migration. diffusive mixing to produce a well-mixed society. A convectiveprocessthusappearstobeplayingtheimportant ACKNOWLEDGMENTS roleofpromotinghumaninteractionbyacceleratingmixing We thank Hawoong Jeong (정하웅) for providing data in the population. Despite the limitations imposed by our fromKoreanfamilybooksandJosefNovotnýforproviding data,wetrytoestimateandquantifythecentralityofSeoul data on surnames in the Czech Republic. We thank Tim using a network-flow model for population, and we find Evans for introducing us to helpful references and Erik suggestivedifferencesbetweentheflowpatternsofergodic Bollt, Valentin Danchev, Sandra González-Bailón, James and nonergodic clans. For details, see Appendix H. Irish, Philip Kreager, Michael Murphy, and Tommy Murphy for helpful comments and discussions. We thank V. CONCLUSIONS MarcBarthelemyandRichardMorrisfordetailsabouttheir The long history of detailed record-keeping in Korean workonconstructingflownetworks[64],andwethankthe culture provides an unusual opportunity for quantitative anonymous referees for their helpful comments and sug- research on historical human mobility and migration, and gestions. M.A.P. and S.H.L. acknowledge support from our investigation strongly suggests that both “diffusive” the Engineering and Physical Sciences Research Council and “convective” patterns have played important roles (EPSRC) through Grant No. EP/J001759/1. B.J.K. was in establishing the current distribution of clans in Korea. supported by a National Research Foundation of Korea By studying the geographical locations of clan origins (NRF) grant funded by the Korean government in jokbo (Korean family books), we have quantified the (No. 2014R1A2A2A01004919). D.M.A. was supported extent of “ergodicity” of Korean clans as reflected in by Grant No. 220020230 from the James S. McDonnell time series of marriage snapshots. This underscores the Foundation. S.H.L. did the majority of his work at utility of investigating the location distributions of indi- University of Oxford. vidual clans. Additionally, by comparing our results from S.H. Lee and R. Ffrancon contributed equally to Korean clans to those from Czech families, we have also this work. demonstrated that our approach can give insightful indi- cations of different mobility and migration patterns in APPENDIX A: JOKBO DATA different cultures. Our ergodicity analysis using modern census data clearly illustrates that there are both ergodic In our investigation, we examine ten digitized jokbo and nonergodic clans, and we have used these results to that were first studied in Ref. [21]. In Table I in the suggesttwodifferentmechanismsforhumanmigrationon main text, we give basic information about the ten jokbo, long time scales. Many mobility processes involve a and we now summarize the results of some of our balance between diffusivespreading and the attractiveness computations. ofoneormorecentrallocations(andbetweenmoregeneral First, we apply the same gravity-model fit that we used diffusive and convective fluxes), so we believe that our for jokbo 1 to all of the jokbo data, and the results do not approach in the present paper will be valuable for many deviate much from those for jokbo 1. That is, γ≈0 and situations. α≈1,sowe canapply the population-productmodel with Anoteworthyfeatureofouranalysisisthatweusedboth α¼1. The largest deviations in the two parameter values data with high temporal resolution but low spatial reso- are α≈1.4930 (for jokbo 7) and γ≈0.5377 (for jokbo 6). lution(jokbodata)anddatawithhighspatialresolutionbut Interestingly, we could not find any empirical value of low temporal resolution (census data). This allowed us to γ <0.6reportedintheliterature[4,5,45–49],andit seems consider both the patterns of human movement on short tobeextremelyraretoreportanyempiricalvaluesatallfor time scales (mobility via individual marriage processes) gravity-model parameters. As one can see in Fig. 6, the 041009-9 LEE et al. PHYS. REV. X 4, 041009 (2014) FIG. 6. Scatter plots of the number of clan entries in jokbo 2–10 versus the corresponding centroid in 2000 using the population- product-modelfluxwithα¼1.Weshowourresultsinnumerical orderofthejokboinpanels(a)–(i),sojokbo2isinpanel(a),etc. In each panel, we calculate the line using linear regression to determine the fitting parameter a for N ¼a G , where G is G i G ij ij thepopulation-product-modelfluxofwomenfromclanitoclanjandN isthetotalnumberofentriesfromclaniinthegivenjokbo. i The parameter values are (a) a ≈2.36ð1Þ×10−9 [jokbo 2], (b) a ≈6.6ð1Þ×10−11 [jokbo 3], (c) a ≈5.15ð5Þ×10−9 [jokbo 4], G G G (d)a ≈5.15ð5Þ×10−8[jokbo5],(e)a ≈5.8ð1Þ×10−9[jokbo6],(f)a ≈5.1ð2Þ×10−16[jokbo7],(g)a ≈4.25ð5Þ×10−8[jokbo G G G G 8],(h)a ≈1.44ð1Þ×10−9[jokbo9],and(i)a ≈4.71ð8Þ×10−8[jokbo10].Theredmarkersinpanels(a),(c),and(h)correspondto G G theclansofthedepictedjokbo,andN j ¼0foralloftheotherjokbo.Ineachcase,weusea95%confidenceintervalandcolor i i¼j¼ownclan the points according to the number of administrative regions occupied by the corresponding clans. choiceofα¼1andγ ¼0fitsthedatareasonablywellfor examinethenumberofdistinctclansineachjokboversus jokbo 2–10. Note that the suppressed case of a bride and the total number of entries in that jokbo. In Fig. 9, we groombeingfromthesameclanisapparentinFig.6.This showthefractionofentriesineachjokboasafunctionof is indicated by the red markers, which are significantly thebirthyearofthebridesinthatjokbo.Theseplotssuggest below the other points in some of the panels and do not thatjokboofdifferentsizesatdifferenttimestendtofollow exist at all in other panels. We show the radiation-model theaggregatetrendofpopulationchangethroughoutthelast results for jokbo 2–10 in Fig. 7. several hundred years of Korean history. Additionally,wecanseethatalloftheclansinthejokbo datathatwestudy(i.e.,thegrooms’sideofmarriages)are APPENDIX B: CENSUS DATA, POPULATIONS, “ergodic” in the sense that they were widespread across AND NUMBERS OF CLANS the nation in 2000. This is not surprising, as the avail- ability of digitized jokbo data themselves reflects clan Since 1925, the South Korean government has con- popularity.Wepresentthegravity-modelfittingresultsfor ducted a census every five years [38]. The only years in temporally-divided jokbo 1 data in Table II in the main which the populations of different clans were recorded text, and we give results that use clan origin locations separately for different administrative regions were 1985 instead of population centroid in 2000 in Table III. (We and 2000. These data make it possible to estimate dis- also temporally divided the data from jokbo 6—because, tributionstatistics(e.g.,centroidandradiusofgyration)for asweshowedinTableIinthemaintext,ithasthelargestγ eachclan.Allofthedataarepubliclyavailabletodownload valueamongthetenclans—andwefoundthatitdoesnot from Ref. [38]. exhibit systematic changes over time either.) With these The total population reported in the 1985 South Korean calculations, we again find that α≈1and γ≈0 appear to census was 40419647, and clan information is available be reasonable. The general trend of population change in for 40315813 individuals. In the 2000 South Korean Korea is also reflected in the jokbo data. In Fig. 8, we census, a population of 45985289 was reported, and a 041009-10

Description:
Oct 16, 2014 We color the regions of South Korea based on the fraction of the total population . 10 002–20 002. 1828.51 .. (NRF) grant funded by the Korean government. ( No. In Table IV, we present our PYTHON code using Google.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.