ebook img

Uncovering the spatial structure of mobility networks PDF

1.4 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Uncovering the spatial structure of mobility networks

Uncovering the spatial structure of mobility networks Thomas Louail1,2, Maxime Lenormand3, Miguel Picornell4, Oliva Garc´ıa Cantu´4, Ricardo Herranz4, Enrique Frias-Martinez5, Jos´e J. Ramasco3, Marc Barthelemy1,6∗ 1 Institut de Physique Th´eorique, CEA-CNRS (URA 2306), F-91191, Gif-sur-Yvette, France 2G´eographie-Cit´es, CNRS-Paris 1-Paris 7 (UMR 8504), 13 rue du four, FR-75006 Paris, France 3IFISC, Instituto de F´ısica Interdisciplinar y Sistemas Complejos (CSIC-UIB), Campus Universitat de les Illes Balears, E-07122 Palma de Mallorca, Spain 4Nommon Solutions and Technologies, calle Can˜as 8, E-28043 Madrid, Spain 5Telefonica Research, E-28050 Madrid, Spain and 6Centre d’Analyse et de Math´ematique Sociales, EHESS-CNRS (UMR 8557), 5 190-198 avenue de France, FR-75013 Paris, France 1 0 The extraction of a clear and simple footprint of the structure of large, weighted and directed 2 networksisageneralproblemthathasmanyapplications. Animportantexampleisgivenbyorigin- n destinationmatriceswhichcontainthecompleteinformationoncommutingflows,butaredifficultto a analyzeandcompare. Weproposehereaversatilemethodwhichextractsacoarse-grainedsignature J ofmobilitynetworks,undertheformofa2×2matrixthatseparatestheflowsintofourcategories. 1 We apply this method to origin-destination matrices extracted from mobile phone data recorded 2 in thirty-one Spanish cities. We show that these cities essentially differ by their proportion of two typesofflows: integrated(betweenresidentialandemploymenthotspots)andrandomflows,whose ] importanceincreaseswithcitysize. Finallythemethodallowstodeterminecategoriesofnetworks, h and in the mobility case to classify cities according to their commuting structure. p - c o information from large networks and methods such as s community detection [4] and stochastic block modeling . s (see for example [5] and [6, 7]) were recently proposed. c Theincreasingavailabilityofpervasivedatainvarious Boththesemethodsgroupnodesinclustersaccordingto i fieldshasopenedexcitingpossibilitiesofrenewedquanti- s certain criteria and nodes in a given cluster have similar y tative approaches to many phenomena. This is particu- properties (for example, in the stochastic block model- h larlytrueforcitiesandurbansystemsforwhichdifferent ing, nodes in a given group have similar neighborhood). p devices at different scales produce a very large amount [ These methods are very interesting when one wants to of data potentially useful to construct a ‘new science of extract meso-scale information from a network, but are 1 cities’ [1]. unable to construct expressive categories of links and to v A new problem we have to solve is then to extract propose a classification of weighted (directed) networks. 9 6 useful information from these huge datasets. In partic- This is particularly true in the case of commuting net- 2 ular, we are interested in extracting coarse-grained in- worksincities,whereedgesrepresentflowsofindividuals 5 formation and stylized facts that encode the essence of that travel daily from their residential neighborhood to 0 a phenomenon, and that any reasonable model should theirmainactivityarea. Severaltypesoflinkscanbedis- 1. reproduce. Such meso-scale information helps us to un- tinguished in these mobility networks, some constitute 0 derstand the system, to compare different systems, and the backbone of the city by connecting major residen- 5 also to propose models. This issue is particularly strik- tial neighborhoods to employment centers, while other 1 ing in the study of commuting in urban systems. In flows converge from smaller residential areas to impor- v: transportation research and urban planning, individuals tant employment centers, or diverge from major residen- i daily mobility is usually captured in Origin-Destination tialneighborhoodstosmalleractivityareas. Inaddition, X (OD) matrices which contain the flows of individuals go- the spatial properties of these commuting flows are fun- r ing from a point to another (see [2, 3]). An OD matrix damental in cities and a relevant method should be able a thus encapsulates the complete information about indi- to take this aspect into account. viduals flows in a city, at a given spatial scale and for There is an important literature in quantitative geog- a specific purpose. It is a large network, and as such raphy and transportation research that focuses on the doesnotprovideaclear,syntheticandusefulinformation morphologicalcomparisonofcities[8–11]andnotablyon aboutthestructureofthemobilityinthecity. Moregen- multipleaspectsofpolycentrism,rangingfromschematic erally, it is very difficult to extract high-level, synthetic pictures proposed by urban planners and architects [12] to quantitative case studies and contextualized compar- isons of cities [13–15]. So far most comparisons of large ∗Correspondenceandrequestsformaterialsshouldbeaddressedto sets of cities have been based on morphological indica- MB(Email: [email protected]). tors [8, 9]) — built-up areas, residential density, number 2 of sub-centers, etc. — and aggregated mobility indica- residential and the work locations with a large density tors[10,11]—motorizationrate,averagenumberoftrips — the so called ‘hotspots’ (see [16]). The number of res- (cid:80) per day, energy consumption per capita per transport idents of cell i is given by F and its number of j(cid:54)=i ij mode, etc. —, and have focused on the spatial organi- workers is given by (cid:80) F . The hotspots then corre- j(cid:54)=i ji zation of residences and employment centers. But these spondtolocalmaximaofthesequantities. Itisimportant previousstudiesdidnotproposegenericmethodstotake to note that the method is general, and does not depend into account the spatial structure of commuting trips, on how we determine these hotspots. which consist of both an origin and a destination. Such Once we have determined the cells that are the resi- comparisons based on aggregated indicators thus fail to dential and the work hotspots (some cells can possibly give an idea of the morphology of the city in terms of be both), we proceed to the second and main step of the dailycommutingflows. Westillneedsomegenericmeth- method. We reorder the rows and columns of the OD odsthatareexpressiveinaurbancontext,andthatcould matrix in order to separate hotspots from non-hotspots. constitute the quantitative equivalent of the schematic We put the m residential hotspots on the top lines, and pictures of city forms that have been pictured for long do the same for columns by putting the p work hotspots by urban planners [12]. on the left columns. The OD matrix then becomes a Inthispaperweproposeasimpleandversatilemethod 4-quadrants matrix where the flows F are spatially po- ij designedtocomparethestructureoflarge,weightedand sitioned in the matrix with respect to their nature: on directed networks. In the next section we describe this thetoplefttheindividualsthatliveinhotspotsandwork method in detail. The guiding idea is that a simple and in hotspots; at the top right the individuals that live in clear picture can be provided by considering the distri- hotspots and do not work in hotspots; at the bottom bution of flows between different types of nodes. We left individuals that do not live in hotspots but work in then apply the method to commuting (journey to work) hotspots; and finally in the bottom right corner the in- OD matrices of thirty-one cities extracted from a large dividuals that neither live or work in hotspots. For each mobile phone dataset. We discuss the urban spatial pat- quadrant we sum the number of commuters and normal- terns that our method reveals, and we compare these ize it by the total number of commuters in the OD ma- patterns observed in empirical data to those obtained trix, which gives the proportion of individuals in each of withareasonablenullmodelthatgeneratesrandomcom- the four categories of flows. In other words, for a given muting networks. Finally the method allows determin- city, we reduce the OD matrix to a 2×2 matrix ingcategoriesofnetworkswithrespecttotheirstructure, (cid:18) (cid:19) and here to classify cities according to their commuting I D Λ= (1) structure. This classification highlights a clear relation C R between commuting structure and city size. where (cid:88) (cid:88) Results I = F / F ij ij i∈1..m,j∈1..p i,j∈1..n Extracting coarse-grained information from OD is the proportion of Integrated flows that go from resi- matrices dential hotspots to work hotspots; Forthesakeofclarity,wewilluseherethelanguageof (cid:88) (cid:88) OD matrices, but the method could easily be applied to C = F / F ij ij any weighted and directed network from which we want i∈m+1..n,j∈1..p i,j∈1..n to extract high-level information. is the proportion of Convergent flows that go from ran- We assume that for a given city, we have the n×n dom residential places to work hotspots; matrix F where n is the number of spatial units that ij compose the city at the spatial aggregation level consid- (cid:88) (cid:88) ered (for example a grid composed of square cells of size D = F / F ij ij a,seeMethods). ThisODmatrixF representsthenum- ij i∈1..m,j∈p+1..n i,j∈1..n berofindividualslivinginthelocationiandcommuting to the location j where they have their main, regular ac- is the proportion of Divergent flows that go from resi- tivity (work or school for most people). By convention, dential hotspots to random activity places; whencomputingthenumbersofinhabitantsandworkers in each cell we do not consider the diagonal of the OD (cid:88) (cid:88) R= F / F matrix. This means that we omit the individuals who ij ij i∈m+1..n,j∈p+1..n i,j∈1..n live and work in the same cell (considered as ‘immobile’ at this spatial scale). istheproportionof Randomflowsthatoccur‘atrandom’ In order to extract a simple signature of the OD ma- in the city, i.e. that are going from and to places that trix, we proceed in two steps. We first extract both the are not hotspots. 3 is built which can be dictated by administrative units (divisions in wards, counties, municipalities, etc.) or by HOME WORK technical reasons such as the density of antennas in the caseofmobilephonedata. Giventhisvarietyofdatacol- Residential Work lection protocols, it is thus particularly remarkable that hotspots I hotspots when considering the commuting flows at a city scale, various sources of pervasive data provide a very similar mobilityinformationwhencomparedtotheODmatrices builtfromsurveys[24]. Thisresultneedstobeconfirmed for other cities and countries, but it already opens the C D door to a systematic use of pervasive, geolocated data as a relevant substitute to traditional transport surveys. In the following we apply our ICDR method to OD matrices that have been extracted from mobile phone records in thirty-one Spanish urban areas during a five R weeks period (see the Methods section for details on the dataset and the calculation of the OD matrices). Other Other Hotspots As described above, the first step consists in deter- FIG. 1: Illustration of the ICDR method. The mining hotspots. Several possible methods have been method decomposes the commuting flows in the city in proposed in the literature [15, 25, 26], and we use here four categories: the Integrated flows (I) from hotspot to a parameter-free method based on the Lorenz curve of hotspot, the Convergent flows (C) to hotspots, the the densities that we have proposed in a recent study divergent flows (D) originating at hotspots and finally (see [16] and the Methods section). Once we have de- the random flows (R), which are neither starting nor termined both the origin (residential) hotspots and the ending at hotspots. For each city with its destination (work) hotspots in each city, we first observe origin-destination matrix, we can compute the how their number scale with the population size of the importance of each commuting flow category and get a city. Boththesenumbersforresidentialandemployment simple picture of the mobility structure in the city. hotspots scale sublinearly with the population size (see Supplementary Figures 4 and 5). The number of work hotspots grows significantly slower than the number of By construction, we have I,C,D,R ∈ [0;1] and I + residentialhotspots,showingthatresidentialareasare(i) C+D+R=1. ThismatrixΛisthusaverysimplefoot- more dispersed in the city, and (ii) are more numerous print of the OD matrix that gives an expressive picture than activity centers, as intuitively expected (see Sup- of the structure of commuting in the city, as illustrated plementary Figure 6 that displays the locations of Home by Fig. 1. and Work hotspots in four cities that exhibit different spatialorganizations). Wealsonoteherethatthesublin- earscalingoftheworkhotspotsconfirmspreviousresults Comparison of the mobility networks of thirty-one obtained with a totally different dataset (the number of cities employment centers in US cities) [27]. Commuting data ICDR values Largescaleindividualmobilitynetworksarenowadays extracted from pervasive geolocated data, such as mo- We now apply the second part of the method in order bile phone, GPS, public transport cards or social apps tocalculatetheI,C,D andRvaluesforeachODmatrix. data [17–21]. In particular, if an individual’s mobile Forthe31Spanishurbanareasunderstudy(seeSupple- phonegeolocatedactivityisavailableduringasufficiently mentary Figure 1), we obtain the values shown in Fig. 2. long period of time, it is possible — under certain reg- In Fig. 2(a), we plot these values versus the population ularity conditions — to infer the most likely locations size of these cities. For this sample of cities we see that of her home and her workplace, and by aggregation to globally the proportion I of individuals that commute construct OD matrices [22–24]. Several parameters how- from hotspot to hotspot decreases as the population size ever impact the construction of OD matrices such as the increases, while the proportion R of ‘random’ flows in- natureofthedatasource(surveyoruser-generatedgeolo- creases and the proportions C and D of convergent and cated data), or the spatial scale at which the OD matrix divergent flows seem surprisingly constant whatever the 4 city size. InFig. 2(b)weplot the samevaluesbut sorted density ρ of individuals: a cell i is a hotspot if the local by decreasing values of I which shows clearly that the I density of people is such that ρ(i)>ρ∗. In order to test andRvaluesaretherelevantparametersfordistinguish- the impact of the choice of a particular threshold ρ∗ on ing cities from each other. the resulting ICDR values, we measure the sensitivity We also notice that the values obtained for another of these values as a function of the density threshold ρ∗ spatial scale of data aggregation confirm this trend (see (see Supplementary Figure 8). As expected, the lower Supplementary Figure 7). The decay of I flows (’inte- the density threshold, the larger the number of origin grated’) flows in favor of R flows (’random’) when P in- and destination hotspots, and consequently the larger I creasesshowsthatthepopulationgrowthamongSpanish and the smaller R. In contrast, changing the density citiesgoeswithadecentralizationofbothactivityplaces threshold has little impact on the C and D terms. More and residences. As cities get bigger, their numbers of importantly, the conclusions drawn from the compari- residential and employment hotspots grow (sublinearly), son of the ICDR values across cities remain the same: butthesehotspotscatchasmallerpartofthecommuting whatever the density threshold ρ∗ chosen to define resi- flows. dential/employment hotspots, we still observe the same qualitative behavior: a decay of integrated flows (I) in favor of ‘random’ flows (R), when the population size A null model increases. In order to evaluate to what extent the ICDR sig- natures of cities are characteristic of their commuting Distance of each type of flows structure, we compare these values to the ones returned by a null model of commuting flows. For each city we We now want to characterize spatially these different generate random OD matrices of the same size than the flowsandtherelationbetweencitysizeandthecommut- reference OD matrix but with random flows of individ- ing distances traveled by individuals. In each city we uals that preserve the in- and out- degree of each node computetheaveragedistancetraveledbyindividualsper (see Supplementary Note 5). Fig 2(c) shows the average type of flows I,C,D and R. The resulting average dis- values and standard deviation obtained for 100 replica- tances measured in data are plotted in red on Fig. 3(a). tions. On Fig 2(d) we plot the Z-scores of the I, C, Weobservethattheaveragedistanceforallcategoriesof D, and R values of each city when compared to the val- flowsincreaseswithpopulationsize,anexpectedeffectas ues I∗, C∗, D∗ and R∗ returned by our null model (e.g. the city’s area also grows with population size. We also for the quantity R of the city i, the Z-score is given by observe that the average distance of convergent flows C Z(Ri)=(Ri−Ri∗)/σ(Ri∗)). Essentially, we observe that increases faster than for other types of flows (we note theZ-scoresofI andRarepositiveandlarge,whilethose that the average distance associated to convergent flows of C and D are negative (and large in absolute value). increases more than the distance associated to divergent Alsoascitiesgrow,theZ-scoresofI andRincreasewhile flows D, showing that these flows are not symmetric as those of C and D decrease. These results demonstrate onecouldhavenaivelyexpected). Thisresultmeansthat that the larger a city, and the less random it appears. for this set of Spanish cities, commuters from small resi- This is in contrast with the naive expectation that the dentialareastoimportantactivitycenterstravelonaver- larger a city and the more disordered is the structure of agealongerdistancethanallotherindividuals. Thisob- individuals’ mobility. For large cities, there seems to be servationcouldbeanindicationthatforoursetofcities, a commuting backbone which cannot result from purely residentialareashaveexpandedwhileactivitycentersre- random movements of individuals. This backbone is the mained at their location, leading to longer commuting footprintofthecity’sstructureandhistory,andprobably distances (see Supplementary Figures 10 and 11 when results from strong constraints and efficiency considera- considering various spatial scales of aggregation). tions. Another interesting information is provided by the comparison of distances measured in the data with av- erage distances measured from the random OD matrices Robustness generatedbythenullmodel. Theaveragedistancesasso- ciated to the null model are plotted in blue in Fig. 3(a). Sinceourmethodfirstrequirestodetermineoriginand We see that for all types of flows the distances measured destination hotspots, one could argue that the interpre- in the empirical data are shorter than those generated tation of the I,C,D and R values will crucially depend by the null model. This is another clear indication of ontheparticularmethodchosentodefinethesehotspots. the spatial organization of individual flows in cities. It Theidentificationofhotspotsisaproblemthathasbeen also highlights the importance of the travel time budget broadly discussed in urban economics (see Supplemen- in the residential locations choice. Remarkably enough, tary Note 3). Roughly speaking, starting from a spatial the distance of convergent flows (C) is both the largest distribution of densities, the goal is to identify the local and the one that increases the fastest with population, maxima and amounts to choose a threshold ρ∗ for the indicating a low degree of efficiency. 5 A B 0.5 Flows I C D R 0.5 Flows I C D R 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 106 1 10 20 30 C P D rank 0.5 200 0.4 100 e or 0.3 c s Z− 0 0.2 −100 0.1 106 106 P P FIG. 2: Results for 31 Spanish cities. (a) I (integrated), C (convergent), D (divergent) and R (random) values versus population size for 31 Spanish urban areas. (b) Same ICDR values as in (a) but sorted by decreasing order of I (note that by definition, we have for each city I+C+D+R=1). It is remarkable that I and R dominate and seem almost sufficient to distinguish cities, while C and D are almost constant whatever the city size (see Supplementary Figure 7 for the values obtained with another size a of grid cells). (c) I,C,D,R average values and standard deviations obtained for 100 replications of a null model, where the inflow and outflow at each node are kept constant while flows are randomly distributed at random between nodes. (d) Z-scores obtained by comparing the empirical data and the values returned by the null model. Large values of Z-scores show that the actual commuting networks cannot be considered as resulting from connecting the nodes at random. The I, C, D and R values of a specific city are then a signature of its structure. The comparison of this behavior with the null model spatial structure of these R flows. leads to interesting results. On Fig. 3(b) we plot the We also consider the fraction of total commuting dis- ratio D /D for the four types of flow. Values Null Data tancebytypeofflow(Fig.4). Weseethatforeachtypeof lower than 1 indicate that the average commuting dis- flows, their respective fraction is constant and indepen- tance generated by the null model is shorter than the dent of city size. With the LouBar hotspots detection distance observed in the city. Surprisingly, we observe method [16] (see Supplementary Figure 3) and with a that small cities display a value less than one indicating grid of 1km2 square cells, we measure that roughly 40% the lesser importance of space at this short scale. We ofthetotalcommutingdistanceismadeonrandomflows also see that this ratio increases faster for random flows whiletheothertypesrepresenteachabout20%. Thisre- (R)thanfortheothers(D,C,I),suggestingaremarkable sultshowsthatthemethodisabletoidentifywheremost 6 a All I C D R 30 Data NullModel )20 m k ( D 10 106 106 106 106 106 P b I C D R 4 3 a at d D Null2 D 1 106 106 106 106 P FIG. 3: Distance per type of flow and population size. (a) Average commuting distance vs. population size, per type of flows; (b) ratio D /D per type of flows. The LouBar criteria [16] is used here to define residential Null Data and employment hotspots. of the commuting distance is traveled. In particular, we with the average population of the cities composing the see that the natural, obvious flows (I) from residential cluster. Remarkably these summary statistics show that centers to activity centers are not the most important largestcitiesareclusteredtogetherandarecharacterized ones, and that the decentralization of commuting flows by a larger proportion of ‘random’ flows (R) of individu- seems to be the rule for the Spanish cities in our sample. als both living and working in parts of the city that are not the dominant residential and activity centers. This can be interpreted as an increased facility in bigger ur- Classification of cities ban areas to commute from any part of the city to any other part. Further studies on other cities and countries are needed at this stage in order to discuss the relevance Finally, the ICDR signature of their OD matrix al- of the proposed classification. lows to cluster cities with respect to the structure of their commuting patterns. We measure the euclidian It is also important to test the robustness of this clas- distance between the cities’ ICDR signatures and we sification and we show that introducing a reasonable thenperformahierarchicalclusteranalysis. Fig.5shows amount of noise in the OD matrices does not change the dendrogram resulting from the classification. Four the classification (see Methods and Supplementary Fig- well-separatedclustersareidentifiedonthisdendrogram, ure12). Thissensitivitytestconfirmsthattheclustering and Table I gives the average value of each term along is robust against possible errors in the data source and 7 I C D R 0.5 ) R 0.4 , D , C a= , I Dtot0.3 1k ˛Fij m ( ot Dt 0.2 0.1 106 106 106 106 P FIG. 4: Total commuting distance. Contributions to the total commuting distance of each type of flows vs. population size. We observe that the variations are small and that the largest contribution comes from the R flows. in the extraction of the mobility networks. The classifi- 15 Coruna cation of cities based on their ICDR values is also rea- 6 Zaragoza sonnablyrobusttoachangeofthemethodusedtodefine 22 Santander 26 Elche residentialandemploymenthotspots(seeSupplementary 7 Malaga 28 Almeria Figure 13). 10 SantaCruz 29 Leon 16 Vigo 17 Pamplona 14 Valladolid Discussion 13 Alicante 24 Fuenjirola 20 Donostia 11 Palmas Wehaveproposedamethodtoextracthigh-levelinfor- 31 Salamanca 21 Gijon mation from large weighted and directed networks, such 18 Cordoba asorigin-destinationmatrices. Thismethodreliesonthe 30 Jerez 25 Vitoria identificationoforiginanddestinationhotspots,andthis 8 Murcia first step can be performed with any reasonable method. 2 Barcelona 19 Oviedo The important second step consists in aggregating flows 5 Bilbao 1 Madrid infourdifferenttypes,dependingwhethertheystartand 27 Cartagena end from/to a hotspot or not. 9 Palma 12 Granada We have applied this method to commuting networks 4 Sevilla 3 Valencia extractedfrommobilephonedataavailableinthirty-one 23 Castellon Spanish cities. The method has allowed us to highlight several remarkable patterns in the data: 0.3 0.2 0.1 0.0 FIG. 5: Classification of cities. Dendrogram • Independently of the density threshold chosen to resulting from the hierarchical clustering on cities based determine hotspots, the proportion of integrated on their ICDR values. In front of each city name we flows(I)decreaseswithcitysize,whilethepropor- indicate its rank in the hierarchy of population sizes. tion of random flows (R) increases; The largest cities are clustered together. As cities get bigger, the ‘random’ component (R) of their commuting • On average and for all cities considered here, in- flows increases, which signals that it is easier to dividuals that live in residential main hubs and commute from any place to any other in large cities. thatworkinemploymentmainhubs(I flows)travel shorter distances than the others (C,D,R flows); • When the city size increases, the largest impact • The classification of cities based on the ICDR val- is on convergent flows (C) of individuals living in uesleadstogroupswithconsistentpopulationsize, smaller residential areas (typically in the suburbs) highlightingaclearrelationshipbetweenthepopu- and commuting to important employment centers; lation size of cities and their commuting structure. 8 Cluster Cities P¯ I¯ R¯ D¯ C¯ Orange Salamanca, Gijon, Cordoba, ... 255,330 0.43 0.27 0.16 0.14 Dark blue La Corun˜a, Zaragoza, Santander, Elche, ... 392,970 0.37 0.36 0.15 0.13 Green Cartagena, Palma, Granada, ... 732,992 0.31 0.41 0.16 0.13 Light blue Murcia, Barcelona, Bilbao, Madrid, ... 2,463,551 0.25 0.46 0.17 0.12 TABLE I: Classification of cities. Average ICDR values and average population sizes of the cities composing each of the four clusters represented on Fig. 5. As the population grows the proportion of Random flows increases while the proportion of Integrated decreases. The weights of Convergent and Divergent flows stay constant among the groups. In addition, the comparison with a null model led to sending or reception. The number of anonymized users interesting conclusions. Flows in cities display a high represents on average 2% of the total population and at levelofspatialorganizationandasthepopulationsizeof most 5% of the total population. These percentages are the city grows, the increase of the Z-scores of I,C,D,R almost the same for all the urban areas. From the CDR showsthatthestructureofthemobilityismoreandmore data obtained for 20 weekdays (from mondays to thrus- specific, and far from a random organization. We note days only), we extracted home and work places for all thataninterestingdirectionforfutureresearchwouldbe the anonymized mobile phone users in the dataset. The to find some analytical arguments using simple models outputofthisprocessingphaseisanODcommutingma- of city organization for estimating these flows, and how trix for each urban area, at the scale of the BTS point they vary with population size. pattern. In order to facilitate the calculations and the Ourcoarse-grainingmethodprovidesalargescalepic- comparisonoftheresultsbetweendifferentcities,theOD tureofindividualflowsinthecity. Inthisrespectitcould matricesarethentransposedonregularsquarecellsgrids be particularly useful for validating synthetic results of of varying size a (see Supplementary Note 2). urban mobility models (such as [28] for example), and forcomparingdifferentmodels. Anaccuratemodelingof mobility is indeed crucial in a large number of applica- Extraction of OD matrices from mobile phone data tions,includingtheimportantcaseofepidemicspreading whichneedstobebetterunderstood,inparticularatthe In order to extract OD matrices from phone calls, we intra-urban level [29, 30]. select a subset of users with a mobility displaying a suf- It would also be interesting to apply the proposed ficient level of statistical regularity. For this analysis we method to other mobility datasets, with different time considered commuting patterns during workdays only. andspatialscales,inordertotesttherobustnessofthese The users’ Home and Work locations are identified as commutingpatterns. Anotherdirectionforfuturestudies theVoronoicellswhicharethemostfrequentlyvisitedon could be to inspect the time evolution of the I,C,D and weekdays by each user between 8 pm and 7 am (Home) R values for datasets describing travel to work journeys and between 9 am and 5 pm (Work). We assume that over several decades (such as national travel surveys). there must be a daily travel between the home and work Themethodcouldalsobeappliedatlargerspatialscales, locations of each individual. Users with a call activity for example to capture dominant effects in international largerthan40%ofthedaysunderstudyathomeorwork migration flows. More generally, we believe that an im- are considered as valid. We then aggregate the complete portant feature of the ICDR method is its versatility, as flowofusersandconstructtheODmatrixwiththeflows it could be applied to any type of data that is naturally betweenaVoronoicellclassifiedashomeandanothercell represented by a weighted and directed network. classifiedasaworkplace. SincetheVoronoiareasdonot exactly match the grid cells, we use a transition matrix to change the spatial scale of the OD matrix, that is to Methods transform the F values of the ODmatrix where i and j ij are Voronoi cells into F(cid:48) values where i(cid:48) and j(cid:48) are the i(cid:48)j(cid:48) Data cells of a regular grid (see Supplementary Note 2, and Supplementary Figure 2 for an example of the partition Thedatasetusedforouranalysiscomprises55daysof an urban area in BTS Voronoi cells). aggregatedandanonymizedrecordsfor31urbanareasof more than 200,000 inhabitants. No individual informa- tionorrecordswereavailableforthisstudy. Therecords Spatial scale of the OD matrix included the set of Base Transceiver Stations (communi- cation antennas, BTS) used for the communications as TheODmatrixisthestandardobjectinmobilitystud- presented in the Call Detail Records (CDR). A CDR is ies and transport planning [2] and contains information producedforeachactivephoneevent, includingcall/sms about movement of individuals in a given area. More 9 precisely, an OD matrix is a n∗m matrix where n is the k groups and the classification in k groups obtained for number of different ‘Origin’ zones, m is the number of the noisy OD matrices. The Jaccard index measures the ‘Destination’zonesandF isthenumberofpeoplecom- similarity of two partitions P and P(cid:48) of same size: ij muting from place i to place j during a given period of a time. In transport surveys the size of the OD matrix de- JI = a+b+c (2) pendsonthespatialscaleatwhichthemobilitydatahas been collected. Traditionally the zones that are used to where partitionthecityaretheadministrativeunits,whosesize • a: numberofcitypairsthatareinthesamegroup canvaryfromcensusandelectoralunitstowholedepart- for both P and P(cid:48) ments or states, depending on the purpose for building the OD matrix. • b: numberofcitypairsthatareindifferentgroups In this study we applied our ICDR method to cities in P but in the same one in P(cid:48) divided in square cells that are smaller than administra- tive units, allowingfor abetter spatialresolution. In the • c : number of city pairs that are in the same group case of OD matrices extracted from CDR mobile phone in P but not in P(cid:48) (or conversely) data, the maximal resolution corresponds to the BTS TheJaccardindexJ isintherange[0;1]andtheclosest I (antennas) point pattern. The ICDR method proposed it is to 1, the larger the similarity between P and P(cid:48). in this paper does however not depend on a particular Wegenerate100noisymatricesforeachvalueoff and spatial scale and can be applied on OD matrices avail- computetheaveragevalueJ¯ oftheJaccardindex. This I able at coarser spatial resolutions as well. For a given averagevalueencodesthedistancebetweenthereference territory the results obtained with the ICDR method - partition P of cities in k groups and the partitions of the I,C,D and R values in the first place - will obviously cities in k groups obtained for the noisy OD matrices. depend on the spatial resolution and will also depend on Supplementary Figure 12 shows the values of J¯ versus I the method used to define hotspots. It is important to the proportion f of reshuffled individuals, for different note that when the ICDR method is used for compar- number of groups k. The red shaded rectangle on each ing cities, the spatial resolution and the hotspots iden- panel corresponds to the mean value +/- the standard tification method should be the same for all cities (see deviationobtainedfor1,000replicationsofanullmodel, Supplementary Notes 4 and 6, and Supplementary Fig- in which permutations of cities among the k groups are ures 8 and 9 for results obtained with another hotspots randomly performed. We observe here that up to 20% delimitationmethod,andSupplementaryFigure7forre- ofreshuffledindividuals,theaveragevalueJ¯ obtainedis I sults obtained when considering another spatial scale of always significantly larger than the one obtained for the aggregation). null model, indicating that the classification is robust even for important values of the noise. Robustness of the classification of cities Acknowledgments Inordertoensurethattheclassificationofcitiesbased on their ICDR matrices is robust, we introduce a noise We thank the anonymous referees for interesting in the flows Fij (for all the thirty-one cities). We focus and constructive comments. The research leading to on the case where the workplace of an individual can be these results has received funding from the European modified and where the number of individuals living in Union Seventh Framework Programme FP7/2007-2013 each cell i is kept constant. More precisely, the noise is under grant agreement 318367 (EUNOIA project). ML introduced as follows: and JJR received partial financial support from the Spanish Ministry of Economy (MINECO) and FEDER 1. Wepickupauniformrandompositiveintegerg,the (EU) under projects MODASS (FIS2011-24785) and IN- number of individuals whose workplace is reshuf- TENSE@COSYP(FIS2012-30634). TheworkofMLhas (cid:80) fled. This number g varies from 1 to N = F ij been funded under the PD/004/2013 project, from the the total number of commuters in the city. Conselleria de Educacin, Cultura y Universidades of the Government of the Balearic Islands and from the Euro- 2. Werepeatg timesthefollowingoperation: wepick pean Social Fund through the Balearic Islands ESF op- up randomly a residence and a workplace (a cou- erational program for 2013-2017. ple of values (i,j)) and move one individual from her workplace j to put another randomly chosen workplace j(cid:48): F →F −1 and F →F +1 ij ij ij(cid:48) ij(cid:48) Author contributions The parameter of this workplace reshuffling is then f = g/N. In order to evaluate how much the classification of T.L. designed the study, processed and analyzed the cities is affected by this noise, we compute the Jaccard dataandwrotethemanuscript; M.L.processedandana- index J between the reference classification of cities in lyzedthedata;O.G.CandM.P.processedthedata;R.H. I 10 and J.J.R. coordinated the study; E.F.-M. obtained and Competingfinancialinterests:Theauthorsdeclare processed the data; M.B. coordinated and designed the no competing financial interests. study, andwrotethemanuscript. Allauthorsread, com- mentedandapprovedthefinalversionofthemanuscript. Additional information Supplementary Information accompanies this pa- per. [1] Batty, M. The New Science of Cities, MIT Press (2013). man urban mobility. Plos ONE 7:e37027 (2012). [2] Ortuzar, J.D. & Willumsen, L.G. Modelling transport [20] Wu,L.,Zhi,Y.,Sui,Z.&Liu,Y.Intra-urbanhumanmo- (1994). bilityandactivitytransition: evidencefromsocialmedia [3] Weiner,E.UrbanTransportationPlanningintheUnited check-in data.Plos ONE 9(5)e97010 (2014) States. An Historical Overview (1986). [21] Zhong, C., Arisona, S.M., Huang, X., Batty, M. & [4] Fortunato, S. Community detection in graphs. Physics Schmitt, G. Detecting the dynamics of urban structure Reports 486(3-5), 75–174 (2010). through spatial network analysis. Int J Geo Inf Sc 28, [5] Faust, F. & Wasserman, S. Social Network Analysis, 2178–2199 (2014). Cambridge University Press (1992). [22] Kung K.S., Sobolevsky, S. & Ratti, C. Exploring uni- [6] Karrer,B.&Newman,M.E.Stochasticblockmodelsand versal patterns in human home/work commuting from community structure in networks. Physical Review E, mobile phone data. Plos ONE 9(6) (2014). 83(1), 016107 (2011). [23] Jiang, S. et al. A Review of Urban Computing [7] Aicher, C., Jacobs, A.Z. & Clauset, A. Learning La- for Mobile Phone Traces:Current Methods, Challenges tent Block Structure in Weighted Networks. Preprint at and Opportunities. Proceedings of the 2nd ACM http://arXiv.org/abs/1404.0431 (2014). SIGKDDInternationalWorkshoponUrbanComputing, [8] Tsai, Y.H. Quantifying urban form: compactness versus doi:10.1145/2505821.2505828 (2013) sprawl. Urban Stud. 42, 141–161 (2005). [24] Lenormand, M. et al. Cross-checking different sources [9] Gu´erois, M. & Pumain, D. Built-up encroachment of mobility information. PLoS ONE 9(8): e105184. and the urban field: a comparison of forty european doi:10.1371/journal.pone.0105184 (2014). cities.Environ. Plann. A 40, 2186–2203 (2008). [25] McMillen D.P. Nonparametric employment subcenter [10] Schwarz, N. Urban form revisited - selecting indica- identification. J. Urban Econ. 50, 448–473 (2001). tors for characterising european cities. Landscape Urban [26] Griffith, D.A. Modelling urban population density in a Plan. 96, 29–47 (2010). multi-centered city. J. Urban Econ. 9, 298–310 (1981). [11] LeN´echet,F.Urbanspatialstructure,dailymobilityand [27] Louf, R. & Barthelemy, M. Modeling the polycentric energy consumption: a study of 34 european cities. Cy- transition of cities. Phys. Rev. Lett. 111, 198702 (2013). bergeo 580 (2012). [28] Eubank,S.etal.Modellingdiseaseoutbreaksinrealistic [12] Bertaud, A. & Malpezzi, S. The spatial distribution of urban social networks. Nature 429, 180-184 (2004) populationin48worldcities: implicationsforeconomies [29] Balcan, D. et al. Multiscale mobility networks and the in transition. World Bank Report (2003). spatial spreading of infectious diseases. PNAS 106, [13] Salomon, I., Bovy, P., & Orfeuil, J.P. A billion trips a 21484–21489 (2009). day: tradition and transition in European travel pat- [30] Dalziel, B.D., Pourbohloul, B. & Ellner, S.P. Human terns. Kluwer Academic Publishers (1993). mobility patterns predict divergent epidemic dynamics [14] Cattan,N.(Ed.)CitiesandnetworksinEurope: acritical among cities. Proc R Soc B 280:20130763 (2013). approachofpolycentrism.JohnLibbeyEurotext(2007). [31] Molloy,M.&Reed,B.Acriticalpointforrandomgraphs [15] Berroir, S., Mathian, H., Saint-Julien, T. & Sanders, with a given degree sequence. Random structures & al- L. [The role of mobility in the building of metropolitan gorithms, 6(23), 161–180 (1995). polycentrism],Modellingurbandynamics[Desrosiers,F.& Th´eriault, M. (eds)][1–25](ISTE-Wiley) (2011). [16] Louail, T. et al. From mobile phone data to the spatial structure of cities. Scientific reports 4:5276 (2014). [17] Schneider,C.M.,Belik,V.,Couronn´e,T.,Smoreda,Z.& Gonza´lez,M.C.Unravellingdailyhumanmobilitymotifs. J R Soc Interface 10:20130246 (2013). [18] Roth,C.,Kang,S.M.,Batty,M.&Barthelemy,M.Struc- tureofurbanmovements: polycentricactivityandentan- gled hierarchical flows. Plos ONE 6:e15923 (2011). [19] Noulas,A.,Scellato,S.,Lambiotte,R.,Pontil,M.&Mas- colo, C. A tale of many cities: universal patterns in hu-

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.