Size-Dependency of Income Distributions and Its Implications Jiang Zhang∗ and You-Gui Wang† Department of Systems Science, School of Management, Beijing Normal University (Dated: January 26, 2011) This paper highlights the size-dependency of income distributions, i.e. the income distribution versus the population of a country systematically. By using the generalized Lotka-Volterra model to fit the empirical income data in the United States during 1996-2007, we found an important parameter λ can scale with a β power of the size (population) of U.S. in that year. We pointed out that the size-dependency of the income distributions, which is a very important property but seldom addressed by previous studies, has two non-trivial implications: (1) the allometric growth pattern, i.e. the power law relationship between population and GDP in different years, which can 1 bemathematically derivedfrom thesize-dependentincomedistributionsand alsosupportedbythe 1 empirical data; (2) the connection with the anomalous scaling for the probability density function 0 in critical phenomena since the re-scaled form of the income distributions has the exactly same 2 mathematical expression for the limit distribution of thesum of many correlated random variables n asymptotically. a J PACSnumbers: 89.65.Gh,89.75.Da,89.75.Kd Keywords: IncomeDistribution,Size-Dependency, AllometricGrowth,AnomalousScaling 5 2 I. INTRODUCTION tion as a power law with exponent β in different years. ] N So the size-dependency of income distribution is implicit G The power law distribution of incomes in a nation is revealed by this power law relationship. one of the most important universal patterns found in In Section III, we also pointed out that the size- . n economic systems due to the seminal work of Pareto[1]. dependency of income distributions actually implies the fi Itissuitablefornotonlyincomesandwealthindifferent power law relationship between population and GDP, - countries and different years [2–5] but also other com- i.e. the allometric growth(scaling) phenomenon which q [ plex systems e.g. languages[6] and complex networks[7]. is also found in various complex systems such as ecolog- Although this statistical law is supported by many em- ical systems[17–20], cities[21–23] and countries[24, 25]. 3 pirical data[8] and theoretical works[9, 10], it can only And we have also tested this relationship by the em- v describe the distribution in high incomes. Some recent pirical data. Some studies in family names[13] and 9 7 studies have shown that the distribution for the great languages[26, 27] have linked the patterns of power law 2 majority of population can be described by an exponen- distributions and power law relationships of two vari- 2 tialfunctionwhichisverydifferentfromthepowerlawin ables. However, in this study, we argued that the ex- . the high incomes[5, 11]. Silva and Yakovenko [5] defined ponentofthepowerlawrelationshipbetweenpopulation 2 1 these different income intervals as thermal and super- and GDP doesn’t depend on the Pareto exponent of in- 0 thermal regions whose dynamics may follow very differ- come distribution but the size-dependency exponent. 1 ent rules. Furthermore, the re-scaled form of income distribu- v: A recentpaper ofour groupdiscussedhow the income tion curves can be re-expressed as a generalized math- i distribution curves in China change with time[12], so a ematical form in Section IV. This formula actually has X problem arise that does the distribution curves change been found to describe the anomalous scaling of prob- r with the system size? As we know, some early stud- ability density function in critical phenomena, e.g. spin a ies in family names have pointed out that the distri- systems[28,29],wherethere-scaleddistributionformcan butions can change with the size of the system[13–15]. be treated as a limit distribution of the sum of a large This size-dependency of distributions is also found in number of correlated random variables[28, 29]. There- languages[16]. In this paper, we try to propose that the fore,the size-dependency of income distributionalso im- incomedistributionsalsohavethissize-dependencyprop- plies the connection with the central limit theorem of erty which means that the distribution curves change correlated variables. with the system size (the population) systematically. In Section II, we used the revised form of the generalized II. SIZE-DEPENDENT INCOME Lotka Volterra model to fit the empirical income data of DISTRIBUTIONS the United States during 1996-2007. In this formula, we inserted a scale factor λ which changes with the popula- The personal-income distribution data in the United States during 1996-2007 is available. This data is com- piledbythe InternalRevenueService(IRS)fromthe tax ∗ [email protected];http://www.swarmagents.cn/jake returns in the USA for the period 1996-2007(presently † [email protected] the latest available year [30]). The original data gives 2 the percentage in given income intervals. We can plot titdunhianocetanoadcmuijwnuemgesdturweealdadtiltailefivonuhertseserdtehiaisneirtseretijhffbuneeusoctttftiotsoelhnlxooescfwlniiuniæàøóõí©§oœñn·ondmflFgeaidipgtn.iauaorrlnTtei.shn1iec.sroNeamfloosetroiesc,noeotttmhhheeaintGia,nlDtflhoaPer- obability 00.0.111øóõí©§ñ 19øóõí©§ñ96øóõí©§ñøóõí©§ñæàøóõí©§oœñ·©øóõí©§ñøóõí©§ñøóõí©§ñøóõí©§ñæàøóõí©§oœñ·øóõí©§ñøóõí©Λ§ñøóõí©§ñ=øóõí©§ñæàøóõí©§oœñ2·øóõí©§ñøóõí©§.ñøóõí©§6ñøóõí©§ñæàøóõí©§7oœñ·´æàøóõí©§oœñ1·0æàøóõí©§oœñ·-5æàøóõí©§oœñ·æàøóõí©§oœñ· æàøóõí©§oœñ·æàøóõí©§oœñ· æàøóõí©§oœñ·©§øíõóæñàœ·o00.0.000.11110.001Re0s.0c©§øíõaó1ñl©§øíõóeñ©§øíõód©ñ§ø0íõó©§øñíõ©ó§æøñíàõ©ó.§øI휩ñõ·ó§øoí©õñ1§øó©íõ§øñó©níõ§ø©óñí§æøõ©óñíà§øõ©óí§ñø©õóí§øœ©ñõ·óí§ø©oõñ§íø©ócõñ§©æøíóõ©§øñíàóõ§øíñóõíñóõœóñ·©æño§øñàíõo©ó§æøœ·ñíàoõó朩ñà·§øoíõœóm·©æo§øñàíõóæœ1·ñào©§øœ·íoõóæ©ñà§øeíõœó·æoñàœ·o©b§øíõóæñàœ·oy1©§øíõóæñàPœ·0o©§øíõóæñ-àœ·oõóæñàõœó·æΒoñàœ·o1õóæ0ñàœ·o0õó ebiiznxuepAtd[i5oso]nLnpeotnootintiknifiaath-tleVidgftoohhlortemuesinrtercibadonymamlt[oea5osw]d.,seiitnlnHh[c9ceoe,owmd3tehi1esv]esteririgan,bensuwntdeteeiaropdaunolsiwozecfeeurdtrthvhlLeaeeVswgmeemdxneihteoshirtdbaoreiildt--l CumulativePr 01.000-14 22222222111000000009990000000099976543210987 æàøóõí§oœñ· ΛΛΛΛΛΛΛΛΛΛΛ===========11112221212...........4568010828473327657188´´´´´´´´´´´1111111111100000000000-----------55555555555 æàøóõí©§oœñ· æàøóõí©§oœñ· æàóõoœñ·æàóõoœñ· æàóõoœñ· æàóõñoœ· only needs two free parameters. We assume the density curves of income distributions in different years follow 1000 104 105 106 107 the equation, Income (α−1)αexp(−α−1) f(x)=λ λx , (1) Γ(α) (λx)1+α FIG. 1. Income Distributions of U.S. in the period of 1996- 2007. where, α and λ are parameters needed to be estimated. Γ()istheEulergammafunction. Notethat,intheorigi- 2.6 nalformofgeneralized-LVmodel[31],thereisnofactorλ since the mainpurpose of that paper is to give anexpla- 2.4 nationofthe shape ofthe incomedistribution. However, 2.2 we must insert this factor in Equation1 because we care not only the shape of the income distribution curves but -5L0 2. 1 alsoits dependencyonsize(the populationofacountry) HΛ´ indifferentyears. Andthissize-dependencypropertycan 1.8 only be reflected by λ. In addition, α is the Pareto’s ex- ponent in high incomes regime because Equation 1 has 1.6 Λ~P-Β,Β=4.365,R2=0.740 a truncated power law form. Nevertheless, we will use the cumulative distributionfunction insteadofEquation 2.7 2.75 2.8 2.85 2.9 2.95 3. 1 directly to reduce the estimated errors. PopulationH´108L +∞ Γ(α, α−1) C(x)= f(x)dx=1− λx , (2) Z Γ(α) FIG.2. ThePowerLawRelationshipbetweenthePopulation x and λ during 1996-2007 where the function C(x) is the probability that a person whoseincome islargerthanx. Twosteps fitting method isusedinthispaper. Atfirst,weapplyEquation2tothe these years. Therefore, λ actuallyis the function ofpop- empirical data year by year and obtain the best estima- ulation at a given year. This functional relationship can tionsofparametersαandλ. Here,the “best”meansthe be presented by a power law relationship between pop- total distances between empirical data and theoretical ulation and λ as Figure 2 shows. From this figure, we curves on the log-log coordinate are minimized. The αs observe an apparent trend that λ decreases with popu- derived by the first step are (1.59043, 1.60112, 1.61717, lation. This trend can be approximated by a power law 1.67562, 1.75147, 1.76187, 1.71051, 1.61152, 1.80999, relationship between λ and population. 1.95683, 1.87374, 1.92885), they fluctuate around the mean value 1.74076. Second, we fix α=1.74076and use λ∼P−β, (3) thesamemethodtoobtainthebestestimationofλagain. We will show that the size-dependency and implications whereβ estimatedas4.365istheslopeofthelineinFig- actually are independent on α. The reason of using this ure 2. Therefore, we conclude that the income distribu- twostepsfittingmethodistogetabetterestimationofλ tions are size(population) dependent. This dependency which is more important than α. From Figure 1, we can is described by the power law relationship between the see that the distributions change overtime regularly. As scale parameter λ and the population. time goes by, the distribution curves shift. This trend is As a result, the income distributions in different years more obvious in the scaling regions (high income tails). can be re-scaled by P−β, The relationship between the best estimations of λ and yearsis showninthe legendofFigure1.Furthermore,we f(x)∼(P−β)(α−1)αexp(−Pα−−β1x). (4) know that the population of U.S. increases with time in Γ(α) (P−βx)1+α 3 The re-scaled curve of income distribution is shown in the inset of Figure 2. 13. Although the power law relationship equation 3 is ac- ceptablebecause itsR2 isbig enough,weknowthereare 12. X~PΓ,Γ=5.065,R2=0.992 still large deviations from the empirical data in Figure 2. We guess the main errors are from the income dis- 12L0 11. 1 ´ tribution fittings. In Figure 1, we know that there are HP 10. D some deviations in the theoretical income distributions G from the empirical data. And these errors can of course 9. influencetheestimationsofλsverydramaticallysinceλs are very small. The second reason is we have very few 8. samples here(only 11 years), so the noise in the original 2.7 2.75 2.8 2.85 2.9 2.95 3. data can not be eliminated. Thus, equation 3 is just an approximation, however it can not prevent us to get an PopulationH´108L asymptotic theory. Next, we will discuss the two impor- tant implications of this size-dependency. FIG.3. PowerLawRelationshipbetweenthePopulationand theGDP of U.S. during1996-2007 III. POWER LAW RELATIONSHIP BETWEEN POPULATION AND GDP tionship which is shown in Figure 3. From the empirical data, we can estimate the power law exponent is about We will show that the size-dependency of income dis- 5.065 which is closed to the exponent we have predicted tributions implies the power law relationship between from size-dependent income distribution data (the rela- population and GDP. At first, we know that the GDP tive erroris |5.365−5.065|/5.065≈6%). However,there of a country is proportional to the total incomes of all is still a little deviation between the empirical exponent people[32]. Second, the total incomes can be read from and predicted one. The possible error sources may in- the income distribution curve. We can write down the clude (1) the income curves; (2) the estimation of β; (3) equation, the deviation between GDP and total incomes. X ∼PhIi, (5) where, X is the GDP, P is the population of a given IV. GENERALIZED SIZE-DEPENDENT year. I is the random variable income in a given year. INCOME DISTRIBUTION And hIi stands for the ensemble mean value of incomes. Therefore, PhIi is just the total incomes of the whole country in the given year. One of an interesting fact which deserves more atten- Then we can calculate the mean income from the cu- tionisthesize-dependencyofincomedistributionandits mulative probability function (Equation 2) as follow, implication of the power law relationship between popu- lationandGDPareindependentonthe Paretoexponent +∞ +∞ 1 αin the income distributionformulaEquation1. There- hIi= xf(x)dx= C(x)dx= . (6) fore,wecanfurtherhypothesizethatthesize-dependency Z Z λ 0 0 of distribution is a unique property independent on the Therefore, concrete form of the density function. Actually, from Equation 4, we can generalize an ab- X ∼P/λ. (7) stract form of the probability density function, Substituting Equation 3 into 7, we get: f(x)∼P−βg(P−βx). (9) X ∼P1+β. (8) Where, g(y) is an arbitrary probability density function with size-independent argument y. We know that when Equation8 is just the power law relationship(allometric we set g(y) as the concrete form, (α−1)αexp(−α−y1), we growth)between populationandGDP withan exponent Γ(α) y1+α 1+β. We have estimated the exponent β ∼ 4.365 from get the generalized LV model in Equation 4. the income distributions. Therefore, we predict that the Actually, the power law relationship between popula- GDP is a 5.365 power of the population in the United tion and GDP which is discussed in Section III can be States during 1996-2007. derived from the abstract form (Equation 9) because, On the other hand, we can obtain the real data of the populationandtheGDPofthe UnitedStatesduringthe +∞ +∞ hxi= xf(x)dx∼ xP−βg(P−βx)dx, (10) given period. The two variables have a power law rela- Z Z 0 0 4 replacetheintegralvariablexwithy =P−βx,weobtain, ignored more or less by previous studies. The size- dependency has two important implications: 1. The +∞ +∞ power law relationship between population and GDP hxi∼ P−βPβg(y)Pβdy =Pβ g(y)dy ∼Pβ, Z Z (which is also known as allometric growth); 2. The re- 0 0 (11) scaled income distribution has the same mathematical where, +∞g(y)dy is a constant because g(y) is size- form for the anomalous scaling probability density func- 0 indepenRdent. Finally, we can also obtain the power law tion of the sum of many correlated random variables. relationship between population and GDP, However, due to the limitation of our data, the results discussed in this paper are only for United States of X =Phxi∼P1+β. (12) America, this particular developed country, and only for the period of 1996-2007 which is a very stable time of So, we can conclude that the essence of size-dependency the United States. We haveobservedthatthe allometric in income distribution is captured by Equation 9. Actu- growth pattern is not found for some countries, espe- ally,thisre-scaledformdistributionisnotfirstdiscovered cially the nations encountering convulsions or inflation by this paper. In [28, 29], the authors also gave a simi- by another data set. Thus, we hypothesize that the size- larformulatodescribetheanomalousscalingprobability dependency of income distribution, especially the power density function in critical phenomena, law relationship between λ and population will not be observed as well in these cases. f(x)∼n−Dg(n−Dx). (13) Where, n is the size of the system (the number of ad- dends), D isalsoare-scaledexponent. Thesamemathe- In addition, we have found the same size-dependency matical form must imply the ubiquitous naturallaws, so phenomena in human online behaviors [33], therefore, it the individual income can be viewed as a sum of many is reasonableto acceptthat someresults inthis paper as correlatedrandomvariablesrelatedtoeachpersoninthe commononesforthestabledevelopingcomplexsystems. same country. However,we will not discuss the detail of ACKNOWLEDGMENTS this discovery and leave it to the future studies because of the size limitation of this paper. V. CONCLUDING REMARKS This paper is supported by National Natural Science This paper discussed the size-dependency of income Foundation of China under Grant No. 61004107 and distribution which is a very important property and Grant No. 70771012. [1] V.Pareto,CoursdconomiePolitique: Nouvelleditionpar [11] A. Dragulescu and V. Yakovenko, Physica A, 299, 213 G.-H. Bousquet et G. Busino (Librairie Droz, Geneva, (2001) Paris, 1964) [12] Y. Xu, L.P. Guo, N. Ding, and Y.G. Wang, Chinese [2] V.M. Yakovenkoand J. B. Rosser, Rev.Mod. Phy.,81, Phys. Let., 27, 078901 (2010) 1703 (2009) [13] S. Miyazima, Y. Lee, T. Nagamine, and H. Miyajima, [3] F.ClementiandM.Gallegati,PhysicaA,350,427(2005) Physica A,278, 282 (2000) [4] N. Ding and Y.G. Wang, Chinese Phys. Let., 24, 2434 [14] B. J. Kim and S. M. Park, Physica A, 347, 683 (2005), (2007) ISSN 0378-4371 [5] A. C. Silva and V. M. Yakovenko, Europhys. Lett., 69, [15] H. Anh, T. Kiet, S. Baek, B. Kim, and H. Jeong, J. 304 (2005) Korean. Phys.Soc., 51, 1812 (2007) [6] G. K. Zipf, Selected Studies of the Principle of Relative [16] S.Bernhardsson,L.E.C.daRocha, andP.Minnhagen, Frequency in Language, first ed.ed. (HarvardUniversity New J. Phys., 11, 123015 (2009) Press, 1932) [17] S. Nordbeck,Geor. Ann.B, 53, 54 (1971) [7] R. Albert and A. L. Barabasi, Rev. Mod. Phy., 74, 47 [18] M. Kleiber, Hilgardia, 6, 315 (1932) (2002) [19] J. Brown and G. West, Scaling in Biology (Oxford Uni- [8] W. Souma, cond-mat/0011373 (2000) versity Press, 2000) [9] P. Richmond, S. Hutzler, R. Coelho, and P. Repetow- [20] G. B. West and J. H. Brown, J. Exp. Biol., 208, 1575 icz, in Econophysics and Sociophysics (Wiley-VCH Ver- (2005) lag GmbH & Co. KGaA, 2006) pp. 131–159, ISBN [21] A.Isalgue,H.Coch, andR.Serra, PhysicaA,382,643 9783527406708 (2007) [10] A. Chatterjee, B. K. Chakrabarti, and S. S. Manna, [22] L. M. Bettencourt, Res. Policy, 36, 107 (2007) Physica A,335, 155 (2004) [23] L.M.Bettencourt,J.Lobo,D.Helbing,C.Kuhnert, and 5 G.B.West,P.Natl.Acad.ofSci.USA,104,7301(2007) [29] A. L. Stella and F. Baldovin, J. Stat. Mech-theory E, [24] B. Roehner, Int.J. Syst.Sci., 15, 917 (1984) 2010, P02018 (2010) [25] J. Zhang and T.K. Yu,Physica A, 389, 4887 (2010) [30] “http://www.irs.gov/taxstats/indtaxstats/article/ [26] L. Lu, Z. Zhang, and T. Zhou, PloS ONE, 5, e14139 0,,id=134951,00.html,” (2010) [31] O. Malcai, O. Biham, P. Richmond, and S. Solomon, [27] D. C. van Leijenhorst and T. P. van der Weide, Inform. Phys. Rev.E, 66, 031102 (2002) Sciences, 170, 263 (2005) [32] O. Blanchard, Macroeconomics (Prentice Hall, 2000) [28] F. Baldovin and A. L. Stella, Phys. Rev. E, 75, 020101 [33] L.F.Wu,J.Zhangand J.J.Zhu“Allometricscalingand (2007) Size-Dependent distributions of collective human online behaviors,” in preparation