On the Measurement of Dissimilarity and Related Orders ∗ Francesco Andreolia,b,c,† and Claudio Zolia a University of Verona, DSE, Via dell’Artigliere 19, I-37129 Verona, Italy bUniversit´e de Cergy-Pontoise, THEMA, 33 Boulevard du Port, F-95011 Cergy-Pontoise, France cESSEC Business School, 1 Avenue Bernard Hirsch, F-95021 Cergy-Pontoise, France July 1, 2013 Abstract We consider populations partitioned into groups, whose members are distributed across a finite number of classes such as, for instance, types of occupation, residen- tial location, social class of fathers, levels of education, health or income. Our aim is to assess the dissimilarity between the patterns of distributions of the different groups. These evaluations are relevant for the analysis of multi-group segregation, socioeconomic mobility, equalization of opportunity and discrimination. We concep- tualize the notion of dissimilarity making use of reasonable transformations of the groups’ distributions, based on sequences of transfers and exchanges of population massesacrossclassesand/orgroups. Ouranalysisclarifiesthesubstantialdifferences underlying the concept of dissimilarity when applied to ordered or to permutable classes. In both settings, we illustrate the logical connection of dissimilarity evalu- ations with matrix majorization pre-orders, and provide equivalent implementable criteria to test unambiguous dissimilarity reductions. Furthermore, we show that in- equality evaluations can be interpreted as special cases of dissimilarity assessments and discuss relations with concepts of segregation and discrimination. Keywords: Dissimilarity,MatrixMajorization,Zonotopes,Multi-groupSegregation,Dis- crimination. JEL Codes: J71, D31, D63, C16. ∗We would like to thank Martin Foster, Arnaud Lefranc, Eugenio Peluso and Gaston Yalonetzky for usefulcomments. WearealsogratefultoallparticipantsattheFourthECINEQmeeting(Catania2011), th LAGV#11 Marseille (2012), 11 SSCW (New Delhi 2012), EEA-ESEM (Malaga 2012) and seminars at IT2012 Winter School, THEMA, Verona, CORE and Bocconi University. Francesco Andreoli kindly acknowledgesthefinancialsupportfromtheUniversit`aItalo-Francese(UIF/UFI),BandoVinci2010. The usual disclaimer applies. †Contacts: [email protected] (F. Andreoli) and [email protected] (C. Zoli). 1 1 Introduction There is a growing interest, among policymakers and social scientists, on phenomena in- volving the distribution of a population, partitioned into well defined social groups, across ordered or alternatively non-ordered classes representing discrete realizations of economic relevant indicators. Segregation, discrimination, social mobility, equality of life chances, and definitely inequality assessments, are all situations in which the distributions of dif- ferent groups across well defined classes do not coincide, but rather a form of dissimilarity among these distributions prevails. This paper studies incomplete and complete orderings for ranking pairs of situation according to the degree of dissimilarity they exhibit. Dissimilarity comparisons have a long tradition in the statistical and economic liter- ature. In one of his earlier works, Gini (1914, p. 189) defines two distributions (iden- tified as “groups”) defined on the same discrete support (whose realizations are denoted as “classes”) as similar when “the overall populations of the two groups take the same values with the same [relative] frequency.”1 This model for similarity can be straightfor- wardly generalized to the multi-group case, where more than two discrete distributions are involved. Every set of discrete distributions that does not reveal the similarity represen- tation is said to display some degree of dissimilarity. The objective of the paper is to characterize comparisons of sets of distributions, displaying different degrees of dissimilarity, in an “objective” manner, i.e. depending only on a restricted set of intuitive and compelling transformations that, when applied to the data, reduce the overall dissimilarity. The notion of similarity is used, often implicitly, as a reference case for a variety of economic problems discussed above. For instance, univariate inequality prevails if the distribution of income shares across demographic units (such as individuals, fami- lies or groups) differs from the distribution of population shares. Through treating in- come shares and population shares as two distinct distributions across sampled units, one can rationalize inequality comparisons within the dissimilarity model (Marshall, Olkin and Arnold 2011). Furthermore, multivariate inequality analysis extends the dissimilar- ity comparisons to dimensions other than income, such as wealth, well being, assets or consumption (Kolm 1977, Atkinson and Bourguignon 1982, Koshevoy and Mosler 1996). Similarity represents the case in which every unit receives an income or a bundle of at- tributes whose size is proportional to her population share. If, in addition, the weighting scheme is uniform, then income shares or attribute bundles coincide across units, which is 1The text, translated from the Italian, proceeds with a very simple formalization of similarity: “If n is the size of group α, m is the size of group β, n the size of group α which is assigned to class x and m x x the size of group β assigned to the same class, then it should hold [under similarity] that, for any value of x, nx = n.” mx m 2 the case when perfect equality is reached.2 Furthermore, the emerging interest in social inclusion issues (Atkinson and Marlier 2010) has shifted the focus from how unequally realizations are distributed within each group, giving instead relevance to the pattern of dissimilarity between groups’ distri- butional heterogeneity. For instance, in a non-segregated society, the social groups are similarly distributed across non-ordered classes defined, for instance, by the residential lo- cation, the school assignment or the occupational types of the inhabitants.3 When classes are ordered, dissimilarity prevails in all situations where a form of discrimination is at stake. A policy motivated by equality of opportunity objectives may not judge as socially acceptable any form of discrimination in income, education or health achievements across groups identified by ethnic, racial or socioeconomic indicators.4 Even intergenerational equity assessment comes down to how much dissimilarity exists in the chances of reach- ing a given class of destination from different classes of departure. In this situation both groups and classes are ordered. Despitetheconceptualproximity,thesephenomenaareassessedonthebasisofstrongly heterogeneous measurement models. Segregation, discrimination and inequality indicators have played a central role in the sociological and economical literature. Despite the con- solidated belief that indicators should be consistent with more partial criteria based on segregation curves or discrimination curves or Lorenz curves, there is much less agreement on the properties that should be satisfied by their extensions to cases where more than two distributions are involved in the comparisons. This inconsistency stems from the fact that the movements of population masses from one class to another, that have a meaning- ful interpretation in the bivariate analysis, can be hardly reformulated in a setting with many distributions.5 This paper moves beyond these issues providing a common analytical background by modeling the underlying notion of dissimilarity. A century after the seminal work by Gini, we reconcile bivariate and multi-group dis- similarity comparisons building on intuitive operations of movements of groups masses 2SeeHardy,LittlewoodandPolya(1934)andMarshalletal.(2011),Kolm(1969,1977),Koshevoyand Mosler (1996) for traditional results and multivariate extensions. 3See Duncan and Duncan (1955), Hutchens (1991, 2001), Reardon and Firebaugh (2002), Reardon (2009), Flu¨ckiger and Silber (1999), Chakravarty and Silber (2007), Alonso-Villar and del Rio (2010), Frankel and Volij (2011) and Silber (2012), for a survey on the methodology. For an analysis of the economic motivations behind segregation comparisons, see for instance Echenique, Fryer and Kaufman (2006) and Borjas (1992, 1995). 4For a formal treatment of discrimination in the statistical and economic literature, see Butler and McDonald(1987),Jenkins(1994),VandeGaer,SchokkaertandMartinez(2001),FuscoandSilber(2011), Le Breton, Michelangeli and Peluso (2012) or Lefranc, Pistolesi and Trannoy (2009) 5This argument has a parallel in the inequality literature, where progressive Pigou-Dalton transfers characterizingunambiguousreductionsofunivariateinequalitycannotbeeasilyimplementedorextended to the multivariate case. 3 across well defined ordered or non-ordered classes of realizations. We make use of opera- tions retained from the literature on information (Grant, Kajii and Polak 1998), segrega- tion (Frankel and Volij 2011, Reardon 2009) and mobility (Van de Gaer et al. 2001), that we use to characterize axiomatically partial, as well as complete, orders of dissimilarity that (i) permit to compare only differences across groups distributional patterns, leaving aside within group inequality concerns; (ii) build upon a parsimonious structure involving operationsthatarevalidinthemulti-groupandtwo-groupssetting; (iii)provideacommon ground for evaluating a variety of phenomena, among others segregation, discrimination and inequality; (iv) are coherent with more traditional two-groups implemented through segregation/discrimination/Lorenz curves comparisons. In our analysis, we make use of two reasonable dissimilarity-reducing transformations. A merge operation levels the distributional disparities by bringing together, distribution by distribution, the population associated to different groups in two distinct classes (e.g., two schools, two jobs or two neighborhoods) to obtain a new class of larger size. Following Grant et al. (1998), this operation decreases the informativeness of the data when classes are not ordered, but it might lead to counterintuitive results if applied to exogenously ordered classes (such as income intervals, years of education or health scores). For ordered classes, we consider the exchange operations. An exchange trades off an improvement (i.e., an upward shift of one class) enjoyed by a given proportion of a group that is relatively over-represented in a given class, with a deterioration (i.e., a downward shift of one class) supported by the same proportion of another group that is relatively under-represented in the same class. If a set of distributions can be obtained from another through a sequence of merge or, alternatively, exchange operations, then the former set can be unambiguously ranked as displaying less dissimilarity than the latter according to all dissimilarity partial or complete orders coherent with these operations. Our approach is model free and does not rely on welfare evaluations associated with changes in dissimilarity. Our main result is that every dissimilarity order coherent with a set of dissimilarity preserving operations and with merge or exchange operations is equivalently represented via partial orders induced by matrix majorization (Dahl 1999) and, eventually, its verifi- cation can be implemented empirically by checking at the inclusion of geometric bodies, corresponding to multidimensional generalizations of the segregation (Hutchens 1991) and the discrimination curves (Le Breton et al. 2012). Our results consist of equivalences, in the spirit of the Hardy et al. (1934) inequality analysis, between transformations of the data, implementable criteria and unanimity among different classes of dissimilarity indi- cators. However, the dissimilarity analysis involves a broader set of cases than the ones covered by (multivariate) inequality analysis. In fact, while similarity only requires that within group inequalities are equalized across groups, equality calls for their elimination. Hence, equality implies similarity but not the reverse. To support our assertion, we show 4 that the transformations underlying every progressive transfer can be refined into a pre- cise sequence of dissimilarity preserving and reducing transformations, while the opposite argument is false. Therestofthepaperisorganizedasfollows. Section2introducestherelevantnotation. Sections 3 and 4 present the main results of this paper in the form of equivalent conditions between multi-group transfers, majorization conditions and testable criteria both for the case of non-ordered and ordered classes. Section 5 formalizes the relations between dis- similarity and inequality, segregation and discrimination, along with a discussion of novel indicators. Section 6 concludes. 2 Setting 2.1 Notation This paper deals with comparisons of d n distribution matrices, depicting the relative × frequencies distribution of d groups (indexed by rows) across n disjoint classes (indexed by columns), such that n 2 and d 1 for n and d natural numbers. The set of distribution ≥ ≥ matrices with d rows, representing the data, is: n := A = (a ,...,a ,...,a ) : a [0,1]d, a = 1 i . d 1 j n j ij M ∈ ∀ (cid:40) (cid:41) j=1 (cid:88) A distribution matrix A represents a collection of d discrete relative frequencies d ∈ M distributions of groups lying in the unit simplex ∆nA, where n denotes the number of A classes of A.6 For A , the cumulative distribution matrix →−A Rd is constructed ∈ Md ∈ + by sequentially cumulating the classes of A. The column k of →−A, for all k = 1,...,n , is A k therefore a := a . →−k j=1 j We use column vectors and denote e := (1,...,1)t and 0 := (0,...,0)t. We refer to (cid:80) n n the set of transformation matrices using Π to denote an element of the set of all n n n n P × permutation matrices, while we use to denote the set of all row stochastic n m n,m R × matrices whose rows lie in ∆m. The element X can be interpreted as a migration n,m ∈ R matrix whose entry x is the probability of the population in class i in the distribution of ij origin to “migrate” to class j in the distribution of destination. The set of row stochastic matrices such that m = n is denoted by , while contains the doubly stochastic n n n R D ⊆ R matrices whose rows and columns lie in ∆n.7 6AndreoliandZoli(2012)discussthegeneralcasewheretheelementsofeachrowrepresenttheabsolute frequencies distribution of groups across classes, so that the condition a = 1 does not necessarily j ij hold. 7Note that because both sets indicate row stochastic mat(cid:80)rices, where sets restriction d,n d d R ⊆M M 5 2.2 Partial orders of dissimilarity The models for complete similarity and dissimilarity in Gini (1914) can be transposed in convenient matrix notation. The perfect similarity matrix S represents the case in which all groups’ distribution functions coincide across classes, and can be all represented by the same row vector st ∆n. The perfect dissimilarity matrix D formalizes the situation ∈ of maximal dissimilarity between groups’ distributions represented by row vectors d 1 ∈ ∆n1,...,d ∆nd (which may differ in the number of classes occupied), occurring when d ∈ each class is occupied at most by a group and the d distributions do not overlap across classes. In compact form: st dt ... 0 1 S := ... and D := ... ... ... . st 0 ... dt d A dissimilarity ordering is a complete and transitive binary relation (cid:52) on the set d M ∼ (cid:52) with symmetric part that ranks B A whenever the distributions of the groups in B areat most as dissimilar as thoseinA.8 Wewillcharacterizethepartialordersoriginating from the intersection of these dissimilarity orderings (see Donaldson and Weymark 1998). (cid:52) (cid:52) Any dissimilarity order should rank S A D for any A . We show that a d ∈ M very basic and intuitive axiomatic structure allows to characterize the raking either when classes are freely permutable or, alternatively, when they are ordered. These orders will be related to multivariate majorization orders (Marshall et al. 2011) and to empirical criteria. 2.2.1 Multivariate majorization partial orders Multivariate majorization theory suggests elementary algebraic transformations of data that involve row stochastic, column stochastic or bistochastic matrices. In this paper we obtain results that are related to uniform and matrix majorization. Definition 1 (Multivariate Majorization) Given two matrices A, B : d ∈ M (i) B is uniformly majorized by A (B (cid:52)U A) provided that n = n = n and there A B exists a doubly stochastic matrix X such that B = A X. n ∈ D · (ii) B is (matrix) majorized by A (B (cid:52)R A) provided there exists a row stochastic matrix X such that B = A X. ∈ RnA,nB · only on the number of rows. We keep the two set separate to highlight the distinction between the data that have to be compared ( ) and the transformations applied to them ( ). d d,n 8For any A, B, C Mthe relation (cid:52) is transitive if C(cid:52)B and B(cid:52)RA then C(cid:52)A and complete d if either A(cid:52)B or B(cid:52)∈AMor both. Moreover, B∼A if and only if B(cid:52)A and A(cid:52)B. 6 Uniform majorization has been extensively discussed in Marshall et al. (2011), where it is shown that B (cid:52)U A is equivalent to f(B) f(A) for all Schur-convex functions ≤ f. Uniform majorization imposes strong restrictions. In fact, one matrix is uniformly majorized by another on if the size of the rows but also the size of the columns of the two matrices coincide. This is an appealing criterion for inequality analysis, where one wants to eliminate all forms of distributional heterogeneity. For instance, univariate inequality comparisons b (cid:52)U a with a,b correspond to dominance of distribution b over a in 1 ∈ M termsofLorenzcurves. Thismodelmightnot, however, besuitableforotherapplications.9 Matrix majorization is weaker than uniform majorization. It is obtained via multipli- cation of row stochastic matrices, therefore it preserves the total dimension of each group. Matrix majorization ranks S and D as the two polar cases, disregarding the fact that the distributions in the perfect dimilarity and dissimilarity matrices reflect high or low degrees of within groups distributional heterogeneity, thus focusing on the dissimilarities between groups distributions. Matrix majorization has been already investigated (under different names) in other fields such as linear algebra and majorization orders (Dahl 1999, Hasani and Radjabalipour 2007), in inequality analysis (see Chapter 14 in Marshall et al. 2011), in the comparison of statistical experiments (Blackwell 1953, Torgersen 1992) or in segre- gation analysis (Frankel and Volij 2011). Majorization orders provide suitable model for comparing distribution matrices that are, however, not directly testable: there could exist in fact more than one matrix that permits to verify the majorization relation and in general it cannot be identified by the data. It is however sufficient to know that such a matrix exists to infer the majorization relation. 2.2.2 Geometric partial orders based on polytopes inclusion The statistical theory offers many empirical criteria based upon comparisons of curves or geometric bodies for comparing and rank (sets of) distributions. For instance, the Lorenz curve (Marshall et al. 2011) is a plot of the income shares versus the fractional rank of individuals in a population, arranged by increasing income. The plot of the cumulative shares of two groups across classes originates the segregation curve (Duncan and Duncan 1955, Hutchens 1991), when classes are ordered by increasing concentration of one group over the other, or it originates a concentration curve (Mahalanobis 1960) when the order of the classes is exogenously provided. The interdistributional Lorenz curve (ButlerandMcDonald1987)gatherstogetherrobustinequalityandconcentrationanalysis consideringinequalitypatternsamongtwogroupsatatime, whilethediscrimination curve 9In this setting, a school district where all schools have the same racial composition and the same size is regarded as less segregated than a school district where races are evenly distributed across schools of different size. 7 (Le Breton et al. 2012) is the concentration curve of two probability distributions that can be ordered according to first order stochastic dominance. These curves allow to construct bivariate comparisons, but they can hardly be used to assessmulti-groupdissimilaritywithoutbreakingdowntheglobaldissimilarityassessments intopairwisecomparisons. Weprovidegeneralmultivariategeometricteststhatextendthe analysis of segregation, discrimination, (interdistributional) inequality and concentration to more than two groups. Our tests are verified by checking the inclusion of multidi- mensional geometric bodies. Inclusion induces partial orders, in the sense that whenever inclusion is not verified, dissimilarity comparisons are inconclusive. Here we illustrate in depth the main tools that we will use for testing dominance. They are Zonotopes for the permutable case and Path Polytopes for the ordered case. For any A , the Monotone Path MP (A) [0,1]d is an arrangement of n d ∗ A ∈ M ⊆ line segments starting from the origin of the positive orthant and sequentially connecting the points with coordinates given by the columns of →−A. The vertices of MP (A) are ∗ ordered monotonically with respect to the indices of the columns of matrix A, such that v MP (A) if and only if v = a for all j, v = 0 and v = A e = e .10 j ∈ ∗ j →−j 0 d nA · nA d If the classes of A are ordered, then it is possible to construct at most one Monotone Path associated with A, while if the classes are permutable it is possible to construct n ! A alternative Monotone Paths out of the same distribution matrix A. For non-ordered distribution matrices, the Zonotope Z(A) [0,1]d is the set corre- ⊆ sponding to the convex hull of all the n ! Monotone Paths generated by permuting the A columns of A. A convenient formulation of the Zonotope set is as follows:11 nA Z(A) := z := (z ,...,z )t : z = θ a , θ [0,1] j = 1,...,n . 1 d j j j A ∈ ∀ (cid:40) (cid:41) j=1 (cid:88) The (maximum) Dissimilarity Zonotope Z (A) of A is a d-dimensional hypercube D connecting the origin of Rd and the vertex with coordinates e . It’s diagonal is the + d Similarity Zonotope Z (A). Given the representation of the Zonotope, it is not difficult S to see that Z (A) = Z(D) and Z (A) = Z(S) for every A , so that all distribution D S d ∈ M matrices that display some degree of dissimilarity originate Zonotopes representations that lie in Z and share the same reference diagonal Z . D S The latter property makes the Zonotopes inclusion order well defined, so that the relation Z(B) Z(A) always indicates that the set of distributions in B is closer to ⊆ 10See Shephard (1974) and Ziegler (1995) for a definition of the f-monotone path and its applications to the study of Zonotopes. 11The Zonotope Z(A) Rd is a centrally symmetric convex body defined by the Minkowski sum of a ⊆ + finitenumberofclosedlinesegmentsconnectingthepointsgeneratedbythecolumnsofAwiththeorigin (for an extensive treatment, see McMullen 1971). 8 similarity than is the set of distributions in A. Considering the ordered case, the Path Polytope Z (A) [0,1]d is an expansion of the ∗ ⊆ unidimensional ordered set MP in the d-variate space. It consists in the convex hull of ∗ the permutations of MP (A) with respect to the diagonal Z(D) and it is formalized as ∗ follows: Z (A) := z := (z ,...,z )t : z conv Π p Π , p MP (A) . ∗ ∗ 1∗ d∗ ∗ ∈ { d · | d ∈ Pd} ∈ ∗ (cid:8) (cid:9) The Dissimilarity Path Polytope and the Similarity Path Polytope of Z (A) coincide ∗ with Z (A) and Z (A), respectively. The inclusion Z (B) Z (A) indicates an alterna- D S ∗ ∗ ⊆ tive perspective for assessing that the set of distributions depicted in B is more close to similarity than is the set in A. The inclusion test imposes a sequential dominance con- dition for all points z belonging to the convex hull created from p MP (A) which lie ∗ ∗ ∈ on the same hyperplane supporting the standard simplex ∆d, properly scaled by a factor λ [0,d].12 ∈ A simple example clarifies the distinction between Zonotopes and Path Polytopes. Matrix A collects the data on the distribution of male (first row) and female 2 ∈ M (second row) across four classes: 0.4 0.1 0 0.5 A = . (cid:32) 0.1 0.4 0.3 0.2 (cid:33) The Zonotope of matrix A is delimited by the grey area in figure 2(a). Each column of A is a vector in the two dimensional space (we draw a small symbol associated to each vector). Consider the case where classes are interpreted as occupations (and therefore are non-ordered). Matrix A may well represent a segregated distributions of sexes across occupations. The Z(A) is therefore the area between the segregation curve, corresponding to its lower bound, and the dual of the segregation curve obtained as the upper bound of the figure. The Path Polytope of matrix A (figure 2(b)) corresponds to the grey area between the Monotone Path (solid line) and its symmetric projection (dashed line) with respect to the diagonal. If classes are interpreted as ordered non-overlapping income intervals, then matrix A may well represent a gender based discrimination pattern and the Path Polytope corresponds to the area between the discrimination curve (the lower boundary of the Path Polytope) and its dual discrimination curve. 12The hyperplane supporting the simplex has slope e . Since we make use of distribution matrices d satisfyingA e =e ,thenthevalueassociatedtothehyperplanecrossingthePathPolytopeinthepoint n d · A e is equal to et e = d. We derive a procedure to test the Path Polytopes inclusion that exploits · n d· d this feature. 9 Female Female 1 1 1 Male 1 Male O O (a) Z(A) (b) Z∗(A) Figure 1: The Zonotope and Path Polytope (with the monotone path in solid line) 3 Characterization of the dissimilarity orders with permutable classes Which operations are meaningful for transforming the set of distributions represented (cid:52) by matrix A into the set represented by matrix B such that B A? In univariate inequality comparisons, this question has been addressed in the well known theorem by Hardy et al. (1934): transforming one income distribution into another through a sequence of Pigou-Dalton (rich to poor) transfers unambiguously reduces inequality. We seek a similar characterization via transformations of matrix A toward B and possibly S, that (cid:52) are expressed as axioms for the dissimilarity order . 3.1 Axioms The first axiom defines an anonymity property for classes. It requires that the name of the classes does not have to be taken into account in dissimilarity comparisons. Axiom IPC (Independence from Permutations of Classes) For any A, B d ∈ M with n = n = n, if B = A Π for a permutation matrix Π then B ∼ A. A B n n n · ∈ P One direct implication of IPC is that by cumulating frequencies across classes, one cannot derive any additional information that can be exploited in the dissimilarity com- parison. Hence, admitting IPC means restricting attention to a specific type of problems where classes cannot be meaningfully ordered. This clarifies why we treat the case of 10
Description: