ebook img

A maximum entropy theorem with applications to the measurement of biodiversity PDF

0.27 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A maximum entropy theorem with applications to the measurement of biodiversity

A maximum entropy theorem with applications to the measurement of biodiversity 1 Tom Leinster∗ 1 0 2 n a Abstract J This is a preliminary article stating and proving a new maximum en- 3 1 tropy theorem. The entropies that we consider can be used as measures of biodiversity. In that context, the question is: for a given collection ] of species, which frequency distribution(s) maximize the diversity? The T theorem provides the answer. The chief surprise is that although we are I dealing with not just a single entropy, but a one-parameter family of . s entropies,thereisasingledistributionmaximizingallofthemsimultane- c ously. [ 4 v 6 0 9 Contents 0 . 0 1 Statement of the problem 2 1 9 2 Preparatory results 5 0 : v 3 The main theorem 14 i X 4 Corollaries and examples 20 r a This article is preliminary. Little motivation or context is given, and the proofs are probably not optimal. I hope to write this up in a more explanatory and polished way in due course. ∗School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8QW, UK; [email protected]. SupportedbyanEPSRCAdvanced ResearchFellowship. 1 1 Statement of the problem Basic definitions Fix an integer n ≥ 1 throughout. A similarity matrix is an n×n symmetric matrix Z with entries in the interval [0,1], such that Z =1 for all i. A (probability) distribution is an n-tuple p=(p ,...,p ) ii 1 n with p ≥0 for all i and n p =1. i i=1 i Given a similarity matrix Z and a distribution p, thought of as a column P vector, we may form the matrix product Zp (also a column vector), and we denote by (Zp) its ith entry. i Lemma 1.1 Let Z be a similarity matrix and p a distribution. Then p ≤ i (Zp) ≤1 for all i∈{1,...,n}. In particular, if p >0 then (Zp) >0. i i i Proof We have n (Zp) = Z p =p + Z p ≥p i ij j i ij j i j=1 j6=i X X and (Zp) ≤ n 1p =1. 2 i j=1 j Let Z be aPsimilarity matrix and let q ∈[0,∞). The function HZ is defined q on distributions p by 1 1− p (Zp)q−1 if q 6=1  q−1 i i  HqZ(p)= − piloig:X(pZi>p0)i  if q =1. Lemma 1.1 guarantees thati:Xptih>e0se definitions are valid. The definition in the case q = 1 is explained by the fact that HZ(p) = lim HZ(p) (easily shown 1 q→1 q using l’Hˆopital’s rule). We call HZ the entropy of order q. q Notes on the literature TheentropiesHZ wereintroducedinthisgenerality q by Ricotta and Szeidl in 2006, as an index of the diversity of an ecological community [RS]. Think of n as the number of species, Z as indicating the ij similarity of the ith and jth species, and p as the relative abundance of the i ith species. Ricotta and Szeidl used not similarities Z but dissimilarities or ij ‘distances’d ;theformulasabovebecomeequivalenttotheirsonputtingZ = ij ij 1−d . ij The case Z = I goes back further. Something very similar to HI, using q logarithmstobase2ratherthanbasee,appearedininformationtheoryin1967, inapaperofHavrdaandCharv´at[HC]. Later,theentropiesHI werediscovered q in statistical ecology, in a 1982 paper of Patil and Taillie [PT]. Finally, they were rediscoveredin physics, in a 1988 paper of Tsallis [Tsa]. Still in the case Z = I, certain values of q give famous quantities. The entropyHI isShannonentropy(exceptthatShannonusedlogarithmstobase2). 1 2 The entropy HI is known in ecology as the Simpson or Gini–Simpson index; it 2 istheprobabilitythattwoindividualschosenatrandomareofdifferentspecies. For general Z, the entropy of order 2 is known as Rao’s quadratic en- tropy[Rao]. It is usually statedin terms of the matrix with (i,j)-entry1−Z , ij that is, the matrix of dissimilarities mentioned above. One way to obtain a similarity matrix is to start with a finite metric space {a1,...,an} and put Zij =e−d(ai,aj). Matrices of this kind are investigated in depth in [Lei2] and other papers cited therein. Here, metric spaces will only appear in two examples (4.4 and 4.5). The maximum entropy problem Let Z be a similarity matrix and let q ∈ [0,∞). The maximum entropy problem is this: For which distribution(s) p is HZ(p) maximal, and what is the q maximum value? The solution is given in Theorem 3.2. The terms used in the statement of the theorem will be defined shortly. However,the following striking fact can be stated immediately: There is a distribution maximizing HZ for all q simultaneously. q So even though the entropies of different orders rank distributions differently, there is a distribution that is maximal for all of them. For example, this fully explains the numerical coincidence noted in the Re- sults section of [AKB]. Restatement in terms of diversity LetZ beasimilaritymatrix. Foreach q ∈[0,∞), define a function DZ on distributions p by q 1 1−(q−1)HZ(p) 1−q if q 6=1 DZ(p) = q q ( eH1Z(p) if q =1 (cid:0) (cid:1) 1 1−q p (Zp)q−1 if q 6=1   i i  =  i:Xpi>(Z0p)−pi  if q =1. i We call DZ the diversityofio:Yprid>e0r q. As for entropy, it is easily shown that q DZ(p)=lim DZ(p). 1 q→1 q Thesediversitieswereintroducedinformallyin[Lei1],andareexplainedand developedin[LC]. ThecaseZ =I iswellknowninseveralfields: ininformation 3 theory,logDI iscalledtheR´enyientropyoforderq[R´en];inecology,DI iscalled q q the Hill number of order q [Hill]; and in economics, 1/DI is the Hannah–Kay q measure of concentration [HK]. The transformation between HZ and DZ is invertible and order-preserving q q (increasing). Hence the maximum entropy problem is equivalent to the maxi- mum diversity problem: For which distribution(s) p is DZ(p) maximal, and what is the q maximum value? The solution is given in Theorem 3.1. It will be more convenient mathemati- cally to work with diversity rather than entropy. Thus, we prove results about diversity and deduce results about entropy. When stated in terms of diversity, a further striking aspect of the solution becomes apparent: There is a distribution maximizing DZ for all q simultaneously. q The maximum value of DZ is the same for all q. q So every similarity matrix has an unambiguous ‘maximum diversity’, the max- imum value of DZ for any q. q A similarity matrix may have more than one maximizing distribution—but the collection of maximizing distributions is independent of q > 0. In other words,a distributionthatmaximizes DZ forsome q actuallymaximizes DZ for q q all q (Corollary 4.1). ThediversitiesDZ arecloselyrelatedtogeneralizedmeans[HLP],alsocalled q power means. Given a finite set I, positive real numbers (x ) , positive real i i∈I numbers (p ) such that p = 1, and t ∈ R, the generalized mean of i i∈I i i (x ) , weighted by (p ) , of order t, is i i∈I i i∈I P 1/t p xt if t6=0 i i  ! i∈I  Xxpii if t=0. i∈I Y For example, if p =p for all i,j ∈I then the generalizedmeans of orders 1, 0 i j and −1 are the arithmetic, geometric and harmonic means, respectively. Given a similarity matrix Z and a distribution p, take I = {i ∈ {1,...,n} | p > 0}. Then 1/DZ(p) is the generalized mean of ((Zp) ) , i q i i∈I weighted by (p ) , of order q−1. We deduce the following. i i∈I Lemma 1.2 Let Z be a similarity matrix and p a distribution. Then: i. DZ(p) is continuous in q ∈[0,∞) q 4 ii. if (Zp) = (Zp) for all i,j such that p ,p > 0 then DZ(p) is constant i j i j q over q ∈[0,∞); otherwise, DZ(p) is strictly decreasing in q ∈[0,∞) q iii. lim DZ(p)=1/max (Zp) . q→∞ q i:pi>0 i Proof Alloftheseassertionsfollowfromstandardresultsongeneralizedmeans. Continuity is clear except perhaps at q = 1, where it follows from Theorem 3 of [HLP]. Part (ii) follows from Theorem 16 of [HLP], and part (iii) from Theorem 4. 2 In the light of this, we define DZ(p)=1/ max(Zp) . ∞ i i:pi>0 There is no useful definition of HZ, since lim HZ(p)=0 for all Z and p. ∞ q→∞ q 2 Preparatory results Herewemakesomedefinitionsandprovesomelemmasinpreparationforsolving the maximum diversity and entropy problems. Some of these definitions and lemmas can also be found in [Lei2] and [LW]. Convention: for the rest of this work, unlabelled summations are under- stood to be over all i∈{1,...,n} such that p >0. i P Weightings and magnitude Definition 2.1 Let Z be a similarity matrix. A weighting on Z is a column vector w∈Rn such that 1 . Zw= . . . 1     A weighting w is non-negative if w ≥ 0 for all i, and positive if w > 0 for i i all i. Lemma 2.2 Let w and x be weightings on Z. Then n w = n x . i=1 i i=1 i Proof Write u for the column vector (1 ··· 1)t, whePre ( )t meaPns transpose. Then n n w =utw=(Zx)tw=xt(Zw)=xtu= x , i i i=1 i=1 X X using symmetry of Z. 2 Definition 2.3 Let Z be a similarity matrix onwhich there exists at leastone weighting. Its magnitude is |Z|= n w , for any weighting w on Z. i=1 i P 5 For example, ifZ is invertiblethen there is a unique weighting w onZ,and w is the sum of the ith row of Z−1. So then i n |Z|= (Z−1) , ij i,j=1 X the sum of all n2 entries of Z−1. This formula also appears in [SP], [Shi] and[POP],forcloselyrelatedreasonstodowithdiversityanditsmaximization. Weight distributions Definition 2.4 Let Z be a similarity matrix. A weight distributionfor Z is a distribution p such that (Zp) =···=(Zp) . 1 n Lemma 2.5 Let Z be a similarity matrix. i. If Z admits a non-negative weighting then |Z|>0. ii. If w is a non-negative weighting on Z then w/|Z| is a weight distribution for Z, and this defines a one-to-one correspondence between non-negative weightings and weight distributions. iii. If Z admits a weight distribution then Z admits a weighting and |Z|>0. iv. If p is a weight distribution for Z then (Zp) =1/|Z| for all i. i Proof i. Let w be a non-negative weighting. Certainly |Z| = n w ≥ 0. Since i=1 i we are assumingthat n≥1, the vector 0 is nota weighting,so w >0 for i P some i. Hence |Z|>0. ii. Thefirstpartisclear. Toseethatthisdefinesaone-to-onecorrespondence, take a weight distribution p, writing (Zp) =K for all i. Since p =1, i i we have p > 0 for some i, and then K = (Zp) > 0 by Lemma 1.1. The i i P vector w=p/K is then a non-negative weighting. The two processes—passing from a non-negative weighting to a weight distribution, and vice versa—are easily shown to be mutually inverse. iii. Follows from the previous parts. iv. Follows from the previous parts. 2 The first connection between magnitude and diversity is this: Lemma 2.6 Let Z be a similarity matrix and p a weight distribution for Z. Then DZ(p)=|Z| for all q ∈[0,∞]. q Proof By continuity, it is enough to prove this for q 6= 1,∞. In that case, using Lemma 2.5(iv), 1 1 DZ(p)= p (Zp)q−1 1−q = p |Z|1−q 1−q =|Z|, q i i i as required. (cid:16)X (cid:17) (cid:16)X (cid:17) 2 6 Invariant distributions Definition 2.7 Let Z be a similarity matrix. A distribution p is invariant if DZ(p)=DZ(p) for all q,q′ ∈[0,∞]. q q′ Soon we will classify the invariant distributions. To do so, we need some more notation and a lemma. Given a similarity matrix Z and a subset B ⊆ {1,...,n}, let Z be the B matrix Z restricted to B, so that (Z ) =Z (i,j ∈B). If B has m elements B ij ij then Z is an m×m matrix, but it will be more convenient to index the rows B and columns of Z by the elements of B themselves than by 1,...,m. B We will also need to consider distributions on subsets of {1,...,n}. A dis- tributiononB ⊆{1,...,n}issaidtobeinvariant,aweightdistribution,etc.,if itis invariant,a weightdistribution, etc.,with respecttoZ . Similarly,we will B sometimesspeakof‘weightingsonB’,meaningweightingsonZ . Distributions B are understood to be on {1,...,n} unless specified otherwise. Lemma 2.8 Let Z be a similarity matrix, let B ⊆ {1,...,n}, and let r be a distribution on B. Write p for the distribution obtained by extending r by zero. Then DZB(r) = DZ(p) for all q ∈ [0,∞]. In particular, r is invariant if and q q only if p is. Proof For i ∈ B we have r = p and (Z r) = (Zp) . The result follows i i B i i immediately from the definition of diversity of order q. 2 By Lemma 2.6,anyweightdistributionis invariant,andbyLemma 2.8, any extension by zero of a weight distribution is also invariant. We will prove that these are all the invariant distributions there are. For a distribution p we write supp(p) = {i ∈ {1,...,n} | p > 0}, the i support of p. Let Z be a similarity matrix. Given ∅6=B ⊆{1,...,n} and a non-negative weightingwonZ ,letwbethedistributionobtainedbyfirsttakingtheweight B distribution w/|Z | on B, then extending by zero to {1,...,n}. B e Proposition 2.9 Let Z be a similarity matrix and p a distribution. The fol- lowing are equivalent: i. p is invariant ii. (Zp) =(Zp) for all i,j ∈supp(p) i j iii. p is the extension by zero of a weight distribution on a nonempty subset of {1,...,n} iv. p = w for some non-negative weighting w on some nonempty subset of {1,...,n}. e 7 Proof (i⇒ii): Follows from Lemma 1.2. (ii⇒iii): Suppose that (ii) holds, and write B =supp(p). The distribution pon{1,...,n}restrictstoadistributionronB. Thisrisaweightdistribution onB, since for all i∈B we have(Z r) =(Zp) , which by (ii) is constantover B i i i∈B. Clearly p is the extension by zero of r. (iii⇒i): Suppose that p is the extension by zero of a weight distribution r on a nonempty subset B ⊆{1,...,n}. Then for all q ∈[0,∞], DZ(p)=DZB(r)=|Z | q q B by Lemmas 2.8 and 2.6 respectively; hence p is invariant. (iii⇔iv): Follows from Lemma 2.5. 2 There is at least one invariant distribution on any given similarity matrix. For we may choose B to be a one-element subset, which has a unique non- negative weighting w = (1), and this gives the invariant distribution w = (0,...,0,1,0,...,0). e Maximizing distributions Definition 2.10 LetZ be a similaritymatrix. Givenq ∈[0,∞],a distribution p is q-maximizing if DZ(p) ≥ DZ(p′) for all distributions p′. A distribution q q is maximizing if it is q-maximizing for all q ∈[0,∞]. It makes no difference to the definition of ‘maximizing’ if we omit q = ∞; nor does it make a difference to either definition if we replace diversity DZ by q entropy HZ. q We will eventually show that every similarity matrix has a maximizing dis- tribution. Lemma 2.11 Let Z be a similarity matrix and p an invariant distribution. Then p is 0-maximizing if and only if it is maximizing. Proof Suppose that p is 0-maximizing. Then for all q ∈[0,∞] and all distri- butions p′, DZ(p)=DZ(p)≥DZ(p′)≥DZ(p′), q 0 0 q using invariance in the first step and Lemma 1.2 in the last. 2 Lemma 2.12 Let Z be a similarity matrix and B ⊆ {1,...,n}. Suppose that sup DZB(r)≥sup DZ(p), where the first supremum is over distributions r on r 0 p 0 B and the second is over distributions p on {1,...,n}. Suppose also that Z B admits an invariant maximizing distribution. Then so does Z. (In fact, sup DZB(r) ≤ sup DZ(p) in any case, by Lemma 2.8. So the ‘≥’ in r 0 p 0 the statement of the present lemma could equivalently be replaced by ‘=’.) 8 Proof Let r be an invariant maximizing distribution on Z . Define a distri- B bution p on {1,...,n} by extending r by zero. By Lemma 2.8, p is invariant. Using Lemma 2.8 again, DZ(p)=DZB(r)=supDZB(r′)≥supDZ(p′), 0 0 0 0 r′ p′ so p is 0-maximizing. Then by Lemma 2.11, p is maximizing. 2 Decomposition LetZ beasimilaritymatrix. SubsetsB andB′ of{1,...,n} are complementary (for Z) if B∪B′ = {1,...,n}, B∩B′ = ∅, and Zii′ = 0 for all i ∈ B and i′ ∈ B′. For example, there exist nonempty complementary subsets if Z can be expressed as a nontrivial block sum X 0 . 0 X′ (cid:18) (cid:19) Given a distribution p and a subset B ⊆ {1,...,n} such that p > 0 for i some i∈B, let p| be the distribution on B defined by B p i (p| ) = . B i p j∈B j Lemma 2.13 Let Z be a similarity mPatrix, and let B and B′ be nonempty complementary subsets of {1,...,n}. Then: i. For any weightings v on ZB and v′ on ZB′, there is a weighting w on Z defined by v if i∈B w = i i v′ if i∈B′. (cid:26) i ii. For any invariant distributions r on B and r′ on B′, there exists an in- variant distribution p on {1,...,n} such that p|B =r and p|B′ =r′. Proof i. For i∈B, we have (Zw) = Z v + Z v′ = (Z ) v =(Z v) =1. i ij j ij j B ij j B i j∈B j∈B′ j∈B X X X Similarly, (Zw) =1 for all i∈B′. So w is a weighting. i ii. By Proposition 2.9, r = v for some non-negative weighting v on some nonempty subsetC ⊆B. Similarly, r′ =v′ forsomenon-negativeweight- ing v′ onsome nonemptyCe′ ⊆B′. By (i), there isa non-negativeweight- ing w on the nonempty set C∪C′ defineed by v if i∈C w = i i v′ if i∈C′. (cid:26) i 9 Let p = w, a distribution on {1,...,n}, which is invariant by Proposi- tion 2.9. For i∈C we have e pi wi/|ZC∪C′| (p| ) = = B i j∈Bpj j∈Cwj/|ZC∪C′| w v i i = P =P w v j∈C j j∈C j = r . i P P For i∈B\C we have p i (p| ) = =0=r . B i i p j∈B j Hence p|B =r, and similarly p|BP′ =r′. 2 Lemma 2.14 Let Z be a similarity matrix, let B and B′ be complementary subsets of {1,...,n}, and let p be a distribution on {1,...,n} such that p >0 i for some i∈B and p >0 for some i∈B′. Then i D0Z(p)=D0ZB(p|B)+D0ZB′(p|B′). Proof By definition, p p p DZ(p)= i = i + i . 0 (Zp) (Zp) (Zp) X i i∈BX: pi>0 i i∈BX′: pi>0 i Now for i∈B, p = p (p| ) i  j B i j∈B X   by definition of p| , and B (Zp) = Z p + Z p = (Z ) p = p (Z p| ) . i ij j ij j B ij j j B B i   j∈B j∈B′ j∈B j∈B X X X X   Similar equations hold for B′, so DZ(p) = (p|B)i + (p|B′)i 0 i∈BX: pi>0(ZBp|B)i i∈BX′: pi>0(ZB′p|B′)i = D0ZB(p|B)+D0ZB′(p|B′), as required. 2 Proposition 2.15 Let Z be a similarity matrix and let B and B′ be nonempty complementary subsets of {1,...,n}. Suppose that ZB and ZB′ each admit an invariant maximizing distribution. Then so does Z. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.