7 Proof of an entropy conjecture of 1 0 Leighton and Moitra 2 r a Hu¨seyin Acan ∗† Pat Devlin ∗‡ M [email protected] [email protected] 0 Jeff Kahn ∗‡ 1 [email protected] ] O C . h Abstract t a We prove the following conjecture of Leighton and Moitra. Let T m be a tournament on [n] and Sn the set of permutations of [n]. For an [ arc uv of T, let Auv ={σ ∈Sn : σ(u)<σ(v)}. 2 Theorem. For a fixed ε > 0, if P is a probability distribution on Sn 1v such that P(Auv) > 1/2+ε for every arc uv of T, then the binary 2 entropy of P is at most (1 ϑε)log2n! for some (fixed) positive ϑε. − 3 When T is transitive the theorem is due to Leighton and Moitra; for 4 0 this case we give a short proof with a better ϑε. . 1 0 1 Introduction 7 1 : Inwhatfollows weuselog forlog andH()forbinaryentropy. Thepurpose v 2 · i of this note is to prove the following natural statement, which was conjec- X tured by Tom Leighton and Ankur Moitra [6] (and told to the third author r a by Moitra in 2008). Theorem 1. Let T be a tournament on [n] and σ a random (not necessarily uniform) permutation of [n] satisfying: for each arc uv of T, P(σ(u) < σ(v)) > 1/2+ε. (1) AMS 2010 subject classification: 05C20, 05D40, 94A17, 06A07 Key words and phrases: entropy,permutations, tournaments, regularity ∗Departmentof Mathematics, Rutgers University †Supportedby National Science Foundation Fellowship (Award No. 1502650). ‡Supportedby NSFgrant DMS1501962. 1 Then H(σ) (1 ϑ)logn!, (2) ≤ − where ϑ > 0 depends only on ε. (We will usually think of permutations as bijections σ : [n] [n]). The → original motivation for Leighton and Moitra came mostly from questions about sorting partially ordered sets; see [6] for more on this. For the special case of transitive T, Theorem 1 was proved in [6] with ϑ = Cε4. Note that for a typical (a.k.a. random) T, the conjecture’s hy- ε pothesis is unachievable, since, as shown long ago by Erd˝os and Moon [2], noσ agrees withT onmorethana(1/2+o(1))-fraction ofitsarcs. Infact, it seems naturaltoexpectthattransitive tournamentsaretheworst instances, being the ones for which the hypothesized agreement is easiest to achieve. From this standpoint, what we do here may be considered somewhat unsat- isfactory, as our ϑ’s are quite a bit worse than those in [6]. For transitive T it’s easy to see [6, Claim 4.14] that one can’t take ϑ greater than 2ε, which seems likely to be close to the truth. We make some progress on this, giving a surprisingly simple proof of the following improvement of [6]. Theorem 2. For T, P, σ as Theorem 1 with T transitive, H(σ) (1 ε2/8)nlogn. ≤ − TheproofofTheorem1isgiven inSection 3followingbriefpreliminaries in Section 2. The underlying idea is similar to that of [6], which in turn was based on the beautiful tournament ranking bound of W. Fernandez de la Vega [1]; see Section 3 (end of “Sketch”) for an indication of the relation to [6]. Theorem 2 is proved in Section 4. 2 Preliminaries Usage In what follows we assume n is large enough to support our arguments and pretend all large numbers are integers. As usual G[X] is the subgraph of G induced by X; we use G[X,Y] for the bipartite subgraph induced (in the obvious sense) by disjoint X and Y. For a digraph D, D[X] and D[X,Y] are used analogously. For both graphs and digraphs, we use for number of edges (or arcs). |·| 2 Also as usual, the density of a pair (X,Y) of disjoint subsets of V(G) is d(X,Y) = d (X,Y) = G[X,Y]/(X Y ), and we extend this to bipartite G | | | || | digraphs D in which at most one of D (X Y), D (Y X) is nonempty. (3) ∩ × ∩ × For a digraph D, Dr is the digraph gotten from D by reversing its arcs. Write S for the set of permutations of [n]. For σ S , we use T n n σ ∈ for the corresponding (transitive) tournament on [n] (that is, uv T iff σ ∈ σ(u) <σ(v)) and for a digraph D (on [n]) define fit(σ,D) = D T Dr T σ σ | ∩ |−| ∩ | (e.g. when D is a tournament, this is a measure of the quality of σ as a ranking of D). Regularity Here we need just Szemer´edi’s basic notion [7] of a regular pair and a very weak version (Lemma 3)of his Regularity Lemma. As usualabipartite graph H on disjoint X Y is δ-regular if ∪ d (X′,Y′) d (X,Y) < δ H H | − | whenever X′ X, Y′ Y, X′ > δ X and Y′ > δ Y , and we extend this ⊆ ⊆ | | | | | | | | in the obvious way to the situation in (3). It is easy to see that if a bigraph H is δ-regular then its bipartite complement is as well; this implies that for a tournament T on [n] and X, Y disjoint subsets of [n], T (X Y) is δ-regular if and only if T (Y X) is. (4) ∩ × ∩ × The following statement should perhaps be considered folklore, though similar results were proved by Ja´nos Komlo´s, circa 1991 (see [5, Sec. 7.3]). Lemma 3. For each δ > 0 there is a β >2−δ−O(1) such that for any bigraph H on X Y with X , Y n, there is a δ-regular pair (X′,Y′) with X′ ∪ | | | | ≥ ⊆ X,Y′ Y and each of X′ , Y′ at least βn. ⊆ | | | | Corollary 4. For each δ > 0, β as in Lemma 3 and digraph G = (V,E), there is a partition L R W of V such that E (L R) is δ-regular and ∪ ∪ ∩ × min L , R β V /2. {| | | |} ≥ | | Proof. Let X Y be an (arbitrary) equipartition of V and apply Lemma 3 ∪ to the undirected graph H underlying the digraph G (X Y). ∩ × 3 3 Proof of Theorem 1 We now assume that σ drawn from the probability distribution P on S n satisfies (1) and try to show (2) (with ϑ TBA). We use E for expectation w.r.t. P and µ for uniform distribution on S . n Sketch and connection with [6] We will produce S ,...,S T with S L R for some disjoint 1 m i i i ⊆ ⊆ × L ,R [n], satisfying: i i ⊆ (i) with S := min L , R , S = Ω(nlogn) (where the implied i i i P i k k {| | | |} k k constant depends on ε); (ii) each S is δ-regular (with δ = δ TBA); i ε (iii) for all i < j, either (L R ) (L R ) = ∅ or L R is contained i i j j j j ∪ ∩ ∪ ∪ in one of L ,R (note this implies the S ’s are disjoint). i i i Let A = fit(σ,S ) > εS and Q = S : A occurs = Ω(nlogn) . i i i P i i { | |} { {k k } } The main points are then: (a) P(Q) is bounded below by a positive function of ε. (This is just (i) together with a couple applications of Markov’s Inequality.) (b) Regularity of S implies µ(A ) exp[ Ω( S )]. i i i ≤ − k k (c) Under (iii), for any I [m], ⊆ µ( A )< exp[ Ω( S )] ∩i∈I i −Pi∈I k ik (a weak version of independence of the A ’s under µ). i And these points easily combine to give (2) (see (6) and (8)). For the transitive case in [6] most of this argument is unnecessary; in particular, regularity disappears and there is a natural decomposition of T into S ’s: Supposing T = ab : a < b and (for simplicity) n = 2k, we may i { } take the S ’s to be the sets L R with (L ,R ) running over pairs i i i i i × ([(2s 2)2−jn+1,(2s 1)2−jn],[(2s 1)2−jn+1,2s2−jn]), (5) − − − with j [k] and s [2j−1]. (As mentioned earlier, this decomposition of ∈ ∈ the (identity) permutation (1,...,n) also provides the framework for [1].) 4 After some translation, our argument (really, a fairly small subset thereof) then specializes to essentially what’s done in [6]. Set δ = .03ε and let β behalf theβ of Lemma3and Corollary 4. We use the corollary to find a rooted tree each of whose internal nodes has degree T (number of children) 2 or 3, together with disjoint subsets S ,S ,...,S of 1 2 m (the arc set of) T, corresponding to the internal nodes of . The nodes of T will be subsets of [n] (so the size, U , of a node U is its size as a set). T | | To construct , start with root V = [n] and repeat the following for 1 T k = 1,... until each unprocessed node has size less than (say) t := √n. Let V be an unprocessed node of size at least t and apply Corollary 4 to k T[V ] to produce a partition V = L R W , with L , R > β V and k k k k k k k k ∪ ∪ | | | | | | S := T (L R ) δ-regular of density at least 1/2. (Note (4) says we can k k k ∩ × reverse the roles of L and R if the density of T (L R ) is less than k k k k ∩ × 1/2.) Add L ,R ,W to as the children of V and mark V “processed.” k k k k k T (Note the V ’s are the internal nodes of ; nodes of size less then t are not k T processedandareautomatically leaves. Notealsothatthereisnorestriction on W and that, for k > 1, V is equal to one of L , R , W for some i< k.) k k i i i | | Let m be the number of internal nodes of (the final tree). Note that T the leaves of have size at most t and that the S ’s satisfy (ii) and (iii) of i T the proof sketch; that they also satisfy (i) is shown by the next lemma. Set Λ = m V ; Pi=1| i| this quantity will play a central role in what follows. Lemma 5. Λ 1nlog n ≥ 2 3 Proof. This will follow easily from the next general (presumably known) observation, for which we assume is a tree satisfying: T the nodes of are subsets of S, an s-set which is also the root of ; • T T the children of each internal node U of form a partition of U with • T at most b blocks; the leaves of are U ,...,U , with U = u t (any t) and depth d . 1 r i i i • T | | ≤ Lemma 6. With the setup above, u d slog (s/t). P i i b ≥ (Of course this is exact if is the complete b-ary tree of depth d and all T leaves have size 2−bs). 5 Proof. Recall that the relative entropy between probability distributions p and q on [r] is D(p q) = Xpilog(qi/pi) 0 k ≤ (the inequality given by the concavity of the logarithm). We apply this with p = u /s and q the probability that the ordinary random walk down the i i i tree ends at u . In particular q b−di, which, with nonpositivity of D(p q) i i ≥ k and the assumption u t, gives i ≤ X(ui/s)dilogb X(ui/s)log(1/qi) ≥ X(ui/s)log(s/ui) log(s/t). ≥ ≥ The lemma follows. This gives Lemma 5 since V = U d(U), with U ranging over P| i| PU | | leaves of (and d() again denoting depth). T · Lemma 7. The number m of internal nodes of is less than n. T Proof. A straightforward induction shows that the number of leaves of a rooted tree is 1+ (b(w) 1), where w ranges over internal nodes and b P − denotes number of children. The lemma follows since here the number of leaves is at most n (actually at most 3√n) and each d(w) is at least 2. Recalling that A = σ S : fit(σ,S ) εS and that E refers to P, i n i i { ∈ ≥ | |} we have E[fit(σ,S )] 2εS , which with i i ≥ | | E[fit(σ,S )] P(A )S +(1 P(A ))εS (P(A )+ε)S i i i i i i i ≤ | | − | | ≤ | | gives P(A ) ε (essentially Markov’s Inequality applied to S fit(σ,S )). i i i ≥ | |− Set ξ = V 1 and ξ = ξ , and let Q be the event ξ εΛ/2 . i | i| Ai Pi i { ≥ } Then E[ξ ] = V P(A ) εV , implying E[ξ] = E[ξ ] εΛ, and (since i i i i P i ξ V ) ξ Λ|; s|o using≥Ma|rk|ov’s Inequality as above giv≥es P(Q) ε/2. i i ≤| | ≤ ≥ Thus, with σ chosen from S according to P, we have n H(σ) 1+(1 P(Q))logn!+P(Q)log Q ≤ − | | = 1+logn!+P(Q)logµ(Q) 1+logn!+(ε/2)logµ(Q) (6) ≤ (recall µ is the uniform measure on S ). n Let = I [m]: V εΛ/2 J { ⊆ Pi∈I| i|≥ } 6 and, for I , let A = A . Set I i∈I i ∈ J ∩ b = ε2δβ3/33 (7) (see (12) for the reason for the choice of b). We will show, for each I , ∈ J µ(A ) e−bεΛ/2, (8) I ≤ which implies logµ(Q)= logµ( A ) log (bεΛloge)/2 n (bεΛloge)/2, I∈J I ∪ ≤ |J|− ≤ − thesecondinequalityfollowingfrom 2m togetherwithLemma7. With |J| ≤ c = ε3δβ3/150 < (bεlog e)/4, this bounds (for large n) the r.h.s. of (6) by 3 (1 εc/2)logn!, − which proves Theorem 1 with ϑ = ε4δβ3/300 = exp[ ε−O(1)]. − The rest of our discussion is devoted to the proof of (8). For a digraph D L R with L,R disjoint subsets of V, say a pair (X,Y) of disjoint ⊆ × subsets of [n] with X = L , Y = R is safe for D if | | | | | | | | fit(τ,D) < εL R /4 (9) | || | for every bijection τ : L R X Y with τ(L) = X (where fit(τ,D) has ∪ → ∪ the obvious meaning). We also say σ S is safe for D if (σ(L),σ(R)) n ∈ is. Note that since S has density at least 1/2 in L R , the σ’s in A are i i i i × unsafe for S . i Lemma 8. Assume the above setup with L + R = l and L = γl, and set | | | | | | λ = 2δ and ζ = εδγ(1 γ)/4. Let I I be the natural partition of 1 r − ∪···∪ X Y into intervals of size λl. If D is δ-regular and ∪ X I = (γλ ζ)l j [r], (10) j | ∩ | ± ∀ ∈ then (X,Y) is safe for D. (OfcourseanintervalofZ = i < < i isoneofthesets i ,...,i .) 1 u s s+t { ··· } { } Proof. Forτ asinthelineafter(9),letL = L τ−1(I )andR = R τ−1(I ) j j j j ∩ ∩ (j [r]). Then ∈ fit(τ,D) X D (Li Rj) D (Lj Ri) +γ(1 γ)λl2. (11) | | ≤ || ∩ × |−| ∩ × || − 1≤i<j≤r 7 Here the last term is an upper bound on the contribution of pairs contained in the I ’s: if L = γ I = γ λl (so R = (1 γ )λl and γ = γ/λ), j j j j j j j P j | | | | | | − then γ (1 γ ) γ ( γ )2/r = (γ γ2)/λ P j j P j P j − ≤ − − gives L R = γ (1 γ )λ2l2 γ(1 γ)λl2. P j j P j j | || | − ≤ − Ontheotherhand,regularityand(10)(whichimplies L > δ L (= δγl) i | | | | since γλ ζ > γδ, and similarly R > δ R ) give, for all i = j, i − | | | | 6 D (L R ) = (d δ)L R , i j i j | ∩ × | ± | || | where d is the density of D. Combining this with (10) bounds each of the summands in (11) by [(d+δ)(γλ+ζ)((1 γ)λ+ζ) (d δ)(γλ ζ)((1 γ)λ ζ)]l2 − − − − − − = 2[λζd+δ(γ(1 γ)λ2+ζ2)]l2 − and the r.h.s. of (11) by 2 r [λζd+δ(γ(1 γ)λ2+ζ2)]+γ(1 γ)λ l2 < εγ(1 γ)l2/4. (cid:8) (cid:0)2(cid:1) − − (cid:9) − (The main term on the l.h.s. is theone with λζd, which, since r−1 = λ = 2δ, is less than half ther.h.s. Thesecond and thirdterms are much smaller (the second since δ is much smaller than ε).) Corollary 9. For D and parameters as in Lemma 8, and σ uniform from S , n Pr(σ is unsafe for D)< 2rexp[ 2ζ2l/λ]. − Proof. Let (X,Y) = (σ(L),σ(R)). Once we’ve chosen X Y (determining ∪ I ,...,I ), 2exp[ 2ζ2l/λ] is the usual Hoeffding bound [3, Eq. (2.3)] on the 1 r − probability that X violates (10) for a given j. (The bound may be more familiar when elements of X Y are in X independently, but also applies ∪ to the hypergeometric r.v. X I ; see e.g. [4, Thm. 2.10 and (2.12)].) j | ∩ | Proof of (8). Let B = σ S : σ is unsafe for S i n i { ∈ } and B = B . Then A B (as noted above) and (therefore) A B . I i∈I i i i I I ∩ ⊆ ⊆ Moreover—perhaps the central point—the B ’s are independent, since B i i depends only on the relative positions of σ(L ) and σ(R ) within σ(V ). i i i 8 Ontheotherhand,Corollary9,appliedwithD = S (soL = L ,R = R , i i i l = L + R and γ = L /l (β,1 β)) gives i i i | | | | | | ∈ − Pr(B ) < 2rexp[ 2ζ2l/λ]< 2rexp[ ε2δβ2l/64] i − − < 2rexp[ ε2δβ3 V /32] < e−b|Vi|. (12) i − | | (Recall b was defined in (7); since we assume V is large (V > t = √n), i i | | | | the choice leaves a little room to absorb the 2r.) And of course (12) and the independence of the B ’s give (8). i 4 Back to the transitive case Theorem 2 is an easy consequence of the next observation. Lemma 10. Let Y a random m-subset of [2m] satisfying E (a,b) : a < b,a [2m] Y,b Y > (1 +ε)m2. (13) |{ ∈ \ ∈ }| 2 Then H(Y) < (1 ε2/8)2m. − To get Theorem 2 from this, let T = ab : a < b and, for simplicity, { } n = 2k, and decompose T = (L R ) as in (5). For each i, say with S i i × L (= R ) = m , let Y [2m ] consist of the indices of positions within i i i i i | | | | ⊆ σ(L R ) occupied by σ(R ); that is, if σ(L R ) = j < < j , i ∪ i i i ∪ i { 1 ··· 2mi} then Y = l : j σ(R ) . Then Lemma 10 (its hypothesis provided by i l i { ∈ } (1)) gives H(Y ) (1 ε2/8)2m ; i i ≤ − so, since σ is determined by the Y ’s, we have i H(σ) H(Y ) (1 ε2/8) (2m )= (1 ε2/8)nlogn. P i P i ≤ ≤ − − Remark. Note that the Ω(ε2) of Theorem 2 is the best one can do without more fully exploiting (1) (that is, beyond (13) for the (L ,R ,Y )’s, which is i i i all we are using). Proof of Lemma 10. For a [2m], set P(a Y)= 1/2+δ . Then a ∈ ∈ H(Y) H(1/2+δ ) (1 2δ2) ≤ Pa a ≤ Pa − a (where the 2 could actually be 2loge); so it is enough to show δ2 ε2m/8. P a ≥ 9 For a given m-subset Y of [2m], we have f(Y) := (a,b) :a < b,a [2m] Y,b Y |{ ∈ \ ∈ }| = (b 1) m = b m+1 . Pb∈Y − −(cid:0)2(cid:1) Pb∈Y −(cid:0) 2 (cid:1) (the first sum counts pairs (a,b) with a < b and b Y, and m is the ∈ (cid:0)2(cid:1) number of such pairs with a also in Y); so we have (1 +ε)m2 < Ef(Y)= (1 +δ )b m+1 = δ b+m2/2, 2 P 2 b −(cid:0) 2 (cid:1) P b implying δ b > εm2. Combining this with 2m δ δ b, we have P b Pδb>0 b ≥ P b δ > εm/2 and then, using Cauchy-Schwarz, Pδb>0 b δ2 δ2 1 (εm/2)2 = ε2m/8. P b ≥ Pδb>0 b ≥ 2m References [1] W. Fernandez de la Vega, On the maximal cardinality of a consistent set of arcs in a random tournament, J. Comb. Th. Series B (1983), 328-332. [2] P. Erd˝os and J. Moon, On sets of consistent arcs in a tournament, Canad. Math. Bull. 8 (1965), 269-271. [3] W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statistical Assoc. 58 (1963), 13-30. [4] S. Janson, T. L uczak and A. Rucin´ski, Random Graphs, Wiley, New York, 2000. [5] J. Komlo´s and M. Simonovits, Szemer´edi’s regularity lemma and its applications in graph theory, Combinatorics, Paul Erdo˝s is eighty, Vol. 2 (Keszthely, 1993), 295-352, Bolyai Soc. Math. Stud. 2, Ja´nos Bolyai Math. Soc., Budapest, 1996. [6] T. Leighton and A. Moitra, On Entropy and Extensions of Posets, manuscript 2011. http://people.csail.mit.edu/moitra/docs/poset.pdf. [7] E. Szemer´edi, Regular Partitions of Graphs, pp. 399-401 in Probl´emes Combinatoires et Th´eorie des Graphes (Colloq. Internat. CNRS, Univ. Orsay, Orsay, 1976), Paris: E´ditions du Centre National de la Recherche Scientifique (CNRS), 1978. 10