Perfect and separating Hash families: new bounds via the algorithmic cluster expansion local lemma 6 1 0 Aldo Procacci and Remy Sanchis 2 c e Departamento deMatema´tica UFMG 30161-970 - Belo Horizonte - MG - Brazil D e-mails: [email protected],[email protected] 9 ] Abstract O Wepresentnewlowerboundsforthesizeofperfect andseparatinghash C families ensuring their existence. Such new bounds are based on the al- h. gorithmic cluster expansion improved version of the Lova´sz Local Lemma, t whichalsoimpliesthattheMoser-Tardosalgorithm findssuchhashfamilies a in polynomial time. m Keywords: Hashfamilies,algorithmicLova´szLocalLemma,hard-corelatticegas. [ MSC numbers: 05D40,68W20, 82B20,94A60. 2 v 9 1 Introduction and results 8 3 5 In this initial section we will review rapidly the state of the art of the Lov´asz 0 Local Lemma, a powerful tool in the framework of the probabilistic method in . combinatorics, focusing specifically on the recent cluster expansion improvement 1 0 of the Moser-Tardos algorithmic version of the Lemma. We then will recall the 6 main results in the literature concerning Perfect Hash Families and Separating 1 Hash Families. Finally we will present the results of the paper. : v Xi 1.1 Lov´asz Local Lemma: state of the art r The Lov´asz Local Lemma (LLL) was originally formulated by Erdo¨s and Lov´asz a in [8] and since then it has turned out to be one of the most powerful tools in the frameworkofthe probabilisticmethodincombinatoricstoprovethe existence of combinatorial objects with certain desirable properties. The philosophy of the Lemmaisbasicallytoconsideracollectionof“bad”eventsinsomesuitablydefined probabilityspacewhoseoccurrence,evenofjustoneofthem,preventstheexistence of a certain “good” event (i.e. the combinatorial object under analysis). Then the Lemma provides a sufficient condition which, once satisfied, guarantees that there is a strictly positive probability that none of the bad events occurs (so that the good event exists). Such sufficient condition can be inferred from the so- calleddependency graph ofthe collectionofevents. We remindthata dependency 1 graph for a collection of random events B is a (simple and undirected) graph G with vertex set B such that each event B ∈ B is independent from the σ-algebra generatedbythecollectionofeventsB\Γ∗(B)whereΓ∗(B)=Γ (B)∪{B},with G G G Γ (B) denoting the neighborhood of B in G, i.e. the set of vertices of G which G are connected to the vertex B by an edge of G. TheconnectionbetweentheLLLandtheclusterexpansionoftheabstractpolymer gas, implicitly implied by an old paper by Shearer [15], has been sharply pointed out in [14] by Scott and Sokal who also showed that the LLL (with dependency graph G) can be viewed as a reformulation of the Dobrushin criterion [7] for the convergence of the cluster expansion of the hard-core lattice gas (on the same graph G). In a later paper [9] Ferna´ndez and Procacci improved the Dobrushin criterion and this has then been used straightforwardly by Bissacot et al. in [3] to obtain a correspondent improved cluster expansion version of the LLL (shortly CLLL). Suchnew versionofthe LLL hasbeenalreadyimplementedto getnew bounds on several graph coloring problems (see [12] and [5]). As the original Lov´asz Local Lemma by Erdo¨s-Lov´asz, the improved cluster ex- pansion version by Bissacot et al. given in [3] is “non-constructive”, in the sense thatitclaimstheexistenceofacertaineventwithoutexplicitlyexhibitingit. Nev- ertheless, an algorithmic version of the CLLL, based on a breakthrough paper by Moser and Tardos [11], has been recently provided in [13] and [1]. 1.1.1 Moser Tardos setting (general case) In the Moser Tardos setting all events in the collection B depend on a finite family V ofmutuallyindependent randomvariablewithΩ being the sample space determinedbythesevariablessothataoutcomeω ∈Ωisjustarandomevaluation of all variables of the family V. Each event B ∈B is supposed to depend only on somesubsetofthevariablesV,denotedbyvbl(B). SincevariablesinV areassumed tobemutuallyindependent,anytwoeventsB,B′ ∈Bsuchthatvbl(B)∩vlb(B′)= ∅ are necessarily independent. Therefore the family B has a natural dependency graph, i.e. the graph G with vertex-set B and edge-set constituted by the pairs {B,B′}⊂B such that vbl(B)∩vbl(B′)6=∅. In this setting Moser and Tardos define the following random algorithm, whose output, when (and if) it stops, is an evaluation of the variables of the family V (i.e. an outcome ω ∈Ω) which avoids all the events in the collection B. MT-Algorithm (gereral case). - Step 0: Sample all random variables in the family V. Let ω ∈Ω be the output. 0 For k ≥1 - Step k: a) Take ω ∈Ω and check all bad events in the family B. k−1 b) i) If some bad event occurs, choose one, say B, and resample its variables vbl(B) leaving unchanged the remaining variables. ii) If no bad event occurs, stop the algorithm. Let ω ∈Ω be the output. k 2 We are now in a position to state the algorithmic versionof the CLLL, which will be the basic tool to get our results on perfect and separating hash families. We remind that an independent set in a graph G is a set of vertices of G no two of which are connected by an edge of G. Theorem 1.1 [Algorithmic CLLL] Given a finite set V of mutually indepen- dent random variables, let B be a finite set of events determined by these variables with naturaldependency graph G. Let µ={µ } be asequenceofreal numbers B B∈B in [0,+∞). If, for each B ∈B, µ Prob(B) ≤ B µB′ Y⊆XΓ∗G(B) BY′∈Y Y independentinG thentheMT-algorithm reachesanassignmentofvaluesofthevariables V suchthat none of the events in B occurs. Moreover the expected total number of resampling steps made by the MT-algorithm to reach this assignment is at most µ . B∈B B The proof of Theorem 1.1 can be found in [13] and [1]. P 1.2 Perfect Hash Families and Separating Hash Families Given a finite set U we denote by |U| its cardinality. Given an integer k, we denote shortly [k] = {1,2,...,k}. A collection of sets {W ,...,W } such that 1 k W ∩W =∅ for all {i,j}⊂[k] will be called hereafter a “disjoint family”. i j Let n,w be integers such that 2 ≤ w ≤ n. We denote by P ([n]) the set of w all subsets of [n] with cardinality w. Given s,w ,w ,...,w integers such that 1 2 s s w =w,wedenotebyP∗([n])thesetwhoseelementsarethedisjointfamilies i=1 i w S ={W ,...,W } such that W ⊂[n] and |W |=w for i=1,...,s. 1 s i i i P LetAbeaN×nmatrix. GivenW ∈P ([n])wedenotebyA| theN×wmatrix w W formedbythew columnsofthe matrixAwithindices inW. Analogously,givena disjoint family S ={W ,...,W }∈P∗([n]), we denote by A| the N ×w matrix 1 s w S formed by the w columns of the matrix A with indices in W ∪···∪W . 1 s Perfect hash family. Let X and Y be finite sets with cardinality |X| = n and |Y|=m. Letw ∈N suchthat 2≤w ≤n. Thena perfect hashfamily of size N is a sequence f ,...,f of functions from X to Y such that for any subset W ⊂ X 1 N withcardinality|W|=w thereexistsi∈{1,...,N}suchthatf isinjective when i restricted to W. Such perfect hash family will be denoted by PHF(N;n,m,w). A perfect hash family PHF(N;n,m,w) is usually viewed as a matrix A with N rows and n columns, with entries in the set of integers [m] ≡ {1,2,...,m} such that for any setW ∈P ([n]), the N×w matrix A| formedby the w columns of w W the matrix A with indices in W has at least one line with distinct entries. Separating hash family. Given X and Y finite sets with cardinality |X| = n and |Y|=mandtheintegersw ,...,w suchthatw=w +···+w ≤n,aseparating 1 s 1 s hash family of size N is a sequence f ,...,f of functions from X to Y such 1 N 3 that for all disjoint families of subsets {W ,...,W } of X such that |W | = w 1 s j j (j = 1,...,s), there exists i ∈ {1,...,N} such that {f (W ),...,f (W )} is a i 1 i s disjoint family of subsets of Y. A separating hash family SHF(N;n,m,{w ,...,w }) can be viewed as a matrix 1 s Awith N rowsandn columns,withentries inthe setofintegers[m]suchthatfor any disjoint family S = {W ,...,W } ∈ P∗([n]), the N ×w matrix A| formed 1 s w S by the w columns of the matrix A with indices in W ∪···∪W has at least one 1 s linewhich“separateA| ,...,A| ”,i.e.,foranyunorderedpair{r,r′}⊂[s],the W1 Ws entriesofthislinebelongingtoA| aredifferentfromtheentriesofthesameline Wr belonging to A| . Wr′ Theprobabilisticmethodhasbeenalreadyusedseveraltimesinthepasttofacethe problemoftheexistenceofperfectandseparatedhashfamilies. Inparticularlower bounds for N, ensuring the existence of a perfect hash family with fixed values n,m,w have been first obtained by Mehlhorn in [10] using standard techniques of the probabilistic method. The Local Lov´asz Lemma has been subsequently usedby Blackburn[4] to improvethe Mehlhornbound. Inthe same year,another technique in the frameworkof the probabilistic method, the so-calledexpurgation method, has been used to get alternative bounds for perfect hash families [16]. Later the Lov´asz Local Lemma has also been used in [6] to get similar bounds alsoforseparatinghashfamilies. Inthesamepaper[6]the authorsalsooutlineda comparisonbetweentheLLLandtheexpurgationmethodforperfecthashfamilies suggesting that the expurgation method yields better bounds than the LLL. In a related paper [17] an alternative technique still based on the expurgationmethod has been used to obtain new lower bounds for N, for fixed values n,m,{w ,w }, 1 2 guaranteeing the existence of separating hash families. We finally mention that there have been also several results regarding upper bounds for N ensuring the non-existenceofSeparatingandHashfamilies(see,e.g.,[2]andreferencestherein) 1.3 Results Weconcludethisintroductorysectionbypresentingourmainresultswhichconsist in new bounds for perfect hash families and separating hash families. Our first result concerns a lower bound for perfect hash families. Theorem 1.2 Let N,n,n be integers and let w be integer such that 2 ≤ w ≤ n. Then there exists a perfect hash family PHF(N;n,m,w) as soon as ln[ϕ′ (τ)]+(w−1)ln(n−w)−ln(w−1)! N ≥ w,n (1.1) ln(mw)−ln mw−w! m w where τ is the first positive solution of the(cid:0)equation ϕ(cid:0) (cid:1)(cid:1)(x)−xϕ′ (x)=0 and w,n w,n ⌊n/w⌋∧w w ϕ (x)=1+ Γ˜ (w,n)xk (1.2) w,n k k k=1 (cid:18) (cid:19) X 4 with w−k w−k k(w−1)−j−1 ℓ w j Γ˜ (w,n)= 1− × k j n−w n−w j=0 (cid:18) (cid:19) ℓ=1 (cid:18) (cid:19)(cid:20) (cid:21) X Y j! k 1 iℓ s × 1− (1.3) i !...i ! i +1 w 1 k " l # i1+·X··+ik=j ℓY=1 sY=1(cid:16) (cid:17) is≥0 Moreover the MT-algorithm (described in Sec. 1.3.1 below) finds such perfect hash family PHF(N;n,m,w) in an expected time which is polynomial in the input pa- rameters N, n and m for any fixed w. The second result concerns a similar lower bound for separating hash families. To state this result we need to introduce the following definition. Given a multi- set w ,...,w of integers such that w + ··· + w = w, we denote by m the 1 s 1 s p multiplicity of the integer p∈{1,2,...,w} in the multi-set w ,...,w , i.e. m = 1 s p s . i=11{wi=p} PTheorem 1.3 LetN,n,nbeintegersandletw bean integersuchthat2≤w ≤n. Let s≥2 and let {w ,...,w } be a family of integers such that w +···+w =w. 1 s 1 s Then there exists a separating hash family SHF(N;n,m,{w ,...,w }) as soon as 1 s ln[ϕ′ (τ)]+(w−1)ln(n−w)−ln(w−1)!+ln(m ) N ≥ w,n w (1.4) ln 1 q (cid:16) (cid:17) where ϕ′ (τ) is the same number introduced in Theorem 1.2, w,n 1 w! m = (1.5) w w m !w !···w ! p=1 p 1 k and Q π (m) q =1− Gs mw withπ (m)beingthechromaticpolynomial ofthecompletes-partitegraphG with Gs s w ,...w vertices. Moreover the MT-algorithm (described in Sec. 1.3.1 below) 1 s finds such separating hash family SHF(N;n,m,{w ,...,w }) in an expected time 1 s which is polynomial in the input parameters N, n and m for any fixed w. As claimed in the abstract, we will use the algorithmic version of the CLLL, i.e., Theorem1.1,to proveTheorems1.2 and1.3. Letus thus conclude this sectionby describing how to adapt the Moser-Tardos setting and the MT-algorithm to the case of Perfect and separated hash families. 1.3.1 Moser Tardos setting for (perfect [separating] hash families) InthepresentcaseofHashfamilies. thefinitefamilyV oftheMoser-Tardossetting isconstitutedbyasetofNnmutuallyindependentrandomvariabletakingvalues 5 in the set [m] according to the uniform distribution and representing the possible entries of a N ×n matrix. The sample space generated by the family V is thus Ω=[m]N×n and an outcome in Ω is a N ×n matrix A. The bad events. For each W ∈P ([n]) [for each S ={W ,...,W }∈P∗([n])], let w 1 s w E betheeventsuchthatinanylineofA| atleasttwoentriesareequal[letE W W S be the event such that for any line of A|S =A|∪sr=1Wr there is a pair {r,r′}⊂[s] such that two entries of this line, one in A| and the other in A| , are equal]. Wr Wr′ We have thus a family W ≡ {EW}W∈Pw([n]) [a family S ≡ {ES}S∈Pw∗([n])] of bad events containing n members [containing n m members, with m defined in w w w w (1.5)]. If A is a sampled matrix such that no bad event of the family W [S] (cid:0) (cid:1) (cid:0) (cid:1) occurs, then for every W ⊂ P ([n]) [for every S ∈ P∗([n])] at least one line of w w A| [ofA| ] has distinct entries [separatesS ={W ,...,W }],that is to sayA is W S 1 s a PHF(N;n,m,w) [SHF(N;n,m,{w ,...,w })]. 1 s We are now in the position to outline the MT-algorithm for PHF [SHF] MT-Algorithm (for PHF [for SHF]). - Step 0: Pickanevaluationall Nn variablesof the family V (the matrixentries) Let A be the output matrix. 0 For k ≥1 - Step k: a) Take the matrix A and check all bad events of the family W k−1 [of the family S]. b) i) If some bad event occurs, choose one, say E [E ], and take a new W S random evaluation of the entries of A | [of the entries of A | ] k−1 W k−1 S leaving unchanged the remaining entries of A . k−1 ii) If no bad event occurs, stop the algorithm. Let A be the output matrix. k Note that when the algorithm stops the output matrix is a PHF(N;n,m,w) [the outputmatrixisaSHF(N;n,m,w)]. Wewillseeinthenextsectionthisalgorithm stops after an expected number of steps equal to n . w The rest of the paper is organized as follows. In(cid:0)Se(cid:1)ction 2 we give the proofs of Theorems 1.2 and 1.3. Finally, in Section 3 we discuss some comparisons with previous bounds given in the literature. 2 Proofs of Theorems 1.2 and 1.3 2.1 Proof of Theorem 1.2 LetusapplyTheorem1.1forthefamilyofeventsW ={E } introduced W W∈Pw([n]) in the previous section (Sec. 1.3.1). Clearly two events EW,EW′ ∈ W are inde- pendentifW∩W′ =∅. Thereforethedependency graphGofthefamilyofevents {E } can be identified with the graph whose vertices are the elements W W∈Pw([n]) W of P ([n]), i.e. the subsets W of [n] with cardinality w, and two vertices W w and W′ of G are connected by an edge of the dependency graph G if and only if 6 W ∩W′ 6=∅. Thus the neighborhood of W in G is the set Γ (W)={W′ : W′ ∈P ([n]) and W′∩W 6=∅} G w The probability of an event E is W [mw−m(m−1)···(m−w+1)]N mw−w! m N P(E ) = = w W mwN mwN (cid:2) (cid:0) (cid:1)(cid:3) and, according to Theorem 1.1, the MT-algorithm (as it was described in Section 1.3.1) finds a perfect hash family PHF(N;n,m,w) if, for some µ>0 mw−w! m N µ P(E ) = w ≤ W (2.1) W (cid:2) mwN(cid:0) (cid:1)(cid:3) µW′ Y⊆XΓ∗G(W) WY′∈Y Y independentinG We set µ =µ for all W ∈P ([n]) so that W w ⌊n/w⌋∧w µW′ = µ|Y| = 1+ Γk(w,n)µk Y⊆XΓ∗G(W) WY′∈Y Y⊆XΓ∗G(W) Xk=1 Y independentinG Y independentinG where Γ (w,n)= 1 (2.2) k Y⊂Γ∗G(XW):|Y|=k, YindependentinG Hence (2.1) rewrites mw−w! m N µ P(E ) = w ≤ (2.3) W mwN ⌊n/w⌋∧w (cid:2) (cid:0) (cid:1)(cid:3) 1+ Γ (w,n) µk k k=1 X Let us now calculate explicitly the number Γ (w,n) defined in (2.2). We have: k 7 1 w! n−w n−w−(w−i ) Γ (w,n)= 1 ··· k k! i !i !···i ! w−i w−i i0+i1+X···+ik=w 0 1 k (cid:18) 1(cid:19)(cid:18) 2 (cid:19) is≥1, s≥1 n−w−(w−i )−...−(w−i ) 1 k−1 ··· w−i (cid:18) k (cid:19) 1 1 kw−k w! k w (n−w)! = k! w! i ! i (n−kw−i )! (cid:18) (cid:19) iX0=0 0 i1+···+Xik=w−i0Yl=1(cid:18) l(cid:19) 0 is≥1 w (n−w)w−1 kw−k w−k ℓk=(w1−1)−j−1 1− n−ℓw = × k w! j Q (n−w(cid:16))j (cid:17) (cid:18) (cid:19)(cid:20) (cid:21) j=0 (cid:18) (cid:19) X k w × j! i +1 i1+·X··+ik=jYl=1(cid:18) l (cid:19) is≥0 w (n−w)w−1 kw−k w−k ℓk=(w1−1)−j−1 1− n−ℓw = × k w! j Q (n−w(cid:16))j (cid:17) (cid:18) (cid:19)(cid:20) (cid:21) j=0 (cid:18) (cid:19) X k j! w! × i !···i ! (i +1)(w−i −1)! i1+·X··+ik=j 1 k Yl=1(cid:20) l l (cid:21) is≥0 w (n−w)w−1 kw−k w−k w j k(w−1)−j−1 ℓ = 1− × k (w−1)! j n−w n−w (cid:18) (cid:19)(cid:20) (cid:21) j=0 (cid:18) (cid:19)(cid:20) (cid:21) ℓ=1 (cid:18) (cid:19) X Y j! k il (1− s) × s=1 w . i !···i ! (i +1) i1+·X··+ik=j 1 k Yl=1"Q l # is≥0 I.e. we get w (n−w)w−1 k Γ (w,n)= Γ˜ (w,n), (2.4) k k (w−1)! k (cid:18) (cid:19)(cid:20) (cid:21) with Γ˜ (w,n) given by (1.3). Therefore, setting α = (n−w)w−1 µ the condition k (w−1)! (2.3) becomes N (n−w)w−1 mw−w! m α 1 w ≤max = (2.5) (w−1)! " mw (cid:0) (cid:1)# α>0 ϕw,n(α) ϕ′w,n(τ) where ⌊n/w⌋∧w w ϕ (α)=1+ Γ˜ (w,n)αk w,n k k k=1 (cid:18) (cid:19) X 8 and τ is the first positive solution of the equation ϕ(x)−xϕ′(x) =0. Taking the logarithm on both sides of (2.5), condition (2.3) is thus implied by the following inequality A (w) N ≥ n (2.6) D (w) m where A (w) = ln[ϕ′ (τ)]+(w−1)ln(n−w)−ln(w−1)! (2.7) n w,n and m D (w) = ln(mw)−ln mw−w! (2.8) m w (cid:18) (cid:18) (cid:19)(cid:19) Inconclusiononce(2.6)issatisfiedthenalso(2.1)issatisfiedandtherefore,accord- ingtoTheorem1.1,aPHF(N;n,m,w)existsandtheMT-algorithm(asdescribed in Section 1.3.1) finds it in an expected number of steps n n µ=|P ([n])|µ= µ≤ w w w W∈XPw([n]) (cid:18) (cid:19) (cid:18) (cid:19) wherethelastinequalityfollowsfromthefactthattheoptimumµwhichmaximize the r.h.s. of (2.3) is surely less than one. Now, at each step k ≥ 1 of the MT-algorithm described in section 1.3.1, in order to check in item a) whether or not a bad event of the family {E } occurs, W Pw([n] we need to consider all the N lines of (at worst) all the n matrices A| with w W W ∈P ([n], and for each line of a given matrix A| we need to compare all pair w W (cid:0) (cid:1) of entries of the line to check whether they are equal or not. This is done in (at most) N w n operations. 2 w This concludes the proof of Theorem 1.2. (cid:0) (cid:1)(cid:0) (cid:1) 2.2 Proof of Theorem 1.3 We first recall that, for a fixed sequence of integers w ,w ,...,w such that w = 1 2 s w +...w ≤ n, P∗([n]) is the set whose elements are the disjoint families S = 1 s w {W ,...,W } of subsets of [n] with cardinality w ,...,w resp. and A| is the 1 s 1 s S N ×w matrix formed by the w columns of the matrix A with indices in s W . i=1 i Given two disjoint families S = {W ,...,W } and S′ = {W′,...,W′}, we also denote shortly S∩S′ =. ( s W )∩1( s Wk′). 1 k S i=1 i i=1 i Let us apply Theorem 1.1Sfor the famSily of events S = {ES}S∈Pw∗([n]) introduced in Section 1.3.1. For S ={W ,...,W }∈P∗([n]), the probability of the event E is given by 1 s w S N P(E )= P (S) S i i=1 Y where P (S)is the probabilitythatthe line i ofthe matrixA| do notseparateS. i S To calculate P (S) just observe that i 9 #favorable cases #unfavorable cases P (S)= =1− = i #number of all cases #number of all cases Setting w =w +w +···+w we have that 1 2 s #number of all cases=mw To count the number of unfavorable cases, consider the complete s-partite graph G with vertex set W and independent sets of vertices set W ,...W . Then s 1 s #unfavorable cases = #proper colorings of G with m colors = π (m) s Gs where π (m) is the chromatic polynomial of the graph G . Thus Gs s P (S)=1− πGs(m) =. q i mw and therefore P(E )=qN S As before two events ES,ES′ ∈ S are independent if S ∩S′ = ∅. Therefore The dependencygraphforthefamilyofeventsS ={ES}S∈Pw∗([n])canbeidentifiedwith the graphGwithvertexsetP∗([n]) suchthattwoverticesS ={W ,...,W }and w 1 s S′ = {W′,...,W′} are connected by an edge of G if and only S∩S′ 6= ∅ (where 1 s. recallthatS∩S′ =( s W )∩( s W′)). ThisimpliesthattheneighborΓ∗(S) i=1 i i=1 i G of a vertex S of G is given by S S Γ∗(S)={S′ :S′ ∈P∗([n]) and S′∩S 6=∅} G w By Theorem 1.1, the Moser-Tardos algorithm (as described in sec. 1.3.1) finds a separating hash family SHF(N;n,m,{w ,...,w }) if the following condition is 1 s satisfied: there exists ν >0 such that ν qN ≤ (2.9) ν|Y| Y⊆XΓ∗G(S) Y independentinG Notethat,aswedidintheprevioussection,wehavesetµ =ν forallS ∈P∗([n]). S w The denominatorofthe r.h.s. of(2.9)canbe evaluatedsimilarlyas we didfor the case of perfect hash families. Indeed, given a disjoint family S = {W ,...,W }, 1 s the neighbor of S in G is formed by all vertices S′ = {W′,...,W′} such that 1 s ( s W ) ( s W′)6=∅. The only thing that changes respect to the calcula- r=1 r r=1 r tions done for case of the perfect hash families is that now, fixed a set of columns S T S W with cardinality w = w + ...w , the number of different disjoint families 1 s S = {W ,...,W } such that W = s W and |W | = w for r = 1,...,s is 1 s r=1 r r r given by the quantity m defined in (1.5). Therefore we have w S ⌊n/w⌋∧w ν|Y| =1+ Γ (w,n)(m ν)k k w Y⊆XΓ∗G(S) Xk=1 Y independentinG 10