ebook img

From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices PDF

2.7 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices

FROM THE LITTLEWOOD-OFFORD PROBLEM TO THE CIRCULAR LAW: UNIVERSALITY OF THE SPECTRAL DISTRIBUTION OF RANDOM MATRICES TERENCE TAO AND VAN VU Abstract. Thefamouscircularlaw assertsthatifM isann×n n 9 matrix with iid complex entries of mean zero and unit variance, 0 then the empirical spectral distribution (ESD) of the normalized 20 matrix √1nMn converges both in probability and almost surely to the uniform distribution on the unit disk {z ∈C:|z|≤1}. After n alongsequenceofpartialresultsthatverifiedthislawunderaddi- a tional assumptions on the distribution of the entries, the circular J law is now known to be true for arbitrary distributions with mean 1 zero and unit variance. In this survey we describe some of the key ] ingredients used in the establishment of the circular law at this R level of generality, in particular recent advances in understanding P the Littlewood-Offord problem and its inverse. . h t a m [ 1. ESD of random matrices 3 v 4 For an n×n matrix A with complex entries, let 9 n 9 n 1 (cid:88) 2 µ := δ . An n λi 0 i=1 1 8 be the empirical spectral distribution (ESD) of its eigenvalues λ ∈ i 0 C,i = 1,...n (counting multiplicity), thus for instance : v 1 i µ ({z ∈ C|Rez ≤ s;Imz ≤ t}) = |{1 ≤ i ≤ n : Reλ ≤ s;Imλ ≤ t}| X An n i i r a for any s,t ∈ R (we use |A| to denote the cardinality of a finite set A), and (cid:90) n 1 (cid:88) f dµ = f(λ ) An n i i=1 for any continuous compactly supported f. Clearly, µ is a discrete An probability measure on C. 1991 Mathematics Subject Classification. 15A52, 60G50. T.TaoissupportedbyNSFgrantCCF-0649473andagrantfromtheMacArthur Foundation. V. Vu is is supported by NSF Career Grant 0635606. 1 2 TERENCE TAO AND VAN VU A fundamental problem in the theory of random matrices is to com- pute the limiting distribution of the ESD µ of a sequence of random An matrices A with sizes tending to infinity [34, 4]. In what follows, we n consider normalized random matrices of the form A = √1 M , where n n n M = (x ) has entries that are iid random variables x ≡ x. n ij 1≤i,j≤n ij Such matrices have been studied at least as far back as Wishart [58] (see [34, 4] for more discussion). One of the first limiting distribution results is the famous semi-circle law of Wigner [57]. Motivated by research in nuclear physics, Wigner studied Hermitian random matrices with (upper triangular) entries be- ing iid random variables with mean zero and variance one. In the Hermitian case, of course, the ESD is supported on the real line R. He proved that the expected ESD of a normalized n×n Hermitian matrix √1 M , where M = (x ) has iid gaussian entries x ≡ N(0,1), n n n ij 1≤i,j≤n ij converges in the sense of probability measures1 to the semi-circle dis- tribution 1 √ 1 (x) 4−x2 dx (1) [−2,2] 2π on the real line, where 1 denotes the indicator function of a set E. E Theorem 1.1 (Semi-circular law for the Gaussian ensemble). [57] Let M be an n×n random Hermitian matrix whose entries are iid gaussian n variables with mean 0 and variance 1. Then, with probability one, the ESD of √1 M converges in the sense of probability measures to the n n semi-circle law (1). Henceforth we shall say that a sequence µ of random probability mea- n sures converges strongly to a deterministic probability measure µ if, with probability one, µ converges in the sense of probability measures n to µ. We also say that µ converges weakly to µ if for every continuous n (cid:82) (cid:82) compactly supported f, f dµ converges in probability to f dµ, n (cid:82) (cid:82) thus P(| f dµ − f dµ| > ε) → 0 as n → ∞ for each ε > 0. n Of course, strong convergence implies weak convergence; thus for in- stance in Theorem 1.1, µ converges both weakly and strongly to √1nMn the semicircle law. Wigner also proved similar results for various other distributions, such as the Bernoulli distribution (in which each x equals +1 with proba- ij bility 1/2 and −1 with probability 1/2). His work has been extended 1We say that a collection µ of probability measures converges to a limit µ if n (cid:82) (cid:82) onehas f dµ → f dµforeverycontinuouscompactlysupportedfunctionf,or n equivalently if µ({z ∈C|Rez ≤s;Imz ≤t}) converges to µ({z ∈C|Rez ≤s;Imz ≤ t}) for all s,t. LITTLEWOOD-OFFORD, CIRCULAR LAW, UNIVERSALITY 3 and strengthened in several aspects [1, 2, 36]. The most general form was proved by Pastur [36]: Theorem 1.2 (Semi-circular law). [36] Let M be an n × n random n Hermitian matrix whose entries are iid complex random variables with mean 0 and variance 1. Then ESD of √1 M converges (in both the n n strong and weak senses) to the semi-circle law. The situation with non-Hermitian matrices is much more complicated, due to the presence of pseudospectrum2 that can potentially make the ESD quite unstable with respect to perturbations. The non-Hermitian variant of this theorem, the Circular Law Conjecture, has been raised since the 1950’s (see Chapter 10 of [4] or the introduction of [3]) Conjecture 1.3 (Circular law). Let M be the n × n random ma- n trix whose entries are iid complex random variables with mean 0 and variance 1. Then the ESD of √1 M converges (in both the strong and n n weak senses) to the uniform distribution µ := 11 dz on the unit disk π |z|≤1 {z ∈ C : |z| ≤ 1}. The numerical evidence for this conjecture is extremely strong (see e.g. Figure 1). However, there are significant difficulties in establishing this conjecture rigorously, not least of which is the fact that the main tech- niques used to handle Hermitian matrices (such as moment methods and truncation) can not be applied to the non-Hermitian model (see [4, Chapter 10] for a detailed discussion). Nevertheless, the conjecture has been intensively worked on for many decades. The circular law was verified for the complex gaussian distribution in [34] and the real gaussian distribution in [12]. An approach to attack the general case was introduced in [18], leading to a resolution of the strong circular law for continuous distributions with bounded sixth moment in [3]. The sixth moment hypothesis in [3] was lowered to (2+η)th moment for any η > 0 in [4]. The removal of the hypothesis of continuous distribution required some new ideas. In [21] the weak circular law for (possibly discrete) distributions with subgaussian moment was established, with the subgaussian condition relaxed to a fourth moment condition in [35] 2Informally, we say that a complex number z lies in the pseudospectrum of a square matrix A if (A−zI)−1 is large (or undefined). If z lies in the pseudospec- trum,thensmallperturbationsofAcanpotentiallycauseztofallintothespectrum of A, even if it is initially far away from this spectrum. Thus, whenever one has pseudospectrumfarawayfromtheactualspectrum,theactualdistributionofeigen- values can depend very sensitively (in the worst case) on the coefficients of A. Of course,ourmatricesarerandomratherthanworst-case,andsoweexpectthemost dangerouseffectsofpseudospectrumtobeavoided;butthisofcourserequiressome analytical effort to establish, and deterministic techniques (e.g. truncation) should beusedwithextremecaution,sincetheyarelikelytobreakdownintheworstcase. 4 TERENCE TAO AND VAN VU (seealso[19]foranearlierresultofsimilarnature), andthento(2+η)th moment in [22]. Shortly before this last result, the strong circular law assuming (2 + η)th moment was established in [54]. Finally, in a re- cent paper [55], the authors proved this conjecture (in both strong and weak forms) in full generality. In fact, we obtained this result as a consequence of a more general theorem, presented in the next section. 2. Universality An easy case of Conjecture 1.3 is when the entries x of M are iid ij n complex gaussian. In this case there is the following precise formula for the joint density function of the eigenvalues, due to Ginibre [17] (see also [34], [25] for more discussion of this formula): n (cid:89) (cid:89) p(λ ,··· ,λ ) = c |λ −λ |2 e−n|λi|2. (2) 1 n n i j [i<j i=1 From here one can verify the conjecture in this case by a direct calcu- lation. This was first done by Mehta and also Silverstein in the 1960s: Theorem 2.1 (Circular law for Gaussian matrices). [34] Let M be an n n×n random matrix whose entries are iid complex gaussian variables with mean 0 and variance 1. Then, with probability one, the ESD of √1 M tends to the circular law. n n A similar result for the real gaussian ensemble was established in [12]. These methods rely heavily on the strong symmetry properties of such ensembles (in particular, the invariance of such ensembles with respect to large matrix groups such as O(n) or U(n)) in order to perform explicit algebraic computations, and do not extend directly to more combinatorial ensembles, such as the Bernoulli ensemble. The above mentioned results and conjectures can be viewed as exam- ples of a general phenomenon in probablity and mathematical physics, namely, that global information about a large random system (such as limiting distributions) does not depend on the particular distribution of the particles. This is often referred to as the universality phenom- enon (see e.g. [9]). The most famous example of this phenomenon is perhaps the central limit theorem. Inviewoftheuniversalityphenomenon,onecanseethatConjecture1.3 generalizes Theorem 2.1 in the same way that Theorem 1.2 generalizes Theorem 1.1. LITTLEWOOD-OFFORD, CIRCULAR LAW, UNIVERSALITY 5 Bernoulli Gaussian %"! %"! !"$ !"$ !"& !"’ !"# !"# !"’ !"& !"! !"! !!"’ !!"& !!"# !!"# !!"& !!"’ !!"$ !!"$ !%"! !%"! !"! !"’ !"# !"& !"$ %"! %"’ %"# %"& %"$ ’"! !"! !"& !"# !"’ !"$ %"! %"& %"# %"’ %"$ &"! Figure 1. Eigenvalue plots of two randomly generated 5000 by 5000 matrices. On the left, each entry was an iid Bernoulli random variable, taking the values +1 and −1 each with probability 1/2. On the right, each entry was an iid Gaussian normal random variable, with prob- ability density function is √1 exp(−x2/2). (These two 2π distributions were shifted by adding the identity matrix, thus the circles are centered at (1,0) rather than at the origin.) A demonstration of the circular law for the Bernoulli and the Gaussian case appears3 in the Figure 1. Theuniversalityphenomenonseemstoholdevenformoregeneralmod- els of random matrices, as demonstrated by Figure 2 and Figure 3. This evidence suggests that the asymptotic shape of the ESD depends only on the mean and the variance of each entry in the matirx. As mentioend earlier, the main result of [55] (building on a large number of previous results) gives a rigorous proof of this phenomenon in full generality. For any matrix A, we define the Frobenius norm (or Hilbert-Schmidt norm) (cid:107)A(cid:107) by the formula (cid:107)A(cid:107) := trace(AA∗)1/2 = trace(A∗A)1/2. F F Theorem 2.2 (Universality principle). Let x and y be complex random variables with zero mean and unit variance. Let X = (x ) and n ij 1≤i,j≤n Y := (y ) be n×n random matrices whose entries x , y are n ij 1≤i,j≤n ij ij iid copies of x and y, respectively. For each n, let M be a deterministic n 3We thank Phillip Wood for creating the figures in this paper. 6 TERENCE TAO AND VAN VU Bernoulli Gaussian & & %#$ %#$ % % !%#$ !%#$ !& !& %%#"$ & &#’$ " "#$ ((#"$ ! %%#"$ & &#’$ " "#$ ((#"$ ! % % !"# !"# ! ! !!"# !!"# !% !% !!"() % %"’$("!) ("*& &"#$ $ !!"() % %"’$("!) ("*& &"#$ $ Figure 2. Eigenvalue plots of randomly generated n by n matrices of the form D + M , where n = n n 5000. In the left column, each entry of M was n an iid Bernoulli random variable, taking the values +1 and −1 each with probability 1/2, and in the right column, each entry was an iid Gaussian nor- mal random variable, with probability density function is √1 exp(−x2/2). In the first row, D is the de- n 2π terministic matrix diag(1,1,...,1,2.5,2.5,...,2.5), and in the second row D is the deterministic matrix n diag(1,1,...,1,2.8,2.8,...,2.8) (in each case, the first n/2 diagonal entries are 1’s, and the remaining entries are 2.5 or 2.8 as specified). n×n matrix satisfying 1 sup (cid:107)M (cid:107)2 < ∞. (3) n2 n F n Let A := M +X and B := M +Y . Then µ −µ converges n n n n n n √1nAn √1nBn weakly to zero. If furthermore we make the additional hypothesis that the ESDs µ (4) (√1nMn−zI)(√1nMn−zI)∗ converge in the sense of probability measures to a limit for almost every z, then µ −µ converges strongly to zero. √1nAn √1nBn This theorem reduces the computing of the limiting distribution to the case where one can assume4 that the entries x have Gaussian (or any 4Some related ideas also appear in [19]. In the context of the central limit theorem, the idea of replacing arbitrary iid ensembles by Gaussian ones goes back to Lindeberg [31], and is sometimes known as the Lindeberg invariance principle; LITTLEWOOD-OFFORD, CIRCULAR LAW, UNIVERSALITY 7 Bernoulli Gaussian * * ) ) # # ! ! !# !# !) !) !* !* !!"%!"# # ) * & % ( $ $"’ ’"% !!"%!"# # ) * & % ( $ $"’ ’"% Figure 3. Eigenvalue plots of two randomly generated 5000 by 5000 matrices of the form A+BM B, where A n and B are diagonal matrices having n/2 entries with the value 1 followed by n/2 entries with the value 5 (for D) and the value 2 (for X). On the left, each entry of M n was an iid Bernoulli random variable, taking the values +1 and −1 each with probability 1/2. On the right, each entryofM wasaniidGaussiannormalrandomvariable, n with probability density function is √1 exp(−x2/2). 2π special) distribution. Combining this theorem (in the case M = 0) n with Theorem 2.1, we conclude Corollary 2.3. The circular law (Conjecture 1.3) holds in both the weak and strong sense. ItisusefultonoticethatTheorem2.2stillholdsevenwhenthelimiting distributions do not exist. The proof of Theorem 2.2 relies on several surprising connections be- tween seemingly remote areas of mathematics that have been discov- ered in the last few years. The goal of this article is to give the reader an overview of these connections and through them a sketch of the proof of Theorem 2.2. The first area we shall visit is combinatorics. 3. Combinatorics As we shall discuss later, one of the primary difficulties in controlling the ESD of a non-Hermitian matrix A = √1 M is the presence of n n n pseudospectrum - complex numbers z for which the resolvent (A − n zI)−1 = (√1 M − zI)−1 exists but is extremely large. It is therefore n n of importance to obtain bounds on this resolvent, which leads one to see [11] for further discussion, and a formulation of this principle for Hermitian random matrices. 8 TERENCE TAO AND VAN VU understand for which vectors v ∈ Cn is (A −zI)v likely to be small. n Expanding out the vector (A −zI)v, one encounters expressions such n as ξ v +...+ξ v , where v ,...,v ∈ C are fixed and ξ ,...,ξ are 1 1 n n 1 n 1 n iid random variables. The problem of understanding ths distribution of such random sums is known as the Littlewood-Offord problem, and we now pause to discuss this problem further. 3.1. The Littlewood-Offord problem. Let v = {v ,...,v } be a 1 n set of n integers and let ξ ,...,ξ be i.i.d random Bernoulli variables. 1 n Define S := (cid:80)n ξ v and p (a) := P(S = a) and p := sup p (a). i=1 i i v v a∈Z v Intheirstudyofrandompolynomials,LittlewoodandOfford[32]raised the question of bounding p . They showed that if the v are non-zero, v i then p = O(lo√gn). Very soon after, Erd˝os [13], using Sperner’s lemma, v n gave a beautiful combinatorial proof for the following refinement. Theorem3.2. Letv ,...,v benon-zeronumbersandξ bei.i.dBernoulli 1 n i random variables. Then5 (cid:0) n (cid:1) 1 (cid:98)n/2(cid:99) p ≤ = O(√ ). v 2n n Notice that the bound is sharp, as can be seen from the example v := {1,...,1}, in which case S has a binomial distribution. Many mathematicians realized that while the classical bound in Theorem 3.2 is sharp as stated, it can be improved significantly under additional assumptions on v. For instance, Erd˝os and Moser [14] showed that if the v are distinct, then i p = O(n−3/2lnn). v They conjectured that the logarithmic term is not necessary and this was confirmed by Sa´rko¨zy and Szemer´edi [42]. Again, the bound is sharp (up to a constant factor), as can be seen by taking v ,...,v to 1 n be a proper arithmetic progression such as 1,...,n. Stanley [41] gave a different proof that also classified the extremal cases. A general picture was given by Hal´asz, who showed, among other things, that if one forbids more and more additive structure6 in the 5Weusetheusualasymptoticnotationinthispaper,thusX =O(Y),Y =Ω(X), X (cid:28) Y, or Y (cid:29) X denotes an estimate of the form |X| ≤ CY where C does not depend on n (but may depend on other parameters). We also let X =o(Y) denote the bound |X|≤c(n)Y, where c(n)→0 as n→∞. 6Intuitively,thisisbecausethelessadditivestructureonehasinthev ,themore i likely the sums S are to be distinct from each other. In the most extreme case, if LITTLEWOOD-OFFORD, CIRCULAR LAW, UNIVERSALITY 9 v , then one gets better and better bounds on p . One corollary of his i v results (see [24] or [48, Chapter 9] is the following. Theorem 3.3. Consider v = {v ,...,v }. Let R be the number of 1 n k solutions to the equation ε v +···+ε v = 0 1 i1 2k i2k where ε ∈ {−1,1} and i ,...,i ∈ {1,2,...,n}. Then i 1 2k p = O (n−2k−1/2R ). v k k Remark 3.4. SeveralvariantsofTheorem3.2canbefoundin[27,30,16, 28]andthereferencestherein. TheconnectionbetweentheLittlewood- Offord problem and random matrices was first made in [26], in connec- tion with the question of determining how likely a random Bernoulli matrix was to be singular. The paper [26] in fact inspired much of the work of the authors described in this survey. 3.5. The inverse Littlewood-Offord problem. Motivated by in- verse theorems from additive combinatorics, in particular Freiman’s theorem (see [15], [48, Chapter 5]) and a variant for random sums in [53, Theorem 5.2] (inspired by earlier work in [26]), the authors [49] brought a different view to the problem. Instead of trying to improve the bound further by imposing new assumptions, we aim to provide the full picture by finding the underlying reason for the probability p v to be large (e.g. larger than n−A for some fixed A). Notice that the (multi)-set v has 2n subsums, and p ≥ n−C mean v that at least 2n/nC among these take the same value. This suggests that there should be very strong additive structure in the set. In order to determine this structure, one can study examples of v where p is v large. For a set A, we denote by lA the set lA := {a +···+a |a ∈ A}. 1 l i A natural example is the following. Example 3.6. Let I = [−N,N] and v ,...,v be elements of I. Since 1 n S ∈ nI, by the pigeon hole principle, p ≥ 1 = Ω(1). In fact, a short v nI N consideration yields a better bound. Notice that with probability at √ least .99, we have S ∈ 10 nI, thus again by the pigeonhole principle, we have p = Ω(√1 ). If we set N = nC for some constant C, then v nN 1 p = Ω( ). (5) v nC+1/2 the v are linearly independent over the rationals Q, then the sums 2n sums S are i all distinct, and so p =1/2n in this case. v 10 TERENCE TAO AND VAN VU The next, and more general, construction comes from additive combi- natorics. A very important concept in this area is that of a generalized arithmetic progression (GAP). A set Q of numbers is a GAP of rank d if it can be expressed as in the form Q = {a +x a +···+x a |M ≤ x ≤ M(cid:48) for all 1 ≤ i ≤ d} 0 1 1 d d i i i for some a ,...,a ,M ,...,M ,M(cid:48),...,M(cid:48). 0 d 1 d 1 d It is convenient to think of Q as the image of an integer box B := {(x ,...,x ) ∈ Zd|M ≤ x ≤ M(cid:48)} under the linear map 1 d i i i Φ : (x ,...,x ) (cid:55)→ a +x a +···+x a . 1 d 0 1 1 d d The numbers a are the generators of P, and Vol(Q) := |B| is the i volume of B. We say that Q is proper if this map is one to one, or equivalently if |Q| = Vol(Q). For non-proper GAPs, we of course have |Q| < Vol(Q). Example 3.7. Let Q be a proper GAP of rank d and volume V. Let v ,...,v be (not necessarily distinct) elements of P. The random 1 n variable S = (cid:80)n ξ v takes values in the GAP nP. Since |nP| ≤ i=1 i i Vol(nB) = ndV, the pigeonhole principle implies that p ≥ Ω( 1 ). In v ndV fact, using the same idea as in the previous example, one can improve the bound to Ω( 1 ). If we set N = nC for some constant C, then nd/2V 1 p = Ω( ). (6) v nC+d/2 The above examples show that if the elements of v belong to a proper GAPwithsmallrankandsmallcardinalitythenp islarge. Afewyears v ago, the authors [49] showed that this is essentially the only reason: Theorem 3.8 (Weak inverse theorem). [49] Let C,(cid:15) > 0 be arbitrary constants. There are constants d and C(cid:48) depending on C and (cid:15) such that the following holds. Assume that v = {v ,...,v } is a multiset of 1 n integers satisfying p ≥ n−C. Then there is a GAP Q of rank at most v d and volume at most nC(cid:48) which contains all but at most n1−(cid:15) elements of v (counting multiplicity). Remark 3.9. The presence of the small set of exceptional elements is notcompletelyavoidable. Forinstance,onecanaddo(logn)completely arbitrary elements to v and only decrease p by a factor of n−o(1) at v worst. Nonetheless we expect the number of such elements to be less than what is given by the results here. The reason we call Theorem 3.8 weak is the fact that the dependence between the parameters is not optimal and does not yet reflect the relations in (5) and (6). Recently, we were able to modify the approach to obtain an almost optimal result.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.