Universality in the bulk of the spectrum for complex sample covariance matrices 1 1 0 S. P´ech´e∗ 2 n January 5, 2011 a J 4 Abstract ] R P WeconsidercomplexsamplecovariancematricesMN = N1YY∗ where Y is a N ×p random matrix with i.i.d. entries Yij,1≤i≤N,1≤j ≤p . h with distribution F. Undersome regularity and decay assumption on F, t weproveuniversalityofsomelocaleigenvaluestatistics inthebulkofthe a m spectrum in the limit where N → ∞ and limN→∞p/N = γ for any real numberγ ∈(0,∞). [ 2 1 Introduction v 3 9 1.1 Model and result 4 2 This paper is concerned with universal properties of large complex sample co- 2. variance matrices in the bulk of the spectrum. We consider N p random × 1 matrices Y =(Y ) where the Y ’s are i.i.d. random variables with some prob- ij ij 9 ability distribution F. Let then M be the sample covariance matrix: N 0 : 1 v MN = YY∗. i N X In the whole paper we assume that p N and that r ≥ a γ [1, ) such that p/N γ as N . ∃ ∈ ∞ → →∞ The case where p < N can be deduced from the above setting using the fact that YY and Y Y have the same non zero eigenvalues. In the sequel we call ∗ ∗ λ λ λ the ordered eigenvalues of M and π := 1 N δ 1 ≥ 2 ≥ ··· ≥ N N N N i=1 λi its spectral measure. Assuming that F has a finite variance σ2, Marˇcenko and P Pastur ( [19], see also [21]) have shown that π almost surely converges as N N to the so-called Marˇcenko-Pastur distribution ρMP. This probability → ∞ γ,σ2 distribution depends on σ2 only and not on any other detail (higher moments ∗Institut Fourier, 100 Rue des Maths, BP 74, F-38402 St Martin d’Heres. E-mail: [email protected] 1 e.g.) of F: in this sense it is universal. It is defined by the density with respect to the Lebesgue measure: dρMP(x) (u x)(x u ) γ,σ2 + dx = p −2πxσ2 − − 1x∈[u−,u+], (1.1) where u+ =σ2(1+√γ)2 and u =σ2(1 √γ)2. − − Ithasbeenconjectured(see[20]forinstance)that,inthelarge-N-limit,some finer properties of the spectrum are also universal. For instance, the spacing between nearest neighbor eigenvalues in the vicinity of a point u (u ,u ) is + ∈ − expected to be universal, under the sole assumption that the variance of F is finite. Thespacingisactuallybelievedtobe“moreuniversal”thanthe limiting Marˇcenko-Pastur distribution, in the sense that it is expected to be the same as for Hermitian Wigner matrices. To investigate such local properties of the spectrum,weintroducetheso-calledlocaleigenvalue statistics inthebulkofthe spectrum. Given a symmetric function f L (Rm) (m fixed) with compact ∞ ∈ support, a point u [u ,u ], and a scaling factor ρ , we define the local + N eigenvalue statistic S∈(m)(f−,u,ρ ) by N N S(m)(f,u,ρ )= f(ρ (λ u),...,ρ (λ u)), (1.2) N N N i1 − N im − i1,X...,im wherethesumisoveralldistinctindicesfrom 1,...,N .Whenuisinthebulk { } of the spectrum, that is u (u ,u ), the natural choice for the scaling factor + is ρ = NρMP(u), which s∈houl−d give the scale of the spacing between nearest N γ,σ2 neighbor eigenvalues in the vicinity of u. We here prove universality of some local linear statistics in the bulk of the spectrumforawideclassofcomplexsamplecovariancematrices. Wefollowthe approachused by [15] (and a series of papers [13], [12], [14]) where universality in the bulk of Wigner matrices is proved. We now define the class of matrices under considerationin this paper. Let µ be the realGaussiandistribution with mean 0 and variance 1/2. Let F be a complex probability distribution whose real and imaginary parts are of the form ν(dx)=e V(x)µ(dx), (1.3) − for some real function V satisfying the following assumptions: - V 6 and there exists an integer k 1 such that ∈C ≥ 6 V(j)(x) C (1+x2)k, (1.4) | |≤ ∗ j=1 X - there exist δ ,C >0 such that x R, 1 ′ ∀ ∈ ν(x) C′e−δ1|x|2. (1.5) ≤ 2 This assumption can actually be relaxed as explained in Remark 1.2 below to consider distributions ν with sub-exponential decay only. - F is normalized so that xdν(x)=0 and x2 dν(x)=1/2. (1.6) | | Z Z In the sequel we consider sample covariance matrices M = 1YY where N N ∗ Y =(Y ) is a N p random matrix such that: ij × Y ,1 i N;1 j p are i.i.d. random variables with distribution F. ij ≤ ≤ ≤ ≤ (1.7) One shall remark that the condition (1.6) can always be achieved by rescaling and does not impact on the generality of our next results. We now give our two main results. Let ǫ>0 small be given. Theorem 1.1. Assume that F satisfies (1.3) to (1.6). Let also u (u + ǫ,u ǫ) be a point in the bulk of the spectrum and ρ =NρMP(u). T∈hen− +− N γ,1 sin(x y)π 2 lim ES(2)(f,u,ρ )= f(x,y) 1 − dxdy. N→∞ N N ZR2 (cid:16) −(cid:18) π(x−y) (cid:19) (cid:17) Remark 1.1. Itisalsopossibletoproveuniversalityoflocaleigenvaluestatistics for m = 3 using the approach developed hereafter. Nevertheless to consider higher integers m, one needs to increase the regularity of V (see Remark 1.1 in [15]). We can also prove that the spacing distribution close to a point u in the interior of the support of Marˇcenko-Pastur’s law is universal. Let s 0 and (t ) be a sequence such that lim t = + and lim tN ≥= 0 for soNme β >0. Let u (u +ǫ,u Nǫ)→b∞e gNivenan∞dρ =NρNM→P∞(u)N.1D−βefine then ∈ − +− N γ,1 the “spacing function” of the eigenvalues by 1 s t N S (s,u):= ♯ 1 j N 1, λ λ , λ u . (1.8) N j+1 j j 2t ≤ ≤ − − ≤ ρ | − |≤ ρ N (cid:26) N N(cid:27) Intuitively the expectation of the spacing function is the probability, knowing thatthereexistsaneigenvalueinthe interval[u t ,u+t ],tofinditsnearest N N − neighbor within a distance s . Finally, we define ρN d2 sinπ(x y) p(s)= det(I K) , where K(x,y):= − . (1.9) ds2 − L2(0,s) π(x y) − Theorem 1.2. Assume that F satisfies (1.3) to (1.6). Let also u (u + ∈ − ǫ,u ǫ) be a point in the bulk of the spectrum. Then, + − s lim ES (s,u)= p(w)dw. (1.10) N N→∞ Z0 3 Remark 1.2. Using truncation and centralization techniques, it is possible to prove both Theorem 1.1 and Theorem 1.2 when Assumption 1.5 is replaced by the weaker assumption C1,C2 >0 such that ν(x) C1e−C2|x|. ∃ ≤ This extensionis examinedin full detailin Section 5 of[15] for Wigner random matrices and readily extends to sample covariance matrices. The first proofs of universality in the bulk of the spectrum of large random matrices have been obtained for the so-calledinvariantensembles ( [7], [8], [9]). Their proof relies on the fact that the joint eigenvalue density of such ensem- bles can be computed and the asymptotic local eigenvalue statistics can then be determined. A breakthrough in the proof of the conjecture was obtained in[18](followingthe ideaof[5]and[6]),provinguniversalityinthe bulkforthe so-called Dyson Brownian motion model ( [11]). This then allowed to extend universality results to a wide class of non invariant Hermitian random matrix ensembles- the so-called Gauss divisible ensembles (see Subsection 1.2 for the definition). [4] havethen obtainedthe same universality resultfor Gauss divisi- blecomplexsamplecovariancematrices. Veryrecently,suchuniversalityresults have been greatly improvedby [23] and [15] for Hermitian Wigner random ma- trices. BoththepapersremovetheGaussdivisibleassumptionandonlyassume sufficientdecayofthe entriesofWigner randommatrices. The approachof[23] is to make a Taylor expansion of local eigenvalue statistics of Wigner matrices. The core of the proof is then to show that these statistics depend, in the large- N-limit, only on the first four moments of the entries of the Wigner matrix. Proving that any Wigner matrix can be matched to a Gauss divisible matrix withthesamefirst4momentsallowstoproveaverygeneraluniversalityresult. On the other hand, the approach of [15] is to show that any Wigner matrix (under suitable decay of the entries) can be sufficiently well approximatedby a Gauss divisible random matrix, so that the bulk universality follows. We also refer the reader to [16] where the two approaches are combined. While writing this paper, a proof of universality in the bulk of the spectrum of large sample covariance matrices has been obtained in [24] in the sole case wherep N =O(N4483), but with muchmilder assumptions onthe decayof the − distribution ν. Therein ν satifies xCodν(x) < for some sufficiently large | | ∞ C in place of our assumptions (1.4) and (1.5). Their approachis based on the o R ideas developed in [23]. Our paper closely follows the ideas developed in [15]. We give an overview of the proof in the next subsection. 1.2 The idea of the proof Followingthepioneeringworkof[18]and[5],[4]haveshownuniversalityoflocal eigenvalue statistics in the bulk of the spectrum for complex sample covariance matrices when the distribution of the sample is Gauss divisible. We recall that acomplexdistributionµ isGaussdivisibleifthereexistacomplexprobability G 4 distribution P and a non trivial complex Gaussian distribution G such that µ =P ⋆G. Equivalently [4] consider random matrices of the form G 1 MN = (W +aX)(W +aX)∗, (1.11) N where W and X are independent N p complex random matrices both with f × i.i.d. entries. The W ’s are P distributed and the X ’s are complex standard ij ij Gaussianrandom variables. In the above context a is realnumber independent of N. The proof of [18] and [4] relies on three main steps: -conditionnallyonH = 1WW , the eigenvalueprocessinducedby M defines N ∗ N a so-called determinantal random point field; -the corresponding correlation kernel can be expressed as a doublefintegral in the complex plane depending on H through its sole spectral measure µ ; N -under suitable assumptions on W and thanks to concentration results for the spectral measure µ established by [2], [3] and [17], the asymptotic analysis of N the correlationkernel (and local statistics) can be performed. The parameter a is to be seen as the “order of the Gaussian regularization” of P. Inprinciple this resultwouldyielda full proofofthe universalityconjecture if one could let a approach (and be smaller than) 1/√N. Unfortunately this idea fails whatever sharp concentration results can be established for µ . The N asymptotic analysis is not stable in this scale. A breakthrough to overcome this difficulty is obtained in [15]. In [13], [12], [14] concentration results are deeply sharpened so as to be able to consider a Gaussian regularization of order a >> 1 . Given an arbitrary (non Gauss √N divisible) distribution F, the main point is however to show that one can find a Gauss divisible distribution approximatingF sufficiently wellso that one can deduce universality in the bulk for a sample covariance matrix with i.i.d. F distributedentries. This is the mainstepachievedin[15],whichwe nowbriefly expose. Consider the Ornstein-Uhlenbeck (OU) process 1 ∂2 x ∂ L= , ∂ u=Lu. 4∂x2 − 2∂x t Given a complex probability distribution µ, let then X be the N p matrix t × process with initial distribution µ Np and whose entries (real and imaginary ⊗ parts) evolve independently according to the OU process. Then X evolves as e t e t e t/2 Hˆ +(et 1)1/2X , − 7→ − (cid:16) (cid:17) where Hˆ has i.i.d. entries with distribution µ and X has standard complex Gaussian entries. Let et := (etL) Np denote the dynamics of the OU process L ⊗ for all the matrix elements. Here, for small t, one should think that t a2. e ≃ [15] prove that there exists a Gauss divisible distribution µt approximating F G in the total variation norm in a sufficiently good way. Roughly speaking, let µt =etL(1 tL+tL2/2)Fc, µ=(1 tL+t2L2/2)Fc G − − 5 e where Fc is obtained from F after truncation and centralization. Intuitively, it is reasonable to expect that µt is a good approximation of F in the scale G t<<1. For sake of completeness we here recall their result (Proposition 2.1). Letθ be a smoothcutofffunctionsatisfying θ(x)=1if x 1 andθ(x)=0 | |≤ for x 2. | |≥ Proposition 1.1. Let V satisfy (1.4), (1.5) for some k and (1.6). Let λ>0 be sufficiently small and t=Nλ 1. Let c ,d be real numbers so that vdµ defines − N N a centered probability density if the function v is given by: v(x):=e−Vc(x), Vc(x):=V(x)θ (x cN)N−λ/4k +dN. − (cid:16) (cid:17) Let then :=L Np,f :=(e V) Np, and f :=v Np. ⊗ N,p − ⊗ c,N,p ⊗ L 1. There exist constants C > 0, c > 0 depending on k and λ such that f f dµ Np Ce c(Np)c/2. c,N,p N,p ⊗ − | − | ≤ 2. Rg :=(1 tL+t2L2/2)v isaprobability measurewithrespecttodµ. Setting t − G :=[g ] Np, there exists a constant C depending on λ and the constants t t ⊗ C and δ defined in (1.4) and (1.5) such that 1 ∗ et G f 2 p | L t− c,N,p| dµ⊗Np CNpt6−λ CN−4+8λ . et G ≤ ≤ N Z L t The idea is then to prove Theorem 1.1 and Theorem 1.2 for the Gauss divisible ensembles with small parameter a N(λ 1)/2 for some λ > 0. Then, − ∼ using Proposition 1.1, and following the idea of [15] Section 4, one can extend universality of local eigenvalue statistics S(m)(f,u,ρ ) with m= 2,3 and that N N of the spacing function to sample covariance matrices satisfying (1.7). The paperis organizedas follows. InSection 2we study eigenvaluestatitics in the bulk of the spectrum for Gauss divisible sample covariance matrices. To this aim, we first recall some properties of the Deformed Wishart ensemble. This is the conditional distribution of M knowing W. Such an ensemble is in N particularknownto be determinantal,aswe recall. We thenestablishsomeim- provedconvergenceratesforthespectraflmeasureofsamplecovariancematrices whose entries have a sub-Gaussian tail. These concentration results then allow tocomputetheasymptoticcorrelationfunctionsasN intheregimewhere →∞ a 0,a>> 1 . We then prove Theorem 1.2 and Theorem 1.1 in Section 3. → √N In the whole paper, we use C and c to denote constants whose value may vary from line to line. 2 The Gauss divisible ensemble In this section we establish some universality results for the following Gauss divisible ensemble: let 1 MN = (W +aX)(W +aX)∗, (2.1) N f 6 be a Hermitian N N random matrix where × (H )X andW areindependentN prandommatriceswherep N and 0 • × ≥ there exists γ 1 so that lim p/N =γ. N ≥ →∞ (H ) a=a = Nλ where λ>0 is a (small) real number; • 1 N N q (H )X isaN prandommatrixwithcomplexstandardGaussianentries: 2 • × ReX ,ImX (0,1/2), 1 i N, 1 j p; ij ij ∼N ∀ ≤ ≤ ≤ ≤ (H ) W =(W ),1 i N, 1 j p, is a complex randommatrix such 3 ij • ≤ ≤ ≤ ≤ that ReW , ImW , 1 i N,1 j p are i.i.d. and satisfy : ij ij ≤ ≤ ≤ ≤ (A1) There exists a constant δo >0 such that Eeδo|Wij|2 < , i,j. ∞ ∀ Without loss of generality, we also make the assumption that EW 2 =1/4. (2.2) ij | | Note that (2.2) can always be achieved by rescaling M . From now on, we N denote by y y y the ordered eigenvalues of H = H = WW /N 1 2 N N ∗ ≥ ≥ ···≥ and let µ = 1 N δ be its spectral measure. Unfder assumption (2.2), it N N i=1 yi is known that µ almost surely converges to the Marˇcenko-Pasturdistribution N with parametersPγ and 1/4, ρMP = ρMP, whose density function is given by γ,1/4 γ (1.1) with σ2 =1/4. When γ =1, we simply denote ρMP by ρMP for short. 1 2.1 Correlation functions The samplecovarianceGaussdivisible ensemble hasanice mathematicalstruc- turethatwearegoingtomakeuseofinthe sequel: theconditionaldistribution of M with respect to W is the so-calledcomplex deformed Wishart ensemble. N Suchanensemblehasbeenwidelystudiedinrandommatrixtheoryasitinduces a dfeterminantal random point field. We recall some results used in [4] (see ref- erencestherein)thatwillbeneededforthesequel. LetthenPH(λ ,λ ,...,λ ) N 1 2 N be the joint eigenvalue distribution induced by the conditional distribution of M w.r.t. H. Then (see Section 3 in [4]), PH is absolutely continuous with N N respect to Lebesgue measure on RN. Its density fH is given by: + N f ν fNH(x1,x2,...,xN)= VV((xy))det(cid:16)S1e{−yi+Sxj}Iν(cid:18)2√Syixj(cid:19) (cid:18)xyji(cid:19)2(cid:17)Ni,j=(12,.3) where V(x) := (x x ), ν = p N, I is the modified Bessel func- 1 i<j N i− j − ν tion of the first ki≤nd, ≤and S = a2/N. The main tools to study local eigen- Q value statistics are the so-called eigenvalue correlation functions (see Section 3). They are defined for any integer 1 m N by R(m)(x ,x ,...,x ;H)= ≤ ≤ N 1 2 m 7 N! fH(x ,x ,...,x ,λ ,...,λ ) N dλ . Then, for any in- (N−m)! RN+−m N 1 2 m m+1 N i=m+1 i teger 1 m N, one has that ≤R ≤ Q R(m)(x ,x ,...,x ;H)=det(K (x ,x ;H))m , N 1 2 m N i j i,j=1 for some correlation kernel K (u,v;H). This gives the determinantal random N point field structure. Furthermore K (u,v;H) is given by N 1 z2+w2 2w√v 2z√u KN(u,v;H):= 2Ni2πw22S2 ZyiR+Awdwν ZΓwd+zez− Sw Kzν(cid:16) S (cid:17)Iν(cid:16) S (cid:17) i − − , (2.4) z2 y z w z − w+z Yi=1 − i (cid:16) (cid:17) (cid:18) − (cid:19) whereK is the modifiedBesselfunction ofthe secondkind. Γ is a contourori- ν ented counterclockwise enclosing the y1/2,i=1,...,N and A is large enough so that the two contours Γ and Υ=i±R+i A do not cross each other. Υ Γ y1/2 y1/2 − 1 2 y1/2 y1/2 y1/2 y1/2 − 2 − N N 1 Figure 1: The two contours defining the correlation kernel K (u,v;H). N Due to the determinantal structure, correlation functions are not impacted by conjugation of the correlation kernel. In particular, for any b R and any ∈ integer m 1, ≥ m det(KN(ui,uj;H))mi,j=1 =det KN(ui,uj;H)e2b(√ujS−√ui) . (cid:18) (cid:19)i,j=1 In the sequel we consider the conjugationfor some b that will be defined in the asymptotic analysis. We set KNb (ui,uj;H):=KN(ui,uj;H)e2b(√uj−√ui)/S. (2.5) 8 The correlation kernel Kb (u,v;H) only depends on H through its spectral N measureµ . Asinthecasewhereaisfixed(independentofN),theideaistouse N the convergence of µ to the Marˇcenko-Pasturdistribution. In particular, one N wouldliketomakethereplacement(outsideasuitablynegligibleset) N (w2 i=1 − y )=exp N(1+o(1)) ln(w2 y)dρMP(y) . In the next subsection, we prove thiat this{replacement can be m−ade inγ some}sense provided Im(w2)Qis not too R small. 2.2 Concentration results This subsection is devoted to the proof of the following Proposition 2.1, which precisestherateofconvergenceofµ toρMP. BeforeexposingthisProposition N we need a few definitions and notations. Given a complex number z = u+iη, u R,η >0, we define the Stieltjes transform of µ by N ∈ 1 m (z):= dµ (λ), (2.6) N N λ z Z − and the Stieltjes transform of the limiting Marˇcenko-Pasturdistribution by 1 m (z):= dρMP(λ). (2.7) MP λ z γ Z − Let ǫ>0 be given (small enough). Proposition 2.1. Let z = u+iη for some u [u +ǫ,u ǫ] and η > 0. + ∈ − − Then, there exist a constant c and C ,C > 0,c > 0 depending on ǫ only such 1 o that δ <c ǫ, 1 ∀ p P sup m (z) m (z) δ+C γ Ce cδ√Nη, N MP o − for an yu∈([lun−N+ǫ),4u/+N−ǫ](cid:12)(cid:12)(cid:12) η 1−. Furtherm(cid:12)(cid:12)(cid:12)o≥re(cid:0), given(cid:12)(cid:12)ηN −(ln(cid:12)(cid:12)(cid:1)N!)≤4/N, there exist ≤ ≤ ≥ constants c>0,C >0 and K such that κ K , o o ∀ ≥ P sup m (x+iy) κ Ce c√κNη. N − x>(ǫ/200)2,y η| |≥ !≤ | | ≥ Proof of Proposition 2.1: To ease the reading, the proof is postponed to AppendixA.TheproofisactuallyamodificationofthatofTheorems3.1and4.1 in[14],wheretheStieltjestransformofHermitianWignermatricesisconsidered. We thus simply indicate the main changes. 2.3 Asymptotics of the correlation kernel when ν = p N − is bounded The aim of this subsection is here to prove the following Proposition. Let ǫ>0 be given (small) independent of N. 9 Proposition 2.2. Let u [u +ǫ,u ǫ] = [ǫ,1 ǫ] and m 1 be given. + Consider a sequence u =∗u∈ su−ch that N−1 λ u u −= o(1). The≥n, there exist N − b R, a set Ω , and positive constants C,c s|uc−h th∗a|t: N ∈ -the complement of Ω is negligible: P(Ωc ) Ce cNλ/4 N N ≤ − -on Ω one has that N m 1 x x lim det Kb u+ i ,u+ j ;H N NρMP(u) N NρMP(u) NρMP(u) →∞ (cid:18) (cid:18) (cid:19)(cid:19)i,j=1 m sin(π(x x ) i j =det − . π(x x ) (cid:18) i− j (cid:19)i,j=1 Remark 2.1. In Remark 2.3 we explain how to extend Proposition 2.2 to the case where u u = o(N c) with c > 0 arbitrarily small. This extension is − | − ∗| needed for the proof of Theorem 1.2. This subsection is devoted to the proof of Proposition 2.2. The proof is divided into two parts: first we obtain a new expression for the correlation kernel,which then allowsto derive its asymptotics by a saddle point argument. 2.3.1 Rewriting the kernel Our strategy consists into two parts: we first replace the Bessel functions with their asymptotic expansion and then remove the singularity 1/(w z) in the − correlationkernel as it will be proved to prevent a direct saddle point analysis. As this part is highly technical and needs a few notations, we summarize our mainresultinLemma2.3statedattheendofthissubsection. Thereadermight skip this part and is simply referred to the above cited Lemma. Our first task to replace in (2.4) Bessel functions with their asymptotic expansiongiven in Appendix B, formula (B.1). This replacement canbe made, up to a negligible error, provided the contour Γ is cut in a small neighborhood of the imaginary axis: see Figure 2. In the sequel (see Figure 2) we call x ±o (resp. x±1) the endpoints of Γc with Re(xo) < 0 (resp. Re(x1) > 0) with positive/negative imaginary part surrounding the imaginary axis: x±o and x±1 will be chosen in subsection 2.3.2. We also call Γ (resp. Γ ) the part of Γ 1 2 c lying to the right (resp. left) of iR. Then Γ is the image of Γ by z z and 1 2 7→− Γ =Γ Γ . One obtains the following Lemma. c 1 2 ∪ Lemma 2.1. For any b R, one has that ∈ Kb (u,v;H)=(1+o(1)) K1,b(u,v;H)+K2,b(u,v;H) , (2.8) N N N (cid:16) (cid:17) where 2b(√v √u) KN1,b(u,v;H) = e4i2πS−2S ZiR+AdwZΓ1dze−z2S+w2e−2wS√ve2zS√u 1 N w2 y w ν w+z i − , √w√z(uv)1/4 z2 y z w z i Yi=1 − (cid:16) (cid:17) − 10