ebook img

New asymptotic results in principal component analysis PDF

0.38 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview New asymptotic results in principal component analysis

New asymptotic results in principal component analysis 6 Vladimir Koltchinskii and Karim Lounici 1 ∗ † 0 School of Mathematics 2 Georgia Institute of Technology n Atlanta, GA 30332-0160 a e-mail: [email protected] J [email protected] 7 Abstract: Let X be a mean zero Gaussian random vector in a separa- ble Hilbert space H with covariance operator Σ := E(X ⊗X). Let Σ = T] Pr≥1µrPr be the spectral decomposition of Σ with distinct eigenvalues µ1>µ2>... andthecorrespondingspectralprojectorsP1,P2,....Given S a sample X1,...,Xn of size n of i.i.d. copies of X, the sample covariance h. operatorisdefinedasΣˆn:=n−1Pnj=1Xj⊗Xj.Themaingoalofprincipal at ceommppiroicnaelnctoaunnatelyrspiasrtisstPˆo1,ePsˆt2i,m..a.teprsoppecetrrlyaldperfionjeedctionrstePrm1,sPo2f,.s.p.ecbtyratlhdeeir- m composition of the samplecovariance operator Σˆn. The aim of this paper is to study asymptotic distributions of important statistics related to this [ problem,inparticular,of statistickPˆr−Prk22, wherek·k22 isthe squared Hilbert–Schmidt norm. This is done in a “high-complexity” asymptotic 1 frameworkinwhichthesocalledeffective rankr(Σ):= tr(Σ) (tr(·)being v kΣk 7 the trace andk·k∞ beingthe operator norm)of the true c∞ovariance Σ is 5 becominglargesimultaneouslywiththesamplesizen,butr(Σ)=o(n)as 4 n→∞.Inthissetting,weprovethat,inthecaseofone-dimensionalspec- 1 tralprojectorPr,theproperlycenteredandnormalizedstatistickPˆr−Prk22 withdata-dependentcenteringandnormalizationconvergesindistribution 0 toaCauchytypelimit.Theproofsofthisandotherrelatedresultsrelyon . 1 perturbationanalysisandGaussianconcentration. 0 6 AMS 2000 subject classifications:Primary62H12. 1 Keywordsandphrases:Samplecovariance,Spectralprojectors,Effective : rank, Principal Component Analysis, Concentration inequalities, Asymp- v toticdistribution,Perturbationtheory. i X r a 1. Introduction Let X,X ,...,X ,... be i.i.d. random variables sampled from a Gaussian dis- 1 n tributioninaseparableHilbertspaceHwithzeromeanandcovarianceoperator Σ:=EX X andletΣˆ =Σˆ :=n 1 n X X denotethesamplecovariance ⊗ n − j=1 j⊗ j operatorbasedon(X ,...,X ).1 Wewillbeinterestedinasymptoticproperties 1 n P ∗SupportedinpartbyNSFGrantsDMS-1509739,DMS-1207808,CCF-1523768andCCF- 1415498 †SupportedinpartbySimonsGrant315477 andNSFCareerGrant,DMS-1454515 1Givenu,v∈H,thetensorproductu⊗visarankonelinearoperatordefinedas(u⊗v)x= uhv,xi,x∈H. 1 V. Koltchinskii and K. Lounici/Asymptotics in principal component analysis 2 ofseveralstatisticsrelatedtospectralprojectorsofsamplecovarianceΣˆ (empir- icalspectralprojectors)thatcouldbe potentiallyusefulinprincipalcomponent analysis (PCA) and its infinite dimensional versions such as functional PCA (see, e.g., [18]) or kernel PCA in machine learning (see, e.g., [19], [4]). In the classical setting of a finite-dimensional space H = Rp of a fixed di- mension p, the large sample asymptotics of spectral characteristics of sample covariance were studied by Anderson [1] who derived the joint asymptotic dis- tribution of the sample eigenvalues and the associatedsample eigenvectors(see also Theorem 13.5.1 in [2]). Later on, similar results were established in the infinite-dimensional case (see, e.g., [6]). Such an extension is rather straight- forward provided that the “complexity of the problem” characterized by such parameters as the trace tr(Σ) of the covariance operator Σ remains fixed when the sample size n tends to infinity. In the high-dimensional setting, when the dimension p of the space grows simultaneously with the sample size n, the problem has been primarily studied for so called spiked covariance models introduced by Johnstone and co-authors (see, e.g., [8]). In this case, the covariance Σ has a special structure, namely, m Σ= λ2(θ θ )+σ2I , j j ⊗ j p j=1 X where m < p, θ ,...,θ are orthonormal vectors (“principal components”), 1 m λ2 > >λ2 >0, σ2 >0 and I is the p p identity matrix. This means that 1 ··· m p × the observation X can be represented as 2 m p X = λ ξ θ +σ η θ , j j j j j j=1 j=1 X X where ξ ,η ,j 1 are i.i.d. standardnormalrandomvariables.Thus, X can be j j viewed as an o≥bservation of a “signal” m λ ξ θ , consisting of m “spikes”, j=1 j j j in an independent Gaussian white noise. For such models, an elegant asymp- P totic theory has been developed based on the achievements of random matrix theory (see, e.g., the results of Paul [16] on asymptotics of eigenvectorsof sam- ple covariance in spiked covariance models and references therein). The most interesting results were obtained in the case when p c for some constant n → c (0,+ ). In this case, however, the eigenvectors of the sample covariance Σˆ∈fail to∞be consistent estimators of the eigenvectors of the true covariance Σ n (see Johnstone and Lu [8]) and this difficulty could not be overcome without furtherassumptionsonthetrueeigenvectorssuchas,forinstance,theirsparsity. This led to the development of various approaches to “sparse PCA” (see, e.g., [7, 13, 15, 17, 21, 3] and references therein). In this paper, we follow a somewhat different path. It is well known that to ensure consistency of empirical spectral projectors as statistical estimators of 2assumingthat the orthonormal vectors θ1,...,θm areextended toanorthonormal basis θ1,...,θp ofRp V. Koltchinskii and K. Lounici/Asymptotics in principal component analysis 3 spectral projectors of the true covariance Σ one has to establish convergenceof Σˆ to Σ in the operator norm. In what follows, will denote the operator norm(forboundedoperatorsinH), willdekno·tke∞theHilbert–Schmidtnorm 2 k·k and will denote the nuclear norm. We also use the notation tr(Σ) for the 1 k·k trace of Σ and set tr(Σ) r(Σ):= . Σ k k∞ The last quantity is always dominated by the rank of operator Σ and it is sometimes referredto as its effective rank. It was pointed out by Vershynin[20] that the effective rank could be used to provide non-asymptotic upper bounds on the size of the operator norm Σˆ Σ with rather weak (logarithmic) k − k∞ dependence on the dimension and this approach was later used in statistical literature (see [5, 14]). In our paper [10], we proved that in the Gaussian case the size of the operator norm Σˆ Σ can be completely characterized in terms of the effective rank r(Σ)kof t−he tkr∞ue covarianceΣ and its operator norm Σ and that the resulting non-asymptotic bounds are dimension-free (see tkheko∞rems 1 and 2 below). This shows that Σˆ is an operator norm consistent estimator of Σ provided that r(Σ)=o(n), which makes the effective rank r(Σ) animportant complexity parameterof the covarianceestimation problem.This also provides a dimension-free framework for such problems and allows one to study them in a “high-complexity” case (that is, when the effective rank r(Σ) could be large) without imposing any structural assumptions on the true covariance such as, for instance, spiked covariance models [8]. This approach has been developed in some detail in our recent papers [10], [11], [12]. The current paper continues this line of work by studying the asymptotic behavior of several important statistics under the assumptions that both n and → ∞ r(Σ) ,r(Σ) = o(n). This includes statistical estimators of bias of spectral projec→tor∞s of Σˆ (empirical spectral projectors) as well as their squared Hilbert- Schmidt norm errors with a goal to develop “studentized versions” of these statisticsthatcouldbe(inprinciple)usedforstatisticalinference.Beforestating ourmainresults,weprovideinthenextsectionareviewoftheresultsofpapers [10], [11], [12] that will be extensively used in what follows. Throughoutthepaper,wewriteA.BiffA CBforsomeabsoluteconstant ≤ C > 0 (A,B 0). A & B is equivalent to B . A and A B is equivalent to ≥ ≍ A . B and A & B. Sometimes, the signs .,& and could be provided with ≍ subscripts: for instance, A . B means that A CB with a constant C that γ ≤ could depend on γ. 2. Effective rank and concentration of empirical spectral projectors: a review of recent results The following recent result (see, [10]) provides a complete characterization of the quantity E Σˆ Σ in terms of the operator norm Σ and the effective rank r(Σ) in thke c−aseko∞f i.i.d. mean zero Gaussian observkatkio∞ns. V. Koltchinskii and K. Lounici/Asymptotics in principal component analysis 4 Theorem 1. The following bound holds: r(Σ) r(Σ) E Σˆ Σ Σ . (2.1) k − k∞ ≍k k∞(cid:20)r n n (cid:21) _ In paper [10],it is alsocomplemented by a concentrationinequality for Σˆ k − Σ around its expectation: k∞ Theorem 2. There exists a constant C >0 such that for all t 1 with proba- ≥ bility at least 1 e t, − − r(Σ) t t Σˆ Σ E Σˆ Σ C Σ 1 . (2.2) (cid:12)k − k∞− k − k∞(cid:12)≤ k k∞(cid:20)(cid:18)r n _ (cid:19)rn_n(cid:21) (cid:12) (cid:12) It(cid:12)follows from (2.1) and ((cid:12)2.2) that with some constant C > 0 and with probability at least 1 e t − − r(Σ) r(Σ) t t Σˆ Σ C Σ , (2.3) k − k∞ ≤ k k∞(cid:20)r n n rn n(cid:21) _ _ _ which, in turn, implies that for all p 1 ≥ r(Σ) r(Σ) E1/p Σˆ Σ p Σ . (2.4) p k − k∞ ≍ k k∞(cid:20)r n n (cid:21) _ These results showedthat the sample covarianceΣˆ is an operatornorm con- sistent estimator of Σ even in the cases when the effective rank r(Σ) becomes largeasn ,butr(Σ)=o(n)and Σ remainsbounded.Thus,itbecomes of interest→to∞study the behavior of spkeckt∞ral projectors of sample covariance Σˆ (thatareofcrucialimportanceinPCA)insuchanasymptoticframework.This programhas been partially implemented in papers [11], [12]. To state the main results of these papers (used in what follows), we will introduce some further definitions and notations. Let Σ = µ P be the spectral representation of covariance operator r 1 r r Σ with distinct≥non zero eigenvalues µ ,r 1 (arranged in decreasing order) r P ≥ and the corresponding spectral projectors P ,r 1. Clearly, P are finite rank r r ≥ projectors with rank(P ) =: m being the multiplicity of the corresponding r r eigenvalue µ . Let σ(Σ) be the spectrum of Σ. Denote by g¯ the distance from r r the eigenvalue µ to the rest of the spectrum σ(Σ) µ (the r-th “spectral r r \{ } gap”).Itwillbealsoconvenienttoconsiderthenonzeroeigenvaluesσ (Σ),j 1 j ≥ of Σ arranged in nondecreasing order and repeated with their multiplicities (in the case when the number of non zero eigenvalues is finite, we extend this sequence by zeroes). With this notation, let ∆ := j : σ (Σ) =µ ,r 1 and r j r { } ≥ denote by Pˆ the orthogonalprojector onto the linear span of eigenspaces of Σˆ r corresponding to its eigenvalues σ (Σˆ) : j ∆ . It easily follows from a well j r { ∈ } known inequality due to Weyl that sup σ (Σˆ) σ (Σ) Σˆ Σ . j j j 1| − |≤k − k∞ ≥ V. Koltchinskii and K. Lounici/Asymptotics in principal component analysis 5 If Σˆ Σ < g¯r, this immediately implies that the eigenvalues σ (Σˆ) : j ∆k fo−rmka∞“clus2ter” that belongs to the interval (µ g¯r,µ + g¯{r)jand tha∈t r} r − 2 r 2 is separated from the rest of the spectrum of Σˆ in the sense that σ (Σˆ) j 6∈ (µ g¯r,µ +g¯r)forall j ∆ . Inthis case,Pˆ becomes anaturalestimatorof r− 2 r 2 6∈ r r P . It couldbe viewedas a randomperturbationofP andthe followingresult, r r closely related to basic facts of perturbation theory (see [9]), could be found in [11] (see Lemmas 1 and 2 there). Lemma 1. Let E :=Σˆ Σ. The following bound holds: − E Pˆr Pr 4k k∞. (2.5) k − k∞ ≤ g¯r Moreover, denote 1 C := P . r s µ µ r s s=r − X6 Then Pˆ P =L (E)+S (E), (2.6) r r r r − where L (E):=C EP +P EC (2.7) r r r r r and 2 E Sr(E) 14 k k∞ . (2.8) k k∞ ≤ (cid:18) g¯r (cid:19) Remark 1. In the case when 0 is an eigenvalue of Σ, it is convenient to extend the sum in the definition of operator C to s = with µ = 0 (see, for r ∞ ∞ instance, the proof of Lemma 5). Note, however, that P Σ = ΣP = 0 and P Σˆ = ΣˆP = 0. Thus, this additional term in the defi∞nition of C∞ does not r ∞ ∞ have any impact on L (E) (and on the parameters A (Σ),B (Σ) introduced r r r below). ThisresultessentiallyshowsthatthedifferencePˆ P canberepresentedas r r a sumof twoterms, a linearterm with respectto E =−Σˆ Σ denoted by L (E) r − and the remainder term S (E) for which bound (2.8) (quadratic with respect r to E 2 ) holds. The linear term L (E) could be further represented as a sum r of ik.i.dk.∞mean zero random operators: n L (E)=n 1 (P X C X +C X P X ), r − r j r j r j r j ⊗ ⊗ j=1 X which easily implies simple concentration bounds and asymptotic normality results for this term. On the other hand, it follows from theorems 1 and 2 that with probability at least 1 e t − − Σ 2 r(Σ) r(Σ) 2 t t 2 kSr(E)k∞ . k g¯kr2∞(cid:20) n (cid:18) n (cid:19) n (cid:18)n(cid:19) (cid:21), _ _ _ V. Koltchinskii and K. Lounici/Asymptotics in principal component analysis 6 implying that Sr(E) = oP(1) under the assumption r(Σ) = o(n) and Sr(E) = oPk(n−1/2k)∞under the assumption r(Σ) = o(n1/2) (in both cases, kprovidekd∞that kΣg¯kr∞ remains bounded). Bound on the remainder term Sr(E) of the order oP(n−1/2) makes this term negligible if the linear term Lr(E) con- verges to zero with the rate OP(n−1/2) (the standard rate of the central limit theorem). A more subtle analysis of bilinear forms S (E)u,v ,u,v H given r h i ∈ in [11] showed that the bilinear forms concentrate around their expectations at a rate oP(n−1/2) provided that r(Σ) = o(n) (which is much weaker than the assumption r(Σ) = o(n1/2) needed for the operator norm S (E) to be of r the order oP(n−1/2)). More precisely, the following result wkas prokv∞ed for the operator R :=R (E):=S (E) ES (E)=Pˆ P E(Pˆ P ) L (E) r r r r r r r r r − − − − − (see Theorem 3 in [11]): Theorem 3. Suppose that, for some γ (0,1), ∈ g¯ E Σˆ Σ (1 γ) r. (2.9) k − k∞ ≤ − 2 Then, there exists a constant D > 0 such that, for all u,v H and for all γ ∈ t 1, the following bound holds with probability at least 1 e t : − ≥ − Σ 2 r(Σ) t t t |hRru,vi|≤Dγk g¯k2∞ n n n nkukkvk. (2.10) r (cid:18)r r (cid:19)r _ _ Condition (2.9) (along with concentration bound of Theorem 2) essentially guarantees that Σˆ Σ < g¯r with a high probability, which makes the empiricalspectralkpro−jectko∞rPˆ as2mallrandomperturbationofthetruespectral r projector P and allows us to use the tools of perturbation theory. Theorem 3 r easily implies the following concentration bound for bilinear forms Pˆ u,v : r h i Corollary1. UndertheassumptionofTheorem3,withsomeconstantsD,D > γ 0, for all u,v H and for all t 1 with probability at least 1 e t, − ∈ ≥ − Σ t Pˆr EPˆru,v Dk k∞ u v − ≤ g¯ nk kk k r r (cid:12)D E(cid:12) (cid:12) Σ 2 (cid:12)r(Σ) t t t (cid:12)+Dγkg¯k2∞ (cid:12) n n n nkukkvk. (2.11) r (cid:18)r r (cid:19)r _ _ Moreover,itiseasytoseethatifbothuandv areeither inthe eigenspaceof Σ corresponding to the eigenvalue µ , or in the orthogonal complement of this r eigenspace, then the first term in the right hand side of bound (2.11) could be dropped and the bound reduces to its second term. In addition to this, in [11], the asymptotic normality of bilinear forms Pˆ r EPˆ u,v ,u,v H was also proved in an asymptotic framework where nh − r i ∈ → ∞ and r(Σ)=o(n). V. Koltchinskii and K. Lounici/Asymptotics in principal component analysis 7 Anotherimportantquestionstudiedin[11]concernsthestructureofthebias EPˆ P ofempiricalspectralprojectorPˆ .Namely,itwasprovedthatthebias r r r can −be represented as the sum of two terms, the main term P (EPˆ P )P r r r r − being “aligned” with the projector P and the remainder T being of a smaller r r order in the operator norm (provided that r(Σ) = o(n)). More specifically, the following result was proved (see Theorem 4 in [11]). Theorem 4. Suppose that, for some γ (0,1), condition (2.9) holds. Then, ∈ there exists a constant D >0 such that γ EPˆ P =P (EPˆ P )P +T r r r r r r r − − with P T P =0 and r r r m Σ 2 r(Σ) 1 r kTrk∞ ≤Dγ kg¯r2k∞r n √n. (2.12) In the case when m = rank(P ) = 1 (so, µ is an eigenvalue of Σ of multi- r r r plicity 1), the structure ofthe bias becomes especially simple. LetP =θ θ , r r r ⊗ where θ is a unit norm eigenvector of Σ corresponding to µ . Then it is easy r r to see that EPˆ P =b P +T (2.13) r r r r r − with b = (EPˆ P )θ ,θ and T defined in Theorem 4. Moreover, r r r r r r h − i b = EPˆ P ,θ θ =E θˆ ,θ 2 1, r r r r r r r h − ⊗ i h i − implying that b [ 1,0]. Thus, parameter b is an important characteristicof r r the bias of empiri∈cal−spectralprojectorPˆ . It was shownin [11] that, under the r assumption r(Σ).n, Σ 2 r(Σ) |br|. k g¯k2∞ n . (2.14) r Notethatthis upperboundislargerthanupper bound(2.12)onthe remainder T by a factor r(Σ). r k k∞ Let nowPˆr =θˆrpθˆr witha unit normeigenvectorθˆr ofΣˆ. Since the vectors θˆ ,θ are defined on⊗ly up to their signs, assume without loss of generality that r r θˆ ,θ 0.Thefollowingresult,provedin[11](seeTheorem6),showsthatthe r r h i≥ linear forms θˆ ,u have “Bernstein type” concentration around √1+b θ ,u r r r h i h i with deviations of the order OP(n−1/2). Theorem 5. Suppose that condition (2.9) holds for some γ (0,1) and also ∈ that γ 1+b . (2.15) r ≥ 2 Then, there exists a constant C > 0 such that for all t 1 with probability at γ ≥ least 1 e t − − Σ t t θˆr 1+brθr,u Cγk k∞ u . − ≤ g¯ n n k k (cid:12)D p E(cid:12) r (cid:18)r _ (cid:19) (cid:12) (cid:12) (cid:12) (cid:12) V. Koltchinskii and K. Lounici/Asymptotics in principal component analysis 8 Thus, if one constructs a proper estimator ofthe bias parameterb , it would r be possible to improve a “naive estimator” θˆ ,u of linear form θ ,u by r r h i h i reducing its bias. A version of such estimator based on the double sample X ,...,X ,X˜ ,...,X˜ of i.i.d. copies of X was suggested in [11]. If Σ˜ = Σ˜ 1 n 1 n n denotesthesamplecovariancebasedonX˜ ,...,X˜ (thesecondsubsample)and 1 n P˜ =θ˜ θ˜ denotes the corresponding empirical spectral projector (estimator r r r ⊗ of P ), then the estimatorˆb of the bias parameter b is defined as follows: r r r ˆb := θˆ ,θ˜ 1, r r r h i− where the signs of θˆ ,θ˜ are chosen so that θˆ ,θ˜ 0. Based on estimator r r r r h i ≥ ˆb , one can also define a bias corrected estimator θˇ := θˆr (which is not r r √1+ˆbr necessarily a unit vector)and provethe following result, showing that θˇ ,u is r h i a √n-consistent estimator of θ ,u (at least in the case when r(Σ) cn for a r h i ≤ sufficiently small constant c): Proposition 1. Under the assumptions and notations of Theorem 5, for some constant C >0 with probability at least 1 e t, γ − − Σ 2 r(Σ) t t t t |ˆbr−br|≤Cγk g¯k2∞ n n n n n , (2.16) r (cid:18)r r (cid:19)(cid:18)r (cid:19) _ _ _ and, for all u H, with the same probability ∈ Σ t t θˇr θr,u Cγk k∞ u . (2.17) h − i ≤ g¯ n n k k (cid:12) (cid:12) r (cid:18)r _ (cid:19) (cid:12) (cid:12) In addition to t(cid:12)his, asympt(cid:12)otic normality of θˇ ,u was also proved in [11] r h i under the assumption that r(Σ)=o(n). Finally,we willdiscussthe resultsonnormalapproximationofthe (squared) Hilbert–Schmidt norms Pˆ P 2 for an empirical spectral projector Pˆ ob- k r − rk2 r tained in [12]. It was shown in this paper that, in the case when r(Σ) = o(n), the size of the expectation E Pˆ P 2 could be characterizedby the quantity k r− rk2 A (Σ):=2tr(P ΣP )tr(C ΣC ) (which, under mild assumption, is of the same r r r r r order as r(Σ)): A (Σ) E Pˆ P 2 =(1+o(1)) r . k r− rk2 n A similar parameter characterizing the size of the variance Var( Pˆ P 2) is k r − rk2 definedasB (Σ):=2√2 P ΣP C ΣC .Namely,thefollowingresultholds r r r 2 r r 2 k k k k (Theorem 7 in [12]): Theorem 6. Suppose condition (2.9) holds for some γ (0,1). Then the fol- ∈ lowing bound holds with some constant C >0: γ n Σ 3 r(Σ) m +1 B (Σ)Var1/2(kPˆr−Prk22)−1 ≤Cγmrk g¯k3∞B (Σ)√n + rn . (2.18) (cid:12) r (cid:12) r r (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) V. Koltchinskii and K. Lounici/Asymptotics in principal component analysis 9 If kΣg¯kr∞ and mr are bounded and Brr((ΣΣ))√n →0, this implies that B2(Σ) Var( Pˆ P 2)=(1+o(1)) r . k r− rk2 n2 The main result of [12] is the following normal approximation bounds for Pˆ P 2 : k r− rk2 Theorem 7. Supposethat,for some constantsc ,c >0,m c and Σ 1 2 r 1 ≤ k k∞ ≤ c g¯ .Supposealsocondition(2.9)holdswithsomeγ (0,1).Then,thefollowing 2 r ∈ bounds hold with some constant C >0 depending only on γ,c ,c : 1 2 n sup P Pˆ P 2 E Pˆ P 2 x Φ(x) x∈R(cid:12)(cid:12) (cid:26)Br(Σ)(cid:16)k r− rk2− k r− rk2(cid:17)≤ (cid:27)− (cid:12)(cid:12) (cid:12) 1 r(Σ) B (Σ)√n (cid:12) C(cid:12) + log r 2 (cid:12) (2.19) ≤ "Br(Σ) Br(Σ)√ns (cid:18) r(Σ) (cid:19)# _ and Pˆ P 2 E Pˆ P 2 sup P k r− rk2− k r − rk2 x Φ(x) x∈R(cid:12)(cid:12) ( Var1/2(kPˆr−Prk22) ≤ )− (cid:12)(cid:12) (cid:12) (cid:12) (cid:12) 1 r(Σ) B (Σ)√n (cid:12) C(cid:12) + log r 2 (cid:12) , (2.20) ≤ "Br(Σ) Br(Σ)√ns (cid:18) r(Σ) (cid:19)# _ where Φ(x) denotes the distribution function of standard normal random vari- able. Thesebounds showthatasymptoticnormalityofproperlynormalizedstatis- tic Pˆ P 2 holds provided that n , B (Σ) and r(Σ) 0. k r − rk2 → ∞ r → ∞ Br(Σ)√n → In the case of p-dimensional spiked covariance models (with a fixed number of spikes), these conditions boil down to n , p and p=o(n). →∞ →∞ 3. Main results We startthis sectionwithintroducingapreciseasymptoticframeworkinwhich r(Σ) as n . It is assumed that an observation X = X(n) is sampled from→fro∞m a Gau→ssi∞an distributions in H with mean zero and covariance Σ = Σ(n). The data consists on n i.i.d. copies of X(n) : X = X(n),...,X = X(n) 1 1 n n andthesamplecovarianceΣˆ isbasedon(X(n),...,X(n)).Asbefore,µ(n),r 1 n 1 n r ≥ denote distinct nonzero eigenvalues of Σ(n) arranged in decreasing order and P(n),r 1 the corresponding spectral projectors. Let ∆(n) := j : σ (Σ(n)) = r r j ≥ { µ(n) andletPˆ(n) be the orthogonalprojectoronthe linearspanofeigenspaces r r } corresponding to the eigenvalues σ (Σˆ ),j ∆(n) . j n r { ∈ } V. Koltchinskii and K. Lounici/Asymptotics in principal component analysis 10 The goal is to estimate the spectral projector P(n) = P(n) corresponding rn to the eigenvalue µ(n) = µ(n) of Σ(n) with multiplicity m(n) = m(n) and with rn rn spectral gap g¯(n) =g¯(n). Define C(n) =C(n) := 1 P(n) and let rn rn s6=rn µr(nn)−µs(n) s P B :=B (Σ(n)):=2√2 C(n)Σ(n)C(n) P(n)Σ(n)P(n) . n rn k k2k k2 Assumption 1. Suppose the following conditions hold: supm(n) <+ ; (3.1) ∞ n 1 ≥ Σ(n) supk k∞ <+ ; (3.2) g¯(n) ∞ n 1 ≥ B as n ; (3.3) n →∞ →∞ r(Σ(n)) 0 as n . (3.4) B √n → →∞ n Assumption 1 easily implies that r(Σ(n)) , r(Σ(n))=o(n) as n . →∞ →∞ Also, under mild additional conditions, B Σ(n) . n 2 ≍k k Thefollowingfactisanimmediateconsequenceofbound(2.18)andTheorem 7. Proposition 2. Under Assumption 1, 2 B Var( Pˆ(n) P(n) 2)= n (1+o(1)). k − k2 n (cid:18) (cid:19) In addition, n Pˆ(n) P(n) 2 E Pˆ(n) P(n) 2 k − k2− k − k2 d Z (3.5) (cid:16) B (cid:17) −→ n and Pˆ(n) P(n) 2 E Pˆ(n) P(n) 2 k − k2− k − k2 d Z as n , (3.6) (cid:16) Var( Pˆ(n) P(n) 2) (cid:17) −→ →∞ k − k2 q Z being a standard normal random variable. Our maingoalis to developa versionof these asymptotic results forsquared Hilbert–Schmidtnormerror Pˆ(n) P(n) 2 ofempiricalspectralprojectorP(n) k − k2 with a data driven normalization that, in principle, could lead to constructing confidence sets and statistical tests for spectral projectors of covariance oper- ator under Assumption 1. This will be done only in the case when the target

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.