ebook img

Central limit theorem related to MDR-method PDF

0.17 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Central limit theorem related to MDR-method

Central limit theorem related to MDR-method Alexander Bulinski1,2 In many medical and biological investigations, including genetics, it is typical to handle high dimensional data which can be viewed as a set of values of some factors and a binary response variable. For instance, the response variable can describe the state of a patient 3 1 health and one often assumes that it depends only on some part of factors. An important 0 problem is to determine collections of significant factors. In this regard we turn to the MDR- 2 method introduced by M.Ritchie and coauthors. Our recent paper provided the necessary n a and sufficient conditions for strong consistency of estimates of the prediction error employ- J ing the K-fold cross-validation and an arbitrary penalty function. Here we introduce the 8 regularized versions of the mentioned estimates and prove for them the multidimensional 2 CLT. Statistical variants of the CLT involving self-normalization are discussed as well. ] R Keywords and phrases: binary response variable, significant factors, penalty function, P cross-validation, MDR-method, SLLN for arrays, strong consistency, regularized estimates, . h multidimensional CLT, self-normalization. t a m AMS classification: 60F05; 60F15; 62P10. [ 1 v 9 1 Introduction 0 6 6 High dimensional data arise naturally in a number of experiments. Very often such data . 1 are viewed as the values of some factors X ,...,X and the corresponding response variable 0 1 n 3 Y. For example, in medical studies such response variable Y can describe the health state 1 (e.g., Y = 1 or Y = 1 mean “sick” or “healthy”) and X ,...,X and X ,...,X are : 1 m m+1 n v − genetic and non-genetic factors, respectively. Usually X (1 i m) characterizes a single Xi i ≤ ≤ nucleotide polymorphism (SNP), i.e. a certain change of nucleotide bases adenine, cytosine, r thymine and guanine (these genetic notions can be found, e.g., in [2]) in a specified segment a of DNA molecule. In this case one considers X with three values, for instance, 0,1 and 2 i (see, e.g., [4]). It is convenient to suppose that other X (m + 1 i n) take values in i ≤ ≤ 0,1,2 as well. For example, the range of blood pressure can be partitioned into zones of { } low, normal and high values. However, further we will suppose that all factors take values in arbitrary finite set. The binary response variable can also appear in pharmacological experiments where Y = 1 means that the medicament is efficient and Y = 1 otherwise. − A challenging problem is to find the genetic and non-genetic (or environmental) factors which could increase the risk of complex diseases such as diabetes, myocardial infarction and others. Nowthemostpartofspecialists sharetheparadigmthatincontrast tosimple disease (such as sickle anemia) certain combinations of the “damages” of the DNA molecule could be responsible for provoking the complex disease whereas the single mutations need not have 1Faculty of Mathematics and Mechanics, Lomonosov Moscow State University, Moscow 119991,Russia. E-mail: [email protected] 2The work is partially supported by RFBR grant 13-01-00612. 1 dangerous effects (see, e.g., [15]). The important research domain called the genome-wide association studies(GWAS)inspiresdevelopment ofnewmethodsforhandlinglargemassives of biostatistical data. Here we will continue our treatment of the multifactor dimensionality reduction (MDR) method introduced by M.Ritchie et al. [13]. The idea of this method goes back to the Michalski algorithm. A comprehensive survey concerning the MDR method is provided in [14], on subsequent modifications and applications see, e.g., [5], [7] – [12], [17] and [18]. Other complementary methods applied in GWAS are discussed, e.g., in [4], there one can find further references. In [3] the basis for application of the MDR-method was proposed when one uses an ar- bitrary penalty function to describe the prediction error of the binary response variable by means of a function in factors. The goal of the present paper is to establish the new mul- tidimensional central limit theorem (CLT) for statistics which permit to justify the optimal choice of a subcollection of the explanatory variables. 2 Auxiliary results Let X = (X ,...,X ) be a random vector with components X : Ω 0,1,...,q where 1 n i → { } i = 1,...,n(q,narepositiveintegers). Thus, X takesvaluesinX = 0,1,...,q n. Introduce { } a random (response) variable Y : Ω 1,1 , non-random function f : X 1,1 and → {− } → {− } a penalty function ψ : 1,1 R (the trivial case ψ 0 is excluded). The quality of + {− } → ≡ approximation of Y by f(X) is defined as follows Err(f) := E Y f(X) ψ(Y). (1) | − | Set M = x X : P(X = x) > 0 and { ∈ } F(x) = ψ( 1)P(Y = 1 X = x) ψ(1)P(Y = 1 X = x), x M. − − | − | ∈ Itisnotdifficult toshow(see[3])thatthecollectionofoptimal functions, i.e. allfunctions f : X 1,1 which are solutions of the problem Err(f) inf, has the form → {− } → f = I A I A , A , (2) { }− { } ∈ A I A stands for an indicator of A (I ∅ := 0) and consists of sets { } { } A A = x M : F(x) < 0 B C. { ∈ }∪ ∪ Here B is an arbitrary subset of x M : F(x) = 0 and C is any subset of M := X M. If { ∈ } \ we take A∗ = x M : F(x) < 0 , then A∗ has the minimal cardinality among all subsets { ∈ } of . In view of the relation ψ( 1)+ψ(1) = 0 we have A − 6 A∗ = x M : P(Y = 1 X = x) > γ(ψ) , γ(ψ) := ψ( 1)/(ψ( 1)+ψ(1)). (3) { ∈ | } − − If ψ(1) = 0 then A∗ = ∅. If ψ(1) = 0 and ψ( 1)/ψ(1) = a where a R then A∗ = + x M : P(Y = 1 X = x) > a/(1+a6) . Note tha−t we can rewrite (1) as f∈ollows { ∈ | } Err(f) = 2 ψ(y)P(Y = y,f(X) = y). 6 y∈{−1,1} X 2 The value Err(f) is unknown as we do not know the law of a random vector (X,Y). Thus, statistical inference on the quality of approximation of Y by means of f(X) is based on the estimate of Err(f). Let ξ1,ξ2,... be i.i.d. random vectors with the same law as a vector (X,Y). For N N ∈ set ξ = ξ1,...,ξN . To approximate Err(f), as N , we will use a prediction N { } → ∞ algorithm. It involves a function f = f (x,ξ ) with values 1,1 which is defined PA PA N {− } for x X and ξ . In fact we use a family of functions f (x,v ) defined for x X and N PA m ∈ ∈ v V where V := (X 1,1 )m, m N, m N. To simplify the notation we write m m m ∈ ×{− } ∈ ≤ f (x,v ) instead of fm (x,v ). For S 1,...,N (” ” means non-strict inclusion ” ”) PA m PA m ⊂ { } ⊂ ⊆ put ξ (S) = ξj,j S and S := 1,...,N S. For K N (K > 1) introduce a partition N { ∈ } { }\ ∈ of 1,...,N formed by subsets { } S (N) = (k 1)[N/K]+1,...,k[N/K]I k < K +NI k = K , k = 1,...,K, k { − { } { }} here [b] is the integer part of a number b R. Generalizing [4] we can construct an estimate ∈ of Err(f) using a sample ξ , a prediction algorithm with f and K-fold cross-validation N PA where K N, K > 1 (on cross-validation see, e.g., [1]). Namely, let ∈ 1 K ψ(y,S (N))I Yj=y,f (Xj,ξ (S (N)))=y k PA N k Err (f ,ξ ) := 2 { 6 }. (4) K PA N K ♯S (N) k y∈X{−1,1} Xk=1j∈XSk(Nb) b For each k = 1,...,K, random variables ψ(y,S (N)) denote strongly consistent estimates k (as N ) of ψ(y), y 1,1 , constructed from data Yj,j S (N) , and ♯S stands k → ∞ ∈ {− } { ∈ } for a finite set S cardinality. We call Err b(f ,ξ ) an estimated prediction error. K PA N The following theorem giving a criterion of validity of the relation b Err (f ,ξ ) Err(f) a.s., N , (5) K PA N → → ∞ was established in [3] (furbther on a sum over empty set is equal to 0 as usual). Theorem 1 Let f define a prediction algorithm for a function f: X 1,1 . Assume PA → {− } that there exists such set U X that for each x U and any k = 1,...,K one has ⊂ ∈ f (x,ξ (S (N))) f(x) a.s., N . (6) PA N k → → ∞ Then (5) is valid if and only if, for N , → ∞ K I f (x,ξ (S (N)))= 1 L(x) I f (x,ξ (S (N)))=1 L(x) 0 a.s. (7) PA N k PA N k { − } − { } → k=1 x∈X+ x∈X− X(cid:0)X X (cid:1) Here X+ := (X U) x M : f(x) = 1 , X− := (X U) x M : f(x) = 1 and \ ∩{ ∈ } \ ∩{ ∈ − } L(x) = ψ(1)P(X = x,Y = 1) ψ( 1)P(X = x,Y = 1), x X. − − − ∈ The sense of this result is the following. It shows that one has to demand condition (7) outside the set U (i.e. outside the set where f provides the a.s. approximation of f) to PA obtain (5). 3 Corollary 1 ([3]) Let, for a function f: X 1,1 , a prediction algorithm be defined by → {− } f . Suppose that there exists a set U X such that for each x U and any k = 1,...,K PA ⊂ ∈ relation (6) is true. If L(x) = 0 for x (X U) M ∈ \ ∩ then (5) is satisfied. Note also that Remark 4 from [3] explains why the choice of a penalty function proposed by Velez et al. [17]: ψ(y) = c(P(Y = y))−1, y 1,1 , c > 0, (8) ∈ {− } is natural. Further discussion and examples can be found in [3]. 3 Main results and proofs In many situations it is reasonable to suppose that the response variable Y depends only on subcollection X ,...,X of the explanatory variables, k ,...,k being a subset of k1 kr { 1 r} 1,...,n . It means that for any x M { } ∈ P(Y = 1 X = x ,...,X = x ) = P(Y = 1 X = x ,...,X = x ). (9) | 1 1 n n | k1 k1 kr kr In the framework of the complex disease analysis it is natural to assume that only part of the risk factors could provoke this disease and the impact of others can be neglected. Any collection k ,...,k implying (9) is called significant. Evidently if k ,...,k is signifi- 1 r 1 r { } { } cant then any collection m ,...,m such that k ,...,k m ,...,m is significant 1 i 1 r 1 i { } { } ⊂ { } as well. For a set D X let π D := u = (x ,...,x ) : x = (x ,...,x ) D . For ⊂ k1,...,kr { k1 kr 1 n ∈ } B X where X := 0,1,...,q r define in X = X a cylinder r r n ∈ { } C (B) := x = (x ,...,x ) X : (x ,...,x ) B . k1,...,kr { 1 n ∈ k1 kr ∈ } For B = u where u = (u ,...,u ) X we write C (u) instead of C ( u ). { } 1 r ∈ r k1,...,kr k1,...,kr { } Obviously P(Y = 1 X = x ,...,X = x ) P(Y = 1 X C (u)), | k1 k1 kr kr ≡ | ∈ k1,...,kr here u = π x , i.e. u = x , i = 1,...,r. (10) k1,...,kr{ } i ki For C X, N N and W 1,...,N set N ⊂ ∈ ⊂ { } I Yj = 1,Xj C P (Y = 1 X C) := j∈WN { ∈ }. (11) WN | ∈ I Xj C P j∈WN { ∈ } b P When C = X we write simply P (Y = 1) in (11). According to the strong law of large WN numbers for arrays (SLLNA), see, e.g., [16], for any C X with P(X C) > 0 ⊂ ∈ b P (Y = 1 X C) P(Y = 1 X C) a.s., ♯W , N . WN | ∈ → | ∈ N → ∞ → ∞ b 4 If (9) is valid then the optimal function f∗ defined by (2) with A = A∗ introduced in (3) has the form 1, if P(Y = 1 X C (u)) > γ(ψ) and x M, fk1,...,kr(x) = | ∈ k1,...,kr ∈ (12) ( 1, otherwise, − here u and x satisfy (10) (P(X C (u)) P(X = x) > 0 as x M). Hence, for each ∈ k1,...,kr ≥ ∈ significant k ,...,k 1,...,n and any m ,...,m 1,...,n one has 1 r 1 r { } ⊂ { } { } ⊂ { } Err(fk1,...,kr) Err(fm1,...,mr). (13) ≤ For arbitrary m ,...,m 1,...,n , x X, u = π x and a penalty function { 1 r} ⊂ { } ∈ m1,...,mr{ } ψ we consider the prediction algorithm with a function fm1,...,mr such that PA 1, P (Y = 1 X C b (u)) > γ (ψ), x M, fm1,...,mr(x,ξ (W )) = WN | ∈ m1,...,mr WN ∈ (14) PA N N ( 1, otherwise, − b b b here γ (ψ) is a strongly consistent estimate of γ(ψ) constructed by means of ξ (W ). WN N N Introduce U := x M : P(Y = 1 X = x ,...,X = x ) = γ(ψ) . (15) b { ∈ | m1 m1 mr mr 6 } Using Corollary 1 (and in view of Examples 1 and 2 of [3]) we conclude that for any m ,...,m 1,...,n 1 r { } ⊂ { } Err (fm1,...,mr,ξ ) Err(fm1,...,mr) a.s., N . (16) K PA N → → ∞ For each ε > 0,bany sbignificant collection k1,...,kr 1,...,n and arbitrary set { } ⊂ { } m ,...,m 1,...,n due to relations (13) and (16) one has 1 r { } ⊂ { } Err (fk1,...,kr,ξ ) Err (fm1,...,mr,ξ )+ε a.s. (17) K PA N ≤ K PA N when N is large enougbh. b b b Thus, for a given r = 1,...,n 1, according to (17) we come to the following conclusion. − It is natural to choose among factors X ,...,X a collection X ,...,X leading to the 1 n k1 kr smallest estimated prediction error Err (fk1,...,kr,ξ ). After that it is desirable to apply the K PA N permutation tests (see, e.g., [4] and [6]) for validation of the prediction power of selected factors. We do not tackle here theb choicbe of r, some recommendations can be found in [14]. Note also in passing that a nontrivial problem is to estimate the importance of various collections of factors, see, e.g., [15]. Remark 1. It is essential that for each m ,...,m 1,...,n we have strongly 1 r { } ⊂ { } consistent estimates of Err(fm1,...,mr). So to compare these estimates we can use the sub- set of Ω having probability one. If we had only the convergence in probability instead of a.s. convergence in (16) then to compare different Err (fm1,...,mr,ξ ) one should take into K PA N account the Bonferroni corrections for all subsets m ,...,m of 1,...,n . 1 r { } { } Further on we consider a function ψ having thebform b(8). In view of (3) w.l.g. we can assume that c = 1 in (8). In this case γ(ψ) = P(Y = 1). Introduce events A (y) = Yj = y, j S (N) , N N, k = 1,...,K, y 1,1 , N,k k { − ∈ } ∈ ∈ {− } 5 and random variables I A (y) N,k ψ (y) := { } , N,k P (Y = y) Sk(N) where we write ψ (y) instead ofbψ(y,S (N)). Trivial cases P(Y = y) 0,1 are excluded N,k k b ∈ { } and we formally set 0/0 := 0. Then b b P(Y = y) P (Y = y) 1 ψ (y) ψ(y) = − Sk(N) I A (y) I A (y) . (18) N,k − P (Y = y)P(Y = y) { N,k }− P(Y = y) { N,k } Sk(N) b b Clearly, b I A (y) 0 a.s., N , (19) N,k { } → → ∞ and the following relation is true I A (y) 1 N,k { } a.s., N . (20) P (Y = y) → P(Y = y) → ∞ Sk(N) Therefore, by virtue of (18) – (20) we have that for y 1,1 and k = 1,...,K b ∈ {− } ψ (y) ψ(y) 0 a.s., N . (21) N,k − → → ∞ Let m1,...,mr 1,.b..,n . We define the functions which can be viewed as the { } ⊂ { } regularized versions of the estimates fm1,...,mr of fm1,...,mr (see (14) and (12)). Namely, for PA WN 1,...,N , N N, and ε = (εN)N∈N where non-random positive εN 0, asN , ⊂ { } ∈ → → ∞ put b 1, P (Y =1 X C (u)) > γ (ψ)+ε , x M, fm1,...,mr(x,ξ (W )) = WN | ∈ m1,...,mr WN N ∈ PA,ε N N ( 1, otherwise, − b b b where u = π x . Regularization of fm1,...,mr means that instead of the threshold m1,...,mr{ } PA γ (ψ) we use γ (ψ)+ε . WN WN N Take now U appearing in(15). Applying Cborollary1onceagain(andinview of Examples 1 and 2 of [3]) we can claim that the statements which are analogous to (16) and (17) are b b valid for the regularized versions of the estimates introduced above. Now we turn to the principle results, namely, central limit theorems. Theorem 2 Let ε 0 and N1/2ε as N . Then, for each K N, any subset N N → → ∞ → ∞ ∈ m ,...m of 1,...,n , the corresponding function f = fm1,...,mr and prediction algorithm 1 r { } { } defined by f = fm1,...,mr, the following relation holds: PA PA,ε √Nb (Err (f ,ξ ) Err(f)) law Z N(0,σ2), N , (22) K PA N − −→ ∼ → ∞ where σ2 is variance obf the random variable I Y = y V = 2 { } (I f(X) = y P(f(X) = y Y = y)). (23) P(Y = y) { 6 }− 6 | y∈{−1,1} X 6 Proof. For a fixed K N and any N N set ∈ ∈ K 2 1 T (f) := ψ(y) I Yj = y,f(Xj) = y , N K ♯S (N) { 6 } k Xk=1 y∈X{−1,1} j∈XSk(N) K 2 1 T (f) := ψ (y) I Yj = y,f(Xj) = y . N N,k K ♯S (N) { 6 } k Xk=1 y∈X{−1,1} j∈XSk(N) b b One has Err (f ,ξ ) Err(f) = (Err (f ,ξ ) T (f)) K PA N K PA N N − − +(T (f) T (f))+(T (f) Err(f)). (24) b N − N bN − b First of all we show that b P √N(Err (f ,ξ ) T (f)) 0, N . (25) K PA N N − −→ → ∞ For x X, y 1,1 , k =b1,...,K and N b N introduce ∈ ∈ {− } ∈ F (x,y) := I f (x,ξ (S (N))) = y I f(x) = y . N,k PA N k { 6 }− { 6 } Then K 2 1 Err (f ,ξ ) T (f) = ψ (y) I Yj = y F (Xj,y). (26) K PA N N N,k N,k − K ♯S (N) { } k Xk=1 y∈X{−1,1} j∈XSk(N) b b b We define the random variables 1 B (y) := I Yj = y F (Xj,y) N,k N,k ♯S (N) { } k j∈XSk(N) p and verify that for each k = 1,...,K P ψ (y)B (y) 0, N . (27) N,k N,k −→ → ∞ y∈{−1,1} X b Clearly (27) implies (25) in view of (26) as ♯S (N) = [N/K] for k = 1,...,K 1 and k − (1) (2) [N/K] ♯S (N) < [N/K]+K. Write B (y) = B (y)+B (y) where ≤ K N,k N,k N,k 1 B(1) (y)= I Xj U) I Yj = y F (Xj,y), N,k ♯S (N) { ∈ } { } N,k k j∈XSk(N) p 1 B(2) (y)= I Xj /U I Yj = y F (Xj,y). N,k ♯S (N) { ∈ } { } N,k k j∈XSk(N) Obviously p 1 B(1) (y) I f (x,ξ (S (N))) = y I f(x) = y . | N,k | ≤ ♯S (N) | { PA N k 6 }− { 6 }| Xx∈U k j∈XSk(N) p 7 Functions f and f take values in the set 1,1 . Thus, for any x U (where U is PA {− } ∈ defined in (15)), k = 1,...,K and almost all ω Ω relation (6) ensures the existence of ∈ an integer N (x,k,ω) such that f (x,ξ (S (N))) = f(x) for N N (x,k,ω). Hence 0 PA N k 0 ≥ (1) B (y) = 0 for any y belonging to 1,1 , each k = 1,...,K and almost all ω Ω when N,k {− } ∈ N N (ω) = max N (x,k,ω). Evidently, N < a.s., because ♯U < . We obtain 0,k x∈U 0 0,k ≥ ∞ ∞ that (1) ψ (y)B (y) 0 a.s., N . (28) N,k N,k → → ∞ y∈{−1,1} X b If U = X then B(2) (y) = 0 for all N,k and y under consideration. Consequently, (27) is N,k valid and thus, for U = X, relation (25) holds. Let now U = X. Then for k = 1,...,K and 6 N N one has ∈ (2) ψ (y)B (y) = H (x,y)+ H (x,y), N,k N,k N,k N,k y∈X{−1,1} xX∈X+y∈X{−1,1} xX∈X−y∈X{−1,1} b here X = (X U) x X : f(x) = 1 , X = (X U) x X : f(x) = 1 and + − \ ∩{ ∈ } \ ∩{ ∈ − } ψ (y) H (x,y):= N,k I Aj(x,y) (I f (x,ξ (S (N)))=y I f(x)=y ) N,k PA N k ♯S (N) { } { 6 }− { 6 } b k j∈XSk(N) p where Aj(x,y) = Xj = x,Yj = y . The definition of U yields that X = ∅ and + { } X = M x M : P(Y = 1 X = x ,...,X = x ) = γ(ψ) . − ∪{ ∈ | m1 m1 mr mr } Set Rj (x) = I Xj = x (ψ (1)I Yj = 1 ψ ( 1)I Yj = 1 ). N,k { } N,k { }− N,k − { − } It is easily seen that b b b Rj (x) H (x,y) = I f (x,ξ (S (N))) = 1) N,k . N,k PA N k − { } ♯S (N) xX∈X−y∈X{−1,1} xX∈X− j∈XSk(N) b k p Note that Rj (x) = 0 a.s. for all x M, k = 1,...,K, j = 1,...,N and N N. Let us N,k ∈ ∈ prove that, for any x M X and k = 1,...,K, − ∈ ∩ b P I f (x,ξ (S (N))) = 1 0, N . (29) PA N k { } −→ → ∞ For any ν > 0 and x M X we have − ∈ ∩ P(I f (x,ξ (S (N))) = 1 > ν) PA N k { } = P P (Y = 1 X = x ,...,X = x ) > γ (ψ)+ε . Sk(N) | m1 m1 mr mr Sk(N) N (cid:16) (cid:17) Nowweshowthat,fork = 1,...,K,thisprobabilitytendsto0asN . ForW 1,...,N b N b → ∞ ⊂ { } and x M X , put − ∈ ∩ 1 ηj ∆ (W ,x) := P ♯WN j∈WN > γ (ψ)+ε N N 1 ζj WN N ♯WN Pj∈WN ! P b 8 where ηj = I Yj = 1,Xj = x ,...,Xj = x , ζj = I Xj = x ,...,Xj = x , { m1 m1 mr mr} { m1 m1 mr mr} j = 1,...,N. Set p = P(X = x ,...,X = x ). It follows that, for any α > 0, m1 m1 mr mr N ∆ (W ,x) N N ηj 1 P j∈WN > γ (ψ)+ε , ζj p < α , γ (ψ) γ(ψ) < α ≤ ζj WN N ♯W − N WN − N (cid:16)Pj∈WN (cid:12) N jX∈WN (cid:12) (cid:12) (cid:12) (cid:17) 1 (cid:12) 1 (cid:12) (cid:12) (cid:12) + P P ζjb p α +(cid:12)P I (cid:12)Yj = 1 (cid:12)b P(Y = 1) (cid:12)α . (30) N N ♯W − ≥ ♯W { }− ≥ N N (cid:16)(cid:12) jX∈WN (cid:12) (cid:17) (cid:16)(cid:12) jX∈WN (cid:12) (cid:17) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) Due to the Hoeffding inequality 1 P ζj p α 2exp 2♯W α2 =: δ (W ,α ). ♯W − ≥ N ≤ − N N N N N N (cid:16)(cid:12) jX∈WN (cid:12) (cid:17) (cid:8) (cid:9) (cid:12) (cid:12) (cid:12) (cid:12) We have an analogous estimate for the last summand in (30). Consequently, taking into account that p > 0 we see that for all N large enough 1 ∆ (W ,x) P ηj > (p α )(γ(ψ) α +ε ) +2δ (W ,α ). N N N N N N N N ≤ ♯W − − N (cid:16) jX∈WN (cid:17) Whenever x M X one has − ∈ ∩ P(Y = 1,X = x ,...,X = x ) = P(Y = 1)P(X = x ,...,X = x ), m1 m1 mr mr m1 m1 mr mr therefore ηj Eηj ∆ (W ,x) P − > ♯W pε α (γ(ψ)+p α +ε ) +2δ (W ,α ). N N N N N N N N N N ≤ √♯W − − N (cid:16)jX∈WN p (cid:0) (cid:1)(cid:17) The CLT holds for an array ηj,j W ,N N consisting of i.i.d. random variables, N { ∈ ∈ } thus 1 (ηj Eηj) law Z N(0,σ2), √♯W − −→ ∼ 0 N jX∈WN here σ2 = varI Y = 1,X = x ,...,X = x . Hence ∆ (W ,x) 0 if, for some 0 { m1 m1 mr mr} N N → α > 0, N α ♯W , ε ♯W , α /ε 0 as N . (31) N N N N N N → ∞ → ∞ → → ∞ Take W = S (N)pwith k = 1,...,Kp. Then ♯S (N) (K 1)[N/K] for k = 1,...,K and N k k ≥ − we conclude that (31) is satisfied when ε N1/2 as N if we choose a sequence N → ∞ → ∞ (αN)N∈N in appropriate way. So, relation (29) is established. Let R (x) = I Xj = x (ψ(1)I Yj = 1 ψ( 1)I Yj = 1 ), x X, j N. j { } { }− − { − } ∈ ∈ For all x M X one has − ∈ ∩ 1 1 Rj (x) = R (x) + N,k j ♯S (N) ♯S (N) k j∈XSk(N) k j∈XSk(N) p b p 9 (ψ (1) ψ(1))I Yj = 1 (ψ ( 1) ψ( 1))I Yj = 1 + I Xj = x N,k − { }− N,k − − − { − }. { } ♯S (N) j∈XSk(N) b kb Note that ER (x) = 0 for all j N and x X .pThe CLT for an array of i.i.d. random j − ∈ ∈ variables R (x),j S (N),N N provides that j k { ∈ ∈ } 1 R (x) law Z N(0,σ2(x)), N , ♯S (N) j −→ 1 ∼ 1 → ∞ k j∈XSk(N) p where σ2(x) = var(I X = x (ψ(1)I Y = 1 ψ( 1)I Y = 1 )), x X . For each 1 { } { } − − { − } ∈ − y 1,1 , ∈ {− } 1 (ψ (y) ψ(y)) I Xj = x I Yj = y N,k − ♯S (N) { } { } k j∈XSk(N) b 1 p = (ψ (y) ψ(y)) (I Xj = x I Yj = y EI Xj = x I Yj = y ) N,k − ♯S (N) { } { }− { } { } k j∈XSk(N) b +p(ψ (y) ψ(y)) ♯S (N)P(X = x,Y = y). N,k k − Due to the CLT p b I Xj = x I Yj = y EI Xj = x I Yj = y { } { }− { } { } law Z N(0,σ2(x,y)) ♯S (N) −→ 2∼ 2 j∈XSk(N) k p as N , where σ2(x,y) = varI Xj = x,Yj = y . In view of (21) we have → ∞ 2 { } ψN,k(y)−ψ(y) (I Xj = x I Yj = y EI Xj = x I Yj = y ) P 0 ♯S (N) { } { }− { } { } −→ b k j∈XSk(N) p as N . Now we apply (18) – (20) once again to conclude that → ∞ (ψ (y) ψ(y)) ♯S (N) law Z N(0,σ2(y)), N , N,k − k −→ 3 ∼ 3 → ∞ with σ2(y) = P(Y = y)(P(Y =py))−3. Thus, 3 b − (2) P ψ (y)B (y) 0, N . (32) N,k N,k −→ → ∞ y∈{−1,1} X b Taking into account (28) and (32) we come to (27) and consequently to (25). Now we turn to the study of T (f) T (f) appearing in (24). One has N N − √N(T (f) T (f)) b N N − 2√N K 1 = (ψ b(y) ψ(y)) I Yj = y,f(Xj) = y . N,k K ♯S (N) − { 6 } k Xk=1 y∈X{−1,1} j∈XSk(N) b Put Zj = I Yj = y,f(Xj) = y , j = 1,...,N. For each k = 1,...,K { 6 } 1 (ψ (y) ψ(y)) I Yj = y,f(Xj) = y = N,k − ♯S (N) { 6 } y∈X{−1,1} k j∈XSk(N) b p 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.