SubmittedtotheAnnalsofStatistics GAUSSIAN APPROXIMATION FOR THE SUP-NORM OF HIGH-DIMENSIONAL MATRIX-VARIATE U-STATISTICS AND ITS APPLICATIONS∗ By Xiaohui Chen‡† 6 University of Illinois at Urbana-Champaign‡ 1 0 ThispaperstudiestheGaussianapproximationofhigh-dimensional 2 and non-degenerate U-statistics of order two under the supremum p norm.Weproposeatwo-stepGaussianapproximationprocedurethat e does not impose structural assumptions on the data distribution. S Specifically, subject to mild moment conditions on the kernel, we 0 establishtheexplicitrateofconvergencethatdecayspolynomiallyin 3 samplesizeforahigh-dimensionalscalinglimit,wherethedimension canbemuchlargerthanthesamplesize.Wealsosupplementaprac- ] tical Gaussian wild bootstrap method to approximate the quantiles T of the maxima of centered U-statistics and prove its asymptotic va- S lidity.Thewildbootstrapisdemonstratedonstatisticalapplications . h forhigh-dimensionalnon-Gaussiandataincluding:(i)principledand t data-dependenttuningparameterselectionforregularizedestimation a m of the covariance matrix and its related functionals; (ii) simultane- ous inference for the covariance and rank correlation matrices. In [ particular, for the thresholded covariance matrix estimator with the 3 bootstrapselectedtuningparameter,weshowthattheGaussian-like v convergence rates can be achieved for heavy-tailed data, which are 9 less conservative than those obtained by the Bonferroni technique 9 that ignores the dependency in the underlying data distribution. In 1 addition,wealsoshowthatevenforsubgaussiandistributions,error 0 bounds of the bootstrapped thresholded covariance matrix estima- 0 tor can be much tighter than those of the minimax estimator with a . 2 universal threshold. 0 6 1 1. Introduction. LetX1,··· ,Xnbeasampleofindependentandiden- : tically distributed (iid) random vectors in Rp with the distribution function v i F. Let B be a separable Banach space equipped with the norm (cid:107) · (cid:107) and X h : Rp ×Rp → B be a B-valued measurable and symmetric kernel function r a such that h(x ,x ) = h(x ,x ) for all x ,x ∈ Rp and E(cid:107)h(X ,X )(cid:107) < ∞. 1 2 2 1 1 2 1 2 Consider the U-statistics of order two (cid:18)n(cid:19)−1 (cid:88) (1) U = h(X ,X ). i j 2 1≤i<j≤n ∗First version: January 30, 2016. This version: October 4, 2016. †ResearchpartiallysupportedbyNSFgrantDMS-1404891andUIUCResearchBoard Award RB15004. 1 imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016 2 CHEN The main focus of this paper is to study the asymptotic behavior of the ran- dom variable (cid:107)U−EU(cid:107) in the high-dimensional setup when p := p(n) → ∞. Since the introduction of U-statistics by Hoeffding [28], their limit theo- rems have been extensively studied in the classical asymptotic setup where n diverges and p is fixed [29, 26, 51, 2, 57, 24, 30, 31]. Recently, due to the explosive data enrichment, regularized estimation and dimension re- duction of high-dimensional data have attracted a lot of research atten- tions such as covariance matrix estimation [6, 7, 22, 14], graphical models [20, 56, 9], discriminant analysis [38], factor models [23, 34], among many others. Those problems all involve the consistent estimation of an expec- tation of U-statistics of order two E[h(X,X(cid:48))], where X and X(cid:48) are two independent random vectors in Rp with the distribution F. Below are two examples. Example 1.1. The sample covariance matrix Sˆ = (n−1)−1(cid:80)n (X − i=1 i X¯)(X − X¯)(cid:62), where X¯ = n−1(cid:80)n X is the sample mean vector, is an i i=1 i unbiased estimator of the covariance matrix Σ = Cov(X). Then Sˆ is a matrix-valued U-statistic of form (1) with the unbounded kernel 1 (2) h(x ,x ) = (x −x )(x −x )(cid:62) for x ,x ∈ Rp. 1 2 1 2 1 2 1 2 2 Example1.2. ThecovariancematrixΣquantifiesthelineardependency in X = (X ,··· ,X )(cid:62). The rank correlation is another measure for the 1 p nonlinear dependency in a random vector. For m,k = 1,··· ,p, (X ,X ) m k and (X(cid:48) ,X(cid:48)) are said to be concordant if (X −X(cid:48) )(X −X(cid:48)) > 0. Let m k m m k k (3) h (x ,x ) = 2·1{(x −x )(x −x ) > 0} mk 1 2 1m 2m 1k 2k and h(x ,x ) = {h (x ,x )}p . Kendall’s tau rank correlation coef- 1 2 mk 1 2 m,k=1 ficient matrix T = {τ }p can be written as a U-statistic with the mk m,k=1 bounded kernel h in (3) mk (cid:18)n(cid:19)−1 (cid:88) τ = h (X ,X )−1. mk mk i j 2 1≤i<j≤n Then, (τ +1)/2 is an unbiased estimator of P((X −X(cid:48) )(X −X(cid:48)) > 0), mk m m k k i.e. the probability that (X ,X ) and (X(cid:48) ,X(cid:48)) are concordant. m k m k In this paper, we are interested in the following central questions: how does the dimension impact the asymptotic behavior of U-statistics and how can we make statistical inference when p → ∞? Motivation of this paper imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016 GAUSSIAN APPROXIMATION FOR HIGH-DIMENSIONAL U-STATISTICS 3 comes from the estimation and inference problems for large covariance ma- trix and its related functionals [42, 56, 49, 46, 55, 10, 7, 14, 15]. To establish rate of convergence for the regularized estimators or to study the (cid:96)∞-norm Gaussian approximations in high-dimensions, a key issue is to characterize the supremum norm of U −EU. Therefore, as the primary concern of the current paper, we shall consider B = Rp×p and (cid:107)h(cid:107) = max |h |. 1≤m,k≤p mk OurfirstmaincontributionistoprovideaGaussianapproximationscheme for the high-dimensional non-degenerate U-statistics under the sup-norm. Different from the central limit theorem (CLT) type results for the max- ima of sums of iid random vectors [16], which are directly approximated by the Gaussian counterparts with the matching first and second moments, ap- proximatingthesup-normofU-statisticsismoresubtlebecauseofitsdepen- denceandnonlinearity.Here,weproposeatwo-stepGaussianapproximation methodinSection2.Inthefirststep,weapproximatetheU-statisticsbythe leading component of a linear form in the Hoeffding decomposition (a.k.a. the H´ajek projection); in the second, the linear term is further approxi- mated by the Gaussian random vectors. To approximate the distribution of the sup-norm of U-statistics by a linear form, a maximal moment inequality is developed to control the nonlinear and canonical, i.e. completely degen- erate, form of the reminder term. Then the linear projection is handled by the recent development of Gaussian approximation in high-dimensions [16, 59, 58]. Explicit rate of convergence of the Gaussian approximation for high-dimensional U-statistics is established for unbounded kernels subject tosub-exponentialanduniformpolynomialmomentconditions.Specifically, under either moment conditions, we show that the same convergence rate that decays polynomially in sample size as in the Gaussian approximation for the maxima of sums of iid random vectors is attained and the validity of the Gaussian approximation is proved for a high-dimensional scaling limit, where p can be much larger than n. The second contribution of this paper is to propose a Gaussian wild boot- √ strap procedure for approximating the quantiles of n(cid:107)U −EU(cid:107). Since the (unobserved) linear projection terms of the centered U-statistics depend on the unknown underlying data distribution F and there is a nonlinear re- mainder term, we use an additional estimation step beyond the Gaussian approximation. Here, we employ the idea of decoupling and estimate the linear projection on an independent dataset. Validity of the Gaussian wild bootstrap is established under the same set of assumptions in the Gaus- sian approximation results. One important feature of the Gaussian approx- imation and the bootstrap procedure is that no structural assumptions on the distribution F are required and the strong dependence in F is allowed, imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016 4 CHEN which in fact helps the Gaussian and bootstrap approximation. In Section 4, we demonstrate the capability of the proposed bootstrap method applied to a number of important high-dimensional problems, including the data- dependent tuning parameter selection in the thresholded covariance matrix estimator and the simultaneous inference of the covariance and Kendall’s taurankcorrelationmatrices.Twoadditionalapplicationsfortheestimation problems of the sparse precision matrix and the sparse linear functionals are given in the Supplemental Materials (SM). In those problems, we show that the Gaussian like convergence rates can be achieved for non-Gaussian data with heavy-tails. For the sparse covariance matrix estimation problem, we alsoshowthatthethresholdedestimatorwiththetuningparameterselected by the bootstrap procedure can gain potentially much tighter performance bounds over the minimax estimator with a universal threshold that ignores the dependency in F [7, 14, 11]. To establish the Gaussian approximation result and the validity of the bootstrap method, we have to bound the the expected sup-norm of the second-ordercanonicaltermintheHoeffdingdecompositionoftheU-statistics and establish its non-asymptotic maximal moment inequalities. An alterna- tive simple data splitting approach by reducing the U-statistics to sums of iid random matrices can give the exact rate for bounding the moments in thenon-degeneratecase[52,40,32,21].Nonetheless,thereductiontotheiid summands in terms of data splitting does not exploit the complete degener- acy structure of the canonical term and it does not lead to the convergence result in the Gaussian approximation for the non-degenerate U-statistics; see Section 5.1 for details. In addition, unlike the Hoeffding decomposition approach, the data splitting approximation is not asymptotically tight in distribution and therefore it is less useful in making inference of the high- dimensional U-statistics. (cid:80) Notations and definitions. For a vector x, we use |x| = |x |, 1 j j |x| := |x| = ((cid:80) x2)1/2, and |x| = max |x | to denote its entry-wise (cid:96)1, 2 j j ∞ j j (cid:96)2,and(cid:96)∞norms,respectively.ForamatrixM,weuse|M| = ((cid:80) M2)1/2 F i,j ij and (cid:107)M(cid:107) = max |Ma| to denote its Frobenius and spectral norms, re- 2 |a|=1 spectively. Denote a ∨ b = max(a,b) and a ∧ b = min(a,b). We shall use K,K ,K ,··· todenotepositivefiniteabsoluteconstants,andC,C(cid:48),C ,C ,··· 0 1 0 1 and c,c(cid:48),c ,c ,···, to denote positive finite constants whose values do not 0 1 depend on n and p and may vary at different places. We write a (cid:46) b if a ≤ Cb for some constant C > 0, and a (cid:16) b if a (cid:46) b and b (cid:46) a. For a random variable X, we write (cid:107)X(cid:107) = (E|X|q)1/q for q > 0. We use q (cid:107)h(cid:107) = max |h | and (cid:107)h(cid:107) = sup (cid:107)h(x ,x )(cid:107). Through- 1≤m,k≤p mk ∞ x1,x2∈Rp 1 2 imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016 GAUSSIAN APPROXIMATION FOR HIGH-DIMENSIONAL U-STATISTICS 5 out the paper, we write Xn = (X ,··· ,X ) and let X and X(cid:48) be two 1 1 n independent random vectors in Rp with the distribution F, which are in- dependent of Xn. We write Eh = E[h(X,X(cid:48))] and Eg = Eg(X), where 1 g(x) = E[h(x,X(cid:48))]−Ehandx ∈ Rp.Foramatrix-valuedkernelh : Rp×Rp → Rp×p, we say that: (i) h is non-degenerate w.r.t. F if Var(g (X)) > 0 for mk all m,k = 1,··· ,p; (ii) h is canonical or completely degenerate w.r.t. F if E[h (x ,X(cid:48))] = E[h (X,x )] = E[h (X,X(cid:48))] = 0 for all m,k = 1,··· ,p mk 1 mk 2 mk and for all x ,x ∈ Rp. Without loss of generality, we shall assume through- 1 2 out the paper that p ≥ 3 and the matrix h = {h }p is symmetric, i.e. mk m,k=1 h = h . mk km 2. Gaussian approximation. In this section, we study the Gaussian approximation for max (U − EU ) in (1), or equivalently the 1≤m,k≤p mk mk approximation for the sup-norm of the centered U-statistics by consider- ing U − EU and −U + EU. If X ’s are non-Gaussian, a seemingly intu- i itive method would be generating Gaussian random vectors Y ’s by match- i ing the first and second moments of X ; i.e. to approximate U by U(cid:48) = i (cid:0)n(cid:1)−1(cid:80) h(Y ,Y ). However, empirical evidence suggests that this 2 1≤i<j≤n i j may not be a good approximation and theoretically it seems that the non- linearity in U and U(cid:48) accounts for a statistically invalid approximation. To illustrate this point, we simulate n = 200 iid observations from the p-variate elliptic t-distribution in (66) with mean zero and degree of freedom ν = 8 in the SM. We consider the covariance matrix kernel (2) as an example. For p = 40, the P-P plot of the empirical cdfs of the sup-norm of the centered covariance matrices made from X and Y is shown in Figure 1 (left) over i i 5000 simulations. To correct the bias, a closer inspection reveals that U is an approximately linear statistic and its linear projection part in the Hoeffding decomposition is the leading term. This motivates us to propose a two-step approximation method. Let (4) g(x ) = Eh(x ,X(cid:48))−Eh, 1 1 (5) f(x ,x ) = h(x ,x )−Eh(x ,X(cid:48))−Eh(X,x )+Eh. 1 2 1 2 1 2 Clearly, f is a B-valued symmetric and canonical U-statistic of order two w.r.t. the distribution F. Then the Hoeffding decomposition of the kernel h is given by (6) h(x ,x ) = f(x ,x )+g(x )+g(x )+Eh, 1 2 1 2 1 2 imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016 6 CHEN 1.0 1.0 0.8 0.8 0.6 0.6 Y Z 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 X X Fig1.P-Pplotsofthesup-normapproximationforthecenteredsamplecovariancematrix U −EU with the kernel (2) by U(cid:48)−EU(cid:48) (left) and by the leading term in the Hoeffding decomposition of U −EU (right). from which we have n 2 (cid:88) 2 (cid:88) U −EU = f(X ,X )+ g(X ). i j i n(n−1) n 1≤i<j≤n i=1 On the right-hand side of the last expression, the second term is expected to be the leading term (a.k.a. the H´ajek projection) and the first term to be negligible under the sup-norm. Therefore, we can reasonably expect that √ n n 1 (cid:88) (U −EU) ≈ √ g(X ), i 2 n i=1 wherethelattercanbefurtherapproximatedbyn−1/2(cid:80)n Z foriidGaus- i=1 i sian random vectors Z ∼ N(0,Γ ) and Γ is the positive-definite covariance i g g matrix of g(X ); c.f. [16]. Denote p(cid:48) = p(p+1)/2. Here, we slightly abuse i notations and write g˜ = vech(g(X )) as the half-vectorized lower triangular i i matrixofg(X )bycolumns.ThereforeΓ = Cov(g˜ )isthep(cid:48)×p(cid:48) covariance i g i matrix indexed by ((j,k),(m,l)) such that j ≥ k and m ≥ l. Similarly, we shall use Z to denote either the p×p matrix or the p(cid:48) ×1 half-vectorized i version. For the previous elliptic t-distribution example, we plot the empir- ical cdfs of max (Sˆ −σ )/2 against max n−1(cid:80)n Z . 1≤m,k≤p mk mk 1≤m,k≤p i=1 i,mk Figure 1 (right) shows a much better approximation using the leading term in the Hoeffding decomposition. imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016 GAUSSIAN APPROXIMATION FOR HIGH-DIMENSIONAL U-STATISTICS 7 √ LetT = n(U−EU)/2,L = n−1/2(cid:80)n g(X ),W = n−1/2(n−1)−1(cid:80) f(X ,X ), i=1 i 1≤i<j≤n i j andZ = n−1/2(cid:80)n Z ,whereZ areiidN(0,Γ ).DenoteT¯ = max T , i=1 i i g 0 m,k mk L¯ = max L , and Z¯ = max Z . Let 0 m,k mk 0 m,k mk ρ(T¯ ,Z¯ ) = sup|P(T¯ ≤ t)−P(Z¯ ≤ t)| 0 0 0 0 t∈R betheKolmogorovdistancebetweenT¯ andZ¯ .LetB ≥ 1beasequenceof 0 0 n realnumberspossiblytendingtoinfinity.Weconsidertwotypesofconditions on the kernel moments. First, we establish the explicit convergence rate for the kernels with sub-exponential moments; e.g. the ε-contaminated normal distribution (65) in the SM. Theorem 2.1 (Gaussian approximation for centered U-statistics: sub– exponential kernel). Let U be a non-degenerate U-statistic of order two. Assume that there exist constants C ,C ∈ (0,∞) and K ∈ (0,1) such that 1 2 (GA.1) Kernel moment: Eg2 ≥ C and mk 1 (7) max E(|h |2+(cid:96)/B(cid:96))∨E[exp(|h |/B )] ≤ 2 mk n mk n (cid:96)=0,1,2 for all 1 ≤ m,k ≤ p; (GA.2) Scaling limit: B2log7(pn) (8) n ≤ C n−K. 2 n Then there exists a constant C > 0 depending only on C ,C such that 1 2 (9) ρ(T¯ ,Z¯ ) ≤ Cn−K/8. 0 0 TheassumptionsinTheorem2.1havemeaningfulinterpretations.(GA.1) ensures the non-degeneracy of the Gaussian approximation and that the truncation does not lose too much information due to the sub-exponential tails. (GA.2) describes the high-dimensional scaling limit of valid Gaus- sian approximation range. In the high-dimensional context, the dimension p grows with the sample size n and the distribution function F also de- pends on n. Therefore, B is allowed to increase with n. Theorem 2.1 shows n that the approximation error in the Kolmogorov distance converges to zero even if p can be much larger than n and no structural assumptions on F are required. In particular, Theorem 2.1 applies to kernels with the sub- exponential distribution such that (cid:107)h (cid:107) ≤ Cq for all q ≥ 1, in which case mk q B = O(1) and the dimension p is allowed to have a subexponential growth n rate in the sample size n, i.e. p = O(exp(n(1−K)/7)). Condition (GA.1) also covers bounded kernels (cid:107)h(cid:107) ≤ B , where B may increase with n. ∞ n n imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016 8 CHEN Remark 1. Theorem 2.1 shows that the asymptotic validity of Gaus- sian approximation for centered non-degenerate U-statistics holds under the high-dimensionalscalinglimit(GA.2),whichinvolvesonlyapolynomialfac- tor of logp. However, the sup-norm convergence rate n−K/8 obtained in (9) isslowerthann−1/2.Similarobservationshavebeenmadeintheexistinglit- erature on the Berry-Esseen type bounds [47, 3] for the normalized sums of iid random vectors X ∈ Rp with mean zero and the identity covariance ma- i √ trix. [47] showed that the sample mean nX¯ has the asymptotic normality √ if p = o( n) and [3] showed that √ sup|P( nX¯ ∈ A)−P(Z ∈ A)| ≤ Kp1/4E|X |3/n1/2, 1 A∈A whereAistheclassofallconvexsubsetsinRp,Z ∼ N(0,Id ),andK > 0is p an absolute constant. In either case, the dependence of the CLT rate on the dimension p is polynomial (p/n1/2 and p7/4/n1/2, resp). [16] considered the √ Gaussian approximation for max nX¯ and they obtained the rate n−c j≤p j for some (unspecified) exponent c > 0. Following the proofs of Theorem 2.1 inthecurrentpaperandTheorem2.2andCorollary2.1in[16],wecanshow that c is allowed to take the value K/8. Therefore, the effect of higher-order terms than the H´ajek projection to a linear subspace in the Hoeffding de- composition vanishes in the Gaussian approximation. A similar observation is made for the uniform polynomial moment kernels; c.f. Theorem 2.2. For multivariate symmetric statistics of order two, to the best of our knowledge, the Gaussian approximation result (9) with the explicit convergence rate is new. When p is fixed, the rate of convergence and the Edgeworth expansion of such statistics can be found in [5, 25, 4]. In those papers, assuming the Cram´er condition on g(X ) and suitable moment conditions on h(X ,X ), 1 1 2 the Edgeworth expansion of U-statistics was established for the univariate case (p = 1) with remainder o(n−1/2) or O(n−1) [5, 4] and the multivariate case (p > 1 fixed) with remainder o(n−1/2) [25]. In the latter work [25], it is unclear that how the constant in the error bound depends on the di- mensionality parameter p. On the contrary, our Theorem 2.1 can allow p to be larger than n in order to obtain the CLT type results in much higher dimensions. Next, we consider kernels with uniform polynomial moments (up to the fourth order); e.g. the elliptical t-distribution (66) in the SM. Theorem 2.2 (Gaussian approximation for centered U-statistics: kernel with uniform polynomial moment). Let U be a non-degenerate U-statistic of order two. Assume that there exist constants C ,C ∈ (0,∞) and K ∈ 1 2 (0,1) such that imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016 GAUSSIAN APPROXIMATION FOR HIGH-DIMENSIONAL U-STATISTICS 9 (GA.1’) Kernel moment: Eg2 ≥ C and mk 1 (10) max E(|h |2+(cid:96)/B(cid:96))∨E[((cid:107)h(cid:107)/B )4] ≤ 1 mk n n (cid:96)=0,1,2 for all 1 ≤ m,k ≤ p; (GA.2’) Scaling limit: B4log7(pn) (11) n ≤ C n−K. 2 n Then there exists a constant C > 0 depending only on C ,C such that (9) 1 2 holds. Theorem 2.1 and 2.2 allow us to approximate the quantiles of T¯ by those 0 ofZ¯ ,withtheknowledgeofΓ .Inpractice,thecovariancematrixΓ andthe 0 g g H´ajek projection terms g(X ),i = 1,··· ,n, depend on the data distribution i F, which is unknown. Thus, quantiles of Z¯ need to be estimated in real 0 applications. However, we shall see in Section 3 that Theorem 2.1 and 2.2 canstillbeusedtoderiveafeasibleresamplingbasedmethodtoapproximate the quantiles of Gaussian maxima Z¯ and therefore T¯ . 0 0 3. Wild bootstrap. The main purpose of this section is to approxi- matethequantilesofT¯ .LetX(cid:48),··· ,X(cid:48) beanindependentcopyofX ,··· ,X 0 1 n 1 n that are observed; call this training data. Such data can always be obtained by a half-sampling or data splitting on the original data. Therefore, we as- sume that the sample size of total data {X ,··· ,X ,X(cid:48),··· ,X(cid:48) } is 2n. 1 n 1 n Since g(X ),i = 1,··· ,n, are unknown, we construct an estimator for it. i Let gˆ := gˆ(X ,··· ,X ,X(cid:48),··· ,X(cid:48) ) be an estimator of g(X ) using the i i 1 n 1 n i original and training data. Recall that g(x) = Eh(x,X(cid:48))−Eh(X(cid:48),X(cid:48)) for j i j any fixed x ∈ Rp, which can be viewed as the population version for the sec- ond variable X(cid:48). Therefore, we build an empirical version as our estimator j of g(X ). Specifically, we consider i 1 (cid:88)n (cid:18)n(cid:19)−1 (cid:88) (12) gˆ = h(X ,X(cid:48))− h(X(cid:48),X(cid:48)). i n i j 2 j l j=1 1≤j<l≤n Conditional on X , gˆ is an unbiased estimator of g(X ). It is interesting to i i i view gˆ as a decoupled estimator of g(X ). Let i i n n 1 (cid:88) 1 (cid:88) Lˆ∗ = max √ gˆ e and L¯∗ = max √ g (X )e , 0 1≤m,k≤p n i,mk i 0 1≤m,k≤p n mk i i i=1 i=1 imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016 10 CHEN where e ,e ,··· are iid standard Gaussian random variables that are also 1 2 independent of X ,··· ,X ,X(cid:48),··· ,X(cid:48) . Then Lˆ∗ and L¯∗ are bootstrapped 1 n 1 n 0 0 versions of L¯ . Denote the conditional quantiles of Lˆ∗ and L¯∗ given the data 0 0 0 X ,··· ,X ,X(cid:48),··· ,X(cid:48) as 1 n 1 n a (α) = inf{t ∈ R : P (Lˆ∗ ≤ t) ≥ α}, Lˆ∗ e 0 0 a (α) = inf{t ∈ R : P (L¯∗ ≤ t) ≥ α}, L¯∗ e 0 0 where P is the probability taken w.r.t. e ,··· ,e . Now, we can compute e 1 n the conditional quantile a (α) by the Gaussian wild bootstrap method. Lˆ∗ 0 Specifically, a (α) can be numerically approximated by resampling on the Lˆ∗ 0 multiplier Gaussian random variables e ,··· ,e and we wish to use a (α) 1 n Lˆ∗ 0 to approximate the quantiles of T¯ . 0 Theorem 3.1 (Asymptotically validity of Gaussian wild bootstrap for centered U-statistics). Let U be a non-degenerate U-statistic of order two. (i) (Subexponential kernel) If (GA.1) and (GA.2) hold for some con- stants C ,C ∈ (0,∞) and K ∈ (0,1), then there exist a constant C > 0 1 2 depending only on C ,C such that for all α ∈ (0,1) 1 2 (13) |P(T¯ ≤ a (α))−α| ≤ Cn−K/8. 0 Lˆ∗ 0 (ii) (Uniform polynomial kernel) If (GA.1’) and (GA.2’) hold for some constants C ,C ∈ (0,∞) and K ∈ (0,1), then there exist a constant C > 0 1 2 depending only on C ,C such that for all α ∈ (0,1) 1 2 (14) |P(T¯ ≤ a (α))−α| ≤ Cn−K/12. 0 Lˆ∗ 0 Remark 2. From Theorem 3.1, the convergence rate of the wild boot- strap approach for subexponential kernels is the same as the Gaussian ap- proximation results (Theorem 2.1), while it is slower for kernels with uni- form polynomial moment of the order four (Theorem 2.2). The major error in the latter case is due to the estimation of L¯∗ by Lˆ∗ in the wild boot- 0 0 strap. Under (GA.1’) and (GA.2’), the approximation error of Lˆ∗ for L¯∗ 0 0 is on the order O(n−1/4B (log(np))1/2); see Lemma C.7 in the SM. This n is different from the previous work [16], which does not need this extra estimation step for g(X ),i = 1,··· ,n, since only sums of iid random vec- i tors n−1/2(cid:80)n X are involved. Therefore, for sums of iid random vec- i=1 i tors, the wild bootstrap can attain the rate n−K/8 for both subexponential and uniform polynomial moment (of the order four) observations. However, with better moment conditions on the U-statistic kernel, the rate n−K/8 imsart-aos ver. 2014/01/08 file: ustat-June-24-2016.tex date: October 4, 2016

