ebook img

Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes PDF

1.4 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes

Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes Juan A. Cuesta-Albertos1 Eduardo García-Portugués2,4 Manuel Febrero-Bande3 Wenceslao González-Manteiga3 Abstract 7 We consider marked empirical processes, indexed by a randomly projected functional co- 1 variate, to construct goodness-of-fit tests for the functional linear model with scalar response. 0 The test statistics are built from continuous functionals over the projected process, resulting 2 in computationally efficient tests that exhibit root-n convergence rate and circumvent the curse r of dimensionality. The weak convergence of the process is obtained conditionally on a random a M direction, whilst it is proved the almost surely equivalence between the testing for significance expressedontheoriginalandontheprojectedfunctionalcovariate. Thecomputationofthetest 4 in practise involves the calibration by wild bootstrap resampling and the combination of several 2 p-values arising from different projections by means of the false discovery rate method. The finite sample properties of the test are illustrated in a simulation study for a variety of linear ] E models, underlying processes and alternatives. The software provided implements the tests and M allows the replication of simulations and data applications. . t Keywords: Empirical process; Functional data; Functional linear model; Functional principal components; a t Goodness-of-fit; Random projections. s [ 2 1 Introduction v 3 Since Karl Pearson introduced the term “goodness-of-fit” at the beginning of the twentieth century 6 3 there has been an enormous amount of papers on this topic. First concentrated in fitting a model 8 for one distribution function, and later, especially after the papers of Bickel and Rosenblatt (1973) 0 and Durbin (1973), in more general models related with the regression function. The literature is . 1 vast, and we refer to González-Manteiga and Crujeiras (2013) for an updated review of the topic. 0 7 1 Theideasofgoodness-of-fitfordensityanddistributionhavebeennaturallyextendedinthenineties : v ofthelastcenturytoregressionmodels. Considering, asareference, aregressionmodelwithrandom i X design Y = m(X)+ε the goal is to test r a H : m ∈ M = {m : θ ∈ Θ ⊂ Rq} vs. H : m ∈/ M 0 Θ θ 1 Θ inanomnibuswayfromasample{(X ,Y )}n of(X,Y). Herem(x) = E[Y|X = x]istheregression i i i=1 function of Y over X and ε is a random error centred and such that E[ε|X] = 0. Following the ideas on smoothing for testing about the density function (Bickel and Rosenblatt, 1973), the usual pilot estimator for m was a nonparametric one, for example, the Nadaraya- Watson estimator (Nadaraya (1964), Watson (1964)): mˆ (x) = (cid:80)n W (x)Y , with W (x) = h i=1 ni i ni 1Department of Mathematics, Statistics and Computer Science, University of Cantabria (Spain). 2Department of Statistics, Carlos III University of Madrid (Spain). 3DepartmentofStatistics,MathematicalAnalysisandOperationsResearch,UniversityofSantiagodeCompostela (Spain). 4Corresponding author. e-mail: [email protected]. 1 K(cid:0)(x − X )/h(cid:1)(cid:14)(cid:80)n K(cid:0)(x − X )/h(cid:1), where K is a kernel function and h is a bandwidth pa- i j=1 j rameter. Other possible weights such as the ones from local linear estimation, k-nearest neigh- bours or splines were also used in different studies. Using these kind of pilot estimators, statis- tical tests were given by T = d(cid:0)mˆ,m (cid:1), being d some functional distance and θˆ an estimator √ n θˆ of θ such that n(θˆ − θ) = OP(1) under H0. In an alternative way, following the paper by Durbin (1973) for testing about the distribution, the pilot estimator in the regression case was given by I (x) = n−1(cid:80)n Y 1 , the empirical estimation of the integrated regression func- tion I(x) =n E(cid:2)Y1 (cid:3)i=.1Hiär{dXlei≤axn}d Mammen (1993) using mˆ and Stute (1997) using I are {X≤x} h n key references for these two approximations in the literature, which were only the beginning of more thantwohundredpaperspublishedinthelasttwodecades(González-ManteigaandCrujeiras,2013). More recently, it has been of interest testing about a possible structure in a regression setting where we have functional covariates: Y = m(X)+ε, (1) being now X a random element in a functional space, for example in the Hilbert space H = L2[0,1], and Y a scalar response. This is the context of “Functional Data Analysis”, which has received an increasing attention in the last decade (see for example Ramsay and Silverman (2005), Ferraty and Vieu (2006) and Horváth and Kokoszka (2012)), specially motivated by the practical needs of analysing data generated from high-resolution measuring devices. A very simple null hypothesis H considered in the literature for the model (1) is H : m(X) = c, 0 0 wherec ∈ Risafixedconstant: thetestingofsignificanceofthecovariateXoverY. Followingsome of the ideas from Ferraty and Vieu (2006) on considering pseudometrics for performing smoothing with functional data, the test by Härdle and Mammen (1993) was adapted by Delsol et al. (2011a): (cid:90) T = (cid:0)mˆ (x)−Y¯(cid:1)2ω(x)dP (x), n h X n (cid:20) (cid:18) (cid:19) (cid:30) n (cid:18) (cid:19)(cid:21) (cid:88) d(x,Xi) (cid:88) d(Xi,Xj) mˆ (x) = K Y K , h i h h i=1 j=1 being d a functional pseudometric, K a kernel function adapted to this situation, h a bandwidth parameter, w a weight function and P the probability measure induced by X over the functional X space of the covariate. The testing about H has been also considered by Cardot et al. (2003) or, 0 more recently, Hilgert et al. (2013), not in an omnibus way but inside a Functional Linear Model (FLM):m(X) = (cid:104)X,ρ(cid:105), where(cid:104)·,·(cid:105)representstheinnerproductinH andρ ∈ H istheFLMparam- eter. For both approximations, omnibus or not, there has been also some recent papers considering the case of functional response; see for example, Cardot et al. (2007), Chiou and Müller (2007), Kokoszka et al. (2008) and Bücher et al. (2011). The generalization of the previous functional hypothesis to the general case H : m ∈ M = {m : ρ ∈ P} vs. H : m ∈/ M , (2) 0 P ρ 1 P where P can be either of finite or infinite dimension, was the focus of very few papers, specially in the context of omnibus goodness-of-fit tests. In Delsol et al. (2011b) a discussion is given, without theoretical results, for the extension of the checking of a more complex null hypothesis such as a FLM.OnlyonepaperisknownforuswheretheFLMhypothesisisanalysedwiththeoreticalresults. In Patilea et al. (2012), motivated by the smoothing test statistic considered by Zheng (1996) for finite dimensional covariates, one test based on (cid:18) (cid:19) 1 (cid:88) 1 Fn,h((cid:104)Xi,h(cid:105))−Fn,h((cid:104)Xj,h(cid:105)) T = (Y −mˆ (X ))(Y −mˆ (X )) K , n,h n(n−1) i H0 i j H0 j h h 1≤i(cid:54)=j≤n 2 is employed for checking the null hypothesis of linearity with mˆ (X) = (cid:104)X,ρˆ(cid:105), with ρˆ a suitable H0 estimator of ρ and F the empirical distribution function of {(cid:104)X ,h(cid:105)}n . In the same spirit, n,h i i=1 Lavergne and Patilea (2008) gave a test for the finite dimensional context and Patilea et al. (2016) forfunctionalresponse. Fromadifferentperspective,andmotivatedbythetestofEscanciano(2006) for finite dimensional predictors, in García-Portugués et al. (2014) a test was constructed from the marked empirical process I (x) = 1 (cid:80)n Y 1 , with x ∈ R and h ∈ H. This approach n,h n i=1 i {(cid:104)Xi,h(cid:105)≤x} circumvents the technical difficulties that a marked empirical process indexed by x ∈ H, a possible functional extension of the process in Stute (1997), would represent. Inthispaperweconsidermarkedempiricalprocessesindexedbyrandomprojectionsofthefunctional covariate. Themotivationstemsfromthealmostsurely(a.s.) characterizationofthenullhypothesis (2)viaaprojected hypothesis thatarisesfromtheconditionalexpectationontheprojectedfunctional covariate. This allows, conditionally on a randomly chosen h, the study of the weak convergence of the process I (x) for hypothesis testing of infinite dimension. As a by-product, we obtain root-n n,h goodness-of-fit tests that evade the curse of dimensionality and, contrary to smoothing-based tests, do not rely on a tuning parameter. Particularly, we focus on the testing of the aforementioned hypothesis of functional linearity where, contrary to the finite dimensional situation, the functional estimator has a non-trivial effect on the limiting process and requires a careful regularization. The test statistics are built by a continuous functional (Kolmogorov-Smirnov or Cramér-von Mises) over the empirical process and are effectively calibrated by a wild bootstrap on the residuals. To account for a higher power and less influence from h, we consider a number K (not to confuse with a kernel function) of different random projections and merge the resulting p-values into a final p-value by means of the False Discovery Rate (FDR) of Benjamini and Yekutieli (2001). The empirical anal- ysis reports a competitive performance of the test in practice, with a low impact of the choice of K above a certain bound, and an expedient computational complexity of O(n) that yields notable speed improvements over García-Portugués et al. (2014). Therestofthepaperisorganizedasfollows. Thecharacterizationofthenullhypothesisthroughthe projected predictor is addressed in Section 2, together with an application for the testing of the null hypothesis H : m = m (Subsection 2.1). Section 3 is devoted to testing the composite hypothesis 0 0 H : m ∈ {(cid:104)·,ρ(cid:105) : ρ ∈ H}. To that aim, the regularized estimator for ρ of Cardot et al. (2007), 0 ρˆ, is reviewed in Subsection 3.1. The pointwise asymptotic distribution of the projected process is studied in Subsection 3.2, whereas Subsection 3.3 gives its weak convergence. Section 4 describes the implementation of the test and other practicalities. Section 5 illustrates the finite sample prop- erties of the test by a simulation study and with some real data applications. Some final comments and possible extensions are given in Section 6. Appendix A presents the main proofs, whereas the supplementary material contains the auxiliary lemmas and further results for the simulation study. Some general setting and notation are introduced now. The random variable (r.v.) X belongs to a separable Hilbert space H endowed with the inner product (cid:104)·,·(cid:105) and associated norm (cid:107)·(cid:107). The space H is a general real Hilbert space but for simplicity can be regarded as H = L2[0,1]. Y and X are assumed to be centred r.v.’s, and ε is a centred, independent from X, r.v. with variance σ2. ε The independence between ε and X is a technical assumption required for proving Lemmas A.4 and A.5, while for the rest of the paper it suffices with E[ε|X] = 0. Given the H-valued r.v. X and h ∈ H, we denote by Xh = (cid:104)X,h(cid:105) to the projected X in the direction h. Bold letters are used for vectors either in H (mainly) or in Rp, and its kind is clearly determined by the context. Capital letters represent r.v.’s defined on the same probability space (Ω,σ,ν). Weak convergence is denoted by (cid:32)L and D(R) represents the Skorohod’s space of càdlàg functions defined on R. Finally, we shall implicitly assume that the null hypotheses stated hold a.s. 3 2 Hypothesis projection The pillar of the goodness-of-fit tests we present is the a.s. characterization of the null hypothesis (2), re-expressed as H : E[Y −m (X)|X] = 0 for some ρ ∈ P, by means of the associated projected 0 ρ hypothesis on h ∈ H, defined as Hh : E(cid:2)Y −m (X)|Xh(cid:3) = 0. We identify Y −m (X) by Y for the 0 ρ ρ sake of simplicity in notation. We give in this section two necessary and sufficient conditions based on the projections of X to that E[Y|X] = 0 holds a.s. The first condition only requires the integrability of Y, but the condition needs to be satisfied for every direction h. Proposition 2.1. Assume that E[|Y|] < ∞. Then, E[Y|X] = 0 a.s. ⇐⇒ E(cid:2)Y|Xh(cid:3) = 0 a.s. for every h ∈ H. The second condition, more adequate for application, somehow generalizes Proposition 2.1, as it only needs to be satisfied for a randomly chosen h. In exchange, it holds only under some additional conditions on the moments of X and Y. Before stating it we need some preliminary results, being the first one included here for the sake of completeness. Lemma 2.2 (Theorem 4.1 in Cuesta-Albertos et al. (2007)). Let µ be a non-degenerate Gaussian measure on H, let X ,X be two H-valued r.v.’s and denote by X ∼ X if they are identically 1 2 1 2 distributed. Assume that: (a) m := (cid:82) (cid:107)X (cid:107)kdν < ∞, for all k ≥ 1, and (cid:80)∞ m−1/k = ∞. k 1 k=1 k (b) The set {h ∈ H : Xh ∼ Xh} is of positive µ-measure. 1 2 Then X ∼ X . 1 2 Remark 2.2.1. It is not strictly needed that µ be a Gaussian distribution in Lemma 2.2 and this can be replaced by assuming a certain smoothness condition on µ (see, for instance, Theorem 2.5 and Example 2.6 in Cuesta-Albertos et al. (2007)). Remark 2.2.2. Assumption (a) in Lemma 2.2 is not of technical nature. According to Theorem 3.6 in Cuesta-Albertos et al. (2007), it becomes apparent that a similar condition is required. This assumption is satisfied if the tails of P are light enough or if X has a finite moment generating X1 1 function in a neighbourhood of zero. Lemma 2.3. If E(cid:2)Y2(cid:3) < ∞ and X satisfies (a) in Lemma 2.2, then l := E(cid:2)(cid:107)X(cid:107)k|Y|(cid:3) < ∞, for k all k ≥ 1, and (cid:80)∞ l−1/k = ∞. k=1 k The second condition and most important result in this section is given as follows. Theorem 2.4. Let µ be a non-degenerate Gaussian measure on H. Assume that X satisfies (a) in Lemma 2.2 and that E(cid:2)Y2(cid:3) < ∞. Then, E[Y|X] = 0 a.s. ⇐⇒ H := (cid:8)h ∈ H : E(cid:2)Y|Xh(cid:3) = 0 a.s.(cid:9) has positive µ-measure. 0 Corollary 2.5. Under the assumptions of the previous theorem, E[Y|X] = 0 a.s. ⇐⇒ µ(cid:0)H (cid:1) = 1. 0 According to this corollary, it happens that if we are interested in testing the simple null hypothesis H : E[Y|X] = 0 we can do it as follows: i) select at random with µ a direction h ∈ H; ii) 0 conditionally on h, test the projected null hypothesis Hh : E(cid:2)Y|Xh(cid:3) = 0. The rationale is simple 0 yet powerful: if H holds, then Hh also holds; if H fails, then Hh also fails µ-a.s. In this case, with 0 0 0 0 probability one we have chosen a direction h for which Hh fails. Of course, the main advantage 0 to test Hh over testing H directly is that in Hh the conditioning r.v. is real, which simplifies the 0 0 0 problem substantially. 4 2.1 Testing a simple null hypothesis An immediate application of Corollary 2.5 is the testing of the simple null hypothesis H : m = m 0 0 via the empirical process of Stute (1997). Recall that other testing alternatives can be considered on the projected covariate due to the µ-a.s. characterization. We refer to González-Manteiga and Crujeiras (2013) for a review on alternatives. For a random sample {(X ,Y )}n from (X,Y), we can consider the empirical process of the regres- i i i=1 sion conditioned on the direction h, n (cid:88) R (x) := n1/2I (x) = n−1/2 1 Y , x ∈ R n,h n,h {Xh≤x} i i i=1 and then the following result is trivially satisfied from Theorem 1.1 in Stute (1997). Corollary 2.6. Under Hh and E(cid:2)Y2(cid:3) < ∞, R (cid:32)L G in D(R), being G a Gaussian process 0 n,h 1 1 with zero mean and covariance function K (s,t) := (cid:82)s∧tVar(cid:2)Y|Xh = u(cid:3) dF (u), where F is the 1 −∞ h h distribution function of Xh. Different statistics for the testing of Hh can be built from continuous functionals on R (x). We 0 n,h shall cover this with more detail in Section 3. Example 2.7. Consider the FLM Y = (cid:104)X,ρ(cid:105) + ε in H = L2[0,1], with X a Gaussian process with associated Karhunen-Loéve expansion (4) and ε independent from X. Then Xh and Xρ are centred Gaussians with variances σ2 and σ2, respectively, and Cov[Xh,Xρ] = (cid:80)∞ h ρ λ , with h ρ j=1 j j j h = (cid:104)h,e (cid:105), ρ = (cid:104)ρ,e (cid:105). Hence, j j j j (cid:90) s∧t(cid:32)σ2σ2 −(cid:0)(cid:80)∞ h ρ λ (cid:1)2 (cid:33) K1(s,t) = ρ h j=1 j j j +σ2 φ(u/σ )/σ du σ2 ε h h −∞ h (cid:32)σ2σ2 −(cid:0)(cid:80)∞ h ρ λ (cid:1)2 (cid:33) = ρ h j=1 j j j +σ2 Φ((s∧t)/σ ), σ2 ε h h where φ and Φ are the density and distribution functions of a N(0,1), respectively. 3 Testing the functional linear model We focus now in testing the composite null hypothesis, expressed as H : m(X) = (cid:104)X,ρ(cid:105) = Xρ, for some ρ ∈ H. 0 According to Corollary 2.5, it happens that testing H is µ-a.s. equivalent to test 0 Hh : E(cid:104)(Y −Xρ)(cid:12)(cid:12)Xh(cid:105) = 0, for some ρ ∈ H, 0 where h is sampled from a non-degenerate Gaussian law µ. Again, we construct the associated empirical regression process indexed by the projected covariate following Stute (1997). Therefore, given an estimate ρˆ of ρ under H , we have 0 n T (x) := a (cid:88)1 (cid:16)Y −Xρˆ(cid:17) = a (cid:0)T1 (x)+T2 (x)+T3 (x)(cid:1), (3) n,h n {Xh≤x} i i n n,h n,h n,h i i=1 5 where a → 0 is a normalizing positive sequence to be determined later and n n T1 (x) := (cid:88)1 (cid:0)Y −Xρ(cid:1), n,h {Xh≤x} i i i i=1 n (cid:88)(cid:68) (cid:104) (cid:105) (cid:69) T2 (x) := 1 X −E 1 X ,ρ−ρˆ , n,h {Xh≤x} i {Xh≤x} i i=1 (cid:68) (cid:104) (cid:105) (cid:69) T3 (x) :=n E 1 X ,ρ−ρˆ . n,h {Xh≤x} The selection of the right estimator ρˆ has a crucial role in the weak convergence of T3 , which n,h poses a substantially complexer proof than for the simple hypothesis. We consider the regularized estimate proposed in Sections 2 and 3 of Cardot et al. (2007) (denoted by CMS in the sequel), whose construction is sketched here for the sake of exposition of our results. 3.1 Construction of the estimator of ρ Consider the so-called Karhunen-Loéve expansion of X: ∞ (cid:88) 1/2 X = λ ξ e , (4) j j j j=1 where {e }∞ is a sequence of orthonormal eigenfunctions associated to the covariance operator j j=1 of X, Γz := E[(X⊗X)(z)] = E[(cid:104)z,X(cid:105)X], z ∈ H and the ξ ’s are centred real r.v.’s (because X j is centred) such that E(cid:2)ξ ξ (cid:3) = δ , where δ is the Kronecker’s delta. We assume that the j j(cid:48) j,j(cid:48) j,j(cid:48) multiplicity of each eigenvalue is one, so λ > λ > ... > 0. 1 2 The functional coefficient ρ is determined by the equation ∆ = Γρ, with ∆ the cross-covariance operator of X and Y, ∆z := E[(X⊗Y)(z)] = E[(cid:104)z,X(cid:105)Y], z ∈ H. To ensure the existence and uniqueness of a solution to ∆ = Γρ we require the next basic assumptions: A1. X and Y satisfy (cid:80)∞ 1 (cid:104)E[XY],e (cid:105)2 < ∞. j=1 λ2 j j A2. Ker(Γ) = {0}. The estimation of ρ requires the inversion of Γ := 1 (cid:80)n X ⊗X , but since Γ is a.s. a finite rank n n i=1 i i n operator, its inverse does not exist. CMS proposed a regularization yielding a family of continuous estimators for Γ−1. We employ the one from Example 1 in CMS, which provides an empirical finite rank inverse of Γ denoted Γ† (we use Γ† for the population version). Consider a sequence of n thresholdsc ∈ (0,λ ),n ∈ N,withc → 0. Then: i)computetheFunctionalPrincipalComponents n 1 n (FPC) of Γ , i.e., calculate its eigenvalues {λˆ } and eigenfunctions {eˆ }; ii) define the sequences n j j {δ }, with δ := λ −λ and δ := min(λ −λ ,λ −λ ) for j > 1, and set j 1 1 2 j j j+1 j−1 j k := sup{j ∈ N : λ +δ /2 ≥ c }; n j j n iii) compute Γ† (respectively Γ†) as the finite rank operator with the same eigenfunctions as Γ n n (resp. Γ) and associated eigenvalues equal to λˆ−1 (resp. λ−1) if j ≤ k and 0 otherwise. The j j n regularized estimator of ρ is ρˆ := Γ†∆ = 1 (cid:88)kn (cid:88)n (cid:104)Xi⊗Yi,eˆj(cid:105)eˆ. (5) n n n λˆ j j=1 i=1 j Note that (5) is not readily computable in practise, since {λ } is usually unknown (and hence k ). j n As in CMS, we consider the (random) finite rank d := sup{j ∈ N : λˆ ≥ c } n j n 6 as a replacement in practise for the deterministic k . As seen in Lemma A.2, ν[k = d ] → 1, hence n n n the estimator (5) has the same asymptotic behaviour with either k or d . Therefore, we consider n n k in (5) due to the convenient probabilistic tractability. n The following assumptions allow to obtain meaningful asymptotic convergences: A3. E(cid:2)(cid:107)X(cid:107)2(cid:3) < ∞. A4. (cid:80)∞ |(cid:104)ρ,e (cid:105)| < ∞. l=1 l A5. For j large, λ = λ(j) with λ(·) a convex positive function. j A6. λnn4 = O(1). logn (cid:110) (cid:111) A7. inf |(cid:104)ρ,e (cid:105)|, √ λkn = O(n−1/2). kn knlogkn (cid:110) (cid:111) A8. sup max(cid:0)E(cid:2)ξ4(cid:3),E(cid:2)|ξ |5(cid:3)(cid:1) ≤ M < ∞, for M ≥ 1. j j j A9. c = O(cid:0)n−1/2(cid:1). n A brief summary of these assumptions is given as follows. A3 is standard to obtain asymptotic distributions, allows decomposition (4) and implies E(cid:2)Y2(cid:3) < ∞, required in Theorem 1.1 of Stute (1997). A4 and A5 are A.1 and A.2 in CMS. A6 is very similar to one assumption in the second part of Theorem 2 in CMS. A7 is the minimum requirement to control (cid:104)X,L (cid:105) when Lemma 7 in n CMS is used to prove Lemma A.7. A8 is a reinforcement of A.3 in CMS, where only fourth order moments are used. The reason is because we handle inner products of ρˆ times a non independent r.v., while in CMS the r.v. is not used to estimate ρ. A9 is useful, mainly (but also see the final part of Lemma A.6) to control the behaviour of k . We show this fact in Proposition A.1, with a n conclusion very close to the assumption (8) in CMS and coinciding with one of the conditions of their Theorem 3 if lim t < ∞ (the term t is defined in (6)). Finally, we point out that n n,E n,E x,h x,h in CMS the assumptions aim to control the behaviour of k while here we have targeted to control n the threshold c , as this can be modified by the statistician. n 3.2 Pointwise asymptotic distribution of T n,h Corollary 2.6 gives the weak convergence of n−1/2T1 . We analyse now the pointwise behaviour of n,h T2 (x) and T3 (x) for a fixed x ∈ R. We will show that T2 (x) = oP(n1/2) and that the rate of n,h n,h n,h T3 (x) depends on the key normalizing sequence {t }, where n,h n,Ex,h (cid:118) tn,x := (cid:117)(cid:117)(cid:116)(cid:88)kn (cid:104)x,λej(cid:105)2 and Ex,h := E(cid:104)1{Xh≤x}X(cid:105). (6) j j=1 Theorem 3.1. Under Hh and A1–A9, for a fixed x ∈ R, it happens that: 0 (a) n−1/2t−1 T3 (x) (cid:32)L N(0,σ2). n,E n,h ε x,h (b) If lim t = ∞, then with a = n−1/2t−1 in (3) the asymptotic distribution of T (x) n n,Ex,h n n,Ex,h n,h is the one of n−1/2t−1 T3 (x). n,E n,h x,h (c) If lim t < ∞, then with a = n−1/2 in (3) the asymptotic distribution of T (x) is the n n,Ex,h n n,h one of n−1/2(cid:0)T1 (x)+T3 (x)(cid:1). n,h n,h 7 The behaviour of the sequence {t }, indexed by n ∈ N and with arbitrary h ∈ H and x ∈ R, is n,E x,h crucial for the convergence of T . Since {t } is non-decreasing, it has always a limit (finite or n,h n,Ex,h infinite). Its asymptotic behaviour is described next. (cid:0) 1/2(cid:1) Proposition 3.2. The sequence {t } has asymptotic orders between O(1) and O k . In n,E n x,h addition, if X is Gaussian and satisfies A3, then σ2 := Var(cid:2)Xh(cid:3) < ∞ and lim t = φ(x/σ ). h n n,Ex,h h 3.3 Weak convergence of T and of the test statistics n,h The result given in Theorem 3.1 holds for every x ∈ R. For the case (c) of Theorem 3.1 (where the estimation of ρ is not dominant) and under an additional assumption, the result can be generalized to functional weak convergence. Theorem 3.3. Under Hh, A1–A9 and (c) in Theorem 3.1, it happens that: 0 (a) The finite dimensional distributions of T converge to a multivariate Gaussian with covari- n,h ance function K (s,t) := K (s,t)+C(s,t)+C(t,s)+V(s,t), where 2 1 (cid:90) C(s,t) := Var[Y|X = u](cid:10)E ,Γ−1u(cid:11) dP (u), t,h X {uh≤s} (cid:90) V(s,t) := Var[Y|X = u](cid:10)E ,Γ−1u(cid:11)(cid:10)E ,Γ−1u(cid:11) dP (u). s,h t,h X (b) If E(cid:2)||ρˆ −ρ||4(cid:3) = O(n−2), then T (cid:32)L G in D(R), being G a Gaussian process with zero n,h 2 2 mean and covariance function K . 2 Remark 3.3.1. According to Theorem 1 in CMS, it is impossible for ρˆ −ρ to converge to a non- degenerate random element in the topology of H. In order to circumvent this issue, we make the assumption E(cid:2)||ρˆ −ρ||4(cid:3) = O(n−2) which implies ||ρˆ −ρ|| = OP(n−1/2), thus a finite-dimensional parametric convergence rate for ρˆ. In practice this means that ρ lives in a finite dimensional subspace of H. The next result gives the convergence of the Kolmogorov-Smirnov (KS) and Cramér-von Mises (CvM) statistics for testing the FLM. Corollary 3.4. Under the assumptions in Theorem 3.3 and E(cid:2)||ρˆ −ρ||4(cid:3) = O(n−2), if ||T || := n,h KS sup |T (x)| and ||T || := (cid:82) T (x)2dF (x), then x∈R n,h n,h CvM R n,h n,h (cid:90) ||T || (cid:32)L ||G || and ||T || (cid:32)L G (x)2dF (x). n,h KS 2 KS n,h CvM 2 h R Remark 3.4.1. An alternative to (b) and Corollary 3.4 is to consider a deterministic discretization of the statistics, for which the convergence in law is trivial from (a). For example, if ||T || := n,h K(cid:102)S max |T (x )| for a grid {x ,...,x }, then ||T || (cid:32)L ||Z || , where Z ∼ N (0,Σ), k=1,...,G n,h k 1 G n,h K(cid:102)S 2 K(cid:102)S 2 G Σ = K (x ,x ). ij 2 i j 4 Testing in practise The major advantage to test Hh over H is that in Hh the conditioning r.v. is real. The potential 0 0 0 drawbacksofthisuniversalmethodareapossiblelossofpowerandthattheoutcomeofthetestmay vary for different projections. Both inconveniences can be alleviated by sampling several directions h ,...,h , testing the projected hypotheses Hh1,...,HhK and selecting an appropriate way to 1 K 0 0 mix the resulting p-values. For example, by the FDR method proposed in Benjamini and Yekutieli (2001) (see Section 2.2.2 of Cuesta-Albertos and Febrero-Bande (2010)) it is possible to control the final rejection rate to be at most α under H . The procedure is described in the following generic 0 algorithm. 8 Algorithm 4.1 (Testing procedure for H ). Let T denote a test for checking Hh with h chosen by 0 n 0 a non-degenerate Gaussian measure µ on H. i) For i = 1,...,K, set by p the p-value of Hhi obtained with the test T . i 0 n ii) Set the final p-value of H as min Kp , where p ≤ ... ≤ p . 0 i=1,...,K i (i) (1) (K) The calibration of the test statistic for Hh is done by a wild bootstrap resampling. The next 0 algorithm states the steps for testing the FLM. The particular case of the simple null hypothesis corresponds to ρ = 0, so its calibration corresponds to setting ρˆ = ρˆ∗ = 0 in the algorithm. Algorithm 4.2 (Bootstrap calibration in FLM testing). Let {(X ,Y )}n be a random sample from i i i=1 (1). To test H : m ∈ {(cid:104)·,ρ(cid:105) : ρ ∈ H} proceed as follows: 0 i) Estimate ρ by FPC for a given d and obtain εˆ = Y −(cid:104)X ,ρˆ(cid:105). n i i i (cid:12)(cid:12) (cid:12)(cid:12) ii) Compute ||T || = (cid:12)(cid:12)n−1/2(cid:80)n 1 εˆ(cid:12)(cid:12) with N either KS or CvM. n,h N (cid:12)(cid:12) i=1 {Xh≤x} i(cid:12)(cid:12) i N iii) Bootstrap resampling. For b = 1,...,B, do: √ √ a) Draw binary i.i.d. r.v.’s V∗,...,V∗ such that P(cid:8)V∗ = (1− 5)/2(cid:9) = (5+ 5)/10 and √ 1 √ n P(cid:8)V∗ = (1+ 5)/2(cid:9) = (5− 5)/10. b) Set Y∗ := (cid:104)X ,ρˆ(cid:105)+ε∗ from the bootstrap residuals ε∗ := V∗εˆ. i i i i i i c) Estimate ρ∗ from {(X ,Y∗)}n by FPC using the same d of i). i i i=1 n d) Obtain the estimated bootstrap residuals εˆ∗ := Y∗−(cid:104)X ,ρˆ∗(cid:105). i i i (cid:12)(cid:12) (cid:12)(cid:12) e) Compute ||T∗b || := (cid:12)(cid:12)n−1/2(cid:80)n 1 εˆ∗(cid:12)(cid:12) . n,h N (cid:12)(cid:12) i=1 {Xh≤x} i(cid:12)(cid:12) i N iv) Approximate the p-value by 1 (cid:80)B 1 . B b=1 {||Tn∗,bh||N≤||Tn,h||N} The choice of an adequate d for the estimation of ρ can be done in a data-driven way, for exam- n ple by the corrected Schwartz Information Criterion (McQuarrie, 1999), denoted by SICc. Besides, Steps c) and d) can be easily computed using the properties of the linear model, see Section 3.3 of García-Portugués et al. (2014). The bootstrap process we are considering is given by (we consider a = n−1/2): n n n n T∗ (x) := n−1/2(cid:88)1 εˆ∗ = n−1/2(cid:88)1 εˆV∗+n−1/2(cid:88)1 Xρˆ−ρˆ∗ n,h {Xh≤x} i {Xh≤x} i i {Xh≤x} i i i i i=1 i=1 i=1 which is estimating the distribution of n n T (x) = n−1/2(cid:88)1 εˆ +n−1/2(cid:88)1 Xρ−ρˆ. n,h {Xh≤x} i {Xh≤x} i i i i=1 i=1 The bootstrap consistency could be obtained as an adaptation of: Lemma A.1 of Stute et al. (1998) forthefirsttermofT∗ ; LemmaA.2ofthesamepaperforthesecondterm,usingthedecomposition n,h of ρˆ −ρ given in (11) of CMS. The drawing of the random directions is clearly influential in the power of the test. For example, in the extreme case where the projections were orthogonal to the data, that is Xh = 0, then T (x) = (n−1/2(cid:80)n εˆ)1 and ||T || = ||T∗b || = 0 under H . Therefore, Algorithm 4.2 n,h i=1 i {0≤x} n,h N n,h N 0 would fail to calibrate the level of the test and potentially yield spurious results due to numerical inaccuracies in ||T∗b || ≤ ||T || . A data-driven compromise to avoid drawing projections in n,h N n,h N 9 subspaces almost orthogonal to the data is the following: i) compute the FPC of X ,...,X , i.e., 1 n the eigenpairs {(λˆ ,eˆ )}; ii) choose j := min(cid:8)k = 1,...,n−1 : ((cid:80)k λˆ2)/((cid:80)n−1λˆ2) ≥ r(cid:9) for a j j n j=1 j j=1 j variancethresholdr, e.g. r = 0.95; iii)generatethedata-drivenGaussianprocessh := (cid:80)jn η eˆ , jn j=1 j j with η ∼ N(0,s2) and s2 the sample variance of the scores in the j-th FPC. Formally, the Gaussian j j j measure µ associated to h does not respect the assumptions in Theorem 2.4, since it is degenerate jn (but recall that µ does not have to be independent from X). A non-degenerate Gaussian process can be obtained as h + G, with G a Gaussian process tightly concentrated around zero, albeit jn employing h or h +G has negligible effects in practise. jn jn 0 1 10 3 2 8 2 8 2 5 6 6 1 1 4 4 0 0 0 0 2 2 1 − 0 −1 0 2 2 − −2 −5 − 2 2 − − 3 4 − − 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 6 4 6 4 4.5 2 4 4 4 2 2 3.5 0 2 2 3 0 0 5 2. 0 0 −2 −2 52 −2 2 2 1. − − −4 −4 1 4 4 4 − − − 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 4 4 5 6 1 5 1. 2 1 1 2 4 0.5 0.5 10 0 2 0 0 5 0 0. 1.5−1− −2 1−0.5 −2 −20 5 − − 4 − −4 −4 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 1: From left to right and up to down, functional coefficients ρ (black, right scale) and underlying processes (grey, left scale) for the nine different scenarios, labelled S1 to S9. Each graph contains a sample of 100 realizations of the functional covariate X and ρˆ (red) with d selected by SICc. n 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.