ebook img

Estimation for single-index and partially linear single-index integrated models PDF

0.56 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Estimation for single-index and partially linear single-index integrated models

TheAnnalsofStatistics 2016,Vol.44,No.1,425–453 DOI:10.1214/15-AOS1372 (cid:13)c InstituteofMathematicalStatistics,2016 ESTIMATION FOR SINGLE-INDEX AND PARTIALLY LINEAR 6 SINGLE-INDEX INTEGRATED MODELS 1 0 By Chaohua Dong , ,1, Jiti Gao ,1 and Dag Tjøstheim 2 ∗† † ‡ n Southwestern University of Finance and Economics,∗ Monash University† a and University of Bergen ‡ J 2 Estimationmainlyfortwoclassesofpopularmodels,single-index 2 andpartiallylinearsingle-indexmodels,isstudiedinthispaper.Such models feature nonstationarity. Orthogonal series expansion is used ] to approximate theunknownintegrable link functions in themodels T and aprofile approach is used toderivetheestimators. The findings S includethedual rate of convergence of theestimators for thesingle- . h index models and a trio of convergence rates for the partially linear t single-indexmodels. Anewcentrallimit theorem isestablished fora a m plug-inestimatoroftheunknownlinkfunction.Meanwhile,aconsid- erable extension to a class of partially nonlinear single-index models [ is discussed in Section 4. Monte Carlo simulation verifies these the- 1 oretical results. An empirical study furnishes an application of the v proposed estimation procedures in practice. 3 0 0 1. Introduction. In the last decade or so, nonlinear (nonparametric or 6 semiparametric) andnonstationarytimeseriesmodelshavebeenstudiedex- 0 tensively and improved dramatically as witnessed by the literature, such as . 1 thosebasedonthenonparametrickernelapproachbyKarlsenandTjøstheim 0 (2001),Karlsen,MyklebustandTjøstheim(2007),Gao et al.(2009a,2009b), 6 1 Phillips (2009), Wang and Phillips (2009a, 2009b, 2012), Gao (2014), Gao : and Phillips (2013a, 2013b) and Phillips, Li and Gao (2013), among others. v i The main development in the field is the establishment of new estimation X andspecificationtestingproceduresaswellastheresultingasymptoticprop- r a erties. In recent years, the conventional nonparametric kernel-based estima- Received January 2015; revised August 2015. 1SupportedinpartbytwoAustralianResearchCouncilDiscoveryGrants:#DP1096374 and #DP130104229. AMS 2000 subject classifications. 62G05, 62G08, 62G20. Key words and phrases. Integrated time series, orthogonal series expansion, single- index models, partially linear single-index models, dual convergence rates, a trio of con- vergence rates. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2016,Vol. 44, No. 1, 425–453. This reprint differs from the original in pagination and typographic detail. 1 2 C. DONG,J. GAO AND D.TJØSTHEIM tionandspecificationtestingtheoryhasbeenextendedtothenonparametric series based approach; see, for example, Dong and Gao (2013, 2014). We first consider a partially linear single-index model of the form (1.1) y =β x +g(θ x )+e , t=1,...,n, t 0⊤ t 0⊤ t t where y is a scalar process, g(), the so-called link function, is an unknown t nonlinear integrable function f·rom R to R, β and θ are the true but un- 0 0 known d-dimensional column vectors of parameters, the superscript sig- ⊤ nifies the transpose of a vector (or matrix, hereafter), x is a d-dimensional t integrated process, e is an error process and n is sample size. t The motivations of this study are as follows. In a fully nonparametric es- timation context, researchersoften sufferfromtheso-called “curseofdimen- sionality”, and hence dimensionality reduction is particularly of importance in such a situation. One efficient way of doing so is to use index models like model (1.1). Moreover, model (1.1) is also an extension of linear parametric models, since it would become a linear model under the particular choice of the link function. Taking these into account, models such as (1.1) are often used as a reasonable compromise between fully parametric and fully non- parametric modelling. See, for example, Carroll et al. (1997), Xia, Tong and Li (1999), Xia et al. (2002), Yu and Ruppert (2002), Zhu and Xue (2006), Liang et al. (2010), Wang et al. (2010) and Ma and Zhu (2013). Never- theless, most researchers only focus on the stationary covariate case so that theirtheoretical resultsarenotapplicableforpractitioners whousepartially linear single-index model to deal with nonstationary time series data. For example, inmacroeconomic context practitioners may beconcerned within- flation, unemployment rates and other economic indicators. These variables exhibit nonstationary characteristics. Therefore, it is desirable in such cir- cumstances to develop estimation theory for thepartially linear single-index models. Furthermore, recent studies by Gao and Phillips (2013a, 2013b) have pointed out that, for multivariate I(1) processes, the conventional kernel estimation methodmay notbeworkablebecausethelimit theory may break down. This gives rise to a challenge of seeking alternative estimation meth- ods. When β =0, model (1.1) becomes a single-index model 0 (1.2) y =g(θ x )+e , t=1,...,n, t 0⊤ t t which has been studied extensively for the case where x is stationary [see, t e.g., H¨ardle, Hall and Ichimura (1993), Xia and Li (1999) and Wu, Yu and Yu (2010)]. We shall first consider model (1.2), but this is mainly a preliminary stage on our way to the general model (1.1). Standing on its own, model (1.2) has PARTIALLYLINEARSINGLE-INDEXMODELS 3 limited applicability since it is integrable, and among other things, does not include the linear model. Coupled with the assumption that x is a unit t { } root process, this implies that only an order of O(√n) observations can be used in the estimations of θ and g in (1.2). The function g is only capable 0 of describing finite domain behaviour in x . As θ x increases, g(θ x ) goes t 0⊤ t 0⊤ t to zero. All of this will be made precise in the following. When the g function is added as a component in model (1.1), one obtains a model whosebehaviour is governed by the linear component with g super- imposed, whereas as x becomes large it reduces to the linear component. t The resulting model (1.1) can be likened to a smooth transition regression model [STR model, Tera¨svirta, Tjøstheim and Granger (2010)], where as x increases the model changes smoothly to the linear model. Our model in t this sense extends the smooth transition model to a situation where there is an index involved and with a nonstationary input process. We believe that this is a situation which is of interest both theoretically and practically, as witnessed for example by our empirical study where quite different index behaviour is obtained in theclose domain as compared to thefar outregion. It is indeed possible to generalise our model to include a nonlinear be- haviour also far out, which leads to the next stage of modelling. As a lin- ear function is a particular H-regular function while an integrable function belongs to I-regular functions, studied by Park and Phillips (1999, 2001), Wang and Phillips (2009a), in a third stage we extend model (1.1) using a known H-regularfunctiontosubstitutethelinearfunction,inordertomake the model more flexible and applicable. Following the existing identifiability condition, for example, Lin and Ku- lasekera (2007), we assume for models studied later that θ =1 and the 0 k k first nonzero component of θ is positive. Notice that there is no extra con- 0 dition needed for β to make (1.1) identifiable, as discussed in Section 2.2 0 below. To facilitate the theoretical development in the following sections, we assume that θ is an interior point located within a compact and convex 0 parameter space Θ, which is also a usual assumption in a parameter esti- mation context. To focus on the unit root case, we also assume throughout that cointegration will not happen for θ around θ . In other words, there 0 exists a neighbourhood of θ , (θ ,δ) Θ, such that for any θ (θ ,δ), 0 0 0 N ⊂ ∈N θ x is always an I(1) process. ⊤ t The findings of this paper are summarised as follows. The rate of conver- gence of the estimators of θ in both models (1.1) and (1.2) is a composite 0 of two different rates in a new coordinate system where θ is on one axis. 0 θ has a rate of n 1/4 on the θ -axis, and another rate as fast as n 3/4 on n − 0 − all axes orthogonal to θ . Overall, θ possesses convergence rate n 1/4. This 0 n − ibs expected and comes from the integrability of g(), which in turn reduces · the number of effective observatiobns to √n. Moreover, the rate of conver- gence of β to β is n 1, consistent with that of a linear model with a unit n 0 − b 4 C. DONG,J. GAO AND D.TJØSTHEIM root input. The normalisation of θ , θ 1θ , converges to θ with a rate n n − n 0 k k faster than θ in both models. A new central limit theorem for a plug-in n estimator of the form g (u) converbgingbto g(bu), where u R, is comparable n ∈ with the conbventional kernel estimator in the literature. These phenomena are verified with finite sample experiments below. b Theoretical results heavily depend on the level of nonstationarity of the integrated timeseriesandtheintegrability ofthelinkfunctions.Theseprop- erties result in a slow rate of convergence for the link function involving an I(1) process and fast rate of convergence for a linear model with an I(1) process. These are very different from the literature where the regressors are stationary. In addition, Monte Carlo simulations generally need relative larger sample sizes than those for the cases where the regressors are station- ary if the regression function is integrable, since random walk on one hand diverges at rate √n, and on the other hand it possesses recurrent property makingitpossibletoreturntotheeffectivedomainoftheintegrablefunction g. Two papers related to this study are Chang and Park (2003) and Guerre and Moon (2006) in terms of regressor. However, Chang and Park (2003) stipulatethattheirlinkfunctionisasmoothdistributionfunction-liketrans- formation and they are not interested in the estimation of the unknown link function. Guerre and Moon (2006) point out in the discussion section that their method developed for binary choice models may be applicable for the estimation of single-index models where the link function g(x) as →∞ x . Clearly, they are quite different from the setting of this study. | |→∞ Theorganisation of therestof thepaperis as follows. Section 2 gives esti- mation procedures and assumptions for models (1.1) and (1.2). Asymptotic theory is established in Section 3 for the estimator θ in model (1.2) and n the estimator (β , θ ) in model (1.1). A central limit theorem for a plug-in n n estimator of the form g (u) is given in Section 3. Abn extension of model n (1.1) is discussebd inbSection 4 and Monte Carlo simulation experiments are conducted in Section 5. Section 6 shows the implementation of the proposed b estimation schedules with an empirical dataset. Appendix A presents some technical lemmas. The proof of the main results in Section 3 is given in Appendix B. A supplemental document [Dong, Gao and Tjøstheim (2015)] contains Appendices C, D and E where all the proofs of the key lemmas listed in Appendix A as well as some other lemmas are shown in Appendix C, the complete proof of the results in Section 3 is placed in Appendix D and the results in Section 4 are proven in Appendix E. Throughout the paper, the following notation is used. is Euclidean k·k normfor vectors andelement-wise normfor matrices, thatis,if A=(a ) , ij nm A =( n m a2)1/2; I is the d-dimensional identity matrix; [a] is the k k i=1 j=1 ij d maximum integer not exceeding a; R is the real line; for any function f(), P P · PARTIALLYLINEARSINGLE-INDEXMODELS 5 f˙(x), f¨(x) and f···(x) are the derivatives of the first, second and third or- der of f() at x. Here, when f(x) is a vector-valued function its derivatives · shouldbeunderstoodas element-wise. Furthermore, φ() stands fortheden- · sity function of a multivariate standard normal variable; f(w)dw means a multiple integral when w is a vector. Convergence in probability and con- R vergence in distribution are signified as and , respectively. P D → → 2. Estimation procedure and assumptions. Suppose that the link func- tion g() belongs to L2(R)= f(x): f2(x)dx< . It is known that the Hermit·e function sequence H{ (x) is an orthonor∞m}al basis in L2(R) where i { } R by definition x2 (2.1) Hi(x)=(√π2ii!)−1/2Hi(x)exp , i 0, − 2 ≥ (cid:18) (cid:19) and H (x) are Hermite polynomials orthogonal with density exp( x2). The i orthogonality reads H (x)H (x)dx=δ , the Kronecker delta. − i j ij Thus, a continuous function g() L2(R) may be expanded into an or- R · ∈ thogonal series ∞ (2.2) g(x)= c H (x) and c = g(x)H (x)dx. i i i i i=0 Z X Throughout, let k be a positive integer and define gk(x)= ik=−01ciHi(x) as thetruncationseriesandγk(x)=g(x)−gk(x)= ∞i=kciHi(Px)astheresidue after truncation. P 2.1. Estimation procedure for single-index models. By virtueof (2.2), we write model (1.2) for t=1,...,n as y =Z (θ x )c+γ (θ x )+e , t k⊤ 0⊤ t k 0⊤ t t where Z ()=(H (),...,H ()), c =(c ,...,c ) and k is the trunca- k⊤ · 0 · k−1 · ⊤ 0 k−1 tion parameter determined later. Let Y =(y ,...,y ) , Z =(Z (θ x ),...,Z (θ x )) an n k matrix, 1 n ⊤ k 0⊤ 1 k 0⊤ n ⊤ × γ=(γ (θ x ),...,γ (θ x )) and e=(e ,...,e ) .We have amatrix form k 0⊤ 1 k 0⊤ n ⊤ 1 n ⊤ equation Y =Zc+γ+e, and hence by the Ordinary Least Squares (OLS) method, c = c(θ ) = (Z Z) 1Z Y is an estimate for c in terms of θ . 0 ⊤ − ⊤ 0 Nonetheless, since θ is unknown, we only have a form of c. To estimate 0 θ , define for θ Θ, L (θ)= 1 n [y Z (θ x )c(θ)]2. Then we choose 0 e e ∈ n 2 t=1 t − k⊤ ⊤ t an optimum θn such that P e e (2.3) θ =argminL (θ), b n n θ Θ ∈ b 6 C. DONG,J. GAO AND D.TJØSTHEIM as an estimator for θ . Once θ is available, we have a plug-in estimator 0 n g (u) g (u;θ )=Z (u) c for any u R where c=c(θ ), which is purely n n n k ⊤ n ≡ ∈ based on the sample, and hencbe applicable. The estimation procedure pro- pbosed hebre is tbhe profilemebthod [see, Severini andbWonegb(1992), Liang et al. (2010)]. Additionally, to be in concert with the identification condition θ =1, 0 k k we define the normalisation of θ , θ = θ 1θ . An asymptotic theory n n,emp n − n k k for both θ and θ will be studied in Section 3 below. n,emp n b b b b 2.2. Esbtimation prbocedure for partially linear single-index models. Usu- ally, researchers, such as Xia, Tong and Li (1999), impose an identification condition that β is perpendicular to θ on the partially linear single-index 0 0 models. This is because when β is not perpendicular to θ , a new vector 0 0 β (β θ )θ can be used in the place of β and the g function will be re- 0− 0⊤ 0 0 0 placed by g(u)+(β θ )u. However, in model (1.1) the lack of orthogonality 0⊤ 0 between β and θ does not affect the identifiability of the model at all. See 0 0 the verification at the end of Appendix C in the supplementary material [Dong, Gao and Tjøstheim (2015)]. Our estimation procedure in partially linear single-index models is pro- posed as follows. By virtue of (2.2) again, for each t rewrite (1.1) as yt−β0⊤xt=Zk(θ0⊤xt)⊤c+γk(θ0⊤xt)+et, where Z (), c and γ () are defined as before. k k · · Denote X =(x ,x ,...,x ) an n d matrix, and Y,Z,γ,e remain the 1 2 n ⊤ × same as in the last subsection. We have matrix form equation: Y Xβ = − Zc+γ+e. Then the OLS gives that c=c(β ,θ )=(Z Z) 1Z (Y Xβ ). 0 0 ⊤ − ⊤ 0 − Due to the same reason as before, define for generic (β,θ), L (β,θ) = n 1 n [y β x Z (θ x )c(β,θ)]2. The estimator of (β ,θ ) is given by 2 t=1 t− ⊤ t− k⊤ ⊤ t e e 0 0 P β (2.4) n e = argmin Ln(β,θ). (cid:18)θn(cid:19) β Rd,θ Θ b ∈ ∈ Similarly, a plug-in estimator is obtained, g (u) g (u;β ,θ ) = Z (u)c b n ≡ n n n k⊤ wherec=c(β ,θ ).Oncetheestimatorsoftheparametersareavailable, and n n the normalisation θ = θ 1θ is definbed to sabtisfy tbhebidentificationb n,emp n − n k k conditibon.e b b b b b 2.3. Assumptions. Before we establish our main theory in Section 3 be- low, we introduce some necessary conditions. Assumtpion A. (a) Let ε , <j < be a sequence of d-dimen- j { −∞ ∞} sional independent and identically distributed (i.i.d.) continuous random PARTIALLYLINEARSINGLE-INDEXMODELS 7 variables with Eε =0, E[ε ε ]=Ω>0 and E ε p < for some p>2. 1 1 ⊤1 k 1k ∞ The characteristic function of ε is integrable, that is, Eexp(iuε ) du< 1 1 | | . ∞ R (b) Let x =x +v for t 1 and x =O (1), where v is a linear t t 1 t 0 P t − ≥ { } process defined by vt= ∞j=0ρjεt−j, in which {ρj} is a square matrix such that ρ0=Id, ∞j=0kρjkP<∞ and ρ= ∞j=0ρj is of full rank. (c) There is a σ-field such that (e , ) is a martingale difference se- t t t P F P F quence, that is, for all t, E(e ) = 0 almost surely (a.s.). Also, t t 1 E(e2 )=σ2 a.s. and µ :=sup|F− E(e4 )< a.s. (dt)|Fxt−1is adaepted with 4 . 1≤t≤n t|Ft−1 ∞ t t 1 (e) LetV (r)= 1 [nFr]v− andU (r)= 1 [nr]e .Supposethat(U (r), n √n i=1 i n √n i=1 i n V (r)) (U(r),V(r)) as n . Here, (U(r),V(r)) is a (d+1)-vector of n →D P →∞ P Brownian motions. Remark 2.1. All conditions in Assumption A are routine requirements in the nonstationary model estimation context. Conditions (a) and (b) stip- ulate that the regressor x is an integrated process generated by a lin- t ear process v which has the i.i.d. sequence ε , <j < as building t j { −∞ ∞} blocks. Meanwhile, (c), (d) and (e) are extensively used in related papers such as Park and Phillips (2000), Wang and Phillips (2009a, 2009b, 2012), Gao et al. (2009a, 2009b), Gao, Tjøstheim and Yin (2012), among others. The σ-field may be taken as =σ(...,ε ,ε ;e ,...,e ). t t t t+1 1 t F F By Skorohod representation theorem [Pollard (1984), page 71] there ex- ists (U0(r),V0(r)) in a richer probability space such that (U (r),V (r))= n n n n D (U0(r),V0(r)) for which (U0(r),V0(r)) (U(r),V(r)) uniformly on n n n n →a.s. [0,1]d+1. To avoid the repetitious embedding procedure of (U (r),V (r)) to n n the richer probability space where (U0(r),V0(r)) is defined,we simply write n n (U (r), n V (r)) = (U0(r),V0(r)) instead of (U (r),V (r)) = (U0(r),V0(r)). Since n n n n n D n n Lemma A.2 below is derived in this richer probability space, all proofs in the paper should be understood in the richer space as well. We will not repeat this again. Assumtpion B. (a) g(x) is differentiable on R and g(m ℓ)(x)xℓ L2(R) − ∈ for ℓ=0,1,...,m with some given integer m. (b) k=[a nκ] with some constant a>0, κ (0,1/8) and κ(m 3) 1 · ∈ − ≥ 2 with m as in (a) above. Remark 2.2. Condition (a) ensures the negligibility of the truncation residuals(seethederivationatthebeginningofLemmaC.1ofAppendixCof 8 C. DONG,J. GAO AND D.TJØSTHEIM the supplementary document). Regardingcondition (b), although it is strin- gent for κ, we may choose, for example, κ [ 5 , 5 ] and m=8 in practice. ∈ 44 41 Large m and small κ are chosen such that the orthogonal series expansion for the link function converges so fast that all residues after truncation do not affect the limit theory, as can beseen in the proof of Theorem 3.1 in the supplementary material Dong, Gao and Tjøstheim (2015). 3. Asymptotic theory. 3.1. Asymptotic theory for single-index models. Toderive an asymptotic theoryforθ givenby(2.3),weshallusebasicideasfromWooldridge(1994). n Let S (θ) = ∂ L (θ) and J (θ) = ∂2 L (θ) be the score and Hessian, n ∂θ n n ∂θ∂θ⊤ n respectivelyb. As usual, we have the expansion (3.1) 0=S (θ )=S (θ )+J (θ )(θ θ ), n n n 0 n n n 0 − where J (θ ) is the Hessian matrix with the rows evaluated at a point θ n n n b b between θ and θ . n 0 To facilitate the development of the asymptotic theory, we consider coor- dinate robtation in Rd. Let Q=(θ0,Q2) be an orthogonal matrix. Note that such Q does exist since θ =0. We shall use the orthogonal matrix Q to 0 rotate all vectors in Rd. In p6 articular, α0:=Q⊤θ0=(α10,α⊤20)⊤ where α10=kθ0k2=1,α20=Q⊤2θ0=0, (3.2) zt:=Q⊤xt=(x1t,x⊤2t)⊤ where x1t:=θ0⊤xt,x2t:=Q⊤2xt, α:=Q θ for any generic θ. ⊤ Accordingly, wecanrewritethesingle-indexmodelas y =g(θ QQ x )+ t 0⊤ ⊤ t e =g(α z )+e .Inaddition,byAssumptionAandthecontinuousmapping t ⊤0 t t theorem, we have for r [0,1], ∈ 1 1 (3.3) x V (r)=θ V(r) and x V (r)=Q V(r). √n 1[nr]→D 1 0⊤ √n 2[nr]→D 2 ⊤2 Itisnoteworthy thattherotation is notnecessary inpractice, as shownin thesimulationsection,anditisalsologically impossiblesinceθ isunknown. 0 The rotation is only used as a tool to derive an asymptotic theory for the proposed estimator. If α is the nonlinear least squares estimator of α , then α =Q θ . n 0 n ⊤ n Moreover, thescorefunction S (α) andtheHessian J (α) fortheparameter n n α canbbe obtained from those for θ. More precisely, Sn(α)=Qb⊤Sn(θ) abnd J (α)=Q J (θ)Q. Premultiplying equation (3.1) by Q , we have n ⊤ n ⊤ (3.4) 0=S (α )=S (α )+J (α )(α α ). n n n 0 n n n 0 − Thefollowing theorem gives asymptoticdistributionsforthescore S (α ) n 0 and the Hessian J (α ) abs well as α α . b n 0 n 0 − b PARTIALLYLINEARSINGLE-INDEXMODELS 9 Theorem 3.1. Denote D =diag(n1/4,n3/4I ). Under Assumptions n d 1 − A and B, as n →∞ (3.5) D 1S (α ) R1/2W(1) and D 1J (α )D 1 R, n− n 0 →D n− n 0 n− →P where W(1) is a d-dimensional vector of standard normal random variables independent of V(r), and the symmetric block matrix R=(r11 r12) is given r21 r22 by 1 r =L (1,0) s2g˙2(s)ds, r = V (r)dL (r,0) sg˙2(s)ds, 11 1 12 2⊤ 1 Z Z0 Z 1 r =r , r = V (r)V (r)dL (r,0) g˙2(s)ds, 21 1⊤2 22 2 2⊤ 1 Z0 Z in which V and V given by (3.3) are Brownian motions of dimension 1 1 2 and d 1, respectively, L (r,0) denotes the local time process of Brownian 1 − motion V (), standing for the sojourning time of V at zero over [0,r]. 1 1 · As a result, under the same conditions, α is consistent and as n n →∞ (3.6) D (α α ) R 1/2W(1). n n 0 D − − → b A standard book introducing the local time process of Brownian motion b is Revuz and Yor (2005). In view of the structure of D , we have two limits n from (3.6), (3.7) n1/4(α 1) MN(0,ρ ) and n3/4α MN(0,ρ ), 1n D 11 2n D 22 − → → where α =(α ,α ) , MN(0,Ξ) stands for mixture normal distribution n 1n ⊤2n ⊤ for the casebwhere the covariance matrix Ξ is sbtochastic, ρ and ρ are 11 22 diagonal blocks on the matrix R 1=(ρ11 ρ12), b b b − ρ21 ρ22 (3.8) ρ11=(r11−r12r2−21r21)−1 and ρ22=(r22−r21r1−11r12)−1. Hence, α has two different convergence rates for its components. n Note by (3.7) that in the coordinate system Q where θ is an axis, the 0 estimatobr θn has dual convergence rates: the rate of convergence for the coordinate on θ (i.e., α ) is n 1/4, while on all directions orthogonal to 0 1n − θ0 the ratebof convergence for the coordinates (i.e., α2n) is as fast as n−3/4. This difference in convergence rate can be explained in the following way. b Due to the unit root behaviour of x its probability mass is spreading t b { } out in a Lebesgue type fashion. Since g is integrable, g(θ x) 0 outside 0⊤ ≈ the effective range of g. This means that only moderate values of x can t { } contribute to g along θ , but in such directions that are orthogonal to θ , 0 0 there is no restriction on x , so that even for far out values of x , they t t { } { } can contribute, and hence increase the effective sample size. Certainly, no such effect can take place in univariate models. 10 C. DONG,J. GAO AND D.TJØSTHEIM As defined in Section 2, θ = θ 1θ . Intuitively, θ might have n,emp n − n n,emp k k a faster rate of convergence than that of θ . This can be seen using the α- n representation oftherotatebdsystem.bBecaubseofθ =Qα bandhence θ = n n n k k α , θ =Qα , where α =(bα1 ,(α2 ) ) = α 1α . k nk n,emp n,unit n,unit n,unit n,unit ⊤ ⊤ k nk− n The following results give the rates of convergenbce forbαn,unit and thebn for θnbandbθn,emp, resbpectively. b b b b b b b Corobllary 3.1. Under Assumptions A and B, we have as n , →∞ n3/2(α1 1) 1 ξ 2 and n3/4α2 ξ, n,unit− →D −2k k n,unit→D where ξ MN(0,ρ ) is the limit given by (3.7). 22 ∼ b b Note that, after the normalisation, the slow rate becomes as fast as n 3/2 − whereas the fast rate remains the same. Note also that α1 1 but n,unit →P α1 = α 1α 1. The intuitive reason for the fast rate of α1 is n,unit k nk− 1n ≤ n,unit that it takes advantage of the direction orthogonal to θ0b, where there is larger supply of information from far out x ’s as explained above. The rates b b b t b are also verified with Monte Carlo simulation. The resulting rates for θ are n given in Theorem 3.2 below. b Theorem 3.2. Under Assumptions A and B, we have as n , →∞ (3.9) n1/4(θ θ ) MN(0,ρ θ θ ), n− 0 →D 11 0 0⊤ (3.10) n3/4(θ θ ) MN(0,Q ρ Q ). n,embp− 0 →D 2 22 ⊤2 Remark 3.1. As cban be seen, θ θ at rate of n 1/4 and θ n P 0 − n,emp P → → θ at rate of n 3/4. Again roughly speaking, the normalisation scales θ to 0 − n the unit ball, and hence acceleratbes the slow rate of θ . Due tbo the fast n convergence of θn,emp, all the following assertions regarding θn remainbtrue if θ is replaced by θ . A geometric illustration is gbiven in Appendix C n n,emp of the supplemebntary material [Dong, Gao and Tjøstheim (20b15)] to explain thebslow and fast ratbes. We do not wish to repeat this again. Furthermore, by Theorem 3.2 we have θ MN(θ ,n 1/4ρ θ θ ). We n ∼ 0 − 11 0 0⊤ next show that the estimator of the covariance matrix of θ is the inverse n of the Hessian matrix of the form [J (θb)] 1 or even [J (θ )] 1, where n n − n n − J (θ)= n g˙2(θ x )x x is the leading term of J (θ). Mbeanwhile, define n t=1 n ⊤ t t ⊤t n the estimators for σ and L (1,0) by b e b e 1 P e b n n 1 1 (3.11) σ2= [y g (θ x )]2 and L (1,0)= H 2(θ x ), e n t− n n⊤ t n1 √n 0 n⊤ t t=1 t=1 X X respectibvely, where H (b) ibs the first functbion in the Hermite sequebnce. 0 ·

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.