Simultaneous Confidence Band for Partially Linear Panel Data Models with Fixed Effects Xiujuan YANG1, Suigen YANG2 and Gaorong LI3 ∗ 7 1 0 1College of Applied Sciences, Beijing University of Technology, Beijing 100124,China 2 2College of Sciences, Tianjin University of Commerce, Tianjin 300134, China n a 3Beijing Institute for Scientific and Engineering Computing, J 0 Beijing University of Technology, Beijing 100124,China 2 ] E M Abstract . t a In this paper, we construct the simultaneous confidence band (SCB) for the non- t s parametric component in partially linear panel data models with fixed effects. We [ remove the fixed effects, and further obtain the estimators of parametric and non- 1 v parametric components, which do not depend on the fixed effects. We establish the 7 4 asymptotic distribution of their maximum absolute deviation between the estimated 6 nonparametric component and the true nonparametric component under some suit- 5 0 able conditions, and hence the result can be used to construct the simultaneous con- . 1 fidence band ofthe nonparametriccomponent. Basedonthe asymptoticdistribution, 0 7 it becomes difficult for the construction of the simultaneous confidence band. The 1 : reason is that the asymptotic distribution involves the estimators of the asymptotic v i bias and conditional variance, and the choice of the bandwidth for estimating the X second derivative of nonparametric function. Clearly, these will cause computational r a burden and accumulative errors. To overcome these problems, we propose a Boot- strapmethod to construct simultaneous confidence band. Simulation studies indicate that the proposed Bootstrap method exhibits better performance under the limited samples. SupportedbytheNationalNaturalScienceFoundation ofChina(No. 11471029), theBeijing Natural ScienceFoundation(No. 1142002)andtheScienceandTechnologyProjectofBeijingMunicipalEducation Commission (No. KM201410005010). ∗ Gaorong Li is thecorresponding author. E-mail: [email protected] 1 Key words: partially linear model, simultaneous confidence band, panel data, fixed effects, asymptotic property 2000 MR Subject Classification: 62G08; 62H12; 62G20 1 Introduction In the literature, there were a large amount of studies about parametric linear and nonlin- earpaneldatamodels, andArellano(2003), Baltigi (2005), andHsiao (2003) hadprovided excellent overview of parametric panel data model analysis. To relax the strong restric- tions assumed in the parametric panel data models, nonparametric and semiparametric panel data models have received a lot of attention in recent years. Compared to tradi- tional parametric panel data model, nonparametric or semiparametric panel data models are better and more flexible to fit the actual data. Thus, this kind of models have become the hot research topic for the econometricians and statisticians. For example, Henderson, Carroll and Li (2008), and Li, Peng and Tong (2013) considered the fixed effects nonpara- metric panel data model. Henderson and Ullah (2005), Lin and Ying (2001), and Wu and Zhang (2002) considered the random effects nonparametric panel data models. Li and Stengos (1996) considered a partially linear panel data model with some regressors being endogenous via IV approach. Su and Ullah (2006) investigated the fixed effects partially linear panel data model with exogenous regressors. Zhang et al. (2011) considered the empirical likelihood inference for the fixed effects partially linear panel data model. Sun, Carroll and Li (2009) considered the problem of estimating a varying coefficient panel data model with fixed effects using a local linear regression approach. Chen, Gao and Li (2013a, 2013b) and Lai, Li and Lian (2013) studied the semiparametric estimation for a single-index panel data model, and among others. Recently, the fixed effects models are frequently used in econometrics and biometrics. In this paper, we consider the following partially linear panel data models with fixed effects: Y = Xτβ+g(Z )+α +V , i = 1, ,n, t = 1, ,T, (1.1) it it it i it ··· ··· where X are p 1 vector of observable regressors, Z are explanatory variables in it it { } × { } [0,1], β is a p 1 vector of unknown coefficients, g() is an unknown smooth function in × · [0,1], V are random errors with zero mean, and α are fixed effects. In addition, T it i { } { } is the time series length, n is the cross section size. 2 For model (1.1), we assume that α are unobserved time-invariant individual effects. i { } Model (1.1) is called as a partially linear fixed effects model if α are correlated with i { } X ,Z with an unknown correlation structure. For identification purpose, we impose it it { } n α = 0. An application of fixed effects models is the study of individual wage rate, i i=1 αPi represents different unobserved abilities of individual i, such as the unmeasured skills or unobservable characteristics of individual i, which maybe correlate with some observed covariates: age, educational level, job grade, gender, work experience and et al.. As a special case, when α are uncorrelated with X ,Z , model (1.1) becomes a partially i it it { } { } linear random effects model. Baltagi and Li (2002) applied the first-order difference to eliminate the fixed effects and used the series method to estimate the parametric and nonparametric components, and they further established the asymptotic properties. Su and Ullah (2006) considered the estimation of partially linear panel data models with fixed effects. Zhang et al. (2011) applied the empirical likelihood method to model (1.1). Forthepartiallylinearpaneldatamodels,theexistingliteraturesconsideredthepoint- wise asymptotic normality of the estimator for the nonparametric component, and the result can be used to construct the pointwise confidence bands. In practice, we need to construct the simultaneous confidence band of the nonparametric function in the model. The simultaneous confidence band is a powerful tool to check the graphical representation of the nonparametric function during the practical applications. Therefore, there are ex- tensive literatures on the construction of the simultaneous confidence band. For example, FanandZhang(2000), andZhangandPeng(2010)consideredthesimultaneousconfidence bands for the coefficient functions in varying-coefficient models; Li, Peng and Tong (2013) considered the simultaneous confidence band for nonparametric fixed effects panel data model; Li et al. (2014) and Yang et al. (2014) studied the simultaneous confidence band and hypothesis testing for the link function in single-index models, and more literatures see Yothers and Sampson (2011), Brabanter et al. (2012), Cao et al. (2012), Liu et al. (2013), and Li and Yang (2015). In this paper, combining the idea of least-squares dummy-variable approach in para- metric panel data models with the local linear regression technique in nonparametric models, we use the profile least-squares dummy-variable method proposed in Su and Ul- lah (2006) to remove the fixed effects, and further obtain the estimators of parametric and nonparametric components, which do not depend on the fixed effects. Under some 3 suitable conditions, we establish the asymptotic distribution of their maximum absolute deviation between the estimated nonparametric component and the true nonparametric component, and hence the result can be used to construct the simultaneous confidence band of the nonparametric component. In order to construct the simultaneous confidence band based on the asymptotic distribution, we first need to estimate the asymptotic bias and conditional variance, and choose the bandwidth for estimating the second derivative of nonparametric function. These will cause computational burden and accumulative er- rors, and it becomes difficult for the construction of the simultaneous confidence band. To overcome these problems, we further propose a Bootstrap method to construct the simultaneous confidence band of the nonparametric component in model (1.1). The rest of the paper is organized as follows. In Section 2, we use the profile least- squares dummy-variable approach to obtain the estimators of the parametric and non- parametric components, and present the asymptotic properties. In Section 3, we propose the Bootstrap method to construct the simultaneous confidence band. In Section 4, simu- lation studies are used to illustrate the proposed method under the limited samples. The technical proofs of the main theorems are presented in the Appendix. 2 Estimation procedure and asymptotic properties 2.1 Estimation procedure Let (Y ;Xτ,Z ),i = 1, ,n,t = 1, ,T be an independent identically distributed { it it it ··· ··· } (i.i.d.) random sample which comes from model (1.1). In this paper, we consider the asymptotic theories by letting n approach infinity and holding T fixed. In this section, we consider the estimation procedure to first remove the fixed effects, and further obtain the efficient estimators of parametric and nonparametric components. For ease of notation, let Y = (Y , ,Y ,Y , ,Y , ,Y , ,Y )τ, 11 1T 21 2T n1 nT ··· ··· ··· ··· τ g = g(Z ), ,g(Z ),g(Z ), ,g(Z ), ,g(Z ), ,g(Z ) , 11 1T 21 2T n1 nT ··· ··· ··· ··· V =(cid:16)(V , ,V ,V , ,V , ,V , ,V )τ, (cid:17) 11 1T 21 2T n1 nT ··· ··· ··· ··· α = (α , ,α )τ 0 1 n ··· and X = (X , ,X ,X , ,X , ,X , ,X )τ is an nT p matrix, where 11 1T 21 2T n1 nT ··· ··· ··· ··· × 4 X = (X , ,X )τ. Then model (1.1) can be written as the following matrix form, it it1 itp ··· Y = Xβ+g+(I e )α +V, (2.1) n T 0 ⊗ where I is an n n identity matrix, e is a T-dimensional column vector with all el- n T × ements being 1, and denotes the Kronecker product. Furthermore, by the identifica- ⊗ n n tion assumption α = 0, we have α = α . Define the (nT) (n 1) matrix i 1 i − × − i=1 i=2 D= [ en 1,In 1P]τ eT, and α= (α2, ,αn)Pτ, model (2.1) can be rewritten as − − − ⊗ ··· Y = Xβ+g+Dα+V. (2.2) Given α and β, model (2.2) is a version of the usual nonparametric fixed effects panel data model Y Xβ Dα = g+V. (2.3) − − We first apply the local polynomial method (see the details in Fan and Gijbels, 1996) to estimate the nonparametric function g(). For Z in a small neighborhood of z [0,1], it · ∈ approximate g(Z ) by it g(Z ) g(z)+g (z)(Z z). (2.4) it ′ it ≈ − Let K() is a kernel function in R, K (z) = K(z/h)/h, where h is a bandwidth, and h · let τ 1 1 1 1 Z = ··· ··· ··· , z Z z Z z Z z Z z 11 1T n1 nT − ··· − ··· − ··· − W = diag(K (Z z), ,K (Z z),K (Z z), ,K (Z z), ,K (Z z h 11 h 1T h 21 h 2T h n1 − ··· − − ··· − ··· − z), ,K (Z z)) is an (nT) (nT) diagonal matrix. Let G(z) = (g(z),(g (z)))τ, h nT ′ ··· − × η = (ατ,βτ)τ. In what follows, we outline the estimation procedure for β and g(). · Given η = (ατ,βτ)τ, we define the following weighted least-squares objective function (Y Xβ Z G(z) Dα)τW (Y Xβ Z G(z) Dα). (2.5) z z z − − − − − − Minimizing the above objective function (2.5) with respect to G(z), we can obtain the solution of G(z) as follows G(z,η) = (ZτW Z ) 1ZτW (Y Xβ Dα). (2.6) z z z − z z − − e 5 Define the smoothing operator by M(z) = (ZτW Z ) 1ZτW . z z z − z z Then, we can define the estimator of g(z) by g(z,η) = mτ(z)(Y Xβ Dα), (2.7) − − where mτ(z) = eτM(z), ee= (1,0)τ is a 2 1 vector. × Since the fixed effects is an n-dimensional unobserved variable, it is difficult to obtain the consistent estimator for the fixed effects. Therefore, we first need to remove the fixed effects fromthemodel,andfurtherobtaintheestimators ofparametricandnonparametric components. By (2.7), we define the following objective function (Y Xβ g (z) Dα)τ(Y Xβ g (z) Dα) η η − − − − − − = [Y Xβ M(Y Xβ Dα) Dα]τ[Y Xβ M(Y Xβ Dα) Dα] − − e − − − e − − − − − = (Y Xβ Dα)τ(Y Xβ Dα), (2.8) − − − − where g (ez) =e(g(Z e,η), e,g(Ze ,η)e, ,g(Z ,η), ,g(Z ,η), Y = (I M)Y, η 11 1T n1 nT nT ··· ··· ··· − X = (I M)X, D = (I M)D, Q = I D(DτD) 1Dτ, and M is an (nT) (nT) nT nT nT − e − e −e e− e e × smoothing matrix, that is e e e e e e e (1,0)(Zτ W Z ) 1Zτ W Z11 Z11 Z11 − Z11 Z11 . . . M= (1,0)(ZτZ1TWZ1TZZ1T)−1ZτZ1TWZ1T . . . . (1,0)(ZτZnTWZnTZZnT)−1ZτZnTWZnT In addition, let P = (I M)τ(I M) be an (nT) (nT) matrix. nT nT − − × Taking derivative of (2.8) with respect to α and setting it equal to zero, we have α(β) = (DτD) 1Dτ(Y Xβ). (2.9) − − Obviously, the estimator ofethe fixeed eeffectsedepeendseon β. Based on the idea of least- squaresdummy-variableapproachinpaneldataparametricmodelsandthenonparametric local linear regression technique, we then apply the profile least-squares dummy variable method to estimate parameter vector β. 6 Plugging (2.9) into (2.8), we then minimize the profile least-squares objective function with respect to β. Thus, we obtain the profile least-squares estimator of β as βˆ= (XτQX) 1XτQY. (2.10) − By (2.10) and (2.9), we have e e e e e e αˆ = (αˆ , ,αˆ ) =(DτD) 1Dτ(Y Xβˆ). (2.11) 2 n − ··· − n n By α = 0 and (2.11), the estimator ofeα eis αˆ e= e αˆe. i 1 1 i − i=1 i=2 BPy (2.6), (2.10) and (2.11), and some simple calculatPions, we can obtain the estimator of G(z) as follows Gˆ(z) = Gˆ(z,ηˆ)= M(z)(Y Xβˆ Dαˆ) − − = M(z)[Y Xβˆ D(DτD) 1Dτ(Y Xβˆ)] − − − − = M(z)(InT D(DτPDe)−e1DτPe)(Ye Xeβˆ). (2.12) − − By (2.7) and (2.12), we get the estimator of g(z) as gˆ(z) = mτ(z)(InT D(DτPD)−1DτP)(Y Xβˆ). (2.13) − − Remark 1. From (2.10) and (2.13), it is easy to see that the estimators of β and g() do · not depend on the fixed effects. 2.2 Asymptotic properties Let µ = zlK(z)dz and ν = zlK2(z)dz for l = 0,1,2. Define the observed covariate l l set by =R Xit,Zit,1 i n,R1 t T . In order to obtain the main results, we first D { ≤ ≤ ≤ ≤ } present the following technical conditions. (C1) (α ,V ,X ,Z ),i = 1, ,n, are i.i.d., where V = (V ,V , ,V )τ, and X i i i i i i1 i2 iT i ··· ··· and Z can be defined similarly. E X 2+δ < and E V 2+δ < for some δ > 0. Let i it it k k ∞ k k ∞ σ2(x,z) = Var(Y X = x,Z = z), σ2(z) = Var(Y Z = z), and 0 < σ2(x,z),σ2(z) < it it it it it | | . ∞ (C2) E(Y X ,Z ,α ) = E(Y X ,Z ,α ) = Xτβ + g(Z ) + α ,i = 1, ,n,t = it| i i i it| it it i it it i ··· 1, ,T. ··· T (C3) Let f(z) = f (z), where f (z) is the continuous density function of Z , t t it t=1 and ft(z) is bounded aPway from zero and infinity on [0,1] for each t = 1, ,T. Let ··· V = V 1 T V , σ2(z) = E[V2 Z = z] and σ¯2(z) = T σ2(z)f(z). it it− T s=1 is t it| it t=1 t P P e e 7 (C4) Let p(z) = E(X Z = z). The functions g() and p() have the bounded and it it | · · continuous second derivatives on [0,1]. (C5) The kernel function K() is a symmetric density function, and is absolutely · continuous on its support set [ A,A]. − (C5a) K(A) = 0 or 6 (C5b) K(A) = 0, K(t) is absolutely continuous and K2(t), [K (t)]2 are integrable ′ on the ( ,+ ). −∞ ∞ (C6) The bandwidth h satisfies that nh3/logn , nh5logn 0, as n . → ∞ → → ∞ Theorem 1. Assume that conditions (C1)–(C6) hold. Let b(z) = h2µ g (z)/2, Σ = 2 ′′ g ν0σ¯2(z)f−2(z), Σg′ = ν2σ¯2(z)/(f2(z)µ22), Then uniformly for z ∈ [0,1], we have βˆ β = O (n 1/2) p − k − k and √nh gˆ(z) g(z) b(z) L N(0,Σ ), g { − − } −→ √nh3 gˆ′(z) g′(z) L N(0,Σg′), { − } −→ L where “ ” denotes the convergence in distribution. −→ Theorem 2. Assume that conditions (C1)–(C6) hold and h = O(n ρ) for 1/5 ρ < 1/3. − ≤ Then for all z [0,1], we have ∈ P ( 2logh)1/2 sup (nhΣ 1)1/2(gˆ(z) g(z) b(z)) d < u − −g − − − n ( z [0,1] ) (cid:16) ∈ (cid:12) (cid:12) (cid:17) exp( 2exp( (cid:12)u)), as n , (cid:12) (cid:12) (cid:12) −→ − − → ∞ where if K(A)= 0, 6 1 K2(A) 1 d =( 2logh)1/2 + log + loglogh 1 , n − ( 2logh)1/2 ν π1/2 2 − − (cid:26) 0 (cid:27) and if K(A) = 0, 1 1 d = ( 2logh)1/2 + log (K (z))2dz . n − ( 2logh)1/2 4ν π ′ − (cid:26) 0 Z (cid:27) Theorem 2 gives the asymptotic distribution of the maximum absolute deviation be- tweentheestimatednonparametriccomponentgˆ()andthetruenonparametriccomponent · g() when the estimator of β is √n consistent. It provides us the theoretical foundation · − for constructing the simultaneous confidence bandof the nonparametricfunction in model (1.1). 8 Remark 2. If the supremum in Theorem 2 is taken on an interval of [c,d] instead of [0,1], Theorem 2 still holds under certain conditions by transformation. The asymptotic distribution is represented as P ( 2logh/(d c))1/2 sup (nhΣ 1)1/2(gˆ(z) g(z) b(z)) d < u − − −g − − − n ( z [c,d] ) (cid:16) ∈ (cid:12) (cid:12) (cid:17) exp( 2exp((cid:12)u)), (cid:12) e (cid:12) (cid:12) −→ − − where d is the same as d in the Theorem 2 except that h is replaced by h/(d c). n n − Theoreem 3. Assume that conditions (C1)–(C6) hold and Σg′ =ν2σ¯2(z)/(f2(z)µ22). Then for all z [0,1], we have ∈ P ( 2logh)1/2 sup (nh3Σ 1)1/2(gˆ(z) g (z)) d < u − −g′ ′ − ′ − n1 ( z [0,1] ) (cid:16) ∈ (cid:12) (cid:12) (cid:17) exp( 2exp(cid:12)( u)), as n , (cid:12) (cid:12) (cid:12) −→ − − → ∞ where d = ( 2logh)1/2 + 1 log 1 ( z2(K (z))2dz)1/2 . If K(c ) = 0, n1 − ( 2logh)1/2 2π√ν2 ′ 0 − K(z) is absolutely continuous and K2(z), (Kn′(z))2 aRre integrable on ( o ,+ ). −∞ ∞ Theorem 3 presents the asymptotic distribution of the maximum absolute deviation for gˆ() ′ · 2.3 Simultaneous confidence band for the nonparametric function Sincetheasymptotic biasandvarianceofgˆ()inTheorem2involve someunknownquanti- · ties,wecannotapplyTheorem2toconstructsimultaneousconfidencebandofg()directly. · In order to construct the simultaneous confidence band of g(), we first need to get the · consistent estimators of the asymptotic bias and variance of gˆ(). By Theorem 1, the · asymptotic bias of gˆ(z) is (h2µ /2)g (z)(1+o (1)). 2 ′′ p Thus, the consistent estimator of the asymptotic bias is bias(gˆ(z)) = h2µ gˆ (z)/2, where 2 ′′ the estimator gˆ (z) of g (z) is obtained by using local cubic fit with an appropriate pilot ′′ ′′ d bandwidth h = O(n 1/7), which is optimal for estimating g (z) and can be chosen by − ′′ ∗ the residual squares criterion proposed in Fan and Gijbels (1996). Next we will estimate the asymptotic variance of gˆ(z). For simplicity, suppose that the random errors V are i.i.d. for all i and t. By the proofs of theorem, we have it Var gˆ(z) = (1,0)(ZτW Z ) 1(ZτW Q Φ Q W Z )(ZτW Z ) 1(1,0)τ, { |D} z z z − z z 1 1 1 z z z z z − 9 where Q1 = (InT D(DτPD)−1DτP) and Φ1 = diag(σ2(Z11), ,σ2(Z1T),σ2(Z21), , − ··· ··· σ2(Z ), ,σ2(Z ), ,σ2(Z )). Using the similar approximate local homoscedastic- 2T n1 nT ··· ··· ity in Li, Peng and Tong (2013), the asymptotic variance of gˆ(z) is defined by Var gˆ(z) = (1,0)(ZτW Z ) 1(ZτW Q W Z )(ZτW Z ) 1(1,0)τσ2(z). { |D} z z z − z z 1 z z z z z − Let Vˆ = Y Yˆ be the residual, where Yˆ = gˆ+Xβˆ+Dαˆ. By (2.10), (2.11) and − (2.13), we have Vˆ = Y gˆ Xβˆ Dαˆ − − − = Y Xβˆ Dαˆ M(Y Xβˆ Dαˆ) − − − − − = (I M)(Y Xβˆ Dαˆ) nT − − − = (InT M)(InT D(DτPD)−1DτP)(Y Xβˆ) − − − = (I M)Q (I X(XτPQ X) 1XτPQ )Y nT − 1 nT − 1 − 1 =: (I M)Q Q Y, (2.14) nT 1 2 − where Q = I X(XτPQ X) 1XτPQ . Obviously, the residual Vˆ does not depend 2 nT − 1 − 1 on the fixed effects, and is a linear function of Y. By the normalized weighted residual sum of squares, σ2(z) can be estimated by VˆτVˆ Yτ(QτQτPQ Q )Y σˆ2(z) = = 2 1 1 2 . tr(QτQτPQ Q ) tr(QτQτPQ Q ) 2 1 1 2 2 1 1 2 Theorem 4. Under the conditions in Theorem 2, and assume that gˆ(3)() is continuous · on [0,1] and the pilot bandwidth h satisfies that h = O(n 1/7). Then for all z [0,1], − ∗ ∗ ∈ we have gˆ(z) g(z) bias(gˆ(z) ) P ( 2logh)1/2 sup − − |D d < u exp( 2exp( u)), n ( − z [0,1](cid:12) [Var gˆ(z) ]1/2 (cid:12)− )−→ − − (cid:16) ∈ (cid:12) { d|D} (cid:12) (cid:17) (cid:12) (cid:12) (cid:12) (cid:12) where d is defined in Theorem 2. n (cid:12) d (cid:12) By Theorem 4, we construct the (1 α) 100% simultaneous confidence band of the − × nonparametric function g(z) as gˆ(z) bias(gˆ(z) ) ∆ (z) , (2.15) 1,α − |D ± (cid:16) (cid:17) d 1/2 where ∆ (z) = d +[log2 log log(1 α) ]( 2logh) 1/2 Var gˆ(z) . 1,α n − − {− − } − { |D} h i (cid:0) (cid:1) d 10