1 Supplement to “Efficient estimation in semivarying coefficient models for longitudinal/clustered data” by Ming-Yen Cheng, Toshio Honda, and Jialiang Li S.1. Additional simulation results. 5 S.1.1. Nonparametric component estimates. In Step 7 of our estimation 1 0 procedure we give both local linear and spline approaches to estimation the 2 nonparametric component after the efficient estimator βΣb is obtained. In p this section we examine the finite sample performance via simulations. For e comparison, we also computed the respective initial estimbates, that is, the S version using βI instead of βΣb. We considered the same settings in Section 3 4, and we used cross-validation to choose the bandwidth used in the local 1 linear estimatibon. We compubted the mean integrated square error (MISE) ] for all the function estimates and took their average. The results are given E M in Table S.1. Thefigures in Table S.1 indicate that it is clearly advantageous to update . t the nonparametric component after efficient estimation of the parametric a t component. In addition, we observe that the refine local linear and spline s [ estimators perform roughly the same in terms of MISE. 3 Table S.1 v MISE for simulation studies. 8 3 Local linear estimate Splineestimate 5 Initial Refined Initial Refined 0 n=100 0 ρ=.4 .0449 .0354 .0492 .0376 . 1 ρ=.8 .0691 .0597 .0639 .0593 0 n=200 5 ρ=.4 .0390 .0315 .0415 .0355 1 ρ=.8 .0595 .0589 .0584 .0576 : v i X r S.1.2. Parametric component estimates. We note that we adjusted the a covariance function σ(s,t) by setting all negative eigenvalues to be zero. We alsoconsideredastrictly positivethresholdλ = 0.05 andsetalleigenvalues L lower than λL to bebzero. The estimator using this covariance estimate is denoted by “Positive” in Table S.2. The “positive” estimator includes an adjustment when estimating the covariance function by setting eigenvalues lower than a positive cut-off to be zero while the efficient estimator only adjusts the negative eigenvalues. Therefore, it is slightly more biased than the efficient estimator. In all the considered cases, the crude and positive estimators are still more efficient than the working independence estimator. 2 Recall that in all the numerical analysis reported in the paper, h and h 1 2 were selected via the commonly used leave-one-subject-out cross-validation, and the bandwidth h used in the estimation of the covariance structure 3 were selected as h = 2h . To examine effects of the bandwidth choice, we 3 1 considered various choices of h in the numerical studies and obtained quite 3 similar results. Under the column “Different h ”, we report the results for 3 another case when h = 1.5h , which are similar to those obtained when 3 1 h = 2h . 3 1 Our proceduredoes not require any iteration. In practice it may be inter- estingtorefinetheestimation of coefficients andcovariances usingiterations and obtain a final estimation upon convergence. We report the numerical resultsunderthe“Iterative” column.ThebiasandSEareveryclosetothose obtained without iteration. Table S.2 Estimation results of 200 simulations. “Positive” means we set a positive threshold for the covariance eigenvalues; “Different h3” means using a different choice of h3 in our efficient estimation; “Iterative” indicates an iterative estimation approach. Positive Differenth3 Iterative n ρ bias SE bias SE bias SE 100 0.4 β1 .0173 .0411 -.0152 .0375 -.0146 .0361 β2 .0176 .0423 -.0098 .0375 -.0095 .0352 β3 .0205 .0425 -.0122 .0369 -.0099 .0360 β4 -.0096 .0425 .0098 .0373 -.0086 .0362 200 0.4 β1 -.0113 .0329 .0056 .0274 .0045 .0228 β2 -.0164 .0334 -.0099 .0274 -.0066 .0219 β3 .0120 .0323 .0072 .0273 .0034 .0259 β4 -.0095 .0329 -.0043 .0276 -.0035 .0274 100 0.8 β1 .0202 .0366 .0082 .0336 .0065 .0325 β2 .0163 .0378 -.0075 .0335 -.0034 .0323 β3 .0197 .0372 .0166 .0337 .0121 .0328 β4 -.0168 .0354 -.0182 .0338 .0157 .0325 200 0.8 β1 -.0044 .0214 -.0124 .0202 .0056 .0199 β2 .0036 .0215 .0138 .0200 -.0049 .0199 β3 .0042 .0215 .0165 .0204 .0052 .0178 β4 -.0038 .0214 -.0148 .0200 -.0050 .0179 S.2. Proofs of Propositions 1-3 and Lemma 1. In this section, we outline the proofs of Propositions 1-3 and present the proof of Lemma 1. When m is uniformly bounded, we have the same results for general link i functionsbyjustfollowingcloselytheargumentsof[3].Weoutlinetheresults at the end of this supplement. Note that the sub-Gaussian error assumption is necessary in that case. We outline the proofs of Propositions 1-3 since we allow some of the m ’s to diverge as in Assumptions A1 and A2. i 3 Proof of Proposition 1. First we consider the properties of Γ . The(k,l)th V element of n 1H is given by − 112 · X ZTϕ ,X ZTϕ V. h k − Vk l − Vlin From Lemma 1 (v)-(vii), we have b b X ZTϕ ,X ZTϕ V = X ZTϕ ,X ZTϕ V +o (1) h k − Vk l− Vlin h k − ∗Vk l − ∗Vlin p = X ZTϕ ,X ZTϕ V +o (1). h k − ∗Vk l − ∗Vli p b b This and (2.5) imply that for some positive constants C and C , we have 1 2 C λ (n 1H ) λ (n 1H ) C 1 min − 112 max − 112 2 ≤ · ≤ · ≤ and hence 1 1 (S.1) λ (H11) λ (H11) min max nC ≤ ≤ ≤ nC 2 1 with probability tending to 1. Note that Var(β X , Z , T ) = Γ V ij ij ij V |{ } { } { } and Theorem 1 of [13] implies that Γ H11 is nonnegative definite when b V − H11 is defined with V = Σ . Hence for some positive constant C , we have i i 3 C 3 λ (Γ ) min V ≥ n with probability tending to 1. Now we prove the asymptotic normality of β E β X , Z , T V V ij ij ij − { |{ } { } { }} n n b= H1b1 XiTVi−1ǫi−H12H2−21 WiTVi−1ǫi . (cid:16)Xi=1 Xi=1 (cid:17) As in the proof of Theorem 2 of [13], we take c Rp such that c = 1 and ∈ | | write n cT(β E β X , Z , T ) = a η (say), V V ij ij ij i i − { |{ } { } { }} i=1 X b b where a2 = cTH11(X W H 1H )TV 1Σ V 1(X W H 1H )H11c i i − i 2−2 21 i− i i− i− i 2−2 21 4 and η is a sequence of conditionally independent random variables with i { } E η X , Z , T = 0 and Var(η X , Z , T ) = 1. i ij ij ij i ij ij ij { |{ } { } { }} |{ } { } { } We have from (S.1) and Lemma 1 (vii) that m2 p m2 max a2 = O max X ZTϕ 2 = O max . 1 i n i p n2 k k − Vkk∞ p n2 ≤≤ (cid:16) Xk=1 (cid:1) (cid:16) (cid:17) On the other hand, we have for some positivbe constant C , 4 n C a2 = cTΓ c 4 i V ≥ n i=1 X with probability tending to 1. Hence we have established max a2 1n≤i≤an2 i = Op(n−1m2max) = op(1) i=1 i and it follows from thPe standard argument that n n 1/2 (S.2) a2 − a η d N(0,1). i i i → (cid:16)Xi=1 (cid:17) Xi=1 Finally we evaluate the conditional bias: Bias = E β X , Z , T β β V ij ij ij 0 { |{ } { } { }}− Take g G such that g g = O(K 2) and set ∈ B k 0−b kG,∞ n− δ = g g and δ = ZTδ . 0 0 0 0 e −e Note that δ = O(K 2)e and δ V = O(K 2). k 0k∞ n− k 0k n− We also take ϕ G such that ϕ ϕ = O(K 2). Then we Vk ∈ B k ∗Vk − VkkG,∞ n− have the following expression for the conditional bias: e Bias = nH11(S ,...e,S )T, β 1 p where S = X ,δ ZTΠ δ V = X ZTϕ ,δ ZTΠ δ V k h k 0 − Vn 0in h k − Vk 0− Vn 0in = X ZTϕ ,δ ZTΠ δ V h k − ∗Vbk 0 − Vn 0in e b + X ZTϕ ,ZTΠ δ ZTΠ δ V h k − ∗Vk Vn 0− Vn 0in + ZTϕ ZTϕ ,δ ZTΠ δ V h ∗Vk − Vk 0 − Vbn 0in = S +S +S (say). 1k 2k 3k e b 5 Note that ( X ZTϕ V)2 E S = 0 and E S2 = O k k − ∗Vkk { 1k} { 1k} K3n n (cid:16) (cid:17) since S is a sum of independent random variables, ϕ = Π X , δ = 1k ∗Vk V k 0 ZTδ , and 0 δ ZTΠ δ δ +CK1/2 ZTΠ δ V k 0 − Vn 0k∞ ≤ k 0k∞ n k Vn 0k δ +CK1/2 δ V = O(K 3/2). ≤ k 0k∞ n k 0k n− Hence we have S = O (1/(nK3)1/2)= o (n 1/2). 1k p n p − NowwedealwithS .FromLemma1(vi)andthefactthat δ ZTΠ δ = 2k 0 Vn 0 3/2 k − k∞ O(Kn− ), we have ZTΠ δ ZTΠ δ V k Vn 0− Vn 0kn δ ZTΠ δ ,ZTg V δ ZTΠ δ ,ZTg V = sup |h 0 −b Vn 0 in −h 0− Vn 0 i | ZTg V g∈GB k kn K = O K 3/2 n = O (K 1n 1/2). p n− n p n− − r (cid:16) (cid:17) Thus we have S = o (n 1/2). 2k p − | | We also have S δ V ZT(ϕ ϕ ) V = O (K 4)= o (n 1/2) | 3k| ≤ k 0knk ∗Vk − Vk kn p n− p − since δ ZTΠ δ V δ V. Hence we have k 0 − Vn 0kn ≤ k 0kn e Bias = o (n 1/2). b β p − The desired result follows from (S.2) and the above equality. As for Proposition 2, there is almost no change in calculation of the score functions in [13] and [4] and we omit the outline. This is because m is i bounded for any fixed n. Proof of Proposition 3. When V = Σ , we have i i ΓV = H11 = (H112)−1 and ϕ∗Σk = ϕ∗eff,k. · 6 Lemma 1 (vii) implies that 1 1 1 nΓ−V1 = nH11·2 = nE{lβ∗(lβ∗)T}+op(1) = ΩΣ+op(1). The desired result follows from the above result and Proposition 1. Proof of Lemma 1. The proof consists of seven parts. (i) Recall that n ( ZTg V)2 = 1E (ZTg)TV 1(ZTg) . k k n i i− i nXi=1 o We have from Assumptions A4 and A5 that C n 1 mi (S.3) 1E gT(T )Z ZTg(T ) n m ij ij ij ij i nXi=1 Xj=1 o C n mi ( ZTg V)2 2E gT(T )Z ZTg(T ) ≤ k k ≤ n ij ij ij ij nXi=1Xj=1 o for some positive constants C and C . Assumptions A2 and A3 imply that 1 2 for some positive constants C and C , 3 4 (S.4) q 1 n 1 mi C g2(t)dt E gT(T )Z ZTg(T ) 3 l ≤ n m ij ij ij ij i Xl=1Z nXi=1 Xj=1 o 1 n mi q E gT(T )Z ZTg(T ) C g2(t)dt. ≤ n ij ij ij ij ≤ 4 l nXi=1Xj=1 o Xl=1Z The desired result follows from (S.3) and (S.4). (ii) This is a well-known result in the literature of spline regression. See for example A.2 of [12]. (iii)The result in (ii) implies XTβ+ZTg 2 CK β 2 + g 2 k k∞ ≤ n | | k kG,2 (cid:16) (cid:17) for some positive constant C. Recall that p and q are fixed in this paper. On the other hand, we have from Assumptions A1-3 and A5 that for some 7 positive constants C , C , and C , 1 2 3 ( XTβ+ZTg V)2 k k C n 1 mi X XT X ZT β 1E (βT gT(T )) ij ij ij ij ≥ n nXi=1 mi Xj=1 ij (cid:18)ZijXiTj ZijZiTj(cid:19)(cid:18)g(Tij)(cid:19)o C n 1 mi β 2E (βT gT(T )) C β 2+ g 2 . ≥ n nXi=1 mi Xj=1 ij (cid:18)g(Tij)(cid:19)o≥ 3| | k kG,2 Besides, we have for some positive constants C and C , 1 2 C n mi ( v V)2 1 v 2 C v . ij 2 k k ≤ n | | ≤ k k∞ i=1 j=1 XX Hence the desired results are established. (iv) For g G and g G , we have 1 B 2 B ∈ ∈ n 1 ZTg ,ZTg V = γT WTV 1W γ = γT∆ γ (say), h 1 2in 1 n i i− i 2 1 Vn 2 n Xi=1 o where ∆ is a qK qK matrix and γ and γ correspond to g and g , Vn n n 1 2 1 2 respectively. Element×s of 1 n WTV 1W are written as n i=1 i i− i 1 n P (S.5) vj1j2B (T )B (T )Z Z = ∆(k1,l1,k2,l2) (say), n i k1 ij1 k2 ij2 ij1l1 ij2l2 Vn Xi=1jX1,j2 where vj1j2 is defined in (??), 1 k ,k K , and 1 l ,l q. By i ≤ 1 2 ≤ n ≤ 1 2 ≤ evaluating the variance of (S.5) and using the Bernstein inequality for inde- pendent bounded random variables, and Assumptions A1 and A2, we have uniformly in k , k , l , and l , 1 2 1 2 logn (S.6) ∆(k1,l1,k2,l2) E(∆(k1,l1,k2,l2)) = O if B (t)B (t) 0 Vn − Vn p snKn2 k1 k2 ≡ (cid:16) (cid:17) and logn (S.7) ∆(k1,l1,k2,l2) E(∆(k1,l1,k2,l2))= O if B (t)B (t) 0. Vn − Vn p nK k1 k2 6≡ r n (cid:16) (cid:17) By exploiting (S.6), (S.7), and the local property of the B-spline basis, we obtain (S.8) logn max λ (∆ E(∆ )), λ (∆ E(∆ )) = O . min Vn Vn max Vn Vn p {| − | | − |} n r (cid:16) (cid:17) 8 We also have C C 1 2 (S.9) λ (E(∆ )) λ (E(∆ )) min Vn max Vn K ≤ ≤ ≤ K n n since Assumptions A2 and A3 yields C n 1 mi 3 (Z B(T ))T(Z B(T )) ij ij ij ij n m ⊗ ⊗ i i=1 j=1 X X C n mi ∆ 4 (Z B(T ))T(Z B(T )) Vn ij ij ij ij ≤ ≤ n ⊗ ⊗ i=1 j=1 XX for some positive constants C and C . See the proof of Lemma A.3 of [12]. 3 4 Hence the desired result follows from (S.8) and (S.9). (v) This follows from (iv) and (vi). (vi) Using Assumptions A1 and A2 we have n 1 δ ,Z B V = δ vj1j2Z B (T ) h n l kin n n,ij1 i ij2l k ij2 Xi=1jX1,j2 and C δ 2 n C δ 2 Var(hδn,ZlBkiVn)≤ 1kn2nk∞ m2i E{Bk2(Tij1)Bk2(Tij2)} ≤ 2nkKnk∞ n Xi=1 jX1,j2 for some positive constants C and C . Hence we have 1 2 q Kn C Var( δ ,Z B V) δ 2 h n l kin ≤ nk nk∞ l=1 k=1 XX for some positive constant C and the desired result follows from (S.9). (vii) Take ϕ G such that ϕ ϕ = O(K 2). Then we have Vk ∈ B k Vk− ∗VkkG,∞ n− for some positive C, e e (S.10) ZT(ϕ ϕ ) k Vk − ∗Vk k∞ ZT(ϕ ϕ ) + ZT(ϕ ϕ ) ≤ k Vk − Vk k∞ k Vk − ∗Vk k∞ C K ZT(ϕ ϕ ) V + ZT(ϕ ϕ ) ≤ nk eVk − Vk k ek Vk − ∗Vk k∞ =≤ OCp(KKn3/k2Z).T(ϕ∗Vk −ϕeVk)kV +kZT(ϕeVk −ϕ∗Vk)k∞ pn− e e 9 Here we used the fact that ϕ = Π X G and ϕ = Π X . In- Vk Vn k ∈ B ∗Vk V k equality (S.10) implies ZTϕ = O(1) and we have only to evaluate ZT(ϕ ϕ ). We shkould jVusktkf∞ollow the arguments on p.16 of [3] by re- Vk − Vk placing ϕ and ϕ with ZTϕ andZTϕ sincethe arguments employ ∗k,n k,n Vk Vk (iv) and (vi)band don’t depend on mi. Then we have b b ZT(ϕ ϕ ) = o (1), ZT(ϕ ϕ ) V = O ( K /n), k Vk − Vk k∞ p k Vk − Vk kn p n p and ZT(ϕ ϕ ) V = O ( K /n). b Vk Vk pb n k − k The desired results follow from the above equationps and (S.10). b S.3. Proof of Proposition 4. In the proof, we repeatedly use argu- ments based on exponential inequalities, truncation, and division of regions into small rectangles to prove uniform convergence results as in [S3]. We do not give the details of these arguments since they are standard ones in non- parametric kernel methods. Since we impose Assumption A2 and we do not use Σ or V in the construction of g(t), σ2(t), and σ(s,t), we see the effects i i of diverging m explicitly only when applying the exponential inequality i for generalized U-statistics. Recallbthatcwe assumebthree times continuous differentiability of the relevant functions in this proposition. The proof consists of four parts: (i) representation of g(t), (ii) represen- tation of ǫ , (iii) representation of σ2(t), and (iv) representation of σ(s,t). ij (i) Representation of g(t). Applying the third order Taylobr series expansion to g0(t), wbe have c b (S.11) b T t h2 T t 2 ZTg (T ) = ZT g (t)+h ij − g (t)+ 1 ij − g (t) +O(h3), ij 0 ij ij 0 1 h 0′ 2 h 0′′ 1 1 1 n (cid:16) (cid:17) o where g (t) = (g (t),...,g (t))T and g (t) = (g (t),...,g (t))T. By 0′ 0′1 0′q 0′′ 0′′1 0′′q plugging (S.11) into (3.2), we have uniformly in t, (S.12) g(t) = g (t)+D (L (t)) 1L (t)(β β ) 0 q 1 − 2 0 I − h2 b + 21Dq(Lb1(t))−1Lb3(t)g0′′(t)+bDq(L1(t))−1E0(t)+Op(h31), b b b 10 where L (t)= A (t) defined after (3.2), 1 1n b L (t) = 1 n mi Z 1 K Tij −t XT, 2 N1h1 Xi=1Xj=1 ij ⊗ Tihj1−t! (cid:16) h1 (cid:17) ij Lb (t) = 1 n mi (Z ZT) (Tihj1−t)2 K Tij −t , 3 N1h1 Xi=1Xj=1 ij ij ⊗ (Tihj1−t)3! (cid:16) h1 (cid:17) b 1 n mi 1 T t ij E (t) = Z K − ǫ . 0 N1h1 Xi=1Xj=1 ij ⊗ Tihj1−t! (cid:16) h1 (cid:17) ij By following standard arguments such as those in [S3], we obtain for j = 1,2,3, logn (S.13) L (t) = L (t)+O uniformly in t, j j p nh r 1 (cid:16) (cid:17) b where L = E L (t) , and j j { } b logn (S.14) E (t) = O uniformly in t. 0 p nh r 1 (cid:16) (cid:17) Assumption A2 implies that (S.15) C I L (t) C I 1 2q 1 2 2q ≤ ≤ for some positive constants C and C . From (S.12)-(S.15), we have uni- 1 2 formly in t, (S.16) h2 g(t) = g (t)+D (L (t)) 1L (t)(β β )+ 1D (L (t)) 1L (t)g (t) 0 q 1 − 2 0− I 2 q 1 − 3 0′′ logn logn b +Dq(L1(t))−1E0(t)+Opb(h31)+Op nh +Op h21 nh 1 r 1 (cid:16) (cid:17) (cid:16) (cid:17) = g (t)+L (t)(β β )+h2L (t)g (t)+L (t)E (t) 0 4 0− I 1 5 0′′ 6 0 logn logn +Op(h31)+Obp nh +Op h21 nh (say). 1 r 1 (cid:16) (cid:17) (cid:16) (cid:17) Note that all the elements of L (t), j = 4,5,6, are bounded functions of t. j (ii) Representation of ǫ . We have ij ǫ = ǫ +XT(β β )+ZT(g (T ) g(T )). ij ij b ij 0− I ij 0 ij − ij b b b