ebook img

ERIC ED609885: Efficient Analysis of "Q"-Level Nested Hierarchical General Linear Models Given Ignorable Missing Data PDF

2013·0.35 MB·English
by  ERIC
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ERIC ED609885: Efficient Analysis of "Q"-Level Nested Hierarchical General Linear Models Given Ignorable Missing Data

Efficient Analysis of Q-Level Nested Hierarchical General Linear Models Given Ignorable Missing Data Yongyun Shin and Stephen W. Raudenbush The International Journal of Biostatistics, v9 n1 p109-133 2013 Abstract This paper extends single-level missing data methods to efficient estimation of a Q-level nested hierarchical general linear model given ignorable missing data with a general missing pattern at any of the Q levels. The key idea is to reexpress a desired hierarchical model as the joint distribution of all variables including the outcome that are subject to missingness, conditional on all of the covariates that are completely ob- served; and to estimate the joint model under normal theory. The unconstrained joint model, however, identifies extraneous parameters that are not of interest in subsequent analysis of the hierarchical model, and that rapidly multiply as the number of levels, the number of variables subject to missingness, and the number of random coefficients grow. Therefore, the joint model may be extremely high dimensional and difficult to estimate well unless constraints are imposed to avoid the proliferation of extrane- ous covariance components at each level. Furthermore, the over-identified hierarchical model may produce considerably biased inferences. The challenge is to represent the constraintswithintheframeworkoftheQ-levelmodelinawaythatisuniformwithout regard to Q; in a way that facilitates efficient computation for any number of Q lev- els; and also in a way that produces unbiased and efficient analysis of the hierarchical model. Our approach yields Q-step recursive estimation and imputation procedures whose qth step computation involves only level-q data given higher-level computation components. Weillustratetheapproachwithastudyofthegrowthinbodymassindex analyzing a national sample of elementary school children. KEY WORDS: Child Health; Hierarchical General Linear Model; Ignorable Miss- ing Data; Maximum Likelihood; Multiple Imputation. Acknowledgements: This research was supported by the Institute of Education Sci- ences, U.S. Department of Education, through Grants R305D090022 to NORC and R305D130033 to VCU. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education. The first author was also supported by grant 1U01HL101064 from the National Institutes of Health. 1 1 Introduction A seminal contribution to statistical methodology is the development of efficient methods for handling missing data within the framework of a general linear model (GLM, Rubin, 1976, 1987, Dempster, Laird, and Rubin, 1977, Meng, 1994, Schafer, 1997, 2003, Little and Rubin, 2002). These methods provide efficient estimation of the GLM given incomplete data. In particular, model-based multiple imputation now provides state-of-the-art methods for handling missing data (Rubin, 1987). These approaches are founded on a comparatively mild assumption in many applications that missing data are ignorable (Rubin, 1976, Little and Rubin, 2002). This paper extends the methodology to an arbitrary Q-level hierarchical GLM where lower-level units are nested within higher-level units (Raudenbush and Bryk, 2002, Gold- stein, 2003). Many multilevel observational studies and controlled experiments produce missing data. In cluster-randomized experiments, the dominant design involves the random assignment of whole schools, hospitals, or communities, rather than students, patients, or adults to treatments (Bingenheimer and Raudenbush, 2004). Multilevel analysis is pervasive in health, education, and social science studies (Datar and Sturm, 2004, Gable, Chang, and Krull, 2007, Danner, 2008, Shin and Raudenbush, 2010). Surveys involve multi-stage sam- pling designs (Tourangeau, Nord, Leˆ, Sorongon, and Najarian, 2009). A ubiquitous problem is that explanatory as well as outcome variables may be subject to missingness at any of the levels. In longitudinal studies, hierarchical data subject to missingness may be estimated by maximum likelihood (ML) in a structural equation model (SEM) where latent means include missing data (Allison, 1987, Muthe´n, Kaplan, and Hollis, 1987, Muthe´n, 1993, Arbuckle, 1996, Enders and Peugh, 2004). SEM software such as Mplus (Muthe´n and Muthe´n, 2010), Amos (Arbuckle, 2003) and EQS (Bentler, 2007) performs ML estimation of such models. When these models are formulated by multi-group analysis, the number of groups is the number of missing patterns (Allison, 1987, Muthe´n et al., 1987, Muthe´n, 1993). Recent advances have extended the single-level methods to multilevel ignorable missing data. Liu, Taylor, andBelin(2000)consideredBayesinferencetolongitudinaldesignshaving a fixed within-subject design with repeated measures at level 1 nested within persons at level 2 where the data are missing at both levels. Schafer and Yucel (2002) developed Bayes and ML inference for a broader class of two-level designs in which the level-1 design could vary across level-2 units with level-1 data subject to missingness. Goldstein and Browne (2002, 2005) took a Bayesian approach to a two-level factor analysis where missing outcomes were imputed by a Gibbs sampling step. Shin and Raudenbush (2007, 2010) extended these methods to a two-level model where the outcome and covariates may have missing values at both levels. Shin and Raudenbush (2011) and Shin (2012) illustrated an efficient ML method to estimate a three-level model with incomplete data. To estimate a three-level hierarchical linear model with incomplete data, Yucel (2008) modified a single-level imputation method (Schafer, 1997, 1999) and a two-level imputation method (Schafer and Yucel, 2002) to carry out the Gibbs Sampler to sequentially impute cluster-level missing values, intermediate- level missing values given the multiply imputed cluster-level data and then, the lowest-level missing values given the multiply imputed data at higher levels. These advances guide us with continuous outcomes. Goldstein, Carpenter, Kenward, and Levin (2009) and Goldstein 2 and Kounali (2009) used a Markov Chain Monte Carlo method to impute a mixture of continuous and discrete outcomes subject to missingness in a two-level model. ShinandRaudenbush(2007)illustratedtwowaystohandletwo-levelmissingdata: direct ML estimation (MLE on Y ) and a two-stage procedure of multiple imputation followed by obs the second stage analysis of the multiply imputed data (MLE on Ymi). This paper general- izesthetwomethodstoanarbitrarynumberofQlevelsandanarbitrarynumberofoutcomes defined at any level. A key emphasis in this paper is the difference in logic and assumptions between the two methods. Using MLE on Y , one first writes down a desired hierarchical obs model, then reparameterizes the model in terms of the joint distribution of outcome and co- variates subject to missingness given the completely observed covariates. Great care must be taken so that the transformation is one-to-one in order to insure unbiased estimation (Shin and Raudenbush, 2007, 2010). This task generally requires the imposition of constraints on regression surfaces at each level to avoid the proliferation of covariance components at higher levels of the joint distribution. One challenge for this paper is to formulate these constraints within the framework of the Q-level model. We show that the unconstrained joint model identifies contextual effects (Raudenbush and Bryk, 2002) and interaction effects that are typically extraneous for the desired model. In contrast, MLE on Ymi generally implies that the imputation model should be unconstrained, allowing the data analyst to impose the desired constraints at the second stage when using conventional software to analyze the imputed data. A challenge with MLE on Ymi is that the need to avoid constraints at stage one may lead to the formulation of extremely high-dimensional imputation models that may be difficult to estimate well. The two methods have characteristic advantages and disadvantages. MLE on Y imposes more assumptions than does MLE on Ymi, because obs MLE on Y imposes distributional assumptions on all variables subject to missingness obs while for MLE on Ymi, such assumptions do not affect the observed data. Given the pluses and minuses, this paper considers a hybrid approach that combines the two methods. Our aim is to formulate a Q-level model that unifies single- and multi-level models into a single expression, facilitating extension of existing missing data methods to an arbitrary number of levels of a linear model with efficient estimation and computation. The model has two representations: a hierarchical linear model for a response variable conditional on covariates, and a marginal model that represents the joint distribution of all variables - the response and covariates - that are subject to missingness conditional on the completely observed covariates. It is essential to clarify the relationship between these two models; in particular, the conditional model is always equivalent to the joint model after imposing certain constraints on the more general joint model. To clarify the needed constraints in a general Q-level setting, we find it revealing to re-parameterize both forms of the model such that all variables subject to missingness are decomposed orthogonally by level. In this model, random components are correlated within but uncorrelated between levels. The required constraints then fall out naturally for any number of levels. The next section defines the joint model and decomposes all variables subject to missing- ness into orthogonal random components. Section 3 considers the problem of estimation and multiple imputation. The orthogonal decomposition is helpful. The key insight is that, if we stack the joint model by level, we can write down estimation formulas at level q using level-q data only given higher-level computation components that are uniform for all q even if we ignore all other-level data. This recursive representation yields efficient computations using 3 conventional ML methods such as the EM algorithm (Dempster, Laird, and Rubin, 1977). Thus, orthogonal decomposition by level transforms a seemingly intractable computational problem into a sequence of familiar, solvable problems as described in Section 3. Section 4 introducestheconditionalhierarchicallinearmodel. Weshowthatthejointmodelrepresents more parameters than desired in the hierarchical model, and describe how to constrain the joint model for identification of the desired hierarchical model. Incorporating the constraints is essential to avoid bias for MLE on Y . For MLE on Ymi, the constraints do not reduce obs bias but may nonetheless be practically necessary for computation involving high dimen- sional models. Analyzing data from a large, nationally representative longitudinal sample of children, we illustrate these methods to study predictors of the growth of body mass index (BMI) between ages 5 to 15 in Section 5. This is a three-level problem, with up to seven repeated measures on children who are sampled within their elementary schools. Section 6 concludes with a discussion of limitations and next steps in the Q-level research agenda. 2 Joint Model All of the models described in this article are subsets of a multilevel p-variate model Y = µ+Zb ∼ N(µ,V), b ∼ N(0,Π) (1) where every element of Y is subject to missingness, µ may be a linear function of completely observed covariates, Z is a matrix of completely observed covariates having random effects b and V = ZΠZT. For simplicity of exposition, we shall assume µ = 0 in this section. We partition Y = [YT YT ···YT]T such that Y is a vector of p variables at level q in a hierarchy 1 2 Q q q of Q levels for p = (cid:80)Q p . In the case of Q = 2 where occasions are nested within children, q=1 q for example, elements of Y such as body mass index and daily TV viewing hours vary across 1 occasions at the lower level 1; and elements of Y such as years of highest parent education 2 and birth weight vary among children at the higher level 2. The variance covariance matrix V may be structured by the fact that Y varies at level q q or higher. In the case of Q = 2, for example, the body mass index in Y varies within 1 as well as between children while the birth weight in Y varies between children but not 2 within a child. Thus, we partition Z = (cid:76)Q Z = diag{Z ,Z ,···,Z }, a diagonal matrix q=1 q 1 2 Q having diagonal submatrices (Z ,Z ,···,Z ), and b = [bT bT ···bT]T, and decompose Y 1 2 Q 1 2 Q q orthogonally by level as Y = Z b ∼ N (0,V ), b ∼ N (0,Π ) (2) q q q qq q qq for Z = [Z Z ···Z ], b = [(cid:15)T (cid:15)T ···(cid:15)T ]T and V = Z Π ZT where Z is a q qq (q+1)q Qq q qq (q+1)q Qq qq q qq q rq matrix of known covariates having level-r unit-specific random effects (cid:15) ∼ N(0,πr ) for rq qq subscript and superscript r denoting the level of variation. The orthogonal random effects (cid:15) are independent between levels so that Π = (cid:76)Q πr , but correlated within levels by rq qq r=q qq (cid:34) (cid:35) 0 cov((cid:15) ,(cid:15) ) = πr so that cov(b ,b ) = Π = for q < s. Then, Π = [Π ], rq rs qs q s qs (cid:76)Q πr ij r=s qs a matrix with (i,j) submatrices Π for i,j = 1,2,···,Q. We may now express Equation ij 4 (2) as Y = (cid:80)Q Z (cid:15) having cov(Y ,Y ) = V = Z Π ZT = (cid:80)Q vr for q ≤ s where q r=q rq rq q s qs q qs s r=s qs vr = Z πr ZT, and decompose V = [V ] for i,j = 1,2,···,Q by the level of variation as qs rq qs rs ij  Y   v1 0 ··· 0   v2 v2 ··· 0   vQ vQ ··· vQ  1 11 11 12 11 12 1Q var Y...2  =  0... 0... ·.·..· 0... + v2...21 v2...22 ·.·..· 0... +···+ v2...Q1 v2...Q2 ·.·..· v2Q...Q . (3)         Y 0 0 ··· 0 0 0 ··· 0 vQ vQ ··· vQ Q Q1 Q2 QQ To reveal the orthogonal decomposition explicitly, we show all random components of the joint model (1) in Table 1. Row label Y indicates a vector of variables that are decomposed q Table 1: All random components of Y in Equation (1). 1 2 3 ··· Q Y (cid:15) (cid:15) (cid:15) ··· (cid:15) 1 11 21 31 Q1 Y (cid:15) (cid:15) ··· (cid:15) 2 22 32 Q2 Y (cid:15) ··· (cid:15) 3 33 Q3 ... ... ... Y (cid:15) Q QQ into the random components b = [(cid:15)T (cid:15)T ···(cid:15)T ]T listed in the row. The random compo- q qq (q+1)q Qq nents (b ,b ,···,b ) enable us to write down estimation formulas at level q that are uniform 1 2 Q for all q and that use level-q data only given higher-level computation components. This representation facilitates construction of efficient computation formulas as will be explained in Section 3. The column label shows the level at which the random components in the column vary. Table 1 lists random effects that are correlated within, but uncorrelated across columns or levels. Column q in Table 1 lists level-q unit-specific random effects (cid:15) ∼ N(0,π ) q q for  (cid:15)   πq πq ··· πq  q1 11 12 1q  (cid:15)   πq πq ··· πq   q2   21 22 2q  (cid:15)q =  ... , πq =  ... ... ... ... . (4)     (cid:15) πq πq ··· πq qq q1 q2 qq Thus, all random effects b of the joint model (1) may also be expressed as ((cid:15) ,(cid:15) ,···,(cid:15) ), 1 2 Q displayed vertically in Table 1. Their parameters are π = (π ,π ,···,π ). This expression 1 2 Q is useful for deriving estimators as will be described in the next Section. 3 Estimation of the Joint Model The orthogonal decomposition by level shown in Table 1 enables us to write down the joint model (1) as a familiar mixed linear model (2) for any level q. Next, we shall exploit the fact that if we stack these level-specific models such that there is an equation at level q and 5 a second equation stacked at all levels higher than q, we can write down familiar estimation formulas that use level-q data only given higher-level computation components even when we completely ignore all data at other levels. Moreover, the estimation formulas remain uniform for all q to produce efficient computation formulas as will be explained below. Therefore, the orthogonal decomposition of the joint model (1) enables us to obtain general, recursive, and familiar estimation formulas for the Q-level problem. 3.1 Structure of Joint Model for All Data at or above Level q Based on the model (2) at level q, we stack Yq = [YT YT ···YT]T, Zq = (cid:76)Q Z and q q+1 Q r=q r bq = [bT bT ···bT]T to express the joint model (1) at level q or above as q q+1 Q Yq = Zqbq ∼ N(0,Vqq), bq ∼ N(0,Πqq) (5) (cid:34) (cid:35) (cid:34) (cid:35) (cid:34) (cid:35) (cid:34) (cid:35) Y Z 0 b Π Πq(q+1) for Yq = q , Zq = q , bq = q , Πqq = qq and Yq+1 0 Zq+1 bq+1 Π(q+1)q Π(q+1)(q+1) (cid:34) (cid:35) V Vq(q+1) (cid:104) (cid:105) Vqq = ZqΠqqZqT = qq where Πq(q+1) = Π Π ··· Π and V(q+1)q V(q+1)(q+1) q(q+1) q(q+2) qQ (cid:104) (cid:105) Vq(q+1) = Z Πq(q+1)Z(q+1)T = V V ··· V for the transpose Z(q+1)T of Zq+1. q q(q+1) q(q+2) qQ Note that we express Yq, Πqq and Vqq uniformly for all q to contain Yq+1, Π(q+1)(q+1) and V(q+1)(q+1), respectively. For Q = 3, for example, Y3 = Z3b3 ∼ N(0,V33) for Y3 = Y , Z3 = 3 Z = Z , b3 = b = (cid:15) , Π33 = Π = π3 , V33 = V = Z Π ZT; Y2 = Z2b2 ∼ N(0,V22) for 3 33 3 33 33 33 33 3 33 3 (cid:34) (cid:35) (cid:34) (cid:35) (cid:34) (cid:35) (cid:34) (cid:35) (cid:34) (cid:35) Y Z 0 b Π Π23 V V23 Y2 = 2 , Z2 = 2 , b2 = 2 , Π22 = 22 and V22 = 22 Y3 0 Z3 b3 Π32 Π33 V32 V33 (cid:34) (cid:35) (cid:34) (cid:35) (cid:15) 0 where Z = [Z Z ], b = 22 , Π = (cid:76)3 πr , Π23 = Π = and V23 = V = 2 22 32 2 (cid:15) 22 r=2 22 23 π3 23 32 23 (cid:34) (cid:35) (cid:34) (cid:35) (cid:34) (cid:35) Y Z 0 b Z Π ZT; and Y1 = Z1b1 ∼ N(0,V11) for Y1 = 1 , Z1 = 1 , b1 = 1 , 2 23 3 Y2 0 Z2 b2   (cid:34) Π Π12 (cid:35) (cid:34) V V12 (cid:35) (cid:15)11 Π11 = 11 and V11 = 11 where Z = [Z Z Z ], b =  (cid:15) , Π21 Π22 V21 V22 1 11 21 31 1  21  (cid:15) 31     0 0 0 Π = (cid:76)3 πr , Π12 = [Π Π ] and V12 = [V V ] for Π =  π2 0 , Π =  0 , 11 r=1 11 12 13 12 13 12  12  13   0 π3 π3 12 13 V = Z Π ZT and V = Z Π ZT. 12 1 12 2 13 1 13 3 Equation (5) for q = 1 expresses the joint model (1) where Z1 may include covariates having random effects at all levels. Often, the model (5) itself is of interest (Shin and Raudenbush, 2011, Shin, 2012). For a positive integer n, let I denote an n by n identity n matrix. In this paper, we focus on estimation of the model (5) where Z = I for all q as qq pq many applications do. In what follows, we use the Kronecker product A⊗B that multiplies matrix B to each scalar element of matrix A (Magnus and Neudecker, 1988). In particular, I ⊗B = diag{B,···,B}. n 6 3.2 Estimation In deriving estimators, it is essential to aggregate the stacked-up joint model (5) at the highest level-Q cluster m. We refer to subscript m as a unit at the highest level Q for m = 1,2,···,N ; N asthenumberofunitsatlevelq nestedwithintheclusterm; andN = Q qm q (cid:80)NQ N , hereafter. Let Y = [YT YT ···YT ]T and (cid:15) = [(cid:15)T (cid:15)T ···(cid:15)T ]T m=1 qm qm q1m q2m qNqmm qsm qs1m qs2m qsNqmm aggregate all Y and (cid:15) in cluster m for s ≤ q. The level-q unit-specific random effects in q qs column q of Table 1 are aggregated to form the aggregated Equations (4) for cluster m  (cid:15)   I ⊗πq I ⊗πq ··· I ⊗πq  q1m Nqm 11 Nqm 12 Nqm 1q  (cid:15)   I ⊗πq I ⊗πq ··· I ⊗πq  (cid:15)qm =  q2...m , var((cid:15)qm) =  Nqm... 21 Nqm... 22 ... Nqm... 2q .     (cid:15) I ⊗πq I ⊗πq ··· I ⊗πq qqm Nqm q1 Nqm q2 Nqm qq Table 1 may also be aggregated to reveal all random components for cluster m. The aggre- gated table is of the same form as Table 1 with row label Y replacing Y , the same column qm q label and the vector (cid:15) instead of (cid:15) in column q. The random components in row Y of qm q qm the table are b = [(cid:15)T (cid:15)T ···(cid:15)T ]T to form the aggregated model (2) qm qqm (q+1)qm Qqm Y = Z b ∼ N(0,V ), b ∼ N(0,Π ) (6) qm qm qm qqm qm qqm for a conformable matrix of known covariates Z = [Z Z ···Z ], Π = qm qqm (q+1)qm Qqm qqm (cid:76)Q I ⊗ πr and V = Z Π ZT where Z = I . Then, cov(b ,b ) = r=q Nrm qq qqm qm qqm qm qqm Nqm×pq qm sm (cid:34) (cid:35) 0 Π = for q < s, and Y = (cid:80)Q Z (cid:15) has cov(Y ,Y ) = qsm (cid:76)Q I ⊗πr qm r=q rqm rqm qm sm r=s Nrm qs V = Z Π ZT = (cid:80)Q Z (I ⊗πr )ZT for q ≤ s. qsm qm qsm sm r=s rqm Nrm qs rsm Next,westackYq = [YT YT ···YT ]T,Zq = (cid:76)Q Z andbq = [bT bT ···bT ]T m qm (q+1)m Qm m r=q rm m qm (q+1)m Qm to express the aggregated model (5) uniformly for all q as Yq = Zq bq ∼ N(0,Vqq), bq ∼ N (0,Πqq) (7) m m m m m m (cid:34) (cid:35) (cid:34) (cid:35) (cid:34) (cid:35) (cid:34) (cid:35) Y Z 0 b Π Πq(q+1) for Yq = qm , Zq = qm , bq = qm , Πqq = qqm m and m Yq+1 m 0 Zq+1 m bq+1 m Π(q+1)q Π(q+1)(q+1) m m m m m (cid:34) (cid:35) V Vq(q+1) (cid:104) (cid:105) Vqq = Zq ΠqqZqT = qqm m where Πq(q+1) = Π Π ··· Π m m m m V(q+1)q V(q+1)(q+1) m q(q+1)m q(q+2)m qQm m m (cid:104) (cid:105) and Vq(q+1) = Z Πq(q+1)Z(q+1)T = V V ··· V . By expressing Z = m qm m m q(q+1)m q(q+2)m qQm qm [I Z ] having b = [(cid:15)T (cid:15)T ]T for Z = [Z ···Z ] and (cid:15) = Nqm×pq −qqm qm qqm −qqm −qqm (q+1)qm Qqm −qqm [(cid:15)T ···(cid:15)T ]T so that (cid:15) ∼ N(0,I ⊗ πq ), (cid:15) ∼ N(0,Φ ) and Πq(q+1) = (q+1)qm Qqm qqm Nqm qq −qqm qqm m (cid:34) (cid:35) (cid:34) (cid:35) cov((cid:15) ,bq+1) 0 qqm m = for Φ = (cid:76)Q I ⊗ πr , we structure Vqq = cov((cid:15) ,bq+1) Φq(q+1) qqm r=q+1 Nrm qq m −qqm m m (cid:34) (cid:35) I ⊗πq +Z Φ ZT Z Φq(q+1)Z(q+1)T Nqm qq −qqm qqm −qqm −qqm m m in a familiar form (Shin and Rau- Zq+1Φq(q+1)TZT V(q+1)(q+1) m m −qqm m denbush,2007)thatisuniformforallq andhasrecursiveV(q+1)(q+1) = Zq+1Π(q+1)(q+1)Z(q+1)T. m m m m Shin and Raudenbush (2007) illustrated how to efficiently estimate the two-level model 7 (7) for Y1 = [YT YT ]T by ML via the EM algorithm where Y and Y are vectors of m 1m 2m 1m 2m arbitrary length. The key insight of the model (7) is to express the familiar two-level form Yq = [YT Y(q+1)T]T uniformly at each level q and estimate it by the method of Shin and m qm m Raudenbush (2007) given computation components at level q +1, starting from the highest level q = Q until we estimate the desired model for Y1 with arbitrary Q levels at q = 1. The m initial step is to estimate the single-level model (7) for q = Q, a special case of Shin and Raudenbush’s two-level model. We formalize the recursive estimation within each iteration of the EM algorithm after defining notation for estimation below. A major advantage of this approach is that the step-q estimation uses only level-q data given higher-level computation components for efficient computation as will be shown in the following section. To express the relationship between complete and observed data, let O = (cid:76)NqmO qm i=1 qim be a matrix of dummy indicators for observed values in Y = [YT YT ···YT ]T so qm q1m q2m qNqmm that Y = O Y and Z = O Z = [O Z ] for Z = O Z qmobs qm qm qmobs qm qm qm −qqmobs −qqmobs qm −qqm (Shin and Raudenbush, 2007). At level Q, N = 1 and O = O for all m. Let Qm Qm Q1m Yq = Oq Yq and Zq = Oq Zq for Oq = (cid:76)Q O to express the observed model (7) mobs m m mobs m m m r=q rm Yq = Zq bq ∼ N(0,Vqq ), Vqq = Oq VqqOqT = Zq ΠqqZqT . (8) mobs mobs m mobs mobs m m m mobs m mobs Estimation of π = (π ,π ,···,π ) is carried out via the EM algorithm. The complete 1 2 Q data (CD) for cluster m may be viewed as b1 given π. Let (cid:15) = (b1,b1,···,b1 ) for the m 1 2 NQ entire sample. If we denote (cid:15) = (cid:15) in column q of Table 1 for unit i nested within cluster qim q m, the CD log likelihood of π may be expressed as l(π|(cid:15)) = (cid:80)Q (cid:80)NQ (cid:80)Nqmlnf((cid:15) |π) q=1 m=1 i=1 qim for the density f((cid:15) |π) of level-q unit-specific (cid:15) ∼ N(0,π ). The CD ML estimators qim qim q are πˆ = (cid:80)NQ (cid:80)Nqm(cid:15) (cid:15)T /N . The E components are from b1 |Y1 ∼ N(˜b1 ,Π˜11) the q m=1 i=1 qim qim q m mobs m m conventional estimation of which requires inversion of V11 that may be extremely high mobs dimensional. Theorthogonaldecompositionbylevelofthejointmodel(1)enablesexpression of bq |Yq ∼ N(˜bq ,Π˜qq) that is uniform for all q and based on level-q data only given m mobs m m computation components from higher levels. As will be explained below, this recursive expression yields successive Q-step estimation formulas from (˜bQ,Π˜QQ) down to (˜b1 ,Π˜11) m m m m for efficient computation of the E step without directly inverting V11 . mobs To estimate fixed effects, let µ = [µ µ ···µ ] for µ = X β in the model (1) where X is 1 2 Q q q q q a matrix of completely observed covariates having fixed effects β . We replaceY = Z b with q q q q d = Y −X β = Z b in the level-q model (2) to express the stacked-up model (5) as dq = q q q q q q Yq −Xqβq = Zqbq for dq = [dT dT ···dT]T, Xq = (cid:76)Q X and βq = [βT βT ···βT]T. q (q+1) Q r=q r q (q+1) Q Let d = Y − X β aggregate d = Y − X β such that the corresponding model (7) qm qm qm q q q q q has dq = Yq −Xq βq = Zq bq for dq = [dT dT ···dT ]T and Xq = (cid:76)Q X . Then, m m m m m m qm (q+1)m Qm m r=q rm dq = Oq dq and Xq = Oq Xq so that the observed model (8) is dq = Yq − mobs m m mobs m m mobs mobs Xq βq = Zq bq . The desired parameters are θ = (π,β) for β = β1. For simplicity of mobs mobs m notations, let V−q and V−(q+1) denote the inverses of Vqq and V(q+1)(q+1), respectively. mobs mobs mobs mobs Given current β , the Fisher scoring equivalent to the Newton-Raphson update is 0  −1 NQ NQ βˆ = β + (cid:88) X1T V−1 X1 (cid:88) X1T V−1 d1 . (9) 0  mobs mobs mobs mobs mobs mobs m=1 m=1 8 ThefollowingsectiondescribesefficientrecursivecomputationofβˆbasedonXqT V−q Xq mobs mobs mobs and XqT V−q dq . mobs mobs mobs Equation (7) expresses a single-level GLM when Z1 is an identity matrix and b1 has m m Π11 = Π11 for all m. The clustering of sample data discussed above gives rise to a Q-level m GLM. Next, we show that the aggregated joint model (7) enables us to write down efficient Q-step recursive estimation formulas the qth-step computation of which involves level-q data only and thus is not unduly burdened with respect to the number of Q levels, p variables and random effects. 3.3 Efficient Computation The conventional E step based on b1 |Y1 ∼ N(˜b1 ,Π˜11) requires inversion of V11 which m mobs m m mobs may be extremely high dimensional and, thus, take long to compute within each iteration of the EM algorithm. On the other hand, the E step based on Equation (8) produces Q-step estimation formulas the qth step of which is to estimate bq |Yq ,θ ∼ N(˜bq ,Π˜qq) where m mobs m m ˜bq = ΠqqAq , Π˜qq = Πqq −ΠqqBq Πqq (10) m m mobs m m m mobs m for Aq = ZqT V−q dq and Bq = ZqT V−q Zq . The key advantages of the E step mobs mobs mobs mobs mobs mobs mobs mobs via Equations (10) are that ˜bq and Π˜qq stay uniform for all q; that computation of ˜bq and m m m Π˜qq uses level-q data only, given Aq+1 ,Bq+1 and θ; that the expressions (10) enable efficient m mobs mobs computation of ˜b1 and Π˜11 via computation of recursive components Aq and Bq ; and m m mobs mobs that the E step does not require direct inversion of V11 . Estimation of ˜b1 and Π˜11 starts mobs m m at the highest level q = Q with initial components AQ = OT V−Q d , BQ = OT V−Q O (11) mobs Qm mobs Qmobs mobs Qm mobs Qm for VQQ = πQ = O πQ OT , computes Aq and Bq using level-q data only, given mobs QQm Qm QQ Qm mobs mobs Aq+1 and Bq+1 at step q, and finally evaluates Equations (10) given A1 and B1 at mobs mobs mobs mobs q = 1 within each iteration of the EM algorithm. To formulate the recursive computation, let (cid:15)˜ = E((cid:15) |Yq ) = ∆−1(Z ψ−1d +Ω−1˜(cid:15)˜ ), (12) −qqm −qqm mobs qm −qqmobs qm qmobs qm −qqm V−q11 = ψ−1 −ψ−1Z ∆−1ZT ψ−1, mobs qm qm −qqmobs qm −qqmobs qm (cid:34) V−q11 V−q12 (cid:35) V−q = mobs mobs mobs V−q21 V−q22 mobs mobs where ∆ = ZT ψ−1Z + Ω−1, ψ = (cid:76)Nqmπq , Ω = var((cid:15) |Yq+1) = qm −qqmobs qm −qqmobs qm qm i=1 qqim qm −qqm mobs Φ −Φq(q+1)Bq+1 Φq(q+1)T and˜(cid:15)˜ = E((cid:15) |Yq+1) = Φq(q+1)Aq+1 forπq = O πq OT . qqm m mobs m −qqm −qqm mobs m mobs qqim qim qq qim Note that computation of (cid:15)˜ and V−q11 requires level-q Z and d only, given −qqm mobs −qqmobs qmobs Aq+1 , Bq+1 and θ. The following result shows that Aq and Bq depend on level-q data mobs mobs mobs mobs only, given higher-level components Aq+1 and Bq+1 . See the Appendix for proofs. mobs mobs 9 Proposition 3.1 ZqT V−q dq andZqT V−q Zq dependonlevel-q dataY , X mobs mobs mobs mobs mobs mobs qmobs qmobs and Z only, given Z(q+1)TV−(q+1)dq+1 ,Z(q+1)TV−(q+1)Zq+1 and θ for all q < Q. −qqmobs mobs mobs mobs mobs mobs mobs Proposition 3.1 enables a Q-step recursive computation of ˜b1 and Π˜11 the qth step of which m m involves level-q data only without directly inverting V11 . mobs Theorem 3.2 ˜b1 and Π˜11 can be computed by a Q-step recursive procedure from level Q m m down to level 1 given θ where step q involves level-q data only. The E step for Q = 3, for example, computes A3 = OT π−3 d and B3 = mobs 3m 33m 3mobs mobs OT π−3 O in Equations (11) initially for the inverse π−3 of π3 ; A2 and B2 in 3m 33m 3m 33m 33m mobs mobs Equations (32) and (33) given A3 and B3 at q = 2; A1 and B1 in Equations (32) mobs mobs mobs mobs and (33) given A2 and A2 to finally yield ˜b1 and Π˜11 in Equations (10) at q = 1. mobs mobs m m Fisher scoring on β may also be based on recursive computation of Aq , Bq , mobs mobs Fq = XqT V−q dq , Gq = XqT V−q Xq , Hq = ZqT V−q Xq . (13) mobs mobs mobs mobs mobs mobs mobs mobs mobs mobs mobs mobs The following result shows that Fq , Gq and Hq depend on level-q data only, given mobs mobs mobs Aq+1 , Bq+1 , Fq+1 , Gq+1 , Hq+1 and θ. mobs mobs mobs mobs mobs Proposition 3.3 XqT V−q dq , XqT V−q Xq and ZqT V−q Xq depend on level- mobs mobs mobs mobs mobs mobs mobs mobs mobs q data Y , X and Z only, given Z(q+1)TV−(q+1)dq+1 , Z(q+1)TV−(q+1)Zq+1 , qmobs qmobs −qqmobs mobs mobs mobs mobs mobs mobs X(q+1)TV−(q+1)dq+1 , X(q+1)TV−(q+1)Xq+1 , Z(q+1)TV−(q+1)Xq+1 and θ for all q < Q. mobs mobs mobs mobs mobs mobs mobs mobs mobs ˆ Propositions 3.1 and 3.3 enable β to be computed recursively. Theorem 3.4 X1T V−1 X1 and X1T V−1 d1 can be computed by a Q-step recursive mobs mobs mobs mobs mobs mobs procedure from level Q down to level 1 given θ where step q involves level-q data only. The Fisher scoring on β for Q = 3, for example, computes A3 = OT π−3 d , mobs 3m 33m 3mobs B3 = OT π−3 O , F3 = XT π−3 d , G3 = XT π−3 X and H3 = mobs 3m 33m 3m mobs 3mobs 33m 3mobs mobs 3mobs 33m 3mobs mobs OT π−3 X initially; A2 , B2 , F2 , G2 and H2 in Equations (32) to (36) 3mobs 33m 3mobs mobs mobs mobs mobs mobs from level-2 data, given A3 , B3 , F3 , G3 and H3 at q = 2; F1 and G1 mobs mobs mobs mobs mobs mobs mobs in Equations (34) and (35) from level-1 data given A2 , B2 , F2 , G2 and H2 to mobs mobs mobs mobs mobs ˆ finally yield β in Equation (9) at q = 1. The recursive estimation efficiently handles missing data one level at a time. Conse- quently, the computation will be efficient given a number of variables subject to missingness at higher levels. In that case, the observed joint model (8) yields recursive computation that is not excessively burdened with respect to Q, p and the number of random effects. On the other hand, given missing data at level 1 only, this approach amounts to the conventional ˆ EM algorithm. The inverse of the Fisher information matrix yields var(θ). Multiple imputation is based on Y1|Y1 ,θˆ∼ N (cid:16)X1β1 +Z1˜b1 ,Z1Π˜11Z1T(cid:17) given the m mobs m m m m m m ML θˆ. Let vii and vij be variances of log(πii) and log11+−ρρiijj where i (cid:54)= j and ρij = √ππiiijπjj for di- (cid:16) (cid:17) agonal π and off-diagonal π in (π ,π ,···,π ). Then, N[log(πˆ ),vˆ ] and N log1+ρˆij,vˆ ii ij 1 2 Q ii ii 1−ρˆij ij estimated by ML imply the joint distribution of a vector, φ , of distinct log(π ) and log1+ρij q ii 1−ρij (cid:104)ˆ ˆ (cid:105) ˆ ˆ from π . Let the distributions of φ and β be N φ ,var(φ ) and N[β,var(β)] estimated by q q q q 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.