ebook img

LMest: an R package for latent Markov models for categorical longitudinal data PDF

0.29 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview LMest: an R package for latent Markov models for categorical longitudinal data

R LMest: an package for latent Markov models for categorical longitudinal data 5 1 Francesco Bartolucci Alessio Farcomeni Silvia Pandolfi 0 2 University of Perugia Sapienza - University of Rome University of Perugia n a Fulvia Pennoni J 9 University of Milano-Bicocca 1 ] January 20, 2015 O C . t a Abstract t s [ LatentMarkov (LM)modelsrepresentanimportantclassofmodelsfortheanal- 1 ysisoflongitudinaldata(Bartolucci et al.,2013),especially whenresponsevariables v 8 are categorical. These models have a great potential of application for the analysis 4 4 of social, medical, and behavioral data as well as in other disciplines. We pro- 4 pose the R package LMest, which is tailored to deal with these types of model. In 0 . particular, we consider a general framework for extended LM models by including 1 0 individual covariates and by formulating a mixed approach to take into account ad- 5 1 ditional dependence structures in the data. Such extensions lead to a very flexible : v class of models, which allows us to fit different types of longitudinal data. Model i X parameters are estimated through the expectation-maximization algorithm, based r on the forward-backward recursions, which is implemented in the main functions of a the package. The package also allows us to perform local and global decoding and to obtain standard errors for the parameter estimates. We illustrate its use and the most important features on the basis of examples involving applications in health and criminology. Keywords: expectation-maximizationalgorithm, forward-backwardre- cursions, hidden Markov model 1 1 Introduction In this paper we illustrate the R package LMest V2.0, which replaces version V1.0, avail- able from http://CRAN.R-project.org/package=LMest. The new version extends V1.0 in several ways and it is aimed at providing a collection of tools that can be used to estimate the class of Latent Markov (LM) models for longitudinal categorical data. The package can be seen as a companion to the book Bartolucci et al. (2013) that is focused on these models; we will closely refer to this book for the theoretical and some practical details. Additional insights are given in the discussion paper by Bartolucci et al. (2014b). LM models are designed to deal with univariate and multivariate longitudinal data based on the repeated observation of a panel of subjects across time. More in detail, LM models are specially tailored to study the evolution of an individual characteristic of in- terest that is not directly observable. This characteristic is represented by a latent process following a Markov chain as in a Hidden Markov (HM) model (Zucchini and MacDonald, 2009). These models also allow us to account for time-varying unobserved heterogeneity in addition to the effect of observable covariates on the response variables. The initial LMformulationintroduced byWiggins (1973), see alsoWiggins (1955), has been developed in several directions and in connection with applications in many fields, suchaspsychology, sociology,economics, andmedicine. Inparticular,thebasicLMmodel, relying on a homogenous Markov chain of first order, has several extensions based on pa- rameterizations that allow us to include hypotheses and constraints of interest. The most important extension is for the inclusion of individual covariates that may affect the distri- bution of the latent process (Vermunt et al., 1999; Bartolucci et al., 2007) or the condi- tional distribution of the response variables given this process (Bartolucci and Farcomeni, 2009). LM models may also be formulated to take into account additional dependence structures in the data. In particular, the basic LM model has been generalized to the mixed LM approach by van de Pol and Langeheine (1990) and to the LM model with random effects (Altman, 2007; Maruotti, 2011; Bartolucci et al., 2011). These formu- lations are related to extended models in which the parameters are allowed to vary in different latent subpopulations; for a complete review see Bartolucci et al. (2013) and 2 Bartolucci et al. (2014b). Asalreadymentioned, LMmodelsarecloselyrelatedtoHMmodels(Zucchini and MacDonald, 2009), which are specific for time-series data. LM models are formulated quite similarly, but they are tailored for longitudinal data where many subjects are observed only at few occasions, typically no more than ten. Most importantly, differently from LM models, extensions of HM models to include covariates or for complex data structures are rarely applied because these structures are typical of longitudinal studies. In R there are some packages that can handle models that are related to LM models. In particular, HM models can be fit through packages HMM (Himmelmann, 2010), Hid- denMarkov (Harte, 2012), and depmixS4 (Visser and Speekenbrink, 2010). The last one is the most closely related to LMest as it is tailored to deal with HM models based on a generalized linear formulation, which can be specified conditionally on individual covari- ates. depmixS4 is nonetheless designed for repeated measurements on a single unit, as in a time-series context. Packages mhsmm (O’Connell and Hojsgaard, 2011) and hsmm (Bulla and Bulla, 2013) deal with hidden semi-Markov models, which are HM models in which a separate assumptions are specified for persistency in a latent state. Finally, pack- age msm (Jackson, 2011) can deal with HM and related models evolving in continuous time. A commercial software which can be used to fit certain LM models is Latent Gold (Vermunt and Magidson, 2003). Additionally, certain Matlab toolboxes are also available for this aim; see for example the HMM toolbox proposed by Murphy (1998). The distinguishing features of the LMest package with respect to the ones above are the following: • it is primarily designed to work with longitudinal data, that is, with (even many) i.i.d. replicates of (usually short) sequences; • it can deal with univariate and multivariate categorical outcomes; • standard errors for the parameter estimates are obtained by exact computation or through reliable approximations of the observed information matrix; • additional random effects can be used to build mixed LM models; 3 • individual covariates are included through efficient parameterizations of either the measurement or the latent model; • computationally efficient algorithms are implemented for predictions; • the package is flexible and it minimizes the use of time resources by relying on certain Fortran routines. The paper is organized as follows. Section 2 briefly outlines the general formulation of LM models and their maximum likelihood estimation, together with a brief description of the basic LM model. Section 3 describes the use of the LMest package to estimate LM models with individual covariates included in the measurement model, while in Sec- tion 4 we deal with individual covariates affecting the distribution of the latent process. Both formulations are illustrated throughexamples based onreal data about self-reported health status. In Section 5 we introduce the mixed LM models and we describe the use of the R function aimed at estimating this model formulation. Here, the illustrative ex- ample is based on data in the field of criminology. Finally, Section 6 contains the main conclusions. 2 Latent Markov models for longitudinal data In the following we provide a brief review of the statistical methodology on which the LM models are based, trying to reduce the mathematical details as much as possible. The illustration closely follows the recent paper by Bartolucci et al. (2014b). We focus in particular on the maximum likelihood estimation of these models on the basis of the Expectation-Maximization (EM) algorithm (Dempster et al., 1977). We also deal with more advanced topics which are important for the correct use of the LMest package, such as the selection of the number of latent states and local and global decoding (Viterbi, 1967; Juang and Rabiner, 1991). 2.1 The general latent Markov model formulation Consider the multivariate case where we observe a vector Y (t) of r categorical response (t) variables, Y , with c categories, labeled from0 to c −1, j = 1,...,r, which areavailable j j j 4 at the t-th time occasion, t = 1,...,T. Also let Y˜ be the vector obtained by stacking Y (t) for t = 1,...,T; this vector has then rT elements. Obviously, in the univariate case we have a single response variable Y(t) for each time occasion, and Y˜ is made by T elements. When available, we also denote by X(t) the vector of individual covariates at the t-th time occasion and by X˜ the vector of all the individual covariates which is obtained by stacking the vectors X(t), for t = 1,...,T. Thegeneralformulationofthemodelassumestheexistenceofalatentprocess, denoted by U = (U(1),...,U(T)), which affects the distribution of the response variables. Such a process is assumed to follow a first-order Markov chain with state space {1,...,k}, where k is the number of latent states. Under the local independence assumption, the response vectors Y(1),...,Y (T) are assumed to be conditionally independent given the latent process U. Moreover, the elements Y(t) within Y (t), t = 1,...,T, are conditionally j independent given U(t). This assumption leads to a strong simplification of the model, but it can be relaxed by allowing serial dependence through the inclusion of the lagged response variables among covariates (see Bartolucci and Farcomeni, 2009, among others). Parameters of the measurement model are the conditional response probabilities φ(t) = p(Y(t) = y|U(t) = u,X(t) = x), j = 1,...,r, y = 0,...,c −1, jy|ux j j which reduce to φ(t) = p(Y(t) = y|U(t) = u,X(t) = x), y = 0,...,c−1, y|ux in the univariate case, where t = 1,...,T and u = 1,...,k. Parameters of the latent process are the initial probabilities πu|x = p(U(1) = u|X(1) = x), u = 1,...,k, and the transition probabilities π(t) = p(U(t) = u|U(t−1) = u¯,X(t) = x), t = 2,...,T, u¯,u = 1,...,k. u|u¯x In the expressions above, x denotes a realization of X(t), y denotes a realization of Y(t), u denotes a realization of U(t), and u¯ denotes a realization of U(t−1). j 5 On the basis of the above parameters, the conditional distribution of U given X˜ may be expressed as p(U = u|X˜ = x˜) = π π , u(1)|x(1) u(t)|u(t−1)x(t) t>1 Y whereu = (u(1),...,u(T))andx˜ denotesarealizationofthevectorofallresponsevariables X˜ . Moreover, the conditional distribution of Y˜ given U and X˜ may be expressed as p(Y˜ = y˜|U = u,X˜ = x˜) = φ(t) , y(t)|u(t)x(t) t Y where, in general, we define φ(t) = p(Y(t) = y|U(t) = u,X(t) = x) and, due to the y|ux assumption of local independence, we have (t) (t) φ = φ . y|ux jyj|ux j Y Note that in the above expressions y˜ is a realization of Y˜ made by the subvectors y(t) = (y(t),...,y(t)) whereas y is a realization of Y(t) with elements y , j = 1,...,r. 1 r j When we deal with the inclusion of individual covariates, the manifest distribution of the response variables corresponds to the conditional distribution of Y˜ given X˜ , defined as p(y˜|x˜) = p(Y˜ = y˜|X˜ = x˜) = π π(2) ···π(T) u(1)|x(1) u(2)|u(1)x(2) u(T)|u(T−1)x(T) u X (1) (T) ×φ ···φ . (1) y(1)|u(1)x(1) y(T)|u(T)x(T) In the basic version of the model, all assumptions are not formulated conditionally on individual covariates. Moreover, when such covariates are available, we suggest to avoid that they simultaneously affect both the distribution of the latent process and the conditional distribution of the response variables given this process. In fact, the two formulations have different interpretations, as explained in more detail in the following, and the resulting model would be of difficult interpretation and estimation. Finally, it is important to note that computing p(y˜|x˜), or p(y˜) in the basic version, involves a sum extended to all the possible kT configurations of the vector u; this typ- ically requires a considerable computational effort. In order to efficiently compute such a probability we can use a forward recursion (Baum et al., 1970); see Bartolucci et al. (2013) for details. 6 2.2 Maximum likelihood estimation Given a sample of n independent units that provide the response vectors y˜ ,...,y˜ , and 1 n given the corresponding vectors of covariates x˜ ,...,x˜ , the model log-likelihood assumes 1 n the following expression n ℓ(θ) = logp(y˜ |x˜ ). i i i=1 X Note that in this case each vector y˜ is a realization of Y˜ which, in the multivariate case, i i is made up of the subvectors y(t), t = 1,...,T, that, in turn, have elements y(t), j = i ij 1,...,r. Moreover, x˜ may be decomposed into the time-specific subvectors of covariates i x(1),...,x(T). Also note that p(y˜ |x˜ ) corresponds to the manifest probability of the i i i i responses provided by subject i, given the covariates; see equation (1). Moreover, θ is the vector of all free parameters affecting p(y˜ |x˜ ). i i Theabovelog-likelihoodfunctioncanbemaximizedbytheEMalgorithm(Baum et al., 1970; Dempster et al., 1977), as described in the following section. 2.2.1 Expectation-Maximization algorithm The EM algorithm is based on the complete data log-likelihood that, with multivariate categorical data, has the following expression r T k cj−1 k ℓ∗(θ) = a(t) logφ(t) + b(1) logπ juxy jy|ux ux u|x j=1 t=1 u=1 x y=0 u=1 x XXXXX XX T k k (t) (t) + b logπ , (2) u¯ux u|u¯x t=2 u¯=1 u=1 x XXXX where, with reference to occasion t and covariate configuration x, a(t) is the number of juxy (t) subjects that are in the latent state u and provide response y to response variable j, b ux (t) is the frequency for latent state u, and b is the number of transitions from state u¯ to iu¯ux state u. The EM algorithm alternates the following two steps until convergence: • E-step: it consists of computing the posterior (given the observed data) expected value of each indicator variable involved in (2) by suitable forward-backward re- 7 (t) ˆ(t) cursions (Baum et al., 1970); these expected values are denoted by aˆ , b , and juxy ux ˆ(t) b ; u¯ux • M-step: it consists of maximizing the complete data log-likelihood expressed as in (2), with each indicator variable substituted by the corresponding expected value. Howtomaximizethisfunctiondepends onthespecificformulationofthemodeland, in particular, on whether the covariates are included in the measurement model or in the latent model. We refer the reader to Bartolucci et al. (2013) for a detailed description of these steps, which are implemented in suitable functions of the LMest package. The EM algorithm could converge to a mode of the log-likelihood which does not cor- respond to the global maximum. In this regard, we suggest to combine a deterministic and a random starting rule for the EM algorithm, so to take as final estimate the one corresponding to the highest log-likelihood obtained at convergence of the algorithm; the corresponding estimate is denoted by θˆ. This may prevent problems due to the multi- modality of this function. In particular, the random initialization is based on suitably normalized random numbers drawn from a uniform distribution from 0 to 1 for the initial and transition probabilities of the Markov chain and for the conditional response prob- abilities. The convergence of the EM algorithm is checked on the basis of the relative log-likelihood difference, that is, (ℓ(θ)(s) −ℓ(θ)(s−1))/|ℓ(θ)(s)| < ǫ, (3) where θ(s) is the parameter estimate obtained at the end of the s-th M-step and ǫ is a predefined tolerance level which can be set by the users. 2.2.2 Standard errors Afterthemodel isestimated, standarderrorsmaybeobtainedonthebasisoftheobserved informationmatrix, denoted by J(θˆ), by standardrules. Inparticular, they arecomputed through the squared root of the diagonal elements of the inverse of this matrix. The LMest package computes this matrix, and then provides the standard errors, by using either the numerical method proposed by Bartolucci and Farcomeni (2009) or the exact 8 computation method proposed by Bartolucci and Farcomeni (2014b), according to the model of interest. According to the approximated method, J(θˆ) is obtained as minus the numerical derivative of the score vector s(θˆ) at convergence. The score vector, in turn, is obtained as the first derivative of the expected value of the complete data log-likelihood, which is (t) ˆ(t) ˆ(t) based on the expected frequencies aˆ , b , and b computed with the final parameter juxy ux u¯ux estimates θˆ; for details see Bartolucci et al. (2013) and Pennoni (2014). The exact computation of J(θˆ) is instead based on the Oakes’ identity (Oakes, 1999). This method uses the complete data information matrix, produced by the EM algorithm, and a correction matrix computed on the basis of the first derivative of the posterior probabilities obtained from the backward-forward recursions. For the basic LM model and for the model with individual covariates affecting the distribution of the latent process, the LMest package also provides functions aimed at performing a parametric bootstrap (Davison and Hinkley, 1997) in order to obtain stan- dard errors. 2.3 Criteria for selecting the number of latent states In certain applications, the number of latent states, k, can be a priori defined, as in the univariate case in which the number of latent states may be fixed equal to the number of response categories. Otherwise, the following criteria are typically used to select the number oflatentstates: Akaikeinformationcriterion(AIC)ofAkaike(1973)andBayesian information criterion (BIC) of Schwarz (1978). They are based on the indices ˆ AIC = −2ℓ+2 np, ˆ BIC = −2ℓ+log(n) np, ˆ where ℓ denotes the maximum of thelog-likelihoodof themodel ofinterest andnpdenotes the number of its free parameters. According to the above criteria, the optimal number of latent states is the one cor- responding to the minimum value of the above indices; this model represents the best compromise between goodness-of-fit and complexity. However, other criteria may be 9 used, such as those taking into account the quality of the classification; for reviews see Bacci et al. (2014) and Bartolucci et al. (2014a). 2.4 Local and global decoding The LMest package allows us to perform decoding, that is, prediction of the sequence of the latent states for a certain sample unit on the basis of the data observed for this unit. In particular, the EM algorithm directly provides the estimated posterior probabilities of U(t), that is, p(U(t) = u|X˜ = x˜,Y˜ = y˜), for t = 1,...,T, u = 1,...,k, and for every covariate and response configuration (x˜,y˜) observed at least once. These probabilities can bedirectly maximized to obtaina prediction of the latent stateof each subject at each time occasion t; this is the so-called local decoding. It shall be noted that local decoding minimizes the classification error at each time occasion, but may yield sub-optimal or even inconsistent predicted sequences of U(1),...,U(T). In order to track the latent state of a subject over time, the most a posteriori likely sequence of states must be obtained, through the so-called global decoding, which is based on an adaptation of the Viterbi (1967) algorithm; see also Juang and Rabiner (1991). The algorithm proceeds through a forward-backward recursion of a complexity similar to the recursions adopted for maximum likelihood estimation within the EM algorithm, so that global decoding may be applied even for long sequences of data. 2.5 Basic latent Markov model Following Bartolucci et al. (2013), and as already mentioned above, the basic LM model rules out individual covariates and assumes that the conditional response probabilities (t) are time homogenous. In symbols, we have that φ = φ in the univariate case and y|ux y|u (t) (t) (t) φ = φ in the multivariate case; we also have π = π and π = π . jy|ux jy|u u|x u u|u¯x u|u¯ In order to fit this model, we can use function est lm basic, which is one of the main functions in the LMest package. Here, we skip a detailed illustration of this function, as it is already provided in the Appendix of the book Bartolucci et al. (2013), and we prefer to focus on functions to estimate more sophisticated LM versions which are only recently available in the package. These versions include individual covariates and further latent 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.