Bayesian Analysis of Switching ARCH Models Sylvia Kaufmann University of Vienna, Department of Economics Hohenstaufengasse 9, 1010 Vienna Austria (email: [email protected]) and 1 Sylvia Fr(cid:252)hwirth-Schnatter Vienna University of Economics and Business Administration Department of Statistics Augasse 2-6, 1090 Vienna Austria (email: [email protected]) 1 Corresponding author 1 2 Bayesian Analysis of Switching ARCH Models 2 3 Sylvia Kaufmann and Sylvia Fr(cid:252)hwirth-Schnatter August 28, 2000 Abstract We consider a time series model with autoregressive conditional heteroskedas- ticity that is subject to changes in regime. The regimes evolve according to a multistatelatentMarkovswitchingprocesswithunknowntransitionprobabilities, and it is the constant in the variance process of the innovations that is subject to regimeshifts. The jointestimationofthe latentprocess and allmodelparameters is performed within a Bayesian framework using the method of Markov Chain Monte Carlo simulation. We perform model selection with respect to the number of states and the number of autoregressive parameters in the variance process using Bayes factors and model likelihoods. To this aim, the model likelihood is estimated by the method of bridge sampling. The usefulness of the sampler is demonstrated by applying it to the data set previously used by Hamilton and Susmel who investigated models with switching autoregressive conditional het- eroskedasticity using maximum likelihood methods. The paper concludes with some issues related to maximum likelihood methods, to classical model selection, and to potential straightforward extensions of the model presented here. Keywords: Bayesian analysis, bridge sampling, MCMC estimation, model selec- tion, switching ARCH-models 1 Introduction The basic switching ARCH-model has been introduced independently in Cai (1994)andHamiltonand Susmel (1994)and has been generalizedin Gray (1996). In these papers the basic switching ARCH-model is derived from the classical 2 University of Vienna, Department of Economics, Hohenstaufengasse 9, 1010 Vienna, Tel. +43 1 4277-37430,email [email protected] 3 Vienna University of Economics and Business Administration, Department of Statistics, Augasse 2-6, 1090 Vienna, Tel. 43 1 31336-5053,email [email protected] 3 ARCH-model 2 2 ht = (cid:13) +(cid:11)1ut(cid:0)1 + +(cid:11)mut(cid:0)m; (1) (cid:1)(cid:1)(cid:1) where the parameter (cid:13) is time invariant, by assuming that (cid:13) changes over time. From now on we will use the notation (cid:13)t rather than (cid:13) to emphasize its time varying nature: 2 2 ht = (cid:13)t +(cid:11)1ut(cid:0)1 + +(cid:11)mut(cid:0)m: (2) (cid:1)(cid:1)(cid:1) The classicalARCH-model (1) obviously is that special case of (2), where (cid:13)t (cid:13). (cid:17) Note that there exists an alternative way to parameterize the ARCH-model (1), namely: ut = p(cid:13) u~t; u~t = ht vt; (cid:1) 2 2(cid:1) ht = 1+(cid:11)1u~t(cid:0)1 + +p(cid:11)mu~t(cid:0)m: (3) (cid:1)(cid:1)(cid:1) Again we could introduce a time varying parameter (cid:13)t rather than a constant parameter (cid:13). It is easy to verify that such a model could be rewritten as: ut = (cid:13)tht vt; (cid:1)2 2 p ut(cid:0)1 ut(cid:0)m ht = 1+(cid:11)1 + +(cid:11)m : (4) (cid:13)t(cid:0)1 (cid:1)(cid:1)(cid:1) (cid:13)t(cid:0)m For a constant parameter (cid:13)t (cid:13) the alternative parameterization is equivalent (cid:17) to parameterization (1). However, obviously model (2) and (4) are di(cid:27)erent, if (cid:13)t is time dependent. To get an identi(cid:28)ed model one needs further assumptions on the way how (cid:13)t changes over time. In Cai (1994), Hamilton and Susmel (1994) and Gray (1996) changes of (cid:13)t are described by the framework of Markov switching models intro- duced by Hamilton (1989). One assumes that (cid:13)t takes one out of K di(cid:27)erent values according to a hidden Markov chain It taking values between 1 and K: (cid:13)t = (cid:13)It. The switching ARCH-model of Hamilton and Susmel (1994) is obtained by in- troducing such a switching parameter into parameterization (4): ut = (cid:13)It ht vt; (cid:1) 2(cid:1) 2 ut(cid:0)1 ut(cid:0)m p ht = 1+(cid:11)1 + +(cid:11)m : (5) (cid:13)It(cid:0)1 (cid:1)(cid:1)(cid:1) (cid:13)It(cid:0)m In the present paperwe study the slightlydi(cid:27)erent switchingARCH-modelwhere the switching parameter is introduced into parameterization (2): ut = ht vt; (cid:1) 2 2 ht = p(cid:13)It +(cid:11)1ut(cid:0)1 + +(cid:11)mut(cid:0)m: (6) (cid:1)(cid:1)(cid:1) 4 It is easy to verify that the switching ARCH-model of Cai (1994) which is re- strictedtoK = 2andusesthestatedependent formulation(cid:13)t = (cid:13)0+St(cid:13)1, withSt either 0 or 1, is a special case of (6). Gray (1996) also uses this parameterization, and introduces additionally switching into the coe(cid:30)cients of the ARCH-process. In the present paper we carry out a fully Bayesian analysis of the basic switching ARCH-model (6). The joint estimation of the latent Markov switching process and all model parameters for (cid:28)xed orders K and m of the switching ARCH- model is performed using the method of Markov Chain Monte Carlo simulation (see e.g. Smith and Roberts, 1993 for a general introduction to MCMC meth- ods). The design of suitable MCMC methods to generate a sample from the posterior of a switching ARCH-model has not been studied before, previous pa- pers used maximum likelihood methods for parameter estimation. We combine well-known results for multi move sampling of a hidden Markov process (Carter and Kohn, 1994; Shephard, 1994; Chib, 1996) with recent results available for MCMC estimation of ARCH-models (Nakatsuma, 2000). One iteration of the sampler involves (cid:28)rst a multi-move step to simulate the latent process out of its conditionaldistribution. The Gibbs sampler can then be used to simulate the pa- rameters, in particular the transition probabilities, for which the full conditional posteriordistributionisknown. Formostparameters, however, thefullcondition- als do not belong to any well-known family of distributions. The simulations are then based on the Metropolis-Hastings algorithm with carefully chosen proposal densities. Forpracticalvolatilitymodelingtheorderparameters, however, willbeunknown. We perform model selection with respect to the number of states K and the number of autoregressive parameters m in the variance process using Bayes fac- tors and model likelihoods. We derive an estimate of the model likelihood for a given order K and m from the MCMC output by combining the candidate’s formula (Chib, 1995) with importance sampling, where the importance density is constructed from the MCMC sample. An alternative Bayesian approach, not pursued in the present paper, would be the inclusion of K and m into the MCMC schemealongthelinesofthejump-di(cid:27)usion-approachoutlinedbyRichardsonand Green (1997). The next section presents the basic switching ARCH model along with the prior speci(cid:28)cation. MCMC Estimation is discussed in detail in section 3, followed by issues on Bayesian model selection in section 4. The usefulness of the Bayesian approach is illustrated in section 5 by reanalyzing the data set used in Hamil- ton and Susmel (1994). The paper concludes in section 6 with some comments on maximum likelihood versus Bayesian methods, some issues related to classi- cal model selection, and with some potential straightforward extensions of the methods presented here. 5 2 The Basic Switching ARCH Model 2.1 Model formulation N Let y = (y1;::: ;yN) denote a sequence of N observations yt. We assume that the observations yt are generated by the following model: 0 yt = zt(cid:12) +ut; (7) where the predictor zt may contain exogenous variables as well as lagged values of yt. ut has the following normal distribution: ut = ht vt; (8) (cid:1) p where vt is an iid normal sequence with zero mean and unit variance: E(vt) = 2 0;E(vt) = 1. We assume that ht follows the switching ARCH model de(cid:28)ned in (6): ut = ht vt; (cid:1) 2 2 ht = p(cid:13)It +(cid:11)1ut(cid:0)1 + +(cid:11)mut(cid:0)m: (9) (cid:1)(cid:1)(cid:1) As in Cai (1994) and Hamilton and Susmel (1994), It is a latent discrete variable modeled as a stationary, irreducible Markov process with discrete state space 1;::: ;K and unknown transition probabilities (cid:17)ij = Pr It = j It(cid:0)1 = i . I0 is f g f j g assumedtohavesomestartingdistributionPr I0 = i0 = (cid:25)(i0). Forconvenience, f g we refer to the K-state, m order Markov switching ARCH model de(cid:28)ned by (9) as SWARCH(K;m). Subsequently we will use the following notations: (cid:17) = ((cid:17)1(cid:1);::: ;(cid:17)K(cid:1)), where (cid:17)i(cid:1) = ((cid:17)i1;::: ;(cid:17)iK) is the conditional transition distribution, collects all prob- abilities of the transition matrix of It; (cid:30) = ((cid:11);(cid:12);(cid:13);(cid:17)) summarizes all unknown model parameters, with (cid:11) = ((cid:11)1;::: ;(cid:11)m) and (cid:13) = ((cid:13)1;::: ;(cid:13)K); and (cid:28)nally N I = (I0;I1;::: ;IN) denotes the whole sequence of switching variables and cor- N N respondingly i denotes a realizationof I . The SWARCH(K;m)-model has the structure of a hierarchical model including latent variables: N N 1. Conditionally on known realizations i of the switching process I and on a known model parameter (cid:30) the conditional distribution of y1;::: ;yN factorizes in the following way: N N t(cid:0)1 f(y1;::: ;yN i ;(cid:30)) = f(yt y ;(cid:11);(cid:12);(cid:13)it); (10) j j t=1 Y 6 where the one-step ahead predictive densities are Gaussian: 0 2 t(cid:0)1 1 (yt zt(cid:12)) f(yt y ;(cid:11);(cid:12);(cid:13)it) = t(cid:0)1 exp (cid:0) (cid:0) t(cid:0)1 :(11) j s2(cid:25)ht((cid:11);(cid:12);(cid:13)it;y ) 2ht((cid:11);(cid:12);(cid:13)it;y ) (cid:18) (cid:19) N 2. Foreach(cid:30),thelatentswitchingprocessI isaMarkovchainwithtransition N matrix depending on (cid:17) = ((cid:17)1(cid:1);::: ;(cid:17)K(cid:1)). The density (cid:25)(i (cid:17)) of the prior N j distribution of I w.r.t. the counting measure is equal to N K K N N Nij (cid:25)(i (cid:30)) = (cid:25)(i (cid:17)) (cid:17)it(cid:0)1;it(cid:25)(i0 (cid:30)) = (cid:17)ij (cid:25)(i0); j j / j t=1 j=1i=1 Y YY Nij = # It = j It(cid:0)1 = i : f j g 3. (cid:30) has a prior distribution with density (cid:25)((cid:30)). t(cid:0)1 N The factorization (10) shows that the observation density f(yt y ;i ;(cid:30)) de- N j pends only on the present value it of the switching process I . This property results from introducing the switching parameter directly into the parameteriza- tion of ht given by (2). For the parameterization (4) which has been used by t(cid:0)1 N Hamilton and Susmel (1994) the conditional density f(yt y ;i ;(cid:30)) depends on j it as well as on the lagged values it(cid:0)1;::: ;it(cid:0)m. Note that the (cid:28)rst two layers of the model are su(cid:30)cient to compute the marginal N likelihood L(y (cid:30)) which could be maximized such as in Hamilton and Sus- j mel (1994) to obtain ML-estimates of the model parameters (cid:30). For a complete Bayesian analysis of the model the third layer specifying the prior (cid:25)((cid:30)) on the model parameters (cid:30) has to be added (see subsection 2.3 for further details). We conclude this section with discussing the marginal model implied by the basic SWARCH(K;m)-model. Under the well-known condition that the absolute values of all eigenvalues of the matrix A, where A is de(cid:28)ned from the transition matrix (cid:17) of It by Ajl = (cid:17)lj (cid:17)Kj, j;l = 1;::: ;K 1, are smaller than one, the (cid:0) (cid:0) ? ? ? sw?itching?proc0ess It has g(cid:0)o1t a stationary dist0ributio?n (cid:17) = ((cid:17)1K;:(cid:0):1: ;?(cid:17)K), where ((cid:17)1;::: ;(cid:17)K(cid:0)1) = (I A) ((cid:17)K1;::: ;(cid:17)K;K(cid:0)1) and (cid:17)K = 1 j=1 (cid:17)j. Then the (cid:0) N (cid:0) marginal distribution of yt, where the latent process I is integrated out, is a ? P mixture of normal distributions with weights (cid:17)j: K 0 ? yt zt(cid:12) F(yt t(cid:0)1) = (cid:17)j(cid:8) (cid:0) ; jF ht j=1 (cid:18) (cid:19) X where F(yt t(cid:0)1) denotes the distribution function of Yt given information up to jF t 1, and (cid:8)( ) is the standard Gaussian distribution function. Thus the basic (cid:0) (cid:1) 7 SWARCH(K;m)-model could be regarded as a special case of the MAR-ARCH- model discussed in Wong and Li (1999). From Theorem 2 of that paper we 0 obtain covariance stationarity of the process ut = yt zt(cid:12) under the necessary (cid:0) and su(cid:30)cient condition that the roots of the equation m l 1 (cid:11)lx = 0 (12) (cid:0) l=1 X are all outside the unit circle. The second unconditional moment of ut is given by: K ? 2 j=1(cid:13)j(cid:17)j E(ut) = m : (13) 1P(cid:0) l=1(cid:11)l FortheSWARCH(2;m)-processthisresulPthasbeenprovenalreadybyCai(1994). Further applications of the results derived in Wong and Li (1999) lead to the conclusion that the third order moment of ut is zero, impling symmetry of the 2 unconditional distribution, and that for a SWARCH(K;1)-model ut is covari- 2 2 ance stationary if (cid:11)1 < 1=3 and the autocorrelation function of ut is given by 2 2 l E(utut(cid:0)l) = (cid:11)1. 2.2 Identi(cid:28)ability It is well known that for any model including a latent, discrete structure, an identi(cid:28)ability problem is present, since the labeling of the states of the switching ~ variable It can be permuted without changing the (marginal) likelihood: (cid:30) = (cid:30) N N ~ 9 6 such that L(y (cid:30)) = L(y (cid:30)) (see e.g. Fr(cid:252)hwirth-Schnatter (2000) for a recent j j discussion of this issue). Therefore the unconstrained SWARCH(K;m)-model is not identi(cid:28)able in a strict sense. It is possible to estimate quantities from the unconstrainedSWARCH(K;m)-modelwhichareinvarianttorelabeling,themost important examples being the time-varying parameter (cid:13)It and the volatility ht de(cid:28)ned in (9). Invariance of (cid:13)It follows from: 1 K (cid:26)(1) (cid:26)(K) (cid:13)It = (cid:13)1St +:::+(cid:13)KSt = (cid:13)(cid:26)(1)St +:::+(cid:13)(cid:26)(K)St ; where (cid:26)(1);::: ;(cid:26)(K) is an arbitrary permutation of 1;::: ;K. Thus extracting volatilityestimatesfromaSWARCH(K;m)-model is possible without identifying a unique labeling. If focus lies on estimating the variance (cid:13)i of each state i individually, and to estimate the probability of being in a certain state at a certain time, however, it is necessary to introduce a unique labeling. To render the model identi(cid:28)ed usually an identi(cid:28)ability constraint is put on the state speci(cid:28)c parameters. For a 8 SWARCH(K;m)-model a standard constraint based on the notion that the (cid:28)rst state is the state of lowest volatility whereas the last state is the state of highest volatility would be a constraint on the state speci(cid:28)c variances (cid:13)1, ::: , (cid:13)K: (cid:13)1 < ::: < (cid:13)K: (14) Such a standard constraint, however, may turn out to be a poor constraint es- pecially for a higher number of classes. We found in our case study in section 5 within the 4-state model two states of about medium volatility which were dif- (cid:28)cult to separate through (cid:13)j. We found that the main di(cid:27)erence between these states is primarily not the level of volatility but the persistence of remaining in the current state. For such a model a suitable identi(cid:28)abilityconstraint would be: (cid:13)1 < min((cid:13)2;::: ;(cid:13)4); (cid:13)4 > max((cid:13)2;(cid:13)3); (cid:17)22 > (cid:17)33: (15) 2.3 Choice of the prior In this paper the focus lies on Bayesian estimation in situations where we in general do not have strong prior information. From a theoretical point of view, being completely non-informative about (cid:30) is possible only for state independent parameters such as (cid:11) and (cid:12). Theoretically, being non-informative about (cid:13) = ((cid:13)1;::: ;(cid:13)K)and(cid:17)isnotpossible,asimproperpriorson(cid:13) and(cid:17) resultinimproper posteriors. There is always the possibility that no observation is allocated to a certain state, say j. Improper priors on (cid:13)j and (cid:17)j(cid:1) will then lead to improper posterior distributions. In what follows we will make use of the following independence assumptions concerning the prior distributionof (cid:30): (cid:25)((cid:30)) = (cid:25)((cid:12))(cid:25)((cid:11))(cid:25)((cid:13);(cid:17)). As commonwith regression types models such as (7) we assume a normal prior (b0;B0) for the N regression parameter (cid:12). The choice of the prior on (cid:11) should re(cid:29)ect important m constraints such as that all (cid:11)l are positive and l=1(cid:11)l, which is the persistence 4 parameter, issmallerthan1. Thereforeweassumethat((cid:11);(cid:11)m+1),where (cid:11)m+1 = m P 1 l=1(cid:11)l, follows a Dirichlet distribution D(a1;::: ;am+1). Note that the (cid:0) Dirichlet distribution ensures that the constraints mentioned above are ful(cid:28)lled P withprobability1. Furthermorethischoiceimpliesthatthemarginaldistribution m of the persistence parameter l=1(cid:11)l is a Beta distribution and could therefore be regarded as an extension of the Beta prior imposed in Kim et al. (1998) on P the persistence parameter of a GARCH(1,1)-model. Concerning the priors for the state dependent parameter (cid:17) and (cid:13) we discuss two possible choices. First we could assume that the parameters corresponding to 4 We want to thank the referee for drawing our attention to the fact that the normal prior N(0;(cid:20)I) we assumed in a previous version of the paper has been a particularly poor choice in this respect. 9 the various states are independent apriori and the hyperparameters of the state speci(cid:28)c priors are the same for all states. A natural choice for the distribution of each (cid:13)i is an inverse gamma prior IG(g0;G0), with g0 and G0 being state independent, whereas it is standard to assume that each conditional transition distributions (cid:17)i(cid:1) follows a Dirichlet distribution D(e1;::: ;eK), with e1;::: ;eK being state independent. This choice leads to a symmetric prior in the sense that the resulting prior is invariant to relabeling the states and gives equal probabil- ity to each labeling pattern. The prior integrates to 1 over the unconstrained parameter space and to 1=K! over each subspaces corresponding to a unique labeling. This prior may be unsatisfactory, if we have vague prior ideas concerning the di(cid:27)erences between the states. With a state independent prior we could not include, for instance, the prior belief, that one state correspond to a persistent, low volatility state, whereas another state corresponds to a non-persistent, high volatility state. To include such information we could take the priors used above and make the hyperparameters state dependent: IG(g0;i;G0;i);D(ei1;::: ;eiK), i = 1;::: ;K. The problem is now, how to associate these state dependent priors with the various components of (cid:13) and (cid:17). Based on some apriori labeling we could connect each state dependent prior with a certain component through (cid:13)i IG(g0;i;G0;i) and (cid:17)i(cid:1) D(ei1;::: ;eiK). This strategy, however, leads to a (cid:24) (cid:24) prior which is no longer invariant to relabeling. To preserve invariance of the prior, on the one hand, and to include state speci(cid:28)c information on the other hand, we use the following mixture prior: K! K K (cid:25)((cid:13);(cid:17)) = 1=K! IG((cid:13)i;g0;(cid:26)m(i);G0;(cid:26)m(i)) D((cid:17)i(cid:1);e(cid:26)m(i);(cid:26)m(1);::: ;e(cid:26)m(i);(cid:26)m(K));(16) m=1i=1 i=1 XY Y where (cid:26)m(1);::: ;(cid:26)m(K), m = 1;::: ;K! correspond to the K! possible ways of relabeling the states with (cid:26)1( ) being the identity. With such a prior is possible (cid:1) to express, for instance, the belief that one state has higher variance and lower persistence than another state without being speci(cid:28)c which state this will be. Note that this prior implies apriori dependence between (cid:13) and (cid:17) whenever the hyperparameters are actually di(cid:27)erent. If all hyperparameters are the same, it collapses to the state independent prior discussed above. The mixture prior (16) is by de(cid:28)nition invariant to relabeling the states, gives equal probability to each labeling pattern, integrates to 1 over the unconstrained parameter space and to 1=K! over each subspaces corresponding to a unique labeling. 10
Description: