Autoregressive Moving Average (ARMA) Models and their Practical Applications Massimo Guidolin February 2018 1 Essential Concepts in Time Series Analysis 1.1 Time Series and Their Properties Time series: a sequence of random variables y ,y ,...,y , or a stochastic process {y }T , 1 2 T t t=1 of which we only observe the empirical realisations. It is continuous when the observations are recorded continuously over some time interval, discretely sampled when the observa- tions are recorded at equally spaced time intervals. Observed time series:{y }T is a selection (technically, a sub-sequence because limited t t=1 to a finite sample) of the realized values of a family of random variables {y }+∞defined on t −∞ an appropriate probability space. Time series model for the observations {y }T : a specification of the joint distribution t t=1 of the set of random variables of which the sampled data are a realization. • A complete probabilistic time series model would be impractical (very large number of parameters), so we often specify only the first- and second-order moments of such a joint distribution, i.e. the mean, variances and covariances of {y }. t • If the joint distribution of the variables in the random sequence {y } is multivariate t normal, the second-order moments of the process are sufficient to give a complete statistical characterization of {y }. t • Whentheassumptionaboveisnotrespected, ifthestochasticprocesscanbeassumed 1 to be linear, its second-order characterization is still sufficient. Indeed, the theory of minimum mean squared error linear prediction depends only on the second-order moment properties of the process. Linear process: Atimeseries{y }issaidtobealinearprocessifithastherepresentation t ∞ (cid:88) y = µ+ φ z ∀t t j t−j j=−∞ where µ is a constant, {φ } is a sequence of constant coefficients where φ = 1 and j 0 (cid:80)∞ |φ | < ∞, and {z } is a sequence of IID random variables with a defined dis- j=−∞ j t tribution function. In particular, we assume that the distribution of z is continuous, with t E[z ] = 0 and Var[z ] = σ2 . Noticeably, if σ2(cid:80)∞ φ2 < ∞, then y is weakly stationary. t t z z i=1 i t 1.2 Stationarity • In order to use past realizations of a variable of interest to forecast its future values, it is necessary for the stochastic process to be stationary. Strict stationarity: a process is strictly stationary if the joint distribution of the variable associated to any sub-sequence of times t ,t ,...,t is the same as the joint distribution of 1 2 n the sequence of all times t ,t ,...,t (where k is an arbitrary time shift). In other 1+k 2+k n+k words, a strictly stationary time series {y } has the following properties: t 1. the random variables y are identically distributed; t 2. the two random vectors [y ,y ](cid:48) and [y ,y ](cid:48) have the same joint distribution for t t+k 1 1+k any t and k. • Strictstationarityrequiresthatallthemomentsofthedistributionaretimeinvariant. Weak stationarity: a stochastic process {y } is weakly stationary (or, alternatively, t covariance stationary) if it has time invariant first and second moments, i.e., if for any choice of t = 1,2,...,∞ , the following conditions hold: 1. µ ≡ E(y ) with |µ | < ∞ y t y 2. σ2 ≡ E[(y −µ )(y −µ )] = E[(y −µ )2] < ∞ y t y t y t y 3. γ ≡ E[(y −µ )(y −µ )] ∀h with |γ | < ∞ h t y t−h y h where h = ...,−3,−2,−1,1,2,3,.... 2 • Weak stationarity requires that the mean and the variance are time invariant. • Weak stationarity implies that Cov(y ,y ) = Cov(y ,y ) = γ , that is it does t t−h t−j t−j−h h not vary over time, but only depends on h. Autocovariancefunction(ACVF):γ ≡ Cov(y ,y ) for h = ...,−3,−2,−1,1,2,3,.... h t t−h Autocorrelation function (ACF) : ρ = γ /γ where γ is the variance. h h 0 0 • Autocorrelations convey more useful information than autocovariance coefficients do, as the latter depend on the units of measurement of y . t • The autocovariance and autocorrelation functions are important for the characteri- zation and classification of time series, given that, for stationary processes, both γ() and ρ() should eventually decay to zero. • Strict stationarity implies weak stationarity (provided that the first and the second moments exist) while the reverse is generally not true. Figure 1: Plot of Data Simulated from a Gaussian white noise process 3 White Noise: a white noise (WN) process is a sequence of random variables {z } with t mean equal to zero, constant variance equal to σ2 , and zero autocovariances (and autocor- relations) except at lag zero. If {z } is normally distributed, we shall speak of a Gaussian t white noise. The figure shows an example of Gaussian white noise, from which 1,000 simulations have been generated with a standard deviation of 0.2. It represents pure noise in the sense that—apart from lucky patterns that may accidentally appear - there is no appreciable structure in the plotted data, either in terms of its levels or in terms of its tendency to randomly fluctuate. 1.3 Sample Autocorrelations and Sample Partial Autocorrela- tions • Plotting the data is not always sufficient to determine whether the data generating process (DGP) is stationary or not. • Since for a series that originates from a stationary process autocorrelations tend to die out quickly as the length h increases, they may be used to investigate the prop- erties of the data. • Weonlyobserveoneofthepossiblerealizedandnecessarilyfinitesequencesofsample data from the process {y }, so we shall only be able to compute sample autocorrela- t tions. Sampleautocorrelations: givenasampleofT observationsofthevariabley ,y ,y ,...,y , t 1 2 T the estimated or sample autocorrelation function ρˆ (where h is a positive integer) is com- h puted as (cid:80)T (y −µˆ)(y −µˆ) γˆ ρˆ = t=h+1 t t−h = h h (cid:80)T (y −µˆ)2 γˆ t=1 t 0 where µˆ is the sample mean computed as µˆ = T−1(cid:80)T y t=1 t γˆ is the sample autocovariance h γˆ is the sample variance. 0 • Sample autocorrelations only provide estimates of the true and unobserved autocor- relations of {y } and may contain large sample variation and substantial uncertainty. t • They measure the strength of the linear relationship between y and the lagged val- t ues of the series (the higher the autocorrelation coefficient, the stronger the linear predictive relationship between past and future values of the series). 4 • If {y } is an IID process with finite variance, then for a large sample, the estimated t autocorrelations will be asymptotically normally distributed as N(0,1/T). • Tests of hypotheses on the autocorrelation coefficients can determine whether they are significantly different from zero, e.g. the 95% confidence interval given by √ ρˆ ±1.96 T. h • If {y } is a weakly stationary time series satisfying y = µ+(cid:80)q φ z where φ = 1 t t j=0 j t−j 0 and {z } is a Gaussian white noise, then: j q ρˆ ∼a N(0,T−1(1+2(cid:88)ρ2)) h j j=1 • In finite samples ρˆ is a biased estimator of ρ , with a bias of the order of 1/T , h h meaning that it can be large for small samples, but it becomes less and less relevant as the number of observations increases. • Jointly test whether several M consecutive autocorrelation coefficients are equal to zero using: 1. Box and Pierce test: M Q(M) ≡ T (cid:88)ρˆ2 ∼a χ2 h M h=1 under the null hypothesis H = ρ = ρ = ... = ρ = 0 0 1 2 M Therefore, if Q(M) exceeds the critical value from the χ2 distribution with degrees of freedom at a probability 1−α, then H can be rejected and at least 0 one ρ is significantly different from zero. 2. Ljung and Box test, with better small sample properties than the previous one: M ρˆ2 Q∗(M) = T(T +2)(cid:88) h ∼ χ2 T −h M h=1 Sample partial autocorrelations: formally, the partial autocorrelation a is defined as h a = Corr(y ,y |y ,...,y ) h t t−h t−1 t−h+1 Partial autocorrelation function (PACF): sequence a ,a ,..,a as a function of h = 1 2 T 1,2,... of partial autocorrelations. 5 • The sample estimate aˆ of the partial autocorrelation at lag h is obtained as the h ordinary least square estimator of φ in an autoregressive model y = φ +φ y + h t 0 1 t−1 ...+φ y +(cid:15) . h t−h t • They measure the “added” predictive power of the h-th lag of the series y , when t−h y ,...,y are already accounted for in the predictive regression. t−1 t−h+1 • ACFs and PACFs can only measure the degree of linear association between y and t y for k ≥ 1 and thus they may fail to detect important nonlinear dependence t−k relationships in the data. 2 Moving Average and Autoregressive Processes 2.1 Finite Order Moving Average Processes Moving average process (MA): a q-th order moving average, MA(q), is a process that can be represented as y = µ+(cid:15) +θ (cid:15) +θ (cid:15) +...+θ (cid:15) t t 1 t−1 2 t−2 q t−q where the process of {(cid:15) } is an IID white noise with mean zero and constant variance equal t to σ2 . In a compact form: (cid:15) q (cid:88) y = µ+(cid:15) + θ (cid:15) t t j t−j j=1 Properties: Always stationary as they are finite, linear combination of white noise processes for which the first two moments are time-invariant. Therefore, MA(q) has constant mean, variance and autocovariances that may be different from zero up to lag q, but are equal to zero afterwards. 1. Mean: E(y ) = µ t 2. Variance: Var(y ) = γ = (1+θ2 +θ2 +...+θ2)σ2 t 0 1 2 q 3. Covariance: γ = Cov(y ,y ) = (θ + θ θ + θ θ + ... + θ θ )σ2 for h = h t t−h h h+1 1 h+2 2 q q−h 1,2,...,q and γ = 0 for h > q h 6 2.2 Autoregressive Processes Autoregressive process: a p-th order autoregressive process, denoted as AR(p), is a process that can be represented by the p-th order stochastic difference equation y = φ +φ y +φ y +...+φ y +(cid:15) t 0 1 t−1 2 t−2 p t−p t where the process of {(cid:15) } is an IID white noise with mean zero and constant variance σ2 . t (cid:15) In a compact form: p (cid:88) y = φ + φ y +(cid:15) t 0 j t−j t j=1 • AR(p) models are simple univariate devices to capture the Markovian nature of fi- nancial and macroeconomic data, i.e., the fact that the series tends to be influenced at most by a finite number of past values of the same series. Lag operator (L): it shifts the time index of a variables regularly sampled over time backward by one unit, e.g. Ly = y . t t−1 Difference operator (∆): it expresses the difference between consecutive realizations of a time series, e.g ∆y = y −y . t t t−1 We can rewrite the AR model using the lag operator (1−φ L−φ L2 −...−φ Lp)y = φ +(cid:15) 1 2 p t 0 t In a compact form φ(L)y = φ +(cid:15) t 0 t where φ(L) is the polynomial of order p, (1−φ L−φ L2 −...−φ Lp). 1 2 p (Reverse) Characteristic equation: the equation obtained by replacing in the polyno- mial φ(L) the lag operator L by a variable λ and setting it equal to zero, i.e., φ(λ) = 0. Root of the polynomial φ(L): any value of λ which satisfies the polynomial equation φ(λ) = 0. It is a determinant of the behavior of the time series: if the absolute value of all the roots of the characteristic equations is higher than one the process is said to be stable. • A stable process is always weakly stationary. 7 Properties: 1. Mean: µ = φ0 in case of an AR(1) 1−φ1 µ = φ0 in case of an AR(p) 1−φ1−φ2−...−φp 2. Variance: Var[y ] = σ(cid:15)2 in case of an AR(1) t 1−φ2 1 Var[y ] = σ(cid:15)2 in case of an AR(p) t 1−φ2−φ2−...−φ2 1 2 p Wold’s decomposition: Every weakly stationary, purely non-deterministic, stochastic process (y − µ) can be written as an infinite, linear combination of a sequence of white t noise components: ∞ y −µ = (cid:15) +φ (cid:15) +φ2(cid:15) +... = (cid:88)φi(cid:15) t t 1 t−1 1 t−2 1 t−i i=0 • Wold’s theorem states that only an autoregressive process of order p with no con- stant and no other predetermined, fixed terms can be expressed as an infinite order moving average process, MA(∞). • This result is useful for deriving the autocorrelation function of an autoregressive process. • It can be used to check the stationary condition of the mean and the variance of an AR model. Example: 1. Mean: Starting from an AR(1) model y = φ +φ y +(cid:15) t 0 1 t−1 t Take the expectation of the model equation E(y ) = µ+φ E(y ) t 1 t−1 Under the stationarity condition, it must be that E(y ) = E(y ) = µ t t−1 and hence E(y ) = µ = φ +φ µ t 0 1 8 which solved for the unknown unconditional mean gives: φ 0 µ = 1−φ 1 and then φ = (1−φ )µ. 0 1 For µ to be constant and to exist, it must be that φ (cid:54)= 1. Substitute φ above in 1 0 the initial model equation and obtain y −µ = φ (y −µ)+(cid:15) t 1 t−1 t It must also be the case that y −µ = φ (y −µ)+(cid:15) t−1 1 t−2 t−1 and therefore y −µ = φ (φ (y −µ)+(cid:15) −µ)+(cid:15) t 1 1 t−2 t−1 t By a process of infinite backward substitution, we obtain ∞ y −µ = (cid:15) +φ (cid:15) +φ2(cid:15) +... = (cid:88)φi(cid:15) t t 1 t−1 1 t−2 1 t−i i=0 Following similar algebraic steps, we can derive also the unconditional mean of an AR(p) model φ 0 µ = 1−φ −φ −...−φ 1 2 p For µ to be constant and to exist, it must be that |φ +φ +...+φ | < 1 1 2 p 2. Variance: Starting from an AR(1) model E[(y −µ)2] = φ2E[(y −µ)2]+E[(cid:15)2] t 1 t−1 t under the stationarity assumption E[(y −µ)] = E[(y −µ)2] = Var[y ] it becomes t t−1 t Var[y ] = φ2Var[y ]+σ2 t 1 t (cid:15) and therefore σ2 Var[y ] = (cid:15) t 1−φ2 1 provided that φ2 > 1. 1 Therefore, putting all the conditions derived before together, for an AR(1) model, weak stationarity requires |φ | < 1. 1 In case of a general AR(p) model the formula of the variance becomes σ2 Var[y ] = (cid:15) t 1−φ2 −φ2 −...−φ2 1 2 p 9 In the general case, covariance stationarity can be checked using the Schur criterion. Schur criterion: Construct two lower-triangular matrices, A and A of the form: 1 2 1 0 ... 0 0 −φ 0 ... 0 0 p −φ 1 0 −φ −φ 0 1 p−1 p −φ −φ −φ −φ A1 ≡ ...2 −φ1 A2 ≡ .p..−2 −φp−1 2 p−2 ... 0 ... 0 −φ −φ ... −φ 1 −φ −φ ... −φ −φ p−1 p−2 1 1 2 p−1 p The AR(p) process is covariance stationary if and only if the matrix S = A A(cid:48) −A A(cid:48) 1 1 2 2 is positive definite. The autocovariances and autocorrelations of AR(p) processes can be computed by solving a set of Yule-Walker equations. Yule-Walker equations for an AR(2) process where µ = 0 y = φ y +φ y +(cid:15) t 1 t−1 2 t−2 t We multiply the previous equation by y with s = 1,2,... and take the expectation of t−s each resulting stochastic difference equation E[y y ] = φ E[y y ]+φ [y y ]+E[y (cid:15) ] t t 1 t−1 t 2 t−2 t t t E[y y ] = φ E[y y ]+φ [y y ]+E[y (cid:15) ] t t−1 1 t−1 t−1 2 t−2 t−1 t−1 t E[y y ] = φ E[y y ]+φ [y y ]+E[y (cid:15) ] t t−2 1 t−1 t−2 2 t−2 t−2 t−2 t E[y y ] = φ E[y y ]+φ [y y ]+E[y (cid:15) ] t t−s 1 t−1 t−s 2 t−2 t−s t−s t By definition E[y y ] = E[y y ] = γ t t−s t−k t−k−s s We know that E[(cid:15) y ] = σ2 and E[y (cid:15) ] = 0 t t t−s t Therefore γ = φ γ +φ γ +σ2 0 1 1 2 2 γ = φ γ +φ γ 1 1 0 2 1 γ = φ γ +φ γ s ≥ 2 s 1 s−1 2 s−2 Divide γ and γ by γ 1 s 0 ρ = φ ρ +φ ρ 1 1 0 2 1 ρ = φ ρ +φ ρ s ≥ 2 s 1 s−1 2 s−2 By construction ρ = 1, then 0 φ 1 ρ = 1 1−φ 2 and, consequently, we can solve by recursive substitution ρ for any s ≥ 2. s 10
Description: