Nonlinear Kalman Filtering in Affine Term Structure Models Peter Christoffersen, Christian Dorion , Kris Jacobs and Lotfi Karoui CREATES Research Paper 2012-49 Department of Economics and Business Email: [email protected] Aarhus University Tel: +45 8716 5515 Fuglesangs Allé 4 DK-8210 Aarhus V Denmark Nonlinear Kalman Filtering in A¢ ne Term Structure Models (cid:3) Peter Christo⁄ersen Christian Dorion University of Toronto, CBS and CREATES HEC Montreal Kris Jacobs Lot(cid:133)Karoui University of Houston and Tilburg University Goldman, Sachs & Co. May 14, 2012 Abstract When the relationship between security prices and state variables in dynamic term structuremodelsisnonlinear,existingstudiesusuallylinearizethisrelationshipbecause nonlinear (cid:133)ltering is computationally demanding. We conduct an extensive investiga- tion of this linearization and analyze the potential of the unscented Kalman (cid:133)lter to properly capture nonlinearities. To illustrate the advantages of the unscented Kalman (cid:133)lter, we analyze the cross section of swap rates, which are relatively simple non-linear instruments, and cap prices, which are highly nonlinear in the states. An extensive Monte Carlo experiment demonstrates that the unscented Kalman (cid:133)lter is much more accuratethanitsextendedcounterpartin(cid:133)lteringthestatesandforecastingswaprates and caps. Our (cid:133)ndings suggest that the unscented Kalman (cid:133)lter may prove to be a good approach for a number of other problems in (cid:133)xed income pricing with nonlinear relationships between the state vector and the observations, such as the estimation of term structure models using coupon bonds and the estimation of quadratic term structure models. JEL Classi(cid:133)cation: G12 Keywords: Kalman (cid:133)ltering; nonlinearity; term structure models; swaps; caps. Dorionisalsoa¢ liatedwithCIRPEEandthanksIFM2 for(cid:133)nancialsupport. Christo⁄ersenandKaroui (cid:3) were supported by grants from IFM2 and FQRSC. We would like to thank Luca Benzoni, Bob Kimmel, two anonymous referees, an associate editor, and the editor (Wei Xiong) for helpful comments. Any remaining inadequacies are ours alone. Correspondence to: Kris Jacobs, C.T. Bauer College of Business, University of Houston, E-mail: [email protected]. 1 Electronic copy available at: http://ssrn.com/abstract=2065624 1 Introduction Multifactora¢ netermstructuremodels(ATSMs)havebecomethestandardintheliterature on the valuation of (cid:133)xed income securities, such as government bonds, corporate bonds, interest rate swaps, credit default swaps, and interest rate derivatives. Even though we have made signi(cid:133)cant progress in specifying these models, their implementation is still subject to substantial challenges. One of the challenges is the proper identi(cid:133)cation of the parameters governing the dy- namics of the risk premia (see Dai and Singleton (2002)). It has been recognized in the literature that the use of contracts that are nonlinear in the state variables, such as interest rate derivatives, can potentially help achieve such identi(cid:133)cation. Nonlinear contracts can also enhance the ability of a¢ ne models to capture time variation in excess returns and conditional volatility (see Bikbov and Chernov (2009) and Almeida, Graveline and Joslin (2011)). Given the potentially valuable information content of nonlinear securities, e¢ cient im- plementation of ATSMs for these securities is of paramount importance. One of the most popular techniques used in the literature, the extended Kalman (cid:133)lter (EKF), relies on a linearized version of the measurement equation, which links observed security prices to the models(cid:146)state variables. Our paper is the (cid:133)rst to extensively investigate the impact of this linearization. We show that this approximation leads to signi(cid:133)cant noise and biases in the (cid:133)ltered state variables as well as the forecasts of security prices. These biases are particularly pronouncedwhenusingsecuritiesthatareverynonlinearinthestatevariables, suchasinter- est rate derivatives. We propose the use of the unscented Kalman (cid:133)lter (UKF), which avoids this linearization, to implement a¢ ne term structure models with nonlinear securities, and we extensively analyze the properties of this (cid:133)lter. The main advantage of the unscented Kalman(cid:133)lteristhatitaccountsforthenon-linearrelationshipbetweentheobservedsecurity prices and the underlying state variables. We use an extensive Monte Carlo experiment that involves a cross-section of LIBOR and swap rates, as well as interest rate caps to investigate the quality of both (cid:133)lters as well as their in- and out-of-sample forecasting ability. The unscented Kalman (cid:133)lter signi(cid:133)cantly outperforms the extended Kalman (cid:133)lter. First, the UKF outperforms the EKF in (cid:133)ltering the unobserved state variables. Using the root-mean-square-error (RMSE) of the (cid:133)ltered state variables as a gauge for the performance of both (cid:133)lters, we (cid:133)nd that the UKF robustly outperforms the EKF across models and securities. In some cases, the median RMSE for the EKF is up to 33 times larger than the median RMSE for the UKF. The outperformance of the unscented Kalman (cid:133)lter is particularly pronounced when interest rate caps are included 2 Electronic copy available at: http://ssrn.com/abstract=2065624 inthe (cid:133)ltering exercise. We also (cid:133)ndthat the UKFis numericallymuchmore stable thanthe EKF, exhibiting a much lower dispersion of the RMSE across the Monte Carlo trajectories. Second, the improved precision of the UKF in (cid:133)ltering the state variables translates into more accurate forecasts for LIBOR rates, swap rates, and cap prices. The outperformance of the UKF is particularly pronounced for short horizons. It is also critically important that the superior performance of the UKF comes at a reasonable computational cost. In our applications, the time required for the unscented Kalman (cid:133)lter was about twice the time needed for the extended Kalman (cid:133)lter. Throughoutthispaperwekeepthestructural parameters(cid:133)xedattheirtruevalues. How- ever, the poor results obtained when using the EKF to (cid:133)lter states suggest that parameter estimation based on this technique would be highly unreliable, as the (cid:133)lter is unable to cor- rectly(cid:133)t rates andprices evenwhenprovidedwiththe truemodel parameters. Thedramatic improvements brought by the UKF suggest that it will also improve parameter estimation, especially when derivative prices are used to estimate the parameters, which is of critical importance in the identi(cid:133)cation of the risk premium parameters. EventhoughtheuseoftheunscentedKalman(cid:133)lterhasbecomepopularintheengineering literature (see for instance Julier (2000) and Julier and Uhlmann (2004)), it has not been used extensively in the empirical asset pricing literature.1 Our results suggest that the unscented Kalman (cid:133)lter may prove to be a good approach to tackle a number of problems in (cid:133)xed income pricing, especially when the relationship between the state vector and the observations is highly nonlinear. This includes for example the estimation of term structure models of credit spreads using a cross section of coupon bonds or credit derivatives, or the estimation of quadratic term structure models.2 The paper proceeds as follows. Section 2 brie(cid:135)y discusses the pricing of LIBOR, swaps, and caps in a¢ ne term structure models. Section 3 discusses Kalman (cid:133)ltering in ATSMs, including the extended Kalman (cid:133)lter and the unscented Kalman (cid:133)lter. Section 4 reports the results of our Monte Carlo experiments. Section 5 discusses implications for parameter estimation, and Section 6 concludes. 1See Carr and Wu (2007) and Bakshi, Carr and Wu (2008) for applications to equity options. van Binsbergen and Koijen (2012) use the unscented Kalman (cid:133)lter to estimate present-value models. 2SeeFontaineandGarcia(2012)forarecentapplicationoftheunscentedKalman(cid:133)ltertotheestimation of term structure models for coupon bonds. See Chen, Cheng, Fabozzi, and Liu (2008) for an application of the unscented Kalman (cid:133)lter to the estimation of quadratic term structure models. 3 2 A¢ ne Term Structure Models In this section, we de(cid:133)ne the risk-neutral dynamics in ATSMs, a pricing kernel and the pricing formulas for LIBOR rates, swap rates, and cap prices. We follow the literature on termstructuremodelsandassumethattheswapandLIBORcontractsaswellastheinterest rate caps are default-free. See Dai and Singleton (2000), Collin-Dufresne and Solnik (2001), and Feldhutter and Lando (2008) for further discussion. 2.1 Risk-Neutral Dynamics A¢ ne term structure models (ATSMs) assume that the short rate is given by r = (cid:14) +(cid:14) x ; t 0 01 t and the state vector x follows an a¢ ne di⁄usion under the risk-neutral measure Q t dx = (cid:20) (cid:18) x dt+(cid:6) S dW ; (1) t t t t (cid:0) (cid:16) (cid:17) p where W is a N dimensional vectoer oef independent standfard Q-Brownian motions, (cid:20) and t (cid:0) (cid:6) are N N matrices and S is a diagonal matrix with a ith diagonal element given by t f(cid:2) e [S ] = (cid:11) +(cid:12) x : (2) t ii i 0i t Following Du¢ e and Kan (1996), we write Q(u;t;(cid:28)) = EtQ e(cid:0) tt+(cid:28)rsdseu0xt R = exphA ((cid:28)) B ((cid:28))ix ; (3) f u (cid:0) u0 tg where (cid:28) is the time to maturity, and A ((cid:28)) and B ((cid:28)) satisfy the following Ricatti ODEs u u N dA ((cid:28)) 1 u = (cid:18)0(cid:20)B ((cid:28))+ [(cid:6)B ((cid:28))]2(cid:11) (cid:14) (4) d(cid:28) (cid:0) u 2 u i i (cid:0) 0 i=1 X e e and N dB ((cid:28)) 1 u = (cid:20)B ((cid:28))+ [(cid:6)B ((cid:28))]2(cid:12) +(cid:14) . (5) d(cid:28) (cid:0) u 2 u i i 1 i=1 X Equations (4) and (5) can be solveed numerically with initial conditions A (0) = 0 and u B (0) = u. The resulting zero-coupon bond price is exponentially a¢ ne in the state vector u (cid:0) P(t;(cid:28)) = Q(0;t;(cid:28)) = exp A ((cid:28)) B ((cid:28))x : (6) f 0 (cid:0) 00 tg 4 2.2 Pricing Kernel The model is completely speci(cid:133)ed once the dynamics of the state price are known. The dynamic of the pricing kernel (cid:25) is assumed to be of the form t d(cid:25) t = r dt (cid:3) dW ; (7) (cid:25) (cid:0) t (cid:0) 0t t t where W is a N dimensional vector of independent standard P Brownian motions and (cid:3) t t (cid:0) (cid:0) denotes the market price of risk. The dynamics of the state vector under the actual measure P can be obtained by subtracting (cid:6)pS (cid:3) from the drift of equation (1). t t Themarketpriceofrisk(cid:3) doesnotdependonthematurityofthebondandisafunction t of the current value of the state vector x . We study completely a¢ ne models which specify t the market price of risk as follows (cid:3) = S (cid:21) . (8) t t 0 SeeCheridito, Filipovi·candKimmel(2007), Dpu⁄ee(2002), andDuarte(2004)foralternative speci(cid:133)cations of the market price of risk. 2.3 LIBOR and Swap Rates In ATSMs, the time-t LIBOR rate on a loan maturing at t+(cid:28) is given by 1 P(t;(cid:28)) L(t;(cid:28)) = (cid:0) (9) (cid:28)P(t;(cid:28)) = exp( A ((cid:28))+B ((cid:28))x ) 1: (cid:0) 0 00 t (cid:0) while the fair rate at time t on a swap contract with semi-annual payments up to maturity t+(cid:28) can be written as 1 P(t;(cid:28)) SR(t;(cid:28)) = (cid:0) (10) 0:5 2(cid:28) P(t;0:5j) (cid:2) j=1 1 exp(A ((cid:28)) B ((cid:28))x ) = P(cid:0) 0 (cid:0) 00 t : 0:5 2(cid:28) exp(A (0:5j) B (0:5j)x ) (cid:2) j=1 0 (cid:0) 00 t As mentioned earlier, A ((cid:28)) and B ((cid:28))Pcan be obtained numerically from equations (4) and 0 0 (5). 2.4 Cap Prices Computing cap prices is more computationally intensive. Given the current latent state x , the value of an at-the-money cap C on the 3-month LIBOR rate L(t;0:25) with strike 0 L 5 (cid:22) R = L(0;0:25) and maturity in T years is T=0:25 T=0:25 CL(0;T;R(cid:22)) = EQ e(cid:0) 0Tjrsds 0.25 L(Tj 1;0.25) R(cid:22) + = cL 0;Tj;R(cid:22) ; (11) Xj=2 (cid:20) R (cid:16) (cid:0) (cid:0) (cid:17) (cid:21) Xj=2 (cid:0) (cid:1) (cid:22) where T = 0.25j. The cap price is thus the sum of the value of caplets c 0;T ;R with j L j (cid:22) strike R and maturity T . j (cid:0) (cid:1) (cid:22) The payo⁄(cid:5) of caplet c 0;T ;R is known at time T but paid at T . It is given Tj 1 L j j 1 j (cid:0) (cid:0) by (cid:0) (cid:1) (cid:22) + (cid:5) = 0.25 L(T ;0.25) R Tj(cid:0)1 j(cid:0)1 (cid:0) 1 P(T ;0.25) + (cid:0) j 1 (cid:1) (cid:22) = 0.25 (cid:0) (cid:0) R 0.25P(T ;0.25) (cid:0) (cid:18) j 1 (cid:19) (cid:0) 1+0.25R(cid:22) 1 + = P(T ;0.25) . (12) P(Tj 1;0.25) (cid:18)1+0.25R(cid:22) (cid:0) j(cid:0)1 (cid:19) (cid:0) Since the discounted value of the caplet is a martingale under the risk-neutral measure, we have for K = 1 1+0:25R(cid:22) cL 0;Tj;R(cid:22) = EQ e(cid:0) 0Tjrsds(cid:5)Tj 1 R (cid:0) (cid:0) (cid:1) = 1 EhQ e(cid:0) 0Tj(cid:0)1rsds iK P(Tj 1;0.25) + K (cid:20) R (cid:0) (cid:0) (cid:21) (cid:16) (cid:17) 1 = (0;T ;T ;K) (13) j 1 j KP (cid:0) Equation (13) represents the time-0 value of 1=K puts with maturity T and strike K on j 1 (cid:0) a zero-coupon bond maturing in T years. Du¢ e, Pan, and Singleton (2000) show that the j price of such a put option is given by P(0;Tj(cid:0)1;Tj;K) = EQ(cid:20)e(cid:0)R0Tj(cid:0)1rsds K (cid:0)exp A0(0:25)(cid:0)B00(0:25)xTj(cid:0)1 +(cid:21) (cid:16) (cid:17) = eA0(0:25)EQ e(cid:0) 0Tj(cid:0)1rsds e(cid:0)(cid:8)A0(0:25)K (cid:0)exp (cid:0)B00(0:25(cid:9))xTj 1 + (cid:20) R (cid:0) (cid:21) (cid:16) (cid:17) (cid:8) (cid:9) = eA0(0:25) cG (logc;0;T ) G (logc;0;T ) , (14) 0;d j 1 d;d j 1 (cid:0) (cid:0) (cid:0) h i where c = e A0(0:25)K, d = B (0:25), and (cid:0) 0 (cid:0) 1 1 1 Ga;b(y;0;Tj 1) = Q(a;0;Tj 1) 1 Im Q(a+i{b;0;Tj 1)e(cid:0)i{y d{ (15) (cid:0) 2 (cid:0) (cid:0) (cid:25) { (cid:0) Z0 (cid:2) (cid:3) Ingeneral, theintegralin(15) canonlybesolvednumerically. Notethatthisrequiressolving the Ricatti ODEs for Au((cid:28)) and Bu((cid:28)) in (4) and (5) at each point u = a+i{b. Empirical studies of cap pricing and hedging can be found in Li and Zhao (2006) and Jarrow, Li and Zhao (2007). 6 3 Kalman Filtering the State Vector Consider the following general nonlinear state-space system x = F (x ;(cid:15) ); (16) t+1 t t+1 and y = G(x )+u (17) t t t where y is a D-dimensional vector of observables, (cid:15) is the state noise and u is the t t+1 t observation noise that has zero mean and a covariance matrix denoted by R. In term structure applications, the transition function F is speci(cid:133)ed by the dynamic of the state vector and the measurement function G is speci(cid:133)ed by the pricing function of the (cid:133)xed income securities being studied. In our application, the transition function F follows from the a¢ ne state vector dynamic in (1), y are the LIBOR, swap rates, and cap prices observed t weekly for di⁄erent maturities, and the function G is given by the pricing functions in (9), (10), and (11). The transition equation (16) re(cid:135)ects the discrete time evolution of the state variables, whereas the measurement equation provides the mapping between the unobserved state vector and the observed variables. If x ;t T is an a¢ ne di⁄usion process, a discrete t f (cid:20) g expression of its dynamics is unavailable except for Gaussian processes. When the state vector is not Gaussian, one can obtain an approximate transition equation by exploiting the existence of the two (cid:133)rst conditional moments in closed-form and replacing the original state vector with a new Gaussian state vector with identical two (cid:133)rst conditional moments. While this approximation results in inconsistent estimates, Monte Carlo evidence shows that its impact is negligible in practice (see Duan and Simonato (1999) and de Jong (2000)). For the models we are interested in, the conditional expectation of the state vector is an a¢ nefunctionofthestate(seeAppendixAforexplicitexpressionsofthetwo(cid:133)rstconditional moments). Using (1) and an Euler discretization, the transition equation (16) can therefore be rewritten as follows x = F(x ;(cid:15) ) = a+bx +(cid:15) ; (18) t+1 t t+1 t t+1 where (cid:15) (0;v(x )) and v(x ) is the conditional covariance matrix of the state vector. t+1t t t j (cid:24) N Giventhaty isobservedandassumingthatitisaGaussianrandomvariable, theKalman t (cid:133)lter recursively provides the optimal minimum MSE estimate of the state vector. The prediction step consists of x = a+bx ; (19) tt 1 t 1t 1 j (cid:0) (cid:0) j (cid:0) P = bP b +v x (20) xx(tt 1) xx(t 1t 1) 0 t 1t 1 j (cid:0) (cid:0) j (cid:0) (cid:0) j (cid:0) (cid:0) (cid:1) 7 K = P P 1 ; (21) t xy(tt 1) y(cid:0)y(tt 1) j (cid:0) j (cid:0) and y = E [G(x )]: (22) tt 1 t 1 t j (cid:0) (cid:0) The updating is done using x = x +K y y ; (23) tt tt 1 t t tt 1 j j (cid:0) (cid:0) j (cid:0) (cid:0) (cid:1) and P = P K P K ; (24) xx(tjt) xx(tjt(cid:0)1) (cid:0) t yy(tjt(cid:0)1) t0 When G in (22) is a linear function, e.g. if the observations are zero-coupon yields, then the covariance matrices P and P can be computed exactly and the only ap- xy(tt 1) yy(tt 1) j (cid:0) j (cid:0) proximation is therefore induced by the Gaussian transformation of the state vector used in (18). When the relationship between the state vector and the observation is nonlinear, as is the case when swap contracts, coupon bonds, or interest rate options are used, then G(x ) t needs to be well approximated in order to obtain good estimates of the covariance matrices P and P : The approximation of G is di⁄erent for di⁄erent implementations of xy(tt 1) yy(tt 1) j (cid:0) j (cid:0) the (cid:133)lter, which is the topic to which we now turn. 3.1 The Extended Kalman Filter To deal with nonlinearity in the measurement equation, one can apply the extended Kalman (cid:133)lter (EKF), which relies on a (cid:133)rst order Taylor expansion of the measurement equation around the predicted state x .3 The measurement equation is therefore rewritten as tt 1 j (cid:0) follows y = G(x )+J x x +u ; (25) t tt 1 t t tt 1 t j (cid:0) (cid:0) j (cid:0) where (cid:0) (cid:1) @G J = t @x t(cid:12)xt=xtt 1 (cid:12) j(cid:0) denotes the Jacobian matrix of the nonlinear f(cid:12)unction G(x ) computed at x : (cid:12) tt 1 tt 1 j (cid:0) j (cid:0) The covariance matrices P and P are then computed as xy(tt 1) yy(tt 1) j (cid:0) j (cid:0) P = P J ; (26) xy(tt 1) xx(tt 1) t j (cid:0) j (cid:0) and P = J P J +R: (27) yy(tt 1) t xx(tt 1) t0 j (cid:0) j (cid:0) 3For applications of the extended Kalman (cid:133)lter see Chen and Scott (1995), Duan and Simonato (1999), and Du⁄ee (1999). 8 The estimate of the state vector is then updated using (23), (24), and K = P J P 1 . (28) t xx(tt 1) t y(cid:0)y(tt 1) j (cid:0) j (cid:0) 3.2 The Unscented Kalman Filter Unlike the extended Kalman (cid:133)lter, the unscented Kalman (cid:133)lter uses the exact nonlinear functionG(x )anddoesnotlinearizethemeasurementequation. Ratherthanapproximating t G(x ); the unscented Kalman (cid:133)lter approximates the conditional distribution of the x using t t thescaledunscentedtransformation(seeJulier(2000)formoredetails), whichcanbede(cid:133)ned as a method for computing the statistics of a nonlinear transformation of a random variable. Julier and Uhlmann (2004) prove that such an approximation is accurate to the third order for Gaussian states and to the second order for non-Gaussian states. It must also be noted thattheapproximationdoesnotrequirecomputationoftheJacobianorHessianmatricesand thatthecomputationalburdenassociatedwiththeunscentedKalman(cid:133)lterisnotprohibitive compared to that of the extended Kalman (cid:133)lter. In our application below, the computation time for the unscented Kalman (cid:133)lter was on average twice that of the extended Kalman (cid:133)lter. Consider the random variable x with mean (cid:22) and covariance matrix P , and the non- x xx linear transformation y = G(x). The basic idea behind the scaled unscented transformation is to generate a set of points, called sigma points, with the (cid:133)rst two sample moments equal to (cid:22) and P . The nonlinear transformation is then applied at each sigma point. In par- x xx ticular, the n -dimensional random variable is approximated by a set of 2n + 1 weighted x x points given by = (cid:22) , (29) 0 x X = (cid:22) + (n +(cid:24))P ; for i = 1; ;n (30) i x x xx x X i (cid:1)(cid:1)(cid:1) (cid:16)p (cid:17) = (cid:22) (n +(cid:24))P ; for i = n ; ;2n (31) i x x xx x x X (cid:0) i (cid:1)(cid:1)(cid:1) with weights (cid:16)p (cid:17) (cid:24) (cid:24) Wm = ; Wc = + 1 (cid:26)2 +(cid:13) (32) 0 (n +(cid:24)) 0 (n +(cid:24)) (cid:0) x x 1 (cid:0) (cid:1) Wm = Wc = ; for i = 1; ;2n ; (33) i i 2(n +(cid:24)) (cid:1)(cid:1)(cid:1) x x where (cid:24) = (cid:26)2(n +(cid:31)) n ; and where (n +(cid:24))P is the ith column of the matrix x x x xx (cid:0) i square root of (n +(cid:24))P : The scaling p(cid:16)arameter (cid:26)>0 is(cid:17)intended to minimize higher order x xx p e⁄ects and can be made arbitrary small. The restriction (cid:31)>0 guarantees the positivity of the covariance matrix. The parameter (cid:13) 0 can capture higher order moments of the state (cid:21) 9
Description: