Rao-Blackwellization for Adaptive Gaussian Sum Nonlinear Model Propagation ∗ Sean R. Semper NASA Goddard Space Flight Center, Greenbelt, MD 20771 † John L. Crassidis University at Buffalo, State University of New York, Amherst, NY 14260-4400 ‡ Jemin George U.S. Army Research Laboratory, Adelphi, MD 20783 § ¶ Siddharth Mukherjee and Puneet Singla University at Buffalo, State University of New York, Amherst, NY 14260-4400 Introduction When dealing with imperfect data and general models of dynamic systems, the best es- timate is always sought in the presence of uncertainty or unknown parameters. In many cases, as the first attempt, the Extended Kalman filter (EKF) provides sufficient solutions to handling issues arising from nonlinear and non-Gaussian estimation problems. But these issues may lead unacceptable performance and even divergence. In order to accurately cap- ture the nonlinearities of most real-world dynamic systems, advanced filtering methods have been created to reduce filter divergence while enhancing performance. Approaches, such as Gaussian sum filtering, grid based Bayesian methods and particle filters are well-known ∗Aerospace Engineer, Guidance, Navigation and Control Hardware and Components Branch. Email: [email protected]. Member AIAA. †CUBRCProfessorinSpaceSituationalAwareness,DepartmentofMechanical&AerospaceEngineering. Email: johnc@buffalo.edu. Associate Fellow AIAA. ‡Engineer, Networked Sensing and Fusion Branch. Email: [email protected]. §Student, Department of Mechanical & Aerospace Engineering. Email: smukherj@buffalo.edu. ¶Associate Professor, Department of Mechanical & Aerospace Engineering. Email: psingla@buffalo.edu. 1 of 14 examples of advanced methods used to represent and recursively reproduce an approxima- tion to the state probability density function (pdf). Some of these filtering methods were conceptually developed years before their widespread uses were realized. Advanced nonlin- ear filtering methods currently benefit from the computing advancements in computational speeds, memory, and parallel processing. Gridbasedmethods,multiple-modelapproachesandGaussiansumfilteringarenumerical solutions that take advantage of different state coordinates or multiple-model methods that reduced the amount of approximations used. Choosing an efficient grid is very difficult for multi-dimensional statespaces, and oftentimesexpensivecomputationsmustbedoneateach point. For the original Gaussian sum filter, a weighted sum of Gaussian density functions approximates the pdf but suffers at the update step for the individual component weight selections [1]. In order to improve upon the original Gaussian sum filter, Ref. [2] introduces a weight update approach at the filter propagation stage instead of the measurement update stage. This weight update is performed by minimizing the integral square difference between the true forecast pdf and its Gaussian sum approximation [2]. By adaptively updating each component weight during the nonlinear propagation stage an approximation of the true pdf can be successfully reconstructed. Particle filtering (PF) methods have gained popularity recently for solving nonlinear estimation problems due to their straightforward approach and the processing capabilities mentioned above [3]. The basic concept behind PF is to represent any pdf as a set of random samples. As the number of samples increases, they will theoretically converge to the exact, equivalent representation of the desired pdf. When the estimated qth moment is needed, the samples are used for its construction allowing further analysis of the pdf characteristics [4]. However, filter performance deteriorates as the dimension of the state vector increases. To overcome this problem Ref. [5] applies a marginalization technique for PF methods, decreasing complexity of the system to one linear and another nonlinear state estimation problem. ThemarginalizationtheorywasoriginallydevelopedbyRaoandBlackwellindependently. According to Ref. [6] it improves any given estimator under every convex loss function. The improvement comes from calculating a conditional expected value, often involving integrat- ing out a supportive statistic. In other words, Rao-Blackwellization allows for smaller but separate computations to be carried out while reaching the main objective of the estimator. In the case of improving an estimator’s variance, any supporting statistic can be removed and its variance determined. Next, any other information that dependents on the supporting statistic is found along with its respective variance [6]. A new approach is developed here by utilizing the strengths of the adaptive Gaussian sum propagation in Ref. [2] and a marginalization approach used for PF methods found in 2 of 14 Ref.[7]. Inthefollowingsectionsamodifiedfilteringapproachispresentedbasedonaspecial state-space model within nonlinear systems to reduce the dimensionality of the optimization problem in Ref. [2]. First, the adaptive Gaussian sum propagation is explained and then the new marginalized adaptive Gaussian sum propagation is derived. Finally, an example simulation is presented. Gaussian Sum Nonlinear Model Propagation In this section, the approach from Ref. [2] is used to approximate the pdf for nonlinear systems. The idea is to approximate the required posterior pdf by a Gaussian mixture, which is a weighted sum of Gaussian density functions [8]. According to Refs. [2,9], with a sufficient collection of Gaussian components, any pdf may be approximated as closely as desired. For this section the notation found in Ref. [2] is followed with minor differences in symbols, in order to present their new method. Given a pdf p(x ), the Gaussian approximation is k (cid:2)q p(x ) ≈ wiN(x |x¯i,Pi) (1) k k k k k i=1 where the ith mean and covariance are denoted by x¯i and Pi, respectively. In order to have a k k valid pdf, the following requirement must be satisfied: that for some q and positive weights, (cid:3) w , q wi = 1. Each Gaussian distribution, N(xi|x¯i,Pi) is given by i i=1 k k k k (cid:4) (cid:5) 1 1 N(xi|x¯i,Pi) = exp − (x−x¯i)(Pi)−1(x−x¯i) (2) k k k (cid:4)2πPi(cid:4)1/2 2 k k k k Thepdfattimek+1tobeapproximatedasaGaussianmixtureisgivenbytheChapman- Kolmogorov equation [8]: (cid:6) p(x ) = p(x |x )p(x ) dx (3) k+1 k+1 k k k where p(x |x ) is the probabilistic model of the state evolution (also known as state tran- k+1 k sition pdf). Consider the nonlinear model of the general form: x = f(x ,u ,k)+η (4) k+1 k k k Thestatetransitionpdfdependsonthepdffortheprocessnoisevariableη , usuallymodeled k as an additive zero-mean Gaussian noise processes with covariance Q [2]. Thus, the state k 3 of 14 transition pdf can be written as p(x |x ) = N (x |f(x ,u ,k),Q ) (5) k+1 k k+1 k k k In the traditional Gaussian sum approximation, the forecast density function, p(x ), is k+1 obtained by linearizing the nonlinear transformation and assuming that the weights of the different components are constant, i.e., (cid:2)q pˆ(x ) = wi N(x |x¯i ,Pi ) (6) k+1 k+1 k+1 k+1 k+1 i=1 where i i w = w (7a) k+1 k x¯i = f(x¯i ,ui ,k +1) (7b) k+1 k+1 k+1 Pi = F (x¯i)PiFT(x¯i)+Q (7c) k+1 k k k k and ∂f(x¯i,ui,k) F (x¯i) = k k (8) k ∂x¯i k One of the advantages of the Gaussian sum approximation is, in a low-noise environ- ment, the resulting estimator can be very nearly optimal. Conversely, the problem is, how to formulate an algorithmic procedure for on-line computation of updating the weights. This computation is difficult due to the fact that the number of components q can grow ex- ponentially with time [2,8]. Therefore, to obtain a favorable posterior pdf approximation developing better update laws for the future weights is necessary for enhancing performance. Weight Update for Discrete-Time Dynamic Systems While the traditional Gaussian sum approximation keeps the weights constant, Ref. [2] introduces a weight update scheme that utilizes the Chapman-Kolmogorov equation. That is, given the Gaussian sum approximation at some time k as (cid:2)q pˆ(x ) = wiN(x |x¯i,Pi) (9) k k k k k i=1 the Gaussian approximation as time k +1 may be written as (cid:2)q pˆ(x ) = wi N(x |x¯i ,Pi ) (10) k+1 k+1 k+1 k+1 k+1 i=1 4 of 14 The approach in Ref. [2] finds the new weights of the Gaussian mixture such that the dif- ference between the true pdf in Eq. (3) and the approximated pdf in Eq. (10) is minimized over the domain, x , i.e., k+1 (cid:6) 1 minJ = (cid:4)p(x )−pˆ(x )(cid:4)2dx (11a) k+1 k+1 k+1 wi 2 k+1 (cid:2)q i s.t. w = 1 (11b) i+1 i=1 wi ≥ 0, i = 1, ..., q (11c) i+1 The terms in the above cost function can be expanded and rewritten as (cid:6) 1 J = |p2(x )−2 p(x )pˆ(x )+pˆ2(x )| dx (12) k+1 k+1 k+1 k+1 k+1 2 Substituting Eq. (10) and Eq. (3) gives (cid:6) (cid:6) (cid:4) (cid:5) (cid:2)q 2 1 1 pˆ2(x ) dx = wi N(x |x¯i ,Pi ) dx (13) k+1 k+1 k+1 k+1 k+1 k+1 k+1 2 2 i=1 and (cid:6) (cid:4)(cid:6) (cid:5)(cid:4) (cid:5) (cid:2)q 2p(x )pˆ(x ) = 2 p(x |x )p(x ) dx wi N(x |x¯i ,Pi ) dx k+1 k+1 k+1 k k k k+1 k+1 k+1 k+1 k+1 i=1 (14a) (cid:6) (cid:4)(cid:6) (cid:5) (cid:2)q = 2 wi p(x |x )N(x |x¯i ,Pi )dx p(x )dx (14b) k+1 k+1 k k+1 k+1 k+1 k+1 k k i=1 which are quadratic and linear terms, respectively. Each linear term from the summation is denoted as y . Substituting Eq. (5) yields i (cid:6) (cid:4)(cid:6) (cid:5) y = N(x |f(x ,u ,k),Q )N(x |x¯i ,Pi ) dx p(x ) dx (15a) i k+1 k k k k+1 k+1 k+1 k+1 k k (cid:6) (cid:4) (cid:5) = N(f(x ,u ,k)|x¯ ,Pi +Q ) p(x ) dx (15b) k k k+1 k+1 k k k The first term in the p2(x ) expression is an additive constant and excluded from the k+1 optimization problem. Now the cost function can be written as 1 J = wT Mw −wT y (16) k+1 k+1 k+1 2 5 of 14 where w = [w1 w2 ··· wq ]T, y contains the components y , and M ∈ Rq×q is k+1 k+1 k+1 k+1 i a symmetric matrix given by (cid:6) M = M(x )MT(x )dx (17) k+1 k+1 k+1 where M is a q ×1 vector that contains all the Gaussian components at time k +1: M = [N(x |x¯1 ,P1 ) N(x |x¯2 ,P2 ) ··· N(x |x¯q ,Pq )]T (18) k+1 k+1 k+1 k k+1 k+1 k+1 k+1 k+1 As a result the components of M are given by the product rule of two Gaussian density functions which yield another Gaussian density function [10]. Integrating the product leaves only the normalization constants performed. The off-diagonal elements of M are given by (cid:6) m = N(x |x¯i ,Pi )N(x |x¯j ,Pj ) dx ,i (cid:7)= j (19a) ij k+1 k+1 k+1 k+1 k+1 k+1 k+1 (cid:7) = N(x¯i (cid:7)x¯j ,Pi +Pj ) (cid:6) k+1 k+1 k+1 k+1 (cid:7) × N(xi (cid:7)Pij [(Pi )−1x¯i +(Pj )−1x¯j ],Pij ) dx k+1 k+1 k+1 k+1 k+1 k+1 k+1 k+1 = N(x¯i |x¯j ,Pi +Pj ) (19b) k+1 k+1 k(cid:4)+1 k+1 (cid:5) 1 = (cid:4)2πPks+1(cid:4)−12exp − (x¯ik+1 −x¯jk+1)T(Pks+1)−1(x¯ik+1 −x¯jk+1) (19c) 2 where Ps = Pi +Pj and Pij = [(Pi )−1 +(Pj )−1]−1. The diagonal elements of M k+1 k+1 k+1 k+1 k+1 are given by m = N(x¯i |x¯i ,Pi +Pi ) ii k+1 k+1 k+1 k+1 (20) = (cid:4)4πPi (cid:4)−1/2 k+1 Returning to the linear term wT y , if the prior pdf is approximated by a Gaussian sum k+1 then (cid:6) (cid:4) (cid:5) y = N(f(x ,u ,k)|x¯i ,Pi +Q ) p(x ) dx (21a) i k k k+1 k+1 k k k (cid:6) (cid:4) (cid:5) ≈ N(f(x ,u ,k)|x¯i ,Pi +Q ) pˆ(x ) dx (21b) k k k+1 k+1 k k k Therefore, (cid:6) (cid:2)q y = wi N(f(x ,u ,k)|x¯i ,Pi +Q )N(x |x¯j,Pj) dx (22a) i k k k k+1 k+1 k k k k k i=1 6 of 14 (cid:2)q i = w N (22b) k ij i=1 The vector w = [w1 w2 ··· wq]T is the prior weight vector and the matrix N ∈ Rq×q k k k k contains the following components: (cid:6) N = N(f(x ,u ,k)|x¯i ,Pi +Q )N(x |x¯j,Pj) dx (23a) ij k k k+1 k+1 k k k k k (cid:4) (cid:5) = E N(f(x ,u ,k)|x¯i ,Pi +Q ) (23b) N(x |x¯j,Pj) k k k+1 k+1 k k k k Note that an Unscented Transformationa (UT) can be used to compute the expectations of Eq. (23b). The final optimization problem is presented in the quadratic programming form as 1 minJ(wi ) = wT Mw −wT Nw (24a) k+1 k+1 k+1 k+1 k 2 T i s.t. 1 w = 1 (24b) q×1 i+1 wi ≥ 0 , i = 1, ..., q (24c) i+1 q×1 where 1 is a vector of ones and 0 is a vector of zeros. q×1 q×1 Rao-Blackwellization Approach A Gaussian sum filter approach allows an initial reduction of a computation complexity versus the classical Sequential Monte Carlo (SMC) method using the sequential importance sampling (SIS) algorithm. This is because a SMC method will approximate a continuous distribution of interest by a finite (but large) number of weighted random samples or par- ticles in the state space. In theory any SIS method can approximate the posterior pdf of any form and solve any nonlinear system with any arbitrary distribution, i.e. the PF. Also, incorporating the familiar Bayesian approach known as Rao-Blackwellization, the SMC com- plexity can also be reduced by marginalizing out the conditional linear parts of the nonlinear model [6]. This results in a Rao-Blackwellized Particle Filter (RBPF) where the linear por- tion is estimated using a Kalman Fliter (KF), and the nonlinear portion using the original PF as mentioned earlier [7,8,11,12]. In a similar goal of computational effort reduction for the RBPF, a Rao-Blackwellized Adaptive Gaussian Sum (RB-AGS) approach can allow a reduction of the number of Gaussian components by propagating linear portions using the KF and the nonlinear parts by the original EKF or Unscented Kalman Filter. Therefore, the aSee Ref. [2] for caveats concerning the UT. 7 of 14 Rao-Blackwellization and the new update laws will prove useful for some unique nonlinear systems. Assume that the nonlinear system from Eq. (4) can be decomposed into a general state space model: xl = fl(xn,u )+ηl (25a) k+1 k k k k xn = fn(xn,u )+ηn (25b) k+1 k k k k where the process noise η is k ⎡ ⎤ ⎡ ⎤ ηl Ql 0 η = ⎣ k⎦ ∼ N(0,Q ), Q = ⎣ k ⎦ (26) k k k ηn 0 Qn k k (cid:12) (cid:13) Here, obtaining the pdf p(x ) ≡ p xl,xn is desired. The pdf p(x ) is calculated using k k k k the following two steps. First it is assumed, at time k, the Gaussian sum approximation of p(xn) is given by k (cid:2)q (cid:12) (cid:13) pˆ(xn) = wiN xn|μi,Σi (27) k k k k k i=1 (cid:12) (cid:13) for some chosen mean μi and covariance Σi. Now the forecast approximation, pˆ xn , is k k k+1 calculated using the Chapman-Kolmogorov equation: (cid:6) (cid:12) (cid:13) (cid:12) (cid:13) pˆ xn = p xn |xn pˆ(xn)dxn (28) k+1 k+1 k k k (cid:12) (cid:13) Note that the precise expression for the conditional pdf, p xn |xn , can be written as k+1 k (cid:12) (cid:13) (cid:12) (cid:13) p xn |xn = N xn |fn(xn,u ),Qn (29) k+1 k k+1 k k k k Following the same approach previously shown, the weights corresponding to individual Gaussian components can be obtained from solving the following optimization problem: (cid:6) 1 minJ = (cid:4)p(xn )−pˆ(xn )(cid:4)2 dxn (30a) k+1 k+1 k+1 wi 2 k+1 (cid:2)q i s.t. w = 1 (30b) i+1 i=1 wi ≥ 0, i = 1, ..., q (30c) i+1 8 of 14 The above optimization problem can be written in vector format as 1 minJ(wi ) = wT Mw −wT Nw (31a) k+1 k+1 k+1 k+1 k wi 2 k+1 T i s.t. 1 w = 1 (31b) q×1 i+1 wi ≥ 0 , i = 1, ..., q (31c) i+1 q×1 where w = [w1 w2 ··· wq ]T, and w = [w1 w2 ··· wq]T is the prior k+1 k+1 k+1 k+1 k k k k weight vector. As before the symmetric Gaussian components for time k +1 are (cid:6) M = M(xn )MT(xn ) dxn (32) k+1 k+1 k+1 and M is a q ×1 vector that contains all the Gaussian components at time k +1: M = [N(xn |μ1 ,Σ1 ) N(xn|μ2 ,Σ2 ) ··· N(xn |μq ,Σq )]T (33) k+1 k+1 k+1 k k+1 k+1 k+1 k+1 k+1 Once again, for M the off-diagonal terms are (cid:4) (cid:5) 1 mij = (cid:4)2πΣsk+1(cid:4)−12exp − (μik+1 −μjk+1)T(Σsk+1)−1(μik+1 −μjk+1) (34) 2 where Σs = Σi +Σj and the diagonal terms reduce to k+1 k+1 k+1 m = N(μi |μi ,Σi +Σi ) ii k+1 k+1 k+1 k+1 (35) = (cid:4)4πΣi (cid:4)−1/2 k+1 Next the linear term wT y is made up of the individual components given by k+1 (cid:6) (cid:2)q y = wi N(fn(xn,u ,k)|μ ,Σi +Qn) N(xn|μj,Σj) dxn (36a) i k k k k k+1 k+1 k k k k k i=1 (cid:2)q i = w N (36b) k ij i=1 Now with wT y ≡ wT Nw , each N component can be seen as k+1 k+1 k ij (cid:6) N = N(fn(xn,u ,k)|μ ,Σi +Qn) N(xn|μj,Σj) dxn (37a) ij k k k k+1 k+1 k k k k k (cid:4) (cid:5) = E N(fn(xn,u ,k)|μ ,Σi +Qn) (37b) N(xn|μj,Σj) k k k k+1 k+1 k k k k where the expectation above can be computed by the UT. Note that only the Gaussian 9 of 14 sum approximation is used for the marginalized nonlinear portion of special case state-space model. (cid:12) (cid:13) The joint pdf p xl,xn can be calculated as k k (cid:6) (cid:12) (cid:13) (cid:12) (cid:13) p xl ,xn = p xl ,xn |xn p(xn)dxn (38) k+1 k+1 k+1 k+1 k k k (cid:12) (cid:13) Note that since the conditional pdf p xl ,xn |xn is Gaussian and an analytical expres- k+1 k+1 k sion, it can be easily obtained as ⎛⎡ ⎤ ⎡ ⎤⎞ (cid:12) (cid:13) fl(xn,un) Ql 0 p xl ,xn |xn = N ⎝⎣ k k k ⎦,⎣ k ⎦⎠ (39) k+1 k+1 k fn(xn,un) 0 Qn k k k k Substituting the Gaussian sum approximation for p(xn), an approximation for the joint pdf, k pˆ(x ), can be written as k ⎛⎡ ⎤ ⎡ ⎤⎞ (cid:6) (cid:12) (cid:13) (cid:2)q fl(xn,un) Ql 0 (cid:12) (cid:13) pˆ xl ,xn = wi N ⎝⎣ k k k ⎦,⎣ k ⎦⎠N xn|μi,Σi dxn (40) k+1 k+1 k k k k k fn(xn,un) 0 Qn i=1 k k k k The above integral can be considered as an expectation since (cid:6) (cid:4) (cid:5) (cid:12) (cid:13) N (f (xn,un),Q )N xn|μi,Σi dxn = E N (f (xn,un),Q ) (41) k k k k k k k k N(xn|μi,Σi) k k k k k k k where ⎡ ⎤ fl(xn,un) n n ⎣ k k k ⎦ f (x ,u ) = k k k fn(xn,un) k k k Thus, the marginalized Gaussian sum filter decreases the dimensionality of the opti- mization problem in Ref. [2] but involves an extra expectation type integral in Eq. (41) involving Gaussian pdfs, which can be evaluated using sigma points from the UT. How- ever, the marginalized Gaussian sum approximation may yield more accurate results when compared to the full-state Gaussian sum approximation using the same number of Gaussian components for the nonlinear portion of the state-space model.b The increase in accuracy is duetothefactthat, inthemarginalizedapproach, thesamenumberofGaussiancomponents are used to approximate a lower-dimensional pdf. bBased on the assumption that increasing the number of Gaussian components will improve the approx- imation. 10 of 14