The Kalman Like Particle Filter : Optimal Estimation With Quantized Innovations/Measurements Ravi Teja Sukhavasi and Babak Hassibi Abstract—We study the problem of optimal estimation quantization levels. In both cases, it is assumed that the using quantized innovations, with application to distributed conditional state density is approximately Gaussian leading estimation over sensor networks. We show that the state to a linear filter and a very simple characterization of its probability density conditioned on the quantized innovations errorperformance.UndertheGaussianassumption,theerror can be expressed as the sum of a Gaussian random vector and a certain truncated Gaussian vector. This structure bears covariance matrix associated with the state estimation error close resemblance to the full information Kalman filter and so satisfiesamodifiedRiccatirecursionofthetypethatappears allows us to effectively combine the Kalman structure with a in [11]. The only difference between this modified Riccati particlefiltertorecursivelycomputethestateestimate.Wecall and the traditional one is a scaling factor λ multiplying the resuting filter the Kalman like particle filter (KLPF) and 9 the nonlinear term of the recursion. For the SOI Kalman observe that it delivers close to optimal performance using far 0 fewer particles than that of a particle filter directly applied to filter (SOI-KF), λ is 2 while [10] presents a formula for 0 π the original problem. We also note that the conditional state λ in the case of multiple quantization levels. Henceforth, 2 density follows a, so called, generalized closed skew-normal these filters will be referred to as SOI-KF and MLQ-KF, p (GCSN) distribution. and their associated Riccati recursions as SOI-Riccati and e Index Terms—Distributed state estimation, Sign of Innova- MLQ-Riccati respectively. S tion,ClosedSkewNormalDistribution,ParticleFilter,Wireless sensor network, Kalman Filter. For linear time invariant dynamical systems, if the Gaus- 5 sian assumption were realistic, convergence of the modified ] I. INTRODUCTION Riccati must mean the convergence of the corresponding T linear filters. Using results presented in [11] one can find Recent advances in very large-scale integration and mi- I linear time invariant systems for which the MLQ-Riccati . croelectromechanical system technology have led to the s andSOI-Riccaticonverge.[12]providesexamplesforwhich c availabilityofcheap,lowqualityandlowpowerconsumption the actual error performance of SOI-KF and MLQ-KF do [ sensors in the market. This has generated a great deal of not converge to their respective Riccatis, which warrants a interest in wireless sensor networks (WSNs) due to their 1 closer examination of the assumption of Gaussianity. This v potential applications in several diverse fields [1]. Sensor is one of the questions addressed in this paper. We derive 6 network constraints such as limited bandwidth and power 9 have inspired a considerable amount of research in de- a novel stochastic characterization of the conditional state 9 density (see theorem 3.1). Using this, it is straighforwardto veloping energy efficient algorithms for network coverage 0 see that the conditionalstate density is notGaussian, as one and decentralized detection and estimation using quantized . 9 sensor observations [2]–[4]. wouldexpect,giventhenon-linearnatureofquantization.In 0 fact, it belongs to a class of distributions which we refer to The problem of estimation with quantized measurements 9 as Generalized Closed Skew Normal (GCSN) distributions. is almost as old as the Kalman filter itself. An early survey 0 A careful literature review reveals that a related observation : on the subject can be found in [5]. However, most of the v has been made in [13]. In particular, with some effort, [13] earlier techniques centered around using numerical integra- Xi tion methods to approximatethe optimalstate estimate. The can be used to derive theorem 3.1. Specializing this result to state space models, we develop a novel particle filtering r adventofparticlefiltering[6]–[8]createdawholenewsetof a approach to optimally estimate the state using quantized toolstohandlenon-linearestimationproblems.Forexample, measurements/innovations. In the rest of the paper, we use [4] proposes a particle filtering solution for optimal filtering the words ‘measurements’ and ‘innovations’ interchangealy usingquantizedsensormeasurements.But,quantizingsensor since the analysiswillprovethatthe generalstructureof the measurementscanleadtolargequantizationnoiseswhenthe filter does not depend on whether sensor measurements or observedvaluesarelargewhichthenleadstopoorestimation innovations are quantized. The proposed filter requires far accuracy.In[9],thislimitationisovercomebydevelopingan fewer particles than that of a particle filter applied directly elegant distributed estimation approach based on quantizing to the original problem [12], as will be shown through the innovation to one bit (the so called sign of innovation various simulations. We also develop a precise formulation or SOI). In [10], this is generalized to handle multiple of the conditional state density and observe that it follows RaviTejaSukhavasiiswiththedepartmentofElectricalEngineering,Cal- what we call a generalized closed skew-normal distribution, iforniaInstitute ofTechnology, Pasadena, [email protected] which is very similar to those studied in [14]–[19]. Some Babak Hassibi is a faculty with the department of Electri- useful propertiesof this distributionare also providedin the cal Engineering, California Institute of Technology, Pasadena, USA [email protected] Appendix. The next section introduces the problem setup. x(n+1)=A(n)x(n)+w(n) where x(n) ∈ Rd is the state, y(n) ∈ R is the observation, and w(n) ∈ Rd and v(n) ∈ R are uncorrelated Gaussian . . . . . white noises with zero means and covariances W(n) and R(n) , σ2(n), respectively. The initial state, x(0), of the v(0) x(0) x(1) v(1) v(n) x(n) v + + + system, is also a zero mean Gaussian with covariance P(0) andisuncorrelatedwithbothw(n)andv(n).Asin[9],[10], S(0) S(1) S(n) we consider the sensor network configuration in which the q(0) q(1) q(n) fusion center has sufficient power to broadcast its predicted Ey(n)/qn−1 outputand the correspondingerrorcovarianceto its sensors. FUSION CENTER Sensors are assumed to have limited power and hence their transmission of information should be limited. Here, we Fig.1. WSNwithafusioncenter:Thesensorsactasdatagathering devices. assume that the energy required for receiving messages is much less than that for transmitting. Fig 1 outlines the overall filtering paradigm. Once a II. PROBLEMSTATEMENTANDPRELIMINARIES scheduling algorithm is in place, at each time instant, a The broader problem that one would like to solve can sensor S(n) makes a measurement y(n) and computes the be cast as causal estimation of a random process {x(n)} innovation e˜(n) = y(n)−yˆ(n|n−1), where yˆ(n|n−1)= usingaquantizedversionofa measurementprocess{y(n)}, hxˆ(n|n − 1) together with the variance of the innovation where x(n) and y(n) can be vectors. Without any further ke˜(n)k2 are received by the sensor from the fusion center, structure,thisisadifficultproblemtoanalyze.When{x(n)} withxˆ(n|n−1)beingtheonesteppredictorofthestate.[9], and {y(n)} are jointly Gaussian, we will provide a novel [10]proposemethodstoquantizeǫ(n)andusethequantized stochastic characterization of the probaility density of x(n) innovationstoupdatethestateestimate.Thesefilterstakethe causally conditionedon the quantizedmeasurementprocess. following shape Weuseittoproposeanovelfilteringtechniquefortheabove P(n)(H(n))T problem which reduces to an elegant particle filter when xˆ(n/n)=xˆ(n/n−1)+L(q(n)) H(n)P(n)(H(n))T +σ2 {x(n)} has state space structure. v xˆ(n+1/n)=A(n)xˆ(n/n) A. Notation P(n)(H(n))TH(n)P(n) Thefollowingnotationwillbeusedintherestofthepaper. P(n/n)=P(n)−λ (2a) H(n)P(n)(H(n))T +σ2 1) If {u(n)}∞ is a discrete time random process, v n=−∞ P(n+1),P(n+1/n)=A(n)P(n/n)(A(n))T +W(n) u(i : j) denotes the collection of random vari- (2b) ables (u(i),...,u(j)). U denotes the random vector n [u(0),...,u(n)]T and u denotes a realization of U . whereq(n) denotesthe quantizedinnovationwhile L(q(n)) n n 2) ForarandomvectorY,kYk2 ,E(Y−EY)(Y−EY)T and the value of λ dependon the quantizationscheme used. 3) ForrandomvectorsX,Y,hX,Yi,E(X−EX)(Y − Eqs (2a) and (2b) constitute the modified Riccati recursion EY)T with parameter λ. For SIO-KF, λ = 2 and for MLQ-KF, π 4) For random variables (X ,...,X ), L(X ,...,X ) [10] provides a formula for λ and L(q(n)). The above 1 n 1 n denotes their linear span. filter is derivedbasedon the assumptionthatthe conditional 5) For a vector x, x(i) denotes its i-th component. In the distributionf(x(n)/q )isGaussian,whichwewillprove n−1 context of particle filtering, xi would denote the i-th is generally false. [12] provides examples where the error particle. performanceof the filters in [9],[10]do nottrack the modi- 6) N (µ,Σ)denotesak-dimnormalrandomvariablewith fiedriccatirecursionsthattheywerepredictedto.Instead,the k mean µ and covariance Σ. N (a,b,µ,Σ) denotes a k- optimal filter, which was approximated by a particle filter, k dim normal truncated to lie in (a,b), where a, b are has been observed to have an error covariance matrix that k-dim vectors and the truncation is component-wise. obeys the modified Riccatis. In order to approximate the 7) Φ(x) = P(X ≤ x), where X ∼ N(0,1), optimal filter, [12] employs a very basic particle filtering Φ(a,b,µ,σ) = P(X ∈ (a,b)) when X ∼ N(µ,σ2) algorithm which is outlined below for easy reference. and in general, Φ(I ,µ,σ)=P(X ∈I ). ∗ ∗ 8) Thenotionofoptimalitytobeusedthroughoutthepaper Alg 1. Particle Filter is mean squared error optimality. 1) Set n=0. For i=1,··· ,N, initialize the particles, B. Problem Formulation xi(0|−1)∼f(x(0)) and set xˆ(0|−1)=0 Suppose{x(n)}and{y(n)}havethefollowingstatespace 2) At time n, set q(n)=Q(y(n)−H(n)xˆ(n|n−1)), structure. where Q(.) is a quantization rule. 3) Suppose the quantized value q(n) implies that x(n+1)=A(n)x(n)+w(n) (1a) y(n)−H(n)xˆ(n|n−1)∈I(n), then y(n)=H(n)x(n)+v(n) (1b) v(n)+H(n)(x(n)−xˆ(n/n−1))∈I(n). The importance weights are now calculated as follows Q(.)beageneralquantizationrulewhichgeneratesthequan- tized measurement process {q(n)}, where q(n) = Q(y ). wi(n)=Φ(I(n),H(n)(x(n)−xˆ(n/n−1)),σ (n)) n v Note that we allow the quantization rule to depend on all 4) Measurement update is given by the past measurements y(i), i ≤ n. This includes, as a special case, the method of quantizing the innovations first N xˆ(n/n)= wi(n)xi(n/n−1) proposedin[9].We willshowthattheprobabilitydensityof Xi=1 x(n)conditionedonthequantizedmeasurementsQn admits a characterization very similar to Eq (3). We state the result where wi(n) are the normalized weights, i.e., in the following theorem. wj(n)= wj(n) PN wi(n) 5) Resample Ni=p1articles with replacement accoding to, Theorem 3.1: Theprobabilitydensityofx(n)conditioned on the quantized measurement Q can be expressed as Prob xi(n/n)=xj(n/n−1) =wj(n) n (cid:0) (cid:1) x(n)/(Q =q )∼N (0,R ) 6) For i=1,··· ,N, predict new particles according to, n n k ∆ +R (n)(R (n))−1(Y /(Q =q )) xy y n n n xi(n+1/n)∼f x(n+1)|xi(n|n) , i.e., (5) xi(n+1/n)=A(n)(cid:0)xi(n/n)+N (0,(cid:1)W(n)) k Proof: Theproofisfairlystraightforward.Thetheorem where W(n) is the covariance of the process noise at will be proved by showing that the moment generating time n. fuction of x(n)/(Q =q ) can be seen as the product n n 7) Set xˆ(n+1|n)=A(n)xˆ(n|n). Also, set n=n+1 of two moment generating functions corresponding to the and iterate from step 2. two random variables in Eq (5). For brevity, we will write x(n)/(Q =q ) as x(n)/q . n n n The particles in Alg 1 describe the conditional state f(x(n)/q )= f(x(n),y /q )dy density f(x(n)/q ) and simulations suggest that one needs n Z n n n n upwards of a few hundred particles to get satisfactory error Noting that f(x(n)/y ,q )=f(x(n)/y ), we can write performance for most systems. In the following sections, n n n we disprove the premise behind MLQ-KF and SOI-KF and EetTx(n)/q = etTx(n)f(x(n)/y )f(y /q )dx(n)dy develop a novel particle filtering technique (KLPF) which n Z n n n n schoonwvetrhgaetsthtoeKthLePoFpntiemedaslffialrteferwaesrypmaprttioctliecsaltlhya.nStihmeuplaarttiiocnles (=∗)e21tTR∆tZ etTRxy(n)(Ry(n))−1ynf(yn/qn)dyn filterofAlg1.Thedifferencepartlyliesinusingparticlesto describeaprobabilitydensityfuntionwithmuchlesssupport | ,mfgofRxy(n){(zRy(n))−1yn/qn } than the conditional state density. =⇒ M (t)=M (t)M ((R (n))−1R (n)t) III. FULL INFORMATION VSQUANTIZED INNOVATIONS x(n)/qn Z yn/qn y yx (6) Suppose{x(n)}and{y(n)}arejointlyGaussian,thenitis where Z ∼N (0,R ). In getting (∗), we used the fact that well known that the probability density of x(n) conditioned k ∆ x(n)/y ∼N (R (n)(R (n))−1y ,R ).Foranyrandom on Y is a Gaussian with the following parameters n k xy y n ∆ n variable Y, it is easy to see that M (ATt)=M (t). The Y AY x(n)/(Y =y )∼N (0,R )+R (n)(R (n))−1y , result is now obvious from Eq (6). n n k ∆ xy y n (3) Comparing Eqs (3) and (5), the only difference is the mea- R (n),kx(n)k2, R (n),kY k2, R (n),hx(n),Y i x y n xy n surement vector y being replaced by the random variable n and R∆ ,Rx(n)−Rxy(n)(Ry(n))−1Ryx(n) Yq(n),Yn/qn.ItiseasytoseethatYq(n)isamultivariate gaussian random variable truncated to lie in the region When {x(n)} has an underlying state space structure and defined by q . It is worth noting that the covariance of {y(n)} is a linear measurement of {x(n)} corrupted by n x(n)/q , kx(n)/q k2, is given by additive white Gaussian noise, as defined in Eq (1), it is n n well known that the following Riccati recursion propogates R +R (n)(R (n))−1kY /q k2(R (n))−1R (n) ∆ xy y n n y yx the error covariance P(n)=R ∆ Asthequantizationschemebecomesfiner(inanappropriate P(n),P(n/n−1)=A(n)P(n−1/n−1)A(n)T +Q(n) manner), Y (n) clearly converges to Y and x(n)/q ap- q n n P(n)H(n)(H(n))TP(n) proaches a Gaussian as is well known. Using theorem 3.1, P(n/n)=P(n)− (4) H(n)P(n)(H(n))T +σ2 it is easy to see that x(n)/qn is not Gaussian in general, v contrary to the assumption made in [9], [10]. It belongs P(0)=kx(0)k2 to a class of distributions, which we call the Generalized We would like to address the problem of optimalestimation Skew Normal Distributions (GCSN), the details of which using a quantized version of the observation process. Let are provided in the Appendix. We will use theorem 3.1 to propose a novel particle Φ s (n),s (n),Ez(n)/z , R 1 2 n−1 ∆n filtering scheme. We begin by noting that (cid:16) p (cid:17) Also the conditional distribution of Z /Z is given by n 1:n−1 Ex(n)/q =R (n)(R (n))−1EY (n) (7) n xy y q f (z(n)/z )=N z(n);s (n),s (n),Ez(n)/z ,R n n−1 1 2 n−1 ∆n The filter is implemented for the quantization scheme pro- (cid:16) (cid:17) posed in [9], [10]. At time n, the sensor which is sched- Ez(n)/zn−1 =hz(n),Zn−1i(Rz(n−1))−1zn−1 uled to make the measurement receives a prediction of its R ,kz(n)k2−hz(n),Z i(R (n−1))−1hZ ,z(n)i ∆n n−1 z n−1 measurement yˆ(n) , yˆ(n/n−1) and the error covariance ky(n)−yˆ(n/n−1)k2. The sensor, then quantizes e˜(n) , Note that fn(z(n)/zn−1) is a one dimensional truncated ke˜(n)k normal. where e˜(n) , y(n) − yˆ(n/n − 1), using the following Proof: Theproofisstraightforwardandisnotpresented quantizer and transmits the result to the fusion center. here due to space limitations. q if x>r The following Lemma describes a standard technique to K K−1 Q(x)=q if r <x≤r , i≥2 generate scalar truncated normal random variables. i i−1 i Lemma 4.2: Let U ∼U(0,1), then q if x≤r 1 1 Y =Φ−1((Φ(b)−Φ(a))U +Φ(a)) where (r ,...,r ) are the quantization levels. Using the 1 K−1 convention, r0 = −∞ and rK = ∞, we can re-write is distributed as N(a,b,0,1) the quantization levels as (r0,...,rK). The output of the Proof: If F(y) denotes the cumulative distribution quantizerattimenwillbedenotedbyq(n)andthelowerand function of Y, then it is well known that F(Y) ∼ U(0,1). upper limits of the interval implied by q(n) will be denoted We also have by r (n) and r (n) respectively. We assume that the fusion 1 2 0 if y <a center has access to the exact value of q(n) at every instant. Notethatyˆ(n/n−1)is anestimate ofthemeasurement,not F(y)=Φ(y)−Φ(a) if a≤y <b Φ(b)−Φ(a) necessarily the optimal. Hence y(n)−yˆ(n/n−1) is not an 1 if y ≥b innovationinthetruesenseoftheword,unlesstheestimator employed at the fusion center is optimal. Nevetheless, we So, if U ∼ U(0,1), then F−1(U) ∼ N(a,b,0,1) will refer to q(n) as the quantized innovation. With this and hence the technique. Noting that N(a,b,µ,σ2) ≡ setup, we will develop a filtering technique which needs N(a−σµ,b−σµ,0,1), we can generate scalar truncated normal random variables with arbitrary mean and variance. far fewer particles to achieve optimal performance than the simple particle filter applied to the original problem. We will now propose a particle filtering technique to recursively compute the state estimate. IV. THE FILTER We shall begin with the observation that Y (n) is a q Alg 2. Truncated Normal Particle Filter multivariate normal distribution truncated as follows 1) At n=0, generate Yq(n)=Yn/(cid:16)e˜(j)∈(r1(j),r2(j))∀j ≤n(cid:17) {yi(0)}Ni=1 ∼N(s1(0),s2(0),0,Ry(0)) using the technique outlined in Lemma 4.2. =Y / y(j)∈(r (j)+yˆ(j),r (j)+yˆ(j))∀j ≤n n 1 2 2) At time n, for each particle {yi }, compute the (cid:16) (cid:17) n−1 weight as Let s (n) , r (j) + yˆ(j) and s (n) , r (j) + yˆ(j). 1 1 2 2 Then, clearly, Yq(n) is a multivariate normal with its j-th wi(n)=Φ s1(n),s2(n),Ey(n)/yni−1, R∆n (8) component truncated to lie in the interval (s1(j),s2(j)) ∀ (cid:16) p (cid:17) j ≤ n, i.e., Y (n) ∼ N (s ,s ,0,R (n)). From Eq 3) Generate q n+1 1,n 2,n y (7),itisclearthattheoptimalstateestimatecanbecomputed {yi(n)}N ∼N s (n),s (n),Ey(n)/yi ,R by firstcomputingthe meanof the truncatednormal.Before i=1 1 2 n−1 ∆n (cid:0) (cid:1) proposing the filter, we need a couple of results for and define yi = yni−1 . (a) relating the distributions of Y (n) and Y (n−1) and n hyi(n)i q q 4) Measurement update (b) generating scalar truncated normal random variables. N wi(n)yi Lemma 4.1: Let (Z1,Z2,··· ,Zn)∼fn(zn), where xˆ(n|n)=R (n)(R (n))−1 i=1 n (9) xy y P N wi(n) f (z )=N (z ;s ,s ,0,R (n)), i=1 n n n n 1,n 2,n z P 5) Resample the N particles {yi}N with replacement then the marginal of the first n−1 components is given by n i=1 accoding to Prob yi =wi(n) where the normalized n (Z ,··· ,Z )∼f (z ) weights are given(cid:0)by(cid:1) 1 n−1 n n−1 ≡fn−1(zn−1) wi(n)= wj(n) ∝N (z ;s ,s ,0,R (n−1))× N wi(n) n−1 n−1 1,n−1 2,n−1 z i=1 z }| { P 6) Set xˆ(n+1|n)=A(n)xˆ(n|n). Also, set n=n+1 Now consider Eq (9), we can write it alternately as and iterate from step 2. N wi(n)yi xˆ(n|n)=R (n)(R (n))−1 i=1 n xy y P N wi(n) i=1 Inthefilterproposedabove,itisimportanttonotethatthe N P N dimension of the particle yi is n and hence increases with = wi(n)Rxy(n)(Ry(n))−1yni = wi(n)xi(n/n) n X X time.Intheabsenceofanystructureontherandomprocesses i=1 i=1 (12) {x(n)} and {y(n)}, the computations involved in running the above filter will quickly become infeasible. Alg 2 is of From Eqs (11) and (12), it is easy to see that all the purely theoretical interest but the technique presented above information in yi is captured in xi(n/n). Hence, we only n is greatly simplified when x(n) has state space structure. In needtoworkwith{xi(n/n)}N atanytimen.We cannow i=1 the following subsection, we will show how to overcome desribe the new filter as follows. the problem of increasing particle dimension when {x(n)} Alg 3. Kalman Like Particle Filter (KLPF) is a Gauss Markov process and {y(n)} is a linear mea- 1) At n=0, generate surement of {x(n)} corrupted by additive white Gaussian {yi(0)}N ∼N(s (0),s (0),0,R (0)) using the noise. The filter takes an elegant formulation in which the i=1 1 2 y technique outlined in Lemma 4.2. Compute {xi(0)} particle dimension remains fixed and is equal to the state using Eq (10a) and xi(1/0)=A(0)xi(0/0) dimension. This approach requires far fewer particles than 2) Generate Alg 1 and the filtering technique is quite general, in that, it can handle arbitrary number of quantization levels and {yi(n)}N ∼N s (n),s (n),H(n)xi(n),R i=1 1 2 ∆n arbitrary quantization intervals. We will also observe that (cid:0) (cid:1) Use Eq (11b) to compute R . Then form {xi(n/n)} thefilterrequiresfewerandfewerparticlesasthenumberof ∆n using Eq (10). quantization levels increases. 3) At time n, for each particle {xi(n)}, compute the weight {wi(n)} as A. Exploiting State Space Structure - The Kalman Like wi(n)=Φ s (n),s (n),H(n)xi(n), R 1 2 ∆n Particle Filter (cid:16) p (cid:17) 4) (Measurement update) Suppose{x(n)}and{y(n)}havethestatespacestructure Normalize the weights to get wi(n)= wj(n) and defined in Eq (1). Then we know that Rxy(n)(Ry(n))−1yni compute the measurement updated statePeNis=ti1mwai(tne)using is the optimal estimate of x(n) given the measurement Eq (12), i.e., xˆ(n/n)= N wi(n)xi(n/n) vectoryi and hence can be computedby runninga Kalman i=1 n 5) Resample the N particlePs {xi(n/n)}N with filter using yi, the specific realization of the measurement i=1 n replacement accoding to Prob xi(n/n) =wi(n) and vector Y . Similarly Ey(n)/yi is the optimal estimate n n−1 compute xi(n+1)=A(n+1(cid:0))xi(n/n)(cid:1). of y(n) given Y = yi and R is the resulting n−1 n−1 ∆n 6) Set xˆ(n+1|n)=A(n)xˆ(n|n). Also, set n=n+1 errorcovariance.Allthesequantitiesemergenaturallyin the and iterate from step 2. following Kalman filtering steps. P(0)H(0)T xi(0/0)= yi(0) (10a) Remark 1: It is easy to see that the above filter can be H(0)P(0)(H(0))T +σ2(0) v extended to the case when the quantizer employed at the P(k+1),P(k+1/k)=AP(k/k)AT +W(k) (10b) sensor is of the more general form Q(x)=q if x∈I for j j P(k)H(k)(H(k))TP(k) 1≤j ≤K. P(k+1/k+1)=P(k)− H(k)P(k)(H(k))T +σ2(k) v (10c) From the aboveimplementation,in terms of complexity,the xi(k+1),xi(k+1/k)=Axi(k/k) (10d) KLPF is clearly equivalent to running N parallel Kalman filters. Hence, the complexity of the KLPF scales linearly xi(k+1/k+1)=xi(k+1)+ (10e) in N, the number of particles. Also, it converges to the P(k)(H(k))T optimalfilter as N →∞. But formost systems, simulations (yi(k+1)−H(k+1)xi(k+1)) H(k)P(k)(H(k))T +σ2(k) suggestthattheKLPFdeliversclosetooptimalperformance v (10f) for N ≤50. The particles in the KLPF describe the random variable R (n)(R (n))−1Y (n). The support of its distri- Now, R (n)(R (n))−1yi, R and Ey(n)/yi can be xy y q xy y n ∆n n−1 bution clearly decreases with increasing quantization levels. calculated as follows As a result KLPF needs fewer particles as the quantization R (n)(R (n))−1yi =xi(n/n) (11a) becomes finer, a property that Alg 1 does not share. This xy y n will be demonstrated through examples in Section V. R =H(n)P(n)(H(n))T +σ2(n) (11b) ∆n v The scenario considered thus far involves one measure- Ey(n)/yi =H(n)xi(n) (11c) n−1 ment per time instant. But Alg 3 can be easily extended to TABLEI 250 SUMMARYOFNUMERICALRESULTS Monte Carlo of SOI − KF SOI−Riccati Example 1 Example 2 ˆx(n)k2125000 PKaLrPtiFc,le N FKLilPteFr ,= N 5P0F0 = 2500 SOI-KF SnOoI 2-bit SyOesI 2-bit kx(n)-100 MLQ-KF - no - yes 50 Alg 1 2500 10000 500 750 00 10 20 30 40 tim50en 60 70 80 90 100 KLPF 500 90 25 3 (a) SOI-KF, Alg 1 and KLPF 350 MLQ−Riccati for 2 bits 300 Monte Carlo Simulation of MLQ−KF hgaivnednletimmeulitnipslteanmt.eHaesurereimt iesntasssfurmomeddtihfafetrtehnetmseenassourrsematenat n)||2250 KPLaPrtiFc,le N FKLilPteFr ,= N 9P0F = 10000 ˆx(200 noise processes are uncorrelated across different sensors. n)-150 The proof is simple and is not detailed here due to space ||x(100 constraints. 50 V. SIMULATIONS 00 10 20 30 40 tim50en 60 70 80 90 100 (b) MLQ-KF, Alg 1 and KLPF InAlg1,theparticlesdescribethefullprobabilitydensity of the state conditioned on quantized measurements. While in Alg 3, part of the information about the conditional state Fig. 2. In (a), SOI-KF clearly diverges while Alg 1 and KLPF converge to the optimal filter. From (b), MLQ-KF with 4 levels of density is captured neatly by the Kalman filter. So, the quantization also diverges while KLPF converges to the optimal particlesdescribe a truncatedGaussian which has muchless filter with just 90 particles support than the full conditional distribution. We give two examples in this section to demonstrate the effectiveness of KLPF. The following table summarizes the highlights from the optimal filter whereas MLQ-KF is itself clearly not the the two examples optimal filter. In Examples 1 and 2, by optimality, we meant In Table I, a ‘yes’ indicates that the filter works and is that the mean squared errors of Alg 1 and KLPF tracked close to optimal, a ‘no’ indicates that its estimation error the corresponding modified Riccati recursions very closely. diverges and a ‘-’ means that the quantization method does In fact, simulations showed that the mean squared errors of not apply to the filter. ‘SOI’ stands for ‘sign of innovation’ Alg 1 and KLPF saturate at this value irrespective of the and ‘2-bit’ stands for a quantization rule with quantization number of particles used. intervals given by (−∞,−1.2437), (−1.2437,−0.3823), (−0.3823,0.3823), (0.3823,1.2437) and (1.2437,∞). If B. Example 2 the innovation falls in the interval (−0.3823,0.3823), no A simple tracking system can be characterized by the measurement update is done, so that 2 bits will suffice to τ4 τ3 represent the output of the above quantizer. The numbersin followingparameters,A=h10τ1i,h=[10],W =hτ43 τ22i, front of Alg 1 and KLPF denote the number of particles σ2 = 0.81 and P(0) = 0.01I and the sampling2period v 3 required to approximate the optimal filter. Clearly, KLPF τ =0.1. requires far fewer particles than Alg 1. Also evident from In Example 2, note that KLPF works with much fewer Table I is the fact that KLPF needs dramatically fewer particles than in Example 1. One can attribute this to the particles as the quantization becomes finer. much higher value of the optimal mean squared error in Example1thaninExample2,ascanbeseenfromtheplots. A. Example 1 We re-use this example from [12]. Consider a linear VI. CONCLUSIONS time invariant system of the form (1) with the following We propose a Kalman like particle filter (KLPF) that, 0.95 1 0 parameters: A = 0 0.9 10 , h = [102], W = 2I3, in the examples studied, required moderately small number h 0 0 0.95i R , σ2 = 2.5 and P(0) = 0.01I , where I denotes an of particles and therefore can obtain close to optimal per- v 3 m m×m identity matrix. Note that A is a stable matrix. As formance with a computational complexity comaparable to can be seen from the plots, SOI-KF and MLQ-KF diverge the conventional Kalman filter. An important open issue is but KLPF delivers optimal performance with much fewer to determine the number of particles necessary to closely particles than Alg 1. With the addition of just 1 bit, the approximate the optimal filter. As earlier observed in [12], requirednumberofparticlesdropsfrom500to 90.Butwith the error covariance matrix of the optimal filter seems to Alg 1, the numberof particles goesup from 2500 to 10000. follow the modified Riccati recursion introduced in [11]. Remark 2: Extensive simulations suggest that the mod- Determining whether this is the case, and why, remains an ified Riccati of Eq (2) describes the error performance of interesting open question. 0.8 compression,” Information Theory, IEEE Transactions on, vol. 16, no.2,pp.152–161,1970. 0.7 [14] A Azzalini and A Dalla Valle, “The multivariate skew-normal n)||20.6 distribution,” Biometrika, vol.83,no.4,pp.715–726, 1996. ˆx(0.5 [15] Reinaldo B. Arellano-Valle and Marc G. Genton, “On fundamental - n)0.4 KLPF, NKLPF = 25 skew distributions,” J. Multivar. Anal., vol. 96, no. 1, pp. 93–116, ||x(0.3 2S OleIv −e lK−RFiccati 2005. Particle Filter, NPF = 500 [16] LiHeMarcG.GentonandXiangweiLiu, “Momentsofskew-normal 0.2 randomvectors andtheirquadratic forms,” Statistics andProbability 0.10 10 20 30 40 tim50en 60 70 80 90 100 Letters, vol.51,no.4,pp.319–325,2001. [17] Marc G. Genton, “Discussion of the skew normal,” Scandinavian (a) SOI-KF, Alg 1 and KLPF JournalofStatistics, vol.32,no.2,pp.189–198, 2005. [18] MarcG.GentonPhilippeNaveauandXilinShen, “Askewedkalman 0.6 filter,” J.Multivar. Anal.,vol.94,no.2,pp.382–400,2005. 0.55 [19] MarcG.Genton, Skew-Elliptical DistributionsandtheirApplications 0.5 -AJourney beyongNormality, ChapmanandHall/CRC, 2004. n)-ˆx(n)||200..034.554 5P alertvicelle − F Riltieccr,a NtiPF = 750 APPENDIX ||x(0.02.53 K5 LlePvFe,l N−K KLPFF = 3 A. The Generalized Closed Skew-Normal Distribution 0.2 0.15 Definition 1: For x ∈ Rn, we define the generalized closed 0.10 10 20 30 40 tim50en 60 70 80 90 100 skew-normal distribution,GCSNk,n(x;µ,Σ,D,s1,s2,∆),asfol- lows (b) MLQ-KF, Alg 1 and KLPF GCSN (x;µ,Σ,D,s ,s ,∆), N (x;µ,Σ)L (.) k,n 1 2 k k,n Φ (s ,s ;D(x−µ),∆) Fig. 3. Both in (a) and (b), all the filters are close to optimal. L (.)= n 1 2 (13) KLPFachievesoptimalperformancewithremarkablyfewparticles k,n Φn(s1,s2;0,∆+DΣDT) and hence has a complexity of the same order as that of SOI-KF N (x;µ,Σ) is a k-dim gaussian r.v and Φ (y ,y ;ν,∆) = k n 1 2 and MLQ-KF. P(y(1) ≤ X ≤ y(1),...,y(n) ≤ X ≤ y(n)) where 1 1 2 1 n 2 (X ,...,X )∼N (x;ν,∆). 1 n n Thesizesofthematricesinvolvedfollowaccordingly.Thisisavery REFERENCES simplegeneralizationoftheclosedskew-normal(CSN)distribution defined in [19]. Infact, it reduces to the CSN if y = −∞ [1] I.F.Akyildiz,WeilianSu,Y.Sankarasubramaniam,andE.Cayirci, “A 1 in all its components. Naturally, it inherits most of its closure survey on sensor networks,” Communications Magazine, IEEE, vol. properties. Skew elliptical distributions generated a lot of interest 40,no.8,pp.102–114,2002. because they provide a much needed tool to handle skewness [2] A.Krasnopeev, Jin-Jun Xiao, andZhi-Quan Luo, “Minimum energy decentralized estimation in sensor network with correlated sensor in statistical modeling and have a good number of properties in noise,” inICASSP,2005,vol.3,pp.iii/673–iii/676. common with the standard normal distribution, such as closure [3] Zhi-QuanLuoandJin-JunXiao, “Universal decentralized estimation undermarginlizationandconditioning.Wewillbrieflydiscusssome inabandwidthconstrainedsensornetwork,” inICASSP,2005,vol.4, of the useful properties. pp.iv/829–iv/832. Lemma 1.1: If X ∼ N (µ,Σ), Z ∼ N (0,∆) be k n [4] G Rickard Karlsson and Fredrik Gustafsson, “Particle filtering for independent, then X/(D(X−µ)−Z ∈(s ,s )) ∼ quantized sensor information,” in 13th European Signal Processing 1 2 GCSN (µ,Σ,D,s ,s ,∆) Conference, EUSIPCO.2005,EURASIP,Turkey. k,n 1 2 In the following proposition, we provide a stochastic character- [5] R. Curry, Estimation and Control with Quantized Measurements, M.I.TPress,1970. ization of the GCSN which can be identified as a special case of [6] N.J. Gordon, D.J. Salmond, and A.F.M. Smith, “Novel approach to theorem 3.1 for state space systems. nonlinear/non-gaussian bayesian state estimation,” Radar andSignal Proposition 1.2: Suppose Processing, IEEProceedings-F, vol.140,no.2,pp.107–113,1993. [7] D.CrisanandA.Doucet,“Asurveyofconvergenceresultsonparticle V ∼N 0,Σ−ΣDT(∆+DΣDT)−1DΣ and k filtering methods for practitioners,” IEEE Transactions on Signal “ ” Processing,vol.50,no.3,pp.736–746, 2002. U ∼N s ,s ,0,∆+DΣDT n 1 2 [8] M.S.Arulampalam,S.Maskell,N.Gordon,andT.Clapp, “Atutorial “ ” onparticlefiltersforonlinenonlinear/non-gaussianbayesiantracking,” are independent, where N (a,b,0,R) denotes a p-variate normal IEEETransactionsonSignalProcessing,vol.50,no.2,pp.174–188, p distributiontruncatedcomponent-wise toliein(a,b),whereaand 2002. b are p-dim vectors specifying the boundaries of a cube in Rp, [9] A. Ribeiro, G. B. Giannakis, and S. I. Roumeliotis, “Soi-kf: Dis- torfibiuntneodvaktailomnsa,n”fiIElteErEinTgrwanitshaclotiwon-csoosntcSoigmnmalunPircoacteiosnsisngu,sivnogl.th5e4,singon. i.e., Z∼Np(a,b,0,R) =⇒ fZ(z) = PN(Zp(z∈;0(a,,Rb)))IZ∈(a,b). 12,pp.4782–4795,2006. Then, Y = V + ΣDT ∆+DΣDT −1U is distributed as [10] KeyouYou,LihuaXie,ShuliSun,andWendongXiao,“Multiple-level GCSN(0,Σ,D,ν,∆). ` ´ quantized innovation kalman filter,” in17thInternational Federation The proposition will be proved using lemmas 1.3 and 1.4 which ofAutomaticControl. 2008,Seoul,Korea. are stated below. [11] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, M.I. Jordan, Lemma 1.3: If X ∼ GCSN (µ,Σ,D,s ,s ,∆), then the andS.S.Sastry, “Kalmanfilteringwithintermittent observations,” in k,n 1 2 moment generating function of X is given by Proceedings. 42ndIEEEConference onDecision andControl, 2003, 2003,vol.1,pp.701–708Vol.1. Φ (s ,s ;DΣt,∆+DΣDT) 1 [12] Ravi Teja Sukhavasi and Babak Hassibi, “Particle filtering for M (t)= n 1 2 exp(µt+ tTΣt) quantized innovations,” ICASSP,2009. X Φn(s1,s2;0,∆+DΣDT) 2 [13] W.V Velde R. Curry and J. Potter, “Nonlinear estimation with Now, we will compute the moment generating function of a quantized measurements - pcm predictive quantization, and data truncated multivariate normal distribution. Lemma 1.4: LetX ∼N (a,b,0,R),thenthemoment generat- given by the following recursions p ing function of X is given by Σ(n/n)=Σ(n), D(n/n)= D(n/n−1) ,s = s1,n−1 , M (t)= Φn(s1,s2;DΣt,R)exp tTRt (14) » H – 1,n »s1(n)– ProWofe:w[PilrloonfoXwofcPormoppolesΦtietniot(hnse11,p.s2r2o];o0f,oRf)the prop„osit2ion.« s2,n =»ss22,(nn−)1–, ∆(n/n)=»∆(0n) σ0v2– (16) Let S = ΣDT ∆+DΣDT −1, then Y = V + SU and where xˆ(n) , xˆ(n/n−1) = E(x(n)/qn−1). In the rest of MY(t) = MV(t)M` SU(t). Now´, MSU(t) = MU(STt). Plugging the appendix, we will use the notation R(n) = ∆(n/n) + intheformulaforthemomentgeneratingfunctionderivedinlemma D(n/n)Σ(n)D(n/n)T (=∗)∆(n)+D(n)Σ(n)D(n)T ((∗)follows 1.4, we see that MY(t) is the moment generating function of a from Eq (15)). GCSN (µ,Σ,D,s ,s ,∆) using lemma 1.3. k,n 1 2 Proposition 1.5: LetX ∼GCSN (µ,Σ,D,s ,s ,∆),W ∼ C. Interpreting the Parameters k,n 1 2 Nk(µW,Q) be independent and define Y =AX+W, then Y ∼ Theorem 1.8 outlines how the conditional state density evolves GCSNk,n(µY,ΣY,DY,s1,s2,∆Y), where with time. But, there is little intuition in the recursions proposed Σ =AΣAT +Q, D =DΣATΣ−1, and there.Inthefollowinglines,wewillexplaintheconnectionofeach Y Y Y of the parameters to the state space model, so that one can afford ∆Y =∆+DΣDT −DYΣYDYT, µY =µ+µW abetterinsight intotheoverall filteringtechnique firstproposed in This proof of the proposition follows directly from the following [9].WewillstartbynotingthatΣ(n)isequaltothestatecovariance two lemmas. The first states that the GCSN is closed under full at time n. ranklineartransformationswhilethesecondstatesitsclosureunder Lemma 1.9: Σ(n)=kx(n)k2 addition of gaussian noise. The next Lemma states that R(n) = kY k2, i.e., R(n) is the n Lemma 1.6: If Y ∼ GCSN (µ,Σ,D,s ,s ,∆) and A covariance of the measurement vector [y(0),...,y(n)]T k,n 1 2 is a full column rank k × p (k ≥ p) matrix, then Lemma 1.10: R(n)=kYnk2 =Ry(n) AY ∼ GCSNk,n Aµ,AΣAT,D(ATA)−1AT,s1,s2,∆ . If A The following Lemma gives an expression for P(qn) is a full row ran`k p × k (p ≥ k) matrix, then A´Y ∼ Lemma 1.11: The probability of quantized innovations is given GCSN (µ ,Σ ,D ,ν,∆ ), where by k,n A A A A µA =Aµ, ΣA =AΣAT, DA =DΣATΣ−A1 P(qn)=Φn+1(s1,n,s2,n;0,Ry(n)) In the following Lemma, we look at the cross covariance of x(n) ∆ =∆+DΣDT −D Σ DT A A A A and Yn The following lemma states that the GCSN is closed under Lemma 1.12: hY ,x(n)i=D(n/n)TΣ(n) n addition of gaussian noise. Now, applying Proposition 1.2 on theconditional statedensity, we Lemma 1.7: If X ∼ GCSNk,n(µ,Σ,D,s1,s2,∆) and can write x(n)/Qn=qn stochastically as W ∼ N (µ ,Σ ) are independent, then X + W ∼ GCSNk,nk(µXW+W,WΣX+W,DX+W,s1,s2,∆X+W), x(n)/qn=Z+Σ(n)D(n/n)T(R(n))−1Yq(n), where Dwhere µ=X+DWΣ(Σ=+Σµ )+−1µaWnd, ∆ΣX+W ==∆ +ΣDΣ+DΣTW−, Z ∼Nk(0,Σ(n)−Σ(n)D(n/n)(R(n))−1D(n/n)TΣ(n)) DX+WΣ DT . W X+W Yq(n)∼Nn+1(s1,n,s2,n;0,Ry(n)) (17) X+W X+W X+W Using Lemmas 1.9, 1.10 and 1.12, one can see that B. The Conditional State Density Σ(n)−Σ(n)D(n/n)(R(n))−1D(n/n)TΣ(n)= We will now show that the probability density of the state kx(n)k2−hx(n),Y i kY k2 −1hY ,x(n)i and (18) n n n acoGndCitSioNn.edTiomnethueppdaastteqaunadntiMzeedasinunreomvaetniotnuspfda(txe(nre)c/uqrnsi)ofnosllowwilsl Σ(n)D(n/n)T(R(n))−`1 =hx´(n),Yni kYnk2 −1 (19) be provided for the parameters of the conditional density. All the ` ´ parametershaveverysimpleinterpretationsintermsofthestatistics |,R{xyz(n)}|(Ry({nz))−1} of the state space model. For ease of exposition, we will provide It is now easy to see that Eqs (18) and (19) are consistent with all the recursions for a time-invariant state space model. But the theorem 3.1 and hence we have derivations can be trivially extended to the time varying case. Theorem 1.8 (Timeand Measurement Updates): The x(n)/qn=Nk(0,Pkal(n/n))+Rxy(n)(R(n))−1Yq(n) (20) probability density function of x(n)/qm is given by wherePkal(n/n)istheerrorcovarianceoftheKalmanfilterattime GCSNk,m+1(x(n);0,Σ(n/m),D(n/m),s1,n/m,s2,n/m, step n. The methodology outlined in the appendix can be seen as ∆(n/m)), for m=n−1, n and ∀ n ≥ 0. The recursions relating analternatewayofarrivingatAlg3.Howeveritiscircumlocutious theparametersofthedistributionsofx(n−1)/Qn−1,x(n)/Qn−1 and not very intuitive. Moreover, the stochastic characterization in and x(n)/Qn are given by Eq (20) cannot be extended to the more general case of theorem 3.1 using the theory of GCSN. However, it serves as conclusive Σ(n),Σ(n/n−1)=AΣ(n−1/n−1)AT +W, evidencethattheconditionalstatedensityisnotGaussian,contrary D(n),D(n/n−1)=D(n−1/n−1)Σ(n−1/n−1)ATΣ(n)−1 to what is assumed in [9], [10]. s1,n−1 ,s1,n/n−1 =s1,n−1/n−1, s2,n−1 ,s2,n/n−1 =s2,n−1/n−1, ∆(n),∆(n/n−1)=∆(n−1/n−1)+ D(n−1/n−1)Σ(n−1/n−1)D(n−1/n−1)−D Σ DT (15) n n n Eq (15) constitutes the time update. The measurement update is