1 LMMSE Estimation and Interpolation of Continuous-Time Signals from Discrete-Time Samples Using Factor Graphs Lukas Bolliger, Hans-Andrea Loeliger, and Christian Vogel Abstract—The factor graph approach to discrete-time linear WGN Gaussianstatespacemodelsiswelldeveloped.Thepaperextends source this approach to continuous-time linear systems/filters that are Z k 13 dwreivtehnenbyobwthaiinteMGAauPss/iManMnSoEis/eL.MByMGSEausessitaimnamteesssoafgethpeasinsipnugt, U(t)(cid:45) (cid:54)(cid:45) Y(t) (cid:81)(cid:81)(cid:114) (cid:45) (cid:106)(cid:63) (cid:45) 20 sviagtnioanl,soorfotfhteheousttpautet,soirgnoafl.thTehoeusetpeusttismigantaelsfmroamy nboeisoybtoabisneerd- WGN f Yk Y˜k linear n with arbitrary temporal resolution. The proposed input signal source system/filter a estimation does not seem to have appeared in the prior Kalman J filtering literature. Fig.1. Systemmodel. 1 2 I. INTRODUCTION T] Consider the system model shown in Fig. 1: a continuous- (linearminimummeansquarederror)estimateofU(t)iswell timelineartime-invariantsystem/filterisfedbyacontinuous- I defined and useful. An example of such an LMMSE estimate s. time signal U(t). The system output Y(t) is sampled (at of U(t) is shown in Fig. 11. The nature of this estimate will c regular or irregular intervals) and the samples are corrupted further be illuminated by reformulating it as a regularized [ by discrete-time additive white Gaussian noise. From the least-squares problem with a penalty term (cid:82) u(t)2dt, as will 1 noisy samples Y˜k, we wish to estimate the clean samples be discussed in Section V. v Y , or the clean signal Y(t) at arbitrary instants t, or the k The assumption that U(t) is white Gaussian noise turns 3 state trajectory of the system, or—of particular interest in this 9 our system model (Fig. 1) into a linear Gaussian model, and paper—the input signal U(t) at arbitrary instants t. We will 7 LMMSE estimation of the state trajectory or of the clean not assume that any of these signals is bandlimited (in the 4 output signal Y(t) amounts essentially to Kalman filtering1 . strict sense required by the sampling theorem); instead, the 1 (orratherKalmansmoothing)[2]–[8].However,estimationof key assumption in this paper is that the given linear system 0 the continuous-time input signal U(t) does not seem to have has a finite-dimensional state space representation. 3 been addressed in the Kalman filtering literature. 1 Problems of this kind are ubiquitous. For example, Fig. 1 Wewillalsoconsidersomeextensionsofthesystemmodel : might model an analog-to-digital converter with a non-ideal v including time-varying systems, vector signals, and systems anti-aliasingfilterandwithquantizationnoiseZ ;indeed,this i k X with internal noise sources. These extensions are required in application is a main motivation for this paper. As another r example, Fig. 1 might model a sensor with some internal some of the motivating applications, but the extensions are a straightforward and standard in Kalman filtering. dynamics which limits its temporal resolution of the desired quantity U(t). In both examples, we are primarily interested We will address these estimation problems (as described in estimating the input signal U(t). above)usingfactorgraphs.Factorgraphs[9]–[12]andsimilar Wewilladdresstheseestimationproblemsunderthefurther graphical models [13]–[16] allow a unified description of assumption that the input signal U(t) is white Gaussian systemmodelsandalgorithmsinmanydifferentfields.Inpar- noise. It might perhaps seem at first that this assumption is ticular, Gaussian message passing in factor graphs subsumes problematic when U(t) is actually the signal of interest, as in discrete-time Kalman filtering and many variations of it [9]– thetwomentionedexamples.However,wewillarguethatthis [12]. The graphical-model approach has facilitated the use assumption is meaningful in such cases and that the LMMSE of these techniques as components in more general inference problems and it has become a mode of teaching discrete-time Lukas Bolliger and Hans-Andrea Loeliger are with the Dept. of Kalman filtering itself. Information Technology and Electrical Engineering, ETH Zurich, CH- 8092 Zurich, Switzerland. Email: [email protected], In this paper, we extend the factor graph approach to [email protected]. continuous-time models with discrete-time observations as Christian Vogel is with the Telecomm. Research Center Vienna (FTW), Donau-City-Strasse 1, A-1220 Vienna, Austria. Email: [email protected]. Anabbreviatedversionofthispaperwaspresentedatthe2010Information 1NotethattheKalman-Bucyfilter[3]addressesthedifferentsituationwhere Theory&Appl.Workshop(ITA),LaJolla,CA,Feb.2010[1]. theobservationsarecontinuous-timesignalsaswell. 2 describedabove.Thisextensionappearstobenew2,anditsig- The system output is the discrete-time signal Y ,Y ,...∈Rν 1 2 nificantlyenlargesthedomainofgraphicalmodels.Wenote,in with particular, that the LMMSE estimates of the continuous-time Y =CX(t ) (2) k k signalsassociatedwithsuchmodels(suchasU(t)andY(t)in Fig.1)becomecomputationalobjectsthatcanbehandledwith where t ,t ,...∈R (with t <t ) are discrete instants of 1 2 k−1 k arbitrary temporal resolution by Gaussian message passing. time and where C ∈Rν×n is known. We will usually observe Applications of the methods of this paper (in addition to only the noisy output signal Y˜ ,Y˜ ,... defined by 1 2 those already mentioned) have been reported in [17] and [18]. In [17], a new method for sampling jitter correction Y˜k =Yk+Zk, (3) is proposed that uses the slope of Y(t), which is available whereZ ,Z ,...(thenoise)areindependentGaussianrandom in the state space model, in an iterative algorithm. In [18], 1 2 variables,eachofwhichtakesvaluesinRν andhasadiagonal a new approach to analog-to-digital conversion is proposed covariance matrix V . which combines unstable analog filters with digital estimation Z The (real and scalar) input signal U(t) will be modeled as of U(t) as proposed in the present paper. Both of these white Gaussian noise, i.e., for t<t(cid:48), the integral applications build on [1] (which does not contain the proofs) and rely on the present paper for a full justification of the (cid:90) t(cid:48) proposed algorithms. Further applications (including beam- U(τ)dτ (4) forming with sensor arrays and Hilbert transforms) will be t reported elsewhere. is a zero-mean Gaussian random variable with variance In summary, this paper σ2(t(cid:48)−t), and any number of such integrals are independent U • extends the factor graph approach to continuous-time random variables provided that the corresponding integration models as in Fig. 1; intervals are disjoint. In consequence, it is appropriate to • extends Kalman smoothing (forward-backward Gaussian replace (1) by message passing) to the estimation of input signals as U(t) in Fig. 1; dX(t)=AX(t)dt+bU(t)dt (5) • provides the necessary background for subsequent work where U(t)dt is a zero-mean Gaussian random variable with such as [17] and [18]. infinitesimal variance σ2 dt. The paper builds on, and assumes some familiarity with, U As stated in the introduction, we will argue later (in the factor graph approach to discrete-time Kalman filtering as Section V) that modeling U(t) as white Gaussian noise is given in [11]. meaningful even when U(t) is a (presumably smooth) signal The paper is structured as follows. The system model is of interest that we wish to estimate. formally stated in Section II and represented in factor graph For any fixed initial state X(t ) = x(t ), equation (5) notation in Section III. State estimation and output signal 0 0 (cid:0) (cid:1) inducesaprobabilitydensityf x(t )|x(t ) overthepossible estimation are then essentially obvious, but some pertinent 1 0 values of X(t ) (where t and t are unrelated to the discrete comments are given in Section IV. Estimation of the input 1 0 1 times {t } in (2)). Specifically, integrating (5) from t=t to signal is discussed in Section V. In Section VI, the estimation k 0 t >t yields algorithmsareillustratedbysomesimplenumericalexamples. 1 0 Anumberofextensionsofthebasicsystemmodelareoutlined (cid:90) T in Section VII, and Section VIII concludes the paper. X(t )=eATX(t )+ eA(T−τ)bU(t +τ)dτ (6) 1 0 0 Thefollowingnotationwillbeused:xdenotesthecomplex 0 conjugate of x; AT denotes the transpose of the matrix A; with T =(cid:52) t −t > 0. If U(t) is white Gaussian noise (with AH =(cid:52) BT denotestheHermitiantransposeofA;I denotesan σ2 asabove1),th0entheintegralin(6)isazero-meanGaussian identity matrix; “∝” denotes equality up to a constant scale raUndom vector with covariance matrix3 [23]–[25] factor; N(m,σ2) or N(m,V) denotes a normal (Gaussian) distributionwithmeanmandvarianceσ2,orwithmeanvector (cid:90) T V =σ2 eA(T−τ)bbT(eA(T−τ))Tdτ (7) m and covariance matrix V, respectively. S U 0 (cid:90) T II. SYSTEMMODEL =σ2 eAτbbT(eAτ)Tdτ. (8) U Let X ∈ Rn be the state of a linear system (as, e.g., in 0 Fig. 1) which evolves in time according to It is thus clear that, for fixed X(t ) = x(t ), X(t ) is a 0 0 1 X˙(t)=AX(t)+bU(t) (1) Gaussian random vector with mean eATx(t0) and covariance matrix V , i.e., S whereX˙ denotesthederivativewithrespecttotimeandwhere both the matrix A∈Rn×n and the vector b∈Rn are known. f(cid:0)x(t1)|x(t0)(cid:1)∝e−12(x(t1)−eATx(t0))TVS−1(x(t1)−eATx(t0)). (9) 2Anotherextensionofgraphicalmodelstocontinuoustimearecontinuous- time Bayesian networks [19]–[21], where the emphasis is on finite-state modelsandapproximateinference.Yetanothersuchextensionis[22],where 3ThiscovariancematrixisessentiallythecontrollabilityGramian.However, linearRLCcircuitsaredescribedintermsoffactorgraphs. controllabilityisnotrequiredinthispaper. 3 (cid:0) (cid:1) f x(t )|x(t ) k+1 k X(tk(cid:45)) (cid:45) (cid:45) X(tk(cid:45)+1) X(tk) (cid:45) X(t(cid:48)) (cid:45) X(tk(cid:45)+1) = = f(cid:0)x(t(cid:48))|x(t )(cid:1) f(cid:0)x(t )|x(t(cid:48))(cid:1) k k+1 ... ... (cid:63) (cid:63) (cid:0) (cid:1) C C aFnigi.n4te.rmSedpilaittteinpgoitnhteinnotdime/efatc(cid:48).tor f x(tk+1)|x(tk) to access the state at Y Y k k+1 (cid:63) (cid:63) Zk(cid:45) + Zk+(cid:45)1 + Each of the factors in Fig. 4 can, of course, be replaced by the corresponding decomposition according to Fig. 3. N(0,VZ) Y˜ =y˜ Y˜ =y˜ Note that the input signal U(t) is not explicitly repre- (cid:63)k k (cid:63)k+1 k+1 sented in Figures 2–4. For the estimation of U(t), we will therefore need another decomposition of the node/factor Fig.2. FactorgraphofthesystemmodelwithobservationsY˜k=y˜k. f(cid:0)x(tk+1)|x(tk)(cid:1). IV. GAUSSIANMESSAGEPASSING,STATEESTIMATION, ANDOUTPUTSIGNALESTIMATION N(0,V ) Having thus obtained a discrete-time factor graph (with an S arbitrary temporal resolution), estimating X(t) or Y(t) from the noisy observations Y˜ =y˜ , Y˜ =y˜ , ... by means of 1 1 2 2 (cid:63) Gaussian message passing is standard and discussed in detail X(t ) X(t ) 0 (cid:45) (cid:45) 1(cid:45) eAT + in[11](cf.also[10]and[12]).Wethereforeconfineourselves toafewgeneralremarks(mostlyexcerptedfrom[10]and[11]) and some additional remarks on message passing through the (cid:0) (cid:1) (cid:0) (cid:1) node/factor f x(t)|x(t ) . Fig. 3. Factor graph of f x(t1) x(t0) according to (6)–(8). (The values 0 | oft0 andt1 arenotrestrictedtothediscretetimes{tk}inFig.2.) A. General Remarks 1) In linear Gaussian factor graphs such as Figures 2–4 III. FACTORGRAPHOFSYSTEMMODEL (where all factors are either Gaussians or linear con- We will use Forney factor graphs (also known as normal straints), all sum-product messages are Gaussians and factor graphs [26]) as in [10] and [11]. The nodes/boxes in sum-product message passing coincides with max- suchafactorgraphrepresentfactorsandtheedgesinthegraph product message passing. Moreover, MAP (maximum represent variables. a posteriori) estimation coincides both with MMSE In this notation, the system model of Section II may (minimum mean squared error) estimation and with be represented by the factor graph shown in Fig. 2. More LMMSE (linear/affine MMSE) estimation. precisely, Fig. 2 represents the joint probability density of the 2) In general, every edge in the factor graph carries two variables in the system model at discrete times t ,t ,.... messages, one in each direction. Since all the edges in 1 2 Note that Fig. 2 shows only a section (from t to t ) of Figures 2–4 are directed (i.e., drawn with an arrow), k k+1 the factor graph; the complete factor graph starts at time t we can unambiguously refer to the forward message 0 →− ←− andendsatsometimetK,anditmaycontainadditionalnodes µX and the backward message µX along the edge to represent any pertinent initial or final conditions. Note also representing some variable X. that,apartfromtheGaussiannodes/factorsf(cid:0)x(t )|x(t )(cid:1) 3) Gaussian messages have the form k+1 k and N(0,VZ), the nodes/boxes in Fig. 2 represent linear µ(x)∝e−21(x−m)TW(x−m); (10) constraints. For details of this factor graph notation, we refer to [11]. they are naturally parameterized by the mean vector As shown in Fig. 2, the function (9) can immediately be m and either the matrix W or the covariance matrix usedasanodeinafactorgraph.However,thefunction(9)can V (= W−1). Degenerate Gaussians, where either W itself be represented by nontrivial factor graphs. A first such or V do not have full rank, are often permitted and factor graph is shown in Fig. 3, which corresponds to (6)–(8). sometimes unavoidable; in such cases, only W or V, Plugging Fig. 3 into Fig. 2 results in a standard discrete-time but not both, are well defined. We will use the symbols →− →− −→ linear Gaussian factor graph as discussed in depth in [11]. m and V (or W ) to denote the parameters of X X X ThefactorgraphofFig.2iseasilyrefinedtoarbitrarytem- the forward message (along some edge/variable X) (cid:0) (cid:1) ←− ←− ←− poralresolutionbysplittingthenode/factorf x(t )|x(t ) and m and V (or W ) for the parameters of the k+1 k X X X as shown in Fig. 4. In this way, both the state X(t) and the backward message. output signal Y(t) = CX(t) become available for arbitrary 4) The natural scheduling of the message computations in →− instants t between t and t . Fig. 2 consists of a forward recursion for µ and k k+1 X(tk) 4 TABLEI If the matrix A is diagonalizable, then the integrals in (I.2) COMPUTATIONRULESFORGAUSSIANMESSAGESTHROUGH NODE/FACTORf(cid:0)x(t1) x(t0)(cid:1)WITHt1>t0. and (I.4) can easily be expressed in closed form. Specifically, | if λ 0 1 X(t0)(cid:45) X(t(cid:45)1) A=Q ... Q−1 (11) 0 λ n (cid:0) (cid:1) f x(t1) x(t0) for some complex square matrix Q, then | (cid:90) t →− −→mX(t1)=eA(t1−t0)−→mX(t0) (I.1) 0 eAτbbTeATτdτ =QΘ(t)QH (12) −→V =eA(t1−t0)−→V eAT(t1−t0) →− X(t1) X(t0) where the square matrix Θ(t) is given by +σ2 (cid:90) t1−t0eAτbbTeATτdτ (I.2) U →− (Q−1b) (Q−1b) (cid:16) (cid:17) (cid:124)0 (cid:123)(cid:122) (cid:125) Θ(t)k,(cid:96) =(cid:52) k (cid:96) e(λk+λ(cid:96))t−1 , (13) Q→−Θ(t1−t0)QH see(12) λk+λ(cid:96) and ←←mV−−X(t0)==ee−−AA((tt11−−tt00))←←mV−−X(t1)e−AT(t1−t0) (I.3) (cid:90) te−AτbbTe−ATτdτ =Q←Θ−(t)QH (14) X(t0) X(t1) 0 +σ2 (cid:90) t1−t0e−AτbbTe−ATτdτ (I.4) with U (cid:124)0 Q←Θ−(t1(cid:123)−(cid:122)t0)QH see((cid:125)14) ←Θ−(t)k,(cid:96) =(cid:52) (Q−1λb)k+(Qλ−1b)(cid:96) (cid:16)1−e−(λk+λ(cid:96))t(cid:17). (15) k (cid:96) uˆ(t)=σU2bT(cid:16)−→VX(t)+←V−X(t)(cid:17)−1(cid:0)←m−X(t)−−→mX(t)(cid:1) Note that, in (13) and (15), (Q−1b)k denotes the k-th compo- (I.5) nent of the vector Q−1b. The proof of (12) and (14) is given in Appendix A. The remaining entry (I.5) in Table I is Theorem 1 of the next section. ←− anindependentbackwardrecursion4for µ .Bothof ←− X(tk) V. INPUTSIGNALESTIMATIONAND these recursions use the messages µ with parameters ←m− = y˜ and ←W− = V−1 (assuYmking Y˜ = y˜ is REGULARIZED-LEAST-SQUARESINTERPRETATION knYokwn; ifkY˜ is notYokbservezd/unknown, thenk←W− k=0 We now turn to estimating the input signal U(t) and to ←− k Yk clarifying its meaning. To this end, we need the factor graph and µYk(yk)=1). representation of f(cid:0)x(t )|x(t )(cid:1) that is shown in Fig. 5, 5) Since the factor graph in Fig. 2 has no cycles, the 1 0 whichcorrespondstothedecompositionof(6)intoN discrete a posteriori distribution of any variable X (or Y, Z, →− ←− steps and where T =(cid:52) t − t . Note that this factor graph ...) in the factor graph is the product µX(x)µX(x) of 1 0 (cid:0) (cid:1) is only an approximate representation of f x(t )|x(t ) , but the corresponding two messages, up to a scale factor. 1 0 the representation becomes exact in the limit N → ∞. The The parameters of this marginal distribution are m −→ ←− X variables U˜(t) in Fig. 5 are related to U(t) by and W given by W = W +W and W m = −W→ →−mX+←W− ←m− . X X X X X N (cid:90) t X X X X U˜(t)= U(τ)dτ, (16) 6) Tabulated message computation rules (in particular, Ta- T t−T/N bles2–4of[11])allowtocomposeavarietyofdifferent i.e., U˜(t) is the average of U(t) over the corresponding inter- algorithms to compute the same sum-product messages. val. The proof of this decomposition is given in Appendix B. The variety arises from different parameterizations of For finite N, Fig. 5 is a standard linear Gaussian factor the messages and from local manipulations (including graph in which snapshots U˜(t) of U(t) according to (16) splitting and grouping of nodes) of the factor graph. appear explicitly and can therefore be estimated by standard Gaussian message passing. In the resulting expression for the B. Message Passing Through f(cid:0)x(t1)|x(t0)(cid:1) estimate of U˜(t), we can take the limit N → ∞ and thus Gaussian message passing through the node/factor obtain an estimate of U(t). (cid:0) (cid:1) f x(t1)|x(t0) is summarized in Table I. Both the forward Theorem 1. The MAP/MMSE/LMMSE estimate of U(t) message (with parameters (I.1) and (I.2)) and the backward from observations Y˜ =y˜ according to the system model of k k message (with parameters (I.3) and (I.4)) are easily obtained Section II is from Fig. 3, (7) and (8), and Tables 2 and 3 of [11]. uˆ(t)=σ2bT(cid:16)→−V +←V− (cid:17)−1(cid:0)←m− −→−m (cid:1) (17) U X(t) X(t) X(t) X(t) 4The backward recursion is required for smoothing, i.e., noncausal esti- →− →− ←− ←− mation or estimation with delay; it is absent in basic Kalman filtering as where m , V , and m , V are the parameters X(t) X(t) X(t) X(t) in [2]. In fact, while the backward recursion is obvious from the graphical- of the Gaussian sum-product messages as discussed in Sec- modelperspective,itsdevelopmentinthetraditionalapproachwasnotquite soobvious,cf.[8]. tion IV. (cid:50) 5 N(cid:16)0,σU2N(cid:17) N(cid:16)0,σU2N(cid:17) Using [11, eq. (II.10)], we have T T ←m− =←m− −→−m (25) U˜(cid:48)(t) X(t) X(cid:48)(t) ←− →− ≈ m −m , (26) X(t) X(t) U˜(t + T) U˜(t ) 0 N (cid:63) 1 (cid:63) and using [11, eq. (II.8)], we have ←− (cid:16)←− (cid:17)−1 bT bT W = V (27) N N U˜(cid:48)(t) U˜(cid:48)(t) (cid:16)→− ←− (cid:17)−1 = V + V (28) X(t ) (cid:63) (cid:63) X(t ) X(cid:48)(t) X(t) 0 (cid:45)eANT (cid:45)+ ... (cid:45)eANT (cid:45)+ (cid:45)1 (cid:16)→− ←− (cid:17)−1 ≈ V + V . (29) X(t) X(t) Again, the approximations (26) and (29) both become exact (cid:0) (cid:1) Fig.5. Decompositionofthenode/factorf x(t1) x(t0) intoN discrete in the limit N →∞. Inserting (26) and (29) into (24) yields | timesteps(withT =t1 t0).Thisrepresentationisexactonlyinthelimit N →∞. − lim m =σ2bT(cid:16)→−V +←V− (cid:17)−1(cid:0)←m− −→−m (cid:1). U˜(t) U X(t) X(t) X(t) X(t) N→∞ (30) N(cid:16)0,σU2N(cid:17) The mean of the a posteriori probability of U˜(t) is thus well T defined even for N →∞ and given by (30), and the theorem follows. (cid:50) While we have thus established that the mean (30) of the U˜(t) a posteriori distribution of U˜(t) is well-defined for N →∞, (cid:63) it should be pointed out that the variance of this distribution bT N is infinite: taking the limit N →∞ of (19) yields WU˜(t) =0. However, this seemingly problematic result does not imply U˜(cid:48)(t) that the estimate (17) is useless; it simply reflects the obvious (cid:63) fact that white noise cannot be fully estimated from discrete (cid:45) (cid:45) + noisy samples. X(cid:48)(t) X(t) The nature of the estimate (17) is elucidated by the follow- ing theorem, which reformulates the estimation problem of Fig.6. FactorgraphusedintheproofofTheorem1. this paper as an equivalent regularized least-squares problem. For the sake of clarity, we here restrict ourselves to scalar Proof: Consider the factor graph in Fig. 6, which shows observations Yk where ν = 1, c =(cid:52) C is a row vector, and the relevant part of Fig. 5 with suitably named variables. We σZ2 =(cid:52) VZ is a scalar. (The general case is given in [25].) determine the mean m and the variance W−1 of the a U˜(t) U˜(t) Theorem 2. Assume that the factor graph in Fig. 2 consists posteriori distribution of U˜(t) as follows. From [11, eq. (54) of K sections between t and t (with observations starting 0 K and (III.5)], we have at t ) and assume that the observations Y are scalars. Then 1 k −→ ←− the estimated pair (cid:0)uˆ(t),xˆ(t)(cid:1) with uˆ(t) as in (17) minimizes W =W +W (18) U˜(t) U˜(t) U˜(t) =σU−2NT +(cid:18)NT (cid:19)2bT←W−U˜(cid:48)(t)b. (19) σ12 (cid:90) tKuˆ(t)2dt+ σ12 (cid:88)K (cid:0)y˜k−cxˆ(tk)(cid:1)2 (31) U t0 Z k=1 From [11, eq. (55)], we then have subject to the constraints of the system model. (cid:50) −→ →− ←− ←− WU˜(t)mU˜(t) =WU˜(t)mU˜(t)+WU˜(t)mU˜(t); (20) Proof: Recall the factor graph representation of a least →− squares problem as in Fig. 7, where the large box on top ex- inserting m =0 and using [11, eq. (III.6)] yields U˜(t) pressesthegivenconstraints.Clearly,maximizingthefunction W m = T bT←W− ←m− . (21) represented by Fig. 7 amounts to computing U˜(t) U˜(t) N U˜(cid:48)(t) U˜(cid:48)(t) n n Using (19) and (21), we obtain argmax(cid:89)e−zk2/(2σk2) =argmin(cid:88)zk2/σk2 (32) m =(W )−1(cid:16)W m (cid:17) (22) z1,...,zn k=1 z1,...,zn k=1 U˜(t) U˜(t) U˜(t) U˜(t) subject to the constraints. The right-hand side of (32) will =(cid:18)σ−2+ T bT←W− b(cid:19)−1bT←W− ←m− (23) be called “cost function.” Recall that sum-product message U N U˜(cid:48)(t) U˜(cid:48)(t) U˜(cid:48)(t) passing in cycle-free linear Gaussian factor graphs maximizes ≈σ2bT←W− ←m− (24) the left-hand side of (32) (subject to the constraints) and thus U U˜(cid:48)(t) U˜(cid:48)(t) minimizes the cost function [11]. and the approximation (24) becomes exact in the limit Now plugging Fig. 5 into the factor graph in Fig. 2 results N →∞. in a factor graph as in Fig. 7 with cost function 6 dB N =4 constraints 0 N =6 Z Z Z 1 2 n 50 − ... N(0,σ2) N(0,σ2) 1 n 100 Fig. 7. Factor graphs of a least squares problem used in the proof of − Theorem2. 100 101 f/fc Fig.8. Frequencyresponse(magnitude)ofthefiltersusedinSectionVI. K (cid:32) N (cid:33) (cid:88) z2/σ2 +(cid:88)u˜(cid:16)t +(cid:96)Tk(cid:17)2 Tk k Z k−1 N σ2N k=1 (cid:96)=1 U (cid:88)K (cid:32) 1 (cid:90) tk (cid:33) 0.8 = z2/σ2 + u(t)2dt , (33) k Z σ2 k=1 U tk−1 0.0 which is (31). (cid:50) AccordingtoTheorem2,minimizing(31)ismathematically 0.8 − equivalent to the statistical estimation problems of this paper; in particular, modeling U(t) as white Gaussian noise amounts Fig.9. EstimationofoutputsignalY(t)fromnoisysamplesy˜k (fatdots) atSNR=10dB.Solidline:estimateofY(t)atcorrectSNR.Dashedline: to regularizing the second term in (31) by penalizing power estimationwithassumedSNR100dB;dottedline:estimationwithassumed in uˆ(t). SNR-10dB. The functional (31) is amenable to an informal frequency- domain analysis that considers the relative power in the different frequences of the input signal uˆ(t). In particular, the for the 4th-order filter and estimateuˆ(t)fitsthecorrespondingoutputsignalyˆ(t)=cxˆ(t) σ2 SNR≈ Uf ·2.023 (35) to the observations y˜k preferably by those frequencies that σ2 c Z appear with little damping in the output signal. Since the for the 6th-order filter. We will measure the SNR in dB (i.e., transferfunctionfromU(t)toY(t)=cX(t)ofthesystem(1) 10·log (SNR)). is necessarily a (non-ideal) low-pass filter, the estimate uˆ(t) 10 In some of these plots, the estimator deliberately assumes willcontainlittleenergyinveryhighfrequencies.Inthisway, an incorrect SNR, i.e., an incorrect ratio σ2/σ2, in order to the spectrum of uˆ(t) is shaped by the transfer function of the U Z illustrate the effect of this ratio on (31). linear system. We also note5 that the problem of minimizing (31) may be Estimation of the output signal Y(t) is illustrated in Fig. 9. In this example, the linear system is a Butterworth filter of viewed as an offline control problem where an input signal u(t) is to be determined such that the resulting sampled order 6. The noisy samples y˜k are created with fs = 10fc outputsignaly ,y ,...followsadesiredtrajectoryy˜ ,y˜ ,.... at an SNR of 10 dB. The corresponding estimate of Y(t) is 1 2 1 2 shown as solid line in Fig. 9. However,exploringthisconnectiontocontroltheoryisbeyond the scope of this paper. Also shown in Fig. 9 is the effect of estimating with an incorrect SNR, i.e., of playing with the ratio σ2/σ2 as U Z VI. NUMERICALEXAMPLES mentionedabove.EstimatingwithanassumedSNRthatistoo high results in overfitting; estimating with an assumed SNR We illustrate the estimators of this paper by some simple that is too low reduces the amplitude of the estimated signal. numerical examples. In all these examples, the output signal Fig. 10 shows the effect of f /f on the normalized esti- Y(t) is scalar, we use regular sampling at rate f , i.e., s c s mation error Yk =Y(k/fs),andthelinearsysteminFig.1isaButterworth E(cid:104)(Yˆ −Y )2(cid:105) lowpassfilteroforder4or6withcut-offfrequency(-3dBfre- k k SNR−1 =(cid:52) (36) quency) fc [27]. The amplitude response (i.e., the magnitude out E[Y2] k of the frequency response) of these filters is plotted in Fig. 8. for a Butterworth filter of order 4. For high SNR, we clearly Intheseexamples,weusethesignal-to-noiseratio(SNR)as see a critical “Nyquist region” where severe undersampling discussedinAppendixC.Using(57),theSNRofthediscrete- sets in. For large f /f , the estimate improves by about time observations turns out to be s c 2.62 dB with every factor of 2 in f /f , which is less than σ2 s c SNR≈ Uf ·2.052 (34) what would be expected (viz., 3 dB) for strictly bandlimited σ2 c Z signals [28], [29]. Estimation of the input signal U(t) is illustrated in Fig. 11, 5ThiswaspointedouttotheauthorsbyAndrewSingeroftheUniversity ofIllinoisatUrbana-Champain. for exactly the same setting (with the same discrete-time 7 45 A. Additional Spectral Shaping The estimate (17) of the input signal U(t) is marked by an 40 implicit spectral shaping (cf. the discussion after Theorem 2). 35 20dB It may sometimes be desirable, however, to control the spectrumoftheestimatemoreexplicitly.Thiscanbeachieved 30 by assuming that the input signal U(t) is not white Gaussian 10dB noise, but white Gaussian noise passed through a suitable t25 (finite-dimensional) linear prefilter. The estimation of U(t) is u o R easily adapted to this case by including the prefilter in the N S 20 0dB system model. In contrast to unfiltered-input estimation as in Section V, 15 estimation of a filtered input signal by means of Kalman 10dB filtering/smoothing is standard. 10 − 5 20dB B. Time-Varying and Affine Systems − In some applications, the dynamics of the system/filter 0 in Fig. 1 may change at discrete instants in time (but it is 2.0 10.0 100.0 fs/fc always known). This situation occurs, e.g., when the analog Fig.10. Empiricalestimationerror(36)vs.normalizedsamplingfrequency system/filter is subject to digital control. An example of such fs/fc,parameterizedbytheSNR(54),foraButterworthfilteroforder4. a case is given in [18]. We thus generalize the system model (5) and (2) to (cid:0) (cid:1) dX(t)= A X(t)+b U(t)+h dt (37) k k k 0.8 and Y =C X(t ), (38) k k k 0.0 which holds for t ≤t<t , where A and C are known k k+1 k k matrices, and where b and h are known column vectors. 0.8 k k − If h = 0, both the factor graph representations and the k 103 message computation rules remain unchanged except for the addition of subscripts to the involved matrices and vectors. The case h (cid:54)=0 is included below. 0 k C. Multiple Inputs and Internal Noise 103 − We are also interested in the case where the system/filter 0.3 in Fig. 1 has internal noise sources. (Again, a main mo- tivation are analog-to-digital converters, where the noise in 0.0 the analog part cannot be neglected.) Such internal noise can be handled mathematically by extending the input signal (cid:0) (cid:1)T 0.3 U(t) to a vector U(t) = U1(t),U2(t),... , where the − first component, U (t), is the actual input signal while the 1 remaining components model the internal noise. For t < t(cid:48), Fig.11. Inputsignalestimationforthesamecases(andthesametimescale) asinFig.9.Thesolidline(top)isthecorrectMMSE/LMMSEestimateof the integral U(t). (cid:90) t(cid:48) U(τ)dτ (39) t is a zero-mean Gaussian random vector with diagonal covari- observations y˜k) as in Fig. 9. The power and the spectral ance matrix σU2I(t(cid:48)−t). The corresponding generalization of content for the three different plots in Fig. 11 illustrate the (5) is effect of the ratio σ2/σ2 on (31). dX(t)=(cid:0)AX(t)+BU(t)+h(cid:1)dt, (40) U Z where B is a matrix of suitable dimensions and where we VII. EXTENSIONS haveincludedaconstantoffseth(acolumnvector)asin(37). Note that power differences and correlations among the input We briefly mention a number of extensions and modifi- signals can be expressed by a suitable matrix B. cations of the system model that are required in some of The corresponding generalization of Table I is shown in the motivating applications and are easily incorporated in the Table II. The proofs are straightforward modifications of the estimation algorithms. proofs of Table I and are omitted. 8 TABLEII GENERALIZATIONOFTABLEITO(40). N(cid:16)0,σU2T(cid:17) N(cid:16)0,σU2T(cid:17) N N X(t0)(cid:45) X(t(cid:45)1) U˜(t + T) U˜(t ) (cid:63) 0 N (cid:63) 1 (cid:63) (cid:0) (cid:1) f x(t1)|x(t0) bT bT bT N N N −→mX(t1)=eA(t1−t0)−→mX(t0)+A−1(cid:0)eA(t1−t0)−I(cid:1)h (II.1) (cid:63) (cid:63) −→VX(t1)=eA(t1−t0)−→VX(t0)eAT(t1−t0) (cid:45)eANT (cid:45)+ ... (cid:45)eANT (cid:45)+ +σ2 (cid:90) t1−t0eAτBBTeATτdτ (II.2) U 0 X(t ) (cid:63) X(t ) (cid:124) (cid:123)(cid:122) (cid:125) 0 (cid:45) (cid:45) (cid:45)1 Q→−Θ(t1−t0)QH see(41) eAT + ←m−X(t0)=e−A(t1−t0) (cid:16)←m−X(t1)−A−1(cid:0)eA(t1−t0)−I(cid:1)h(cid:17) (II.3) Fig.12. Decompositionofthenode/factorf(cid:0)x(t1)|x(t0)(cid:1)intoN discrete timestepsaccordingto(53). ←V− =e−A(t1−t0)←V− e−AT(t1−t0) X(t0) X(t1) +σ2 (cid:90) t1−t0e−AτBBTe−ATτdτ (II.4) U (cid:124)0 (cid:123)(cid:122) (cid:125) APPENDIXA Q←Θ−(t1−t0)QH see(42) PROOFOF(12)AND(14) uˆ(t)=σU2BT(cid:16)−→VX(t)+←V−X(t)(cid:17)−1(cid:0)←m−X(t)−−→mX(t)(cid:1) Let (II.5) λ 0 1 Λ=(cid:52) ... . (44) 0 λ n IfthematrixAisdiagonalizableasin(11),thentheintegrals in (II.2) and (II.4) can be written as stated in the table where From (11), we have →− ←− the square matrices Θ(t) and Θ(t) are given by →− ψ (cid:16) (cid:17) eAτ =QeΛτQ−1 (45) Θ(t)k,(cid:96) =(cid:52) k,(cid:96) e(λk+λ(cid:96))t−1 (41) λ +λ k (cid:96) and and by ←− ψ (cid:16) (cid:17) eATτ =(eAτ)T =(eAτ)H =(Q−1)HeΛτQH, (46) Θ(t)k,(cid:96) =(cid:52) k,(cid:96) 1−e−(λk+λ(cid:96))t , (42) λ +λ k (cid:96) and thus respectively, and where ψ is the entry in row k and column k,(cid:96) (cid:96) of the matrix (cid:90) t (cid:18)(cid:90) t (cid:19) eAτbbTeATτdτ =Q eΛτΨeΛτdτ QH (47) Ψ=(cid:52) Q−1B(Q−1B)H. (43) 0 0 D. Nonlinearities with Mild nonlinearities in the system/filter in Fig. 1 can often Ψ=(cid:52) Q−1b(Q−1b)H. (48) be handled by extended Kalman filtering [7], [8], i.e., by iterative estimation using a linearized model based on a The element in row k and column (cid:96) of the matrix under the tentative estimate of the state trajectory X(t). integral is VIII. CONCLUSIONS (cid:16) (cid:17) We have pointed out that exact models of continuous-time eΛτΨeΛτ =ψk,(cid:96)e(λk+λ(cid:96))τ, (49) k,(cid:96) linear systems driven by white Gaussian noise can be used indiscrete-timefactorgraphs.Theassociatedcontinuous-time where ψk,(cid:96) refers to the elements of the matrix Ψ, and signalsthenbecomecomputationalobjectsthatcanbehandled elementwise integration yields with arbitrary temporal resolution by discrete-time Gaussian message passing. (cid:18)(cid:90) teΛτΨeΛτdτ(cid:19) = ψk,(cid:96) (cid:16)e(λk+λ(cid:96))t−1(cid:17), (50) Motivated by applications such as dynamical sensors and λ +λ 0 k,(cid:96) k (cid:96) analog-to-digital converters, we have been particularly inter- ested in estimating the input signal, which does not seem to whichproves(12).Theproofof(14)followsfromnotingthat have been addressed in the prior Kalman filtering literature. changing eAτ into e−Aτ amounts to a sign change of Λ. 9 REFERENCES (cid:45)eANT (cid:45)eANT ... (cid:45)eANT (cid:45) [1] L.Bolliger,H.-A.Loeliger,andC.Vogel,“Simulation,MMSEestima- tion,andinterpolationofsampledcontinuous-timesignalsusingfactor graphs,” 2010 Information Theory & Applications Workshop, UCSD, LaJolla,CA,USA,Jan.31–Feb.5,2010. Fig.13. DecompositionofeAT intoN sections. [2] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Trans. ASME, Series D, J. of Basic Eng., vol. 82, 1960, pp.35–45. [3] R. E. Kalman and R. S. Bucy, “New results in linear filtering and APPENDIXB prediction theory,” Trans. ASME, Series D, J. of Basic Eng., vol. 83, PROOFOFTHEDISCRETE-TIMEDECOMPOSITIONINFIG.5 1961,pp.95–107. [4] B. D. O. Anderson and J. B. Moore, Optimal Filtering. Prentice Hall, We split the integral (6) into N parts, each of width T/N NJ,1979. with T =(cid:52) t1−t0: [5] R. J. Meinhold and N. D. Singpurwalla, “Understanding the Kalman filter,”AmericanStatistician,vol.37,no.2,pp.123–127,May1983. X(t1)=eATX(t0) [6] S.Haykin,AdaptiveFilterTheory,3rded.PrenticeHall,NJ,1996. N (cid:90) kT/N [7] M.S.GrewalandA.P.Andrews,KalmanFiltering:TheoryandPractice +(cid:88) eA(T−τ)bU(t +τ)dτ (51) UsingMATLAB.2nded.,Wiley2001. 0 [8] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation. Prentice k=1 (k−1)T/N Hall,NJ,2000. ≈eATX(t ) [9] F.R.Kschischang,B.J.Frey,andH.-A.Loeliger,“Factorgraphsandthe 0 sum-productalgorithm,”IEEETrans.Inform.Theory,vol.47,pp.498– (cid:88)N (cid:90) kT/N 519,Feb.2001. + eA(T−kT/N)b U(t0+τ)dτ (52) [10] H.-A. Loeliger, “An introduction to factor graphs,” IEEE Signal Proc. k=1 (k−1)T/N Mag.,Jan.2004,pp.28–41. [11] H.-A.Loeliger,J.Dauwels,JunliHu,S.Korl,LiPing,andF.R.Kschis- =eATX(t ) 0 chang, “The factor graph approach to model-based signal processing,” N ProceedingsoftheIEEE,vol.95,no.6,pp.1295–1322,June2007. +(cid:88)eA(T−kT/N)bT U˜(t +kT/N), (53) [12] H.Wymeersch,IterativeReceiverDesign.CambridgeUniversityPress, N 0 2007. k=1 [13] M. I. Jordan, “Graphical models,” Statistical Science, vol. 19, no. 1, where the approximation (52) becomes exact in the limit pp.140–155,2004. N →∞ and where U˜(t) is defined as in (16). The factor [14] Ch.M.Bishop,PatternRecognitionandMachineLearning.NewYork: SpringerScience+BusinessMedia,2006. graph of (53) is shown in Fig. 12. [15] M. J. Wainwright and M. I. Jordan, “Graphical Models, Exponential ThetermeAT canalsobedecomposedintoN discretesteps Families,andVariationalInference,”FoundationsandTrendsinMachine Learning,vol.1,no.1–2,pp.1–305,2008. as shown in Fig. 13. Plugging Fig. 13 into Fig. 12 yields a [16] D.KollerandN.Friedman,ProbabilisticGraphicalModels.Cambridge, factor graph that is easily seen to be equivalent to Fig. 5. MA,MITPress,2009. [17] L.BolligerandH.-A.Loeliger,“Samplingjittercorrectionusingfactor graphs,”Proc.2011Europ.SignalProc.Conf.(EUSIPCO),Barcelona, APPENDIXC Spain,Aug.29–Sept.2,2011. ONSNR [18] H.-A. Loeliger, L. Bolliger, G. Wilckens, and J. Biveroni, “Analog-to- digital conversion using unstable filters,” 2011 Information Theory & For the system model of Section II, we may wish to relate ApplicationsWorkshop,UCSD,LaJolla,CA,USA,Feb.6–11,2011. the input noise power σ2 to the signal-to-noise ratio (SNR) [19] U.Nodelman,C.R.Shelton,andD.Koller,“Continuous-timeBayesian U of the discrete-time observations. For the sake of clarity, we networks,Proc.18thConf.onUncertaintyinArtificialIntell.,pp.378– 387,2002. restrict ourselves to scalar observations Y , i.e., ν =1, c=(cid:52) C k [20] T.El-Hay,I.Cohn,N.Friedman,andR.Kupferman,“Continuous-time is a row vector, and σ2 =(cid:52) V is a scalar. In addition, we beliefpropagation,”Proc.27thInt.Conf.onMachineLearning,Haifa, Z Z assumethatthecontinuous-timelinearsystemistime-invariant Israel,June21–24,2010. [21] I.Cohn,T.El-Hay,N.Friedman,andR.Kupferman,“Meanfieldvari- and stable and any initial conditions can be neglected. In this ational approximation for continuous-time Bayesian networks,” J. Ma- case, we define chineLearningResearch,vol.11,pp.2745–2783,2010. E(cid:2)Y2(cid:3) [22] P. O. Vontobel and H.-A. Loeliger, “Factor graphs and dynamical SNR=(cid:52) k (54) electrical networks,” Proc. 2003 IEEE Information Theory Workshop, σ2 Z Paris,France,March31–April4,2003,pp.218–221. [23] H.GarnierandL.Wang,IdentificationofContinuous-TimeModelsfrom which (under the stated assumptions) is independent of k. We SampledData.SpringerVerlag2008. then have E(cid:2)Y2(cid:3)=c→−V cT (55) [24] TS.priSno¨gdeerrsVtero¨rlmag,,D20is0c2r.ete-Time Stochastic Systems, 2nd ed. London: k X(∞) [25] L.Bolliger,DigitalEstimationofContinuous-TimeSignalsUsingFactor with Graphs.PhDthesisno.20123atETHZurich,2012. →− (cid:90) t [26] G.D.Forney,Jr.,“Codesongraphs:normalrealizations,”IEEETrans. V =(cid:52) σ2 lim eAτbbTeATτdτ (56) Inform.Theory,vol.47,no.2,pp.520–548,2001. X(∞) U t→∞ 0 [27] A. Oppenheim and A. Willsky, Signals and Systems, 2nd ed. Prentice Hall,1996. from (I.2); if, in addition, the system is diagonalizable as in [28] W.R.Bennet,“Spectraofquantizedsignals,”BellSyst.Techn.J.,vol.27, (11), then E(cid:2)Y2(cid:3)=σ2cQ→−Θ(∞)QHcT (57) [29] Npp..T4.4T6h–a4o72a,ndJuMly.1V9e4tt8e.rli,“Lowerboundonthemean-squarederrorin k U oversampled quantization of periodic signals using vector quantization →− where Θ(∞) is a square matrix with entries analysis,” IEEE Trans. Inform. Theory, vol. 42, no. 2, pp. 469–479, March1996. →− (Q−1b) (Q−1b) Θ(∞) =− k (cid:96) (58) k,(cid:96) λ +λ k (cid:96)