ebook img

On well-posedness of Bayesian data assimilation and inverse problems in Hilbert space PDF

0.19 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On well-posedness of Bayesian data assimilation and inverse problems in Hilbert space

ON WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS IN HILBERT SPACE 7 1 IVANKASANICKÝANDJANMANDEL 0 2 Abstract. Bayesian inverse problem on an infinite dimensional separable n Hilbertspacewiththewholestateobservediswellposedwhenthepriorstate a distributionisaGaussianprobabilitymeasureandthedataerrorcovarianceis J acylindricalGaussianmeasurewhosecovariance haspositivelowerbound. If 8 thestatedistributionandthedatadistributionareequivalentGaussianprob- 2 ability measures, then the Bayesian posterior measure is not well defined. If thestatecovarianceandthedataerrorcovariancecommute,thentheBayesian ] posterior measure is well defined for all data vectors if and only if the data R errorcovariancehaspositivelowerbound,andthesetofdatavectorsforwhich P the Bayesian posterior measure is not well defined is dense if the data error . covariance doesnothavepositivelowerbound. h Keywords: Whitenoise,Bayesianinverseproblems,cylindricalmeasures t a AMS Subject Classification: 65J22,62C10,60B11 m [ 1 v 1. Introduction 8 Data assimilation and the solution of inverse problems on infinite dimensional 9 spacesareofinterestasalimitcaseofdiscreteproblemsofanincreasingresolution 2 8 and thus increasing dimension. Computer implementation is by necessity finite 0 dimensional,yetstudying adiscretizedproblem(suchas afinite difference orfinite 1. elementdiscretization)asanapproximationofaninfinite-dimensionalone(suchas 0 a partial differential equation) is a basic principle of numerical mathematics. This 7 principle has recently found use in Bayesianinverseproblems as well, for muchthe 1 same reason; important insights in high-dimensional probability are obtained by : v consideringitinthe lightofinfinite dimension. See[19,20]foranintroductionand i an extensive discussion. X Bayesiandataassimilationandinverseproblemsarecloselylinked;thepriordis- r a tributionacts as a regularizationandthe maximum aposterioriprobability (MAP) delivers a single solution to the inverse problem. In the Gaussian case, the prior becomes a type of Tikhonov regularization and the MAP estimate is essentially a regularized least squares solution. Since there is no Lebesque measure in an in- finite dimension, the standard probability density does not exist, and the MAP estimate needs to be understood in a generalized sense [5, 7]. However, unlike in a finite dimension, even the simplest problems are often ill- posed when the data is infinite dimensional. It is often assumed that the data likelihood is such that the problem is well posed, or, more specifically, that the data space is finite dimensional, e.g., [5, 7, 9, 10, 12, 19, 20]. Well-posedness of the infinite dimensional problem affects the performance of stochastic filtering al- gorithms for finite dimensional approximations; it was observed computationally SupportedpartiallybytheCzechScienceFoundationgrant13-34856SandNSFgrant1216481. 1 WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS 2 [3, Sec. 4.1] that the performance of the ensemble Kalman filter and the particle filter does not deteriorate with increasing dimension when the state distribution approachesa Gaussianprobability measure,but the curse ofdimensionality sets in when the state distribution approaches white noise. A related theoretical analysis was recently developed in [1]. Itwasnotedin[16]thatBayesianfilteringiswelldefinedonlyforsomevaluesof observationswhen the data space is infinite dimensional. In [1], necessaryand suf- ficientconditions were givenin the Gaussiancasefor the Bayesianinverseproblem tobewellposedforalldatavectorsa.s. withrespecttothedatadistribution,which wasunderstoodasaGaussianmeasureonalargerspacethanthegivendataspace. However, in the typical case studied here, such random data vector is a.s. not in thegivendataspace,sothe conditionsin[1]arenotinformativeinoursetting. See Remark 9 for more details. In this paper, we study perhaps the simplest case of a Bayesianinverseproblem on an infinite dimensional separable Hilbert space: the whole state is observed, the observation operator (the forward operator in inverse problems nomenclature) is identity, and both the state distribution and the data error distribution (which enters in the data likelihood) are Gaussian. The state distribution is (a standard, σ-additive)Gaussianmeasure,but the dataerrordistributionis allowedto be only a weak Gaussian measure [2], that is, only a finitely additive cylindrical measure [18]. This way, we may give up the σ-additivity of the data error distribution, but the data vectorsarein the givendata space. Weak Gaussianmeasureis σ-additive ifandonlyifitscovarianceishasfinitetrace. Whitenoise,withcovariancebounded away from zero on infinite dimensional space, is an example of a weak Gaussian measure, which is not σ-additive. It is straightforward that when the data error covariance has positive lower bound, then the least squares, Kalman filter, and the Bayesian posterior are all well defined (Theorems 2, 3, and 6). The main results of this paper consist of the study of the converse when the state is a Gaussian measure: (1) Example 1: If the state covariance and the data error covariance are the same operator with finite trace, then the least squares are not well posed for some data vectors. (2) Example 4: If the state distribution and the data error distribution are equivalent Gaussian measures on infinite dimensional space, then the pos- terior measure is not well defined. (3) Theorem7: Ifthe statecovarianceandthedataerrorcovariancecommute, then the posteriormeasure is well defined for all data vectorsif and only if the data error covariance has positive lower bound. (4) Corollary 8: If the state covariance and the data covariance commute and the data covariance does not have positive lower bound, then the set of vectors for which the posterior measure is not well defined is dense. The paper is organized as follows. In Section 2, we recall some background and establish notation. The well-posedness of data assimilation as a least squares problem is considered in Section 3.1, the well-posedness of Kalman filter formulas in Section 3.2, and the well-posedness of the Bayesiansetting in terms of measures in Section 4. WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS 3 2. Notation We denote by H a separable Hilbert space with a real-valued inner product denoted by hu,vi and the norm |u| = hu,ui. We assume that H has infinite dimension, though all statements hold in finite dimension as well. We denote by p [H] the space of all bounded linear operators from H to H. We say that R ∈ [H] has positive lower bound if hRu,ui≥αhu,ui=α|u|2 for some α > 0 and all u ∈ H. We write R > 0 when R ∈ [H] is symmetric, i.e., R=R∗,whereR∗ denotestheadjointoperatortoR,andhaspositivelowerbound. The operator R∈[H] is positive semidefinite if hRu,ui≥0 for all u ∈ H, and we use the notation R ≥ 0 when R is symmetric and positive semidefinite. We say that R≥0 is a trace class operator if ∞ TrR= hRe ,e i<∞ i i i=1 X where {e } is a total orthonormal set in H. TrR does not depend on the choice of i {e }. i We denote by L(H) the space of all random variables on H, i.e., if X ∈ L(H), then X is a measurable mapping from a probability space (Ω,A,P) to (H,B(H)) where B(H) denotes Borel σ-algebra on H. A weak random variable W ∈L (H) w is a mapping W : (Ω,A,P)→(H,C(H)), where (Ω,A,P) is a general probability space, and C(H) denotes an algebra of cylindrical sets on H, such that: (1) for all D ∈C(H), it holds that W−1(D)∈A, and (2) for any n∈N and any e ,...,e ∈H the mapping 1 n V : ω ∈Ω→(he ,W(ω)i,...,he ,W (ω)i)∗ ∈Rn 1 n is measurable, i.e., V is an n-dimensional real random vector. We denote by L (H) the space of all weak random variables on H. Obviously, w when dim(H) < ∞, then L (H) = L(H), i.e., weak random variables are in- w teresting only if the dimension of the state is infinite. A weak random variable W∈L (H) has weak Gaussian distribution (also called a cylindrical measure), de- w noted W ∼ N (m,R), where m ∈ H is the mean of W, and R ∈ [H],R ≥ 0, is the covariance of W if, for any finite set {e ,...,e } ⊂ H, the random vector 1 n (he ,Wi,...,he ,Wi)∗ has multivariate Gaussian distribution N (µ,Σ) with 1 n µ=(Re ,...,Re )∗ ∈Rn 1 n and covariance matrix Σ∈Rn×n, [Σ] =he ,Re i, i,j i j where[Σ] denotestheelementofthematrixΣintheith rowandthe jth column. i,j It can be shown that W ∼N (m,R) is measurable, i.e., W ∈L(H), if and only if the covariance R is trace class, e.g., [2, Theorem 6.2.2]. WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS 4 For further background on probability on infinite dimensional spaces and cylin- drical measures see, e.g., [4, 6, 21]. 3. Data assimilation Suppose that X(t) is a dynamical system defined on a separable Hilbert t∈N space H. Data assimilation uses observations of the form (cid:8) (cid:9) Y(t) =HX(t)+W(t), t∈N, where H ∈ [H], and W(t) ∼ N (0,R), to estimate sequentially the states of the dynamical system. In each data assimilation cycle, i.e., for each t ∈ N, a forecast state X(t),f is combined with the observation Y(t) to produce a better estimate of the true state X(t). Hence, one data assimilation cycle is an inverse problem [19, 20, 5]. Since we are interested in one data assimilation cycle only, we drop the time index for the rest of the paper. 3.1. 3DVAR. The3DVARmethodisbasedonaminimizationofthecostfunction (1) J3DVAR(x)= x−xf 2 +|y−x|2 B−1 R−1 where B is a known background cov(cid:12)ariance(cid:12)operator and R is a data noise covari- (cid:12) (cid:12) ance. If the state space is finite dimensional, and the matrix B is regular,then the norm on the right-hand side of (1) is defined by |x|2 = B−1/2x,B−1/2x , x∈H. B−1 However, when the state space Dis infinite dimenEsional, the inverse of a compact linearoperatorisunboundedandonlydenselydefined. Itisthennaturaltoextend the quadratic forms on the right-hand side of (1) as B−1/2x,B−1/2x if x∈B1/2(H), (2) |x|2 = B−1 (∞(cid:10) (cid:11) if x∈/ B1/2(H), where B1/2(H) = Im B1/2 , i.e., B1/2(H) denotes the image of the operator B1/2, and (cid:0) (cid:1) R−1/2x,R−1/2x if x∈R1/2(H), (3) |x|2 = R−1 (∞(cid:10) (cid:11) if x∈/ R1/2(H). Obviously, the 3DVAR cost function attains infinite value, and, even worse, it is not hard to construct an example when J3DVAR(x)=∞ for all x∈H. Example 1. Suppose that R=B, B is a trace class operator, (4) xf ∈B1/2(H), and (5) y ∈/ B1/2(H). If x∈/ B1/2(H), then x−xf ∈/ B1/2(H) because B1/2(H) is an linear subspace of H and (4), so (cid:0) (cid:1) 3DVAR J (x)=∞. When x∈B1/2(H), then (y−x)∈/ B1/2(H) WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS 5 using (5), so, again, 3DVAR J (x)=∞. Therefore, J3DVAR(x)=∞ for all x∈H. 3DVAR Naturally, a minimization of J (x) does not make sense unless there is at least one x ∈ H such that J3DVAR(x) < ∞. Fortunately, we can formulate a sufficient condition when this condition is fulfilled. Theorem 2. If at least one of the operators B and R has positive lower bound, then for any possible values of xf and y, there exist at least one x∈H such that 3DVAR J (x)<∞. Proof. Without loss of generality,assume that R has positive lower bound. Hence, |x−y|2 <∞ R−1 for any combinations of x∈H and y ∈H. Therefore, given xf ∈H, 3DVAR J (x)<∞ for any x∈ z ∈B1/2(H):z =x−xf . (cid:3) 3.2. KF an(cid:8)d EnKF. The ensemble(cid:9)Kalman filter [8, 11], which is based on the Kalman filter [14, 13], is one of the most popular assimilation method. The key part of both methods is the Kalman gain operator (6) K: P7→HPfH∗ HPfH∗+R −1. where P∈[H] and P > 0. If the data sp(cid:0)ace is finite d(cid:1)imensional, then the matrix HPH∗ + R is positive definite, and the inverse is well defined. However, when data space is infinite dimensional, the operator HPH∗+R may not be defined on the whole space since an inverse of a trace class operator is only densely defined. Therefore, the KF update equation Xa =Xf +K Pf y−HXf , where Pf = cov Xf , may not be appl(cid:0)icab(cid:1)le(cid:0)since the(cid:1)re is no guarantee that the term K Pf y−HXf is defined. Yet, similarly to 3DVAR, there is a sufficient (cid:0) (cid:1) condition when the Kalman filter algorithm is well defined for any possible values. (cid:0) (cid:1)(cid:0) (cid:1) Theorem 3. If the data noise covariance R has positive lower bound, then the Kalman gain operator K Pf is defined on the whole space H. Proof. If R has positive(cid:0)low(cid:1)er bound, then the linear operator HPfH∗ + R has positive lower bound as well because P is the covariance operator, so P > 0. The statement now follows from the fact that an operator with positive lower bound has an inverse defined on the whole space. (cid:3) 4. Bayesian approach Denote by µf the distribution of Xf. Bayes’ theorem prescribes the analysis measure by (7) µa(B)∝ d(y|x)dµf (x) ˆ B WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS 6 for all B ∈B(H) if (8) c(y)= d(y|x)dµf(x)>0, ˆ H where the given function d : H×H → [0,∞) is called a data likelihood. If the distribution of the forecast and data noise are both Gaussian, then 1 (9) d(y|x)∝exp − |y−x|2 , 2 R−1 (cid:18) (cid:19) where R−1/2x,R−1/2x if x∈R1/2(H), |x|2 = R−1 (∞(cid:10) (cid:11) if x∈/ R1/2(H). With the natural convention that exp(−∞)=0, we have d(y|x)=0 if x∈/ R1/2(H). Whenbothstateanddataspacesarefinite dimensional,condition(8)is fulfilled for any possible value of observation y ∈ H. Unfortunately, when both spaces are infinitedimensional,condition(8)maynotbefulfilledasshowninthenextexample. Example 4. Assume that Xf ∼ N mf,Pf and, mf belongs to the Cameron- Martin space of Xf, i.e., mf ∈ Pf 1/2(H). If the measures µf and µ ∼N (0,R) (cid:0) (cid:1) R are equivalent, then both have the same Cameron-Martinspace, and (cid:0) (cid:1) µf R1/2(H) =µ R1/2(H) =0, R so (cid:16) (cid:17) (cid:16) (cid:17) 1 ˆHd(0|x)dµf(x)=ˆR1/2(H)exp(cid:18)−2|x|2(R)−1(cid:19)dµf (x) + exp(−∞)dµf(x)=0. ˆ H\R1/2(H) Remark 5. Another data likelihood is proposed in [19], d(y|x) if c(y)>0, d(y|x)= (1 if c(y)=0, where c(y) is defined bye(8), and d(y|x) is defined by (9). This definition leads to the analysis distribution 1 d(y|x)dµf(x) if c(y)>0, (10) µa(B)= c(y) B (µf(B´) if c(y)=0 for all B ∈B(H).eThat is, any data y such that c(y)=0 is ignored. Obviously, the Bayesian update (7), is useful only if the set A= y ∈H: d(y|x)dµf(x)=0 ˆ (cid:26) H (cid:27) is empty. The sufficient condition when the set A is empty is similar to conditions when previously mentioned assimilation techniques are well defined. WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS 7 Theorem 6. The set A= y ∈H: d(y|x)dµf(x)=0 , ˆ (cid:26) H (cid:27) where µf ∼ N mf,Pf , and the data likelihood is defined by (9), is empty if the operator R has positive lower bound. (cid:0) (cid:1) Proof. The operator R has positive lower bound, so the data likelihood function 1 d(y|x)∝exp − |y−x|2 2 R−1 (cid:18) (cid:19) is positive for any x,y ∈H, and it follows that d(y|x)dµf(x)>0 ˆ H for all y ∈H. (cid:3) In the special case when both forecast and data covariances commute, we can show that this condition is also necessary for the set A to be empty. Recall that operators Pf and R commute when PfR−RPf =0. Theorem 7. Assume that µf ∼ N mf,Pf , and operators Pf and R commute. Then (cid:0) (cid:1) 1 exp − |y−x|2 dµf(x)>0 ˆ 2 R−1 H (cid:18) (cid:19) for all y ∈H if and only if the operator R has positive lower bound. Proof. Without loss of generality assume that mf = 0. The operators Pf and R are symmetric, commute, and Pf is compact, so there exists a total orthonormal set {e } of common eigenvectors, i Pfe =p e , Re =r e , for all i∈N, i i i i i i e.g., [15, Lemma 8], [17], [22, Section II.10]. For any z ∈ H, denote by {z } its Fourier coefficient with respect to the or- i thonormal set {e }, i z =hz,e i, i∈N. i i Using this notation, 1 ∞ (y −x )2 d(y|x)=exp − |y−x|2 = exp − i i , 2 R−1 2r (cid:18) (cid:19) i=1 i ! Y and ∞ (y −x )2 d(y|x)dµf(x)= exp − i i dµf(x). ˆ ˆ 2r H Hi=1 i ! Y Denote n (y −x )2 f (x)= exp − i i , n∈N. n 2r i ! i=1 Y WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS 8 Since 0 < e−s2 ≤ 1 for any s ∈ R, {f } is a monotone sequence of functions on n H. The functions f are continuous and therefore measurable, and by the the n monotone convergence theorem, n (y −x )2 (11) d(y|x)dµf(x)= lim exp − i i dµf(x) . ˆH n→∞ ˆHi=1 2ri ! ! Y For each i∈N, the random variable Xf,e has N 0,pf distribution, which i i we denote by µf. Additionally, (cid:10) (cid:11) (cid:16) (cid:17) i E Xf,e Xf,e =δ , i,j ∈N, i j ij and, in particular, the r(cid:0)a(cid:10)ndom v(cid:11)a(cid:10)riables(cid:11)(cid:1)Xf,ei and Xf,ej are independent unless i=j. Then, (cid:10) (cid:11) (cid:10) (cid:11) f (x)dµf(x)= f (x ,...,x ,...)dµf(x )×···×dµf (x ) ˆ n ˆ n 1 n 1 1 n n H H for all n∈N, and, using Fubini’s theorem, f (x)dµf(x)= ··· f (x ,...,x )dµf(x )···dµf (x ). ˆ n ˆ ˆ n 1 n 1 1 n n H R R Now (11) yields that n (y −x )2 d(y|x)dµf(x)= lim exp − i i dµf(x ), ˆH n→∞i=1ˆR 2ri ! i i Y and, since the measure µf is absolutely continuous with respect to the Lebesgue i measure λ on R, ∞ ∞ (y −x )2 (12) d(y|x)dµf(x)= exp − i i ψ(x )dλ(x ) ˆ ˆ 2r i i H i=1 −∞ i ! Y where 1 x2 ψ (x)= exp − i , i 2πpf 2pfi ! i i.e., ψ is the density of a N 0,pfq-distributed random variable. i i The identity (cid:16) (cid:17) (y −x )2 x2 1 1 x y y2 i i + i = + x2−2 i i + i ri pf pf ri! i ri ri i i (x −ma)2 y2 = i i + i , pa r +pf i i i with −1 pf 1 1 ma = i y and pa = + , i ri+pfi i i pfi ri! allows us to write (12) in the form ∞ 1 ∞ (x −ma)2 y2 d(y|x)dµf(x)= exp − i i − i dλ(x ). ˆH Yi=1 2πpfi ˆ−∞  2pai 2 ri+pfi  i q  (cid:16) (cid:17) WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS 9 By standard properties of the normal distribution, ∞ (x −ma)2 exp − i i dx = 2πpa ˆ 2pa i i −∞ i ! p for each i∈N, so ∞ pa 1/2 y2 d(y|x)dµf(x)= i exp − i ˆH Yi=1 pfi !  2 ri+pfi  ∞ pf −1/2 (cid:16) (cid:17)y2 (13) = 1+ i exp − i , ri!  2 r +pf  Yi=1 i i  (cid:16) (cid:17) where we used the computation pa 1 1 i = = . pfi pfi p1fi + r1i 1+ prfii (cid:16) (cid:17) The infinite product (13) is nonzero if and only if the following sum converges, ∞ pf −1/2 y2 log 1+ i exp − i  ri!  2 r +pf  Xi=1 i i 1 ∞ pf  ∞(cid:16) y2(cid:17) (14) =− log 1+ i − i . 2 i=1 ri!! i=1 ri+pfi ! X X To conclude the proof, we need to show that that (14) converges if and only if (15) r = inf {r }>0. i i∈N First, the equivalence ∞ pf ∞ pf (16) ln 1+ i <∞ ⇔ i <∞ r r i! i i=1 i=1 X X follows from the limit comparisontest because pf ln 1+ i lim ri =1 i→∞ (cid:16) pf (cid:17) i ri when pf (17) lim i =0. i→∞ ri If condition (17) is not satisfied, then both sums in (16) diverge. If r >0, then the sum ∞ pf ∞ pf ∞ i ≤ i ≤r−1 pf, r r i i i=1 i=1 i=1 X X X and this sum convergesbecause Pf is trace class. WELL-POSEDNESS OF BAYESIAN DATA ASSIMILATION AND INVERSE PROBLEMS 10 Further, if r >0, then ∞ y2 ∞ y2 ∞ i ≤ i ≤r−1 y2 =r−1|y|2 <∞ r +pf r i i=1 i i i=1 i=1 X X X since {y } are Fourier coefficients of y. On the other side, when r = 0, we will i construct y ∈H such that |y|≤1 and ∞ y2 e e i =∞. r +pf i=1 i i X e Since r =0, there exists a subsequence {r }∞ such that ik k=1 1 r ≤ , k ∈N, ik 2k and we define ∞ y = y e i i i=1 X with e e r1/2 if i∈{i } , y = i k k∈N i (0 if i∈/ {ik}k∈N. The element y lies in the unit circle because e ∞ ∞ ∞ 1 e |y|2 = yi2 = rik ≤ 2k =1, i=1 k=1 k=1 X X X while e e ∞ y2 ∞ r ∞ 1 i = ik = =∞ r +pf r +pf pf Xi=1 i e i Xk=1 ik ik Xk=11+ riikk where the last equality follows immediately from (17). Therefore, the sum (14) is finite for all y ∈H if and only if r >0. (cid:3) The construction of the element y at the end of the previous proof may be generalized, and it implies the following interesting corollary. Corollary 8. Assume that operatorsePf and R commute. The set A= y ∈H: d(y|x)dµf(x)=0 , ˆ (cid:26) H (cid:27) where µf ∼N mf,Pf , and the data likelihood is defined by (9), is dense in H if the the operator R does not have positive lower bound. (cid:0) (cid:1) Proof. To show that A is dense it is sufficient to show that for each z ∈H andany δ >0 A∩{u∈H:|z−u|<δ}6=∅. Let z ∈ H and δ > 0. Similarly as in the previous proof, denote by {e } the total i orthonormalset such that Pfe =p e and Re =r e . i i i i i i Because r=inf{r }=0, i

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.