STOCHASTIC LEAST-SQUARES PETROV–GALERKIN METHOD FOR PARAMETERIZED LINEAR SYSTEMS∗ KOOKJIN LEE†, KEVIN CARLBERG‡, AND HOWARD C. ELMAN§ Abstract. We consider the numerical solution of parameterized linear systems where the sys- tem matrix, the solution, and the right-hand side are parameterized by a set of uncertain input parameters. Weexplorespectralmethodsinwhichthesolutionsareapproximatedinachosenfinite- dimensional subspace. It has been shown that the stochastic Galerkin projection technique fails to minimizeanymeasureofthesolutionerror[20]. Asaremedyforthis,weproposeanovelstochastic 7 least-squares Petrov–Galerkin (LSPG) method. The proposed method is optimal in the sense that 1 it produces the solution that minimizes a weighted (cid:96)2-norm of the residual over all solutions in a 0 given finite-dimensional subspace. Moreover, the method can be adapted to minimize the solution 2 errorindifferentweighted(cid:96)2-normsbysimplyapplyingaweightingfunctionwithintheleast-squares n formulation. In addition, a goal-oriented semi-norm induced by an output quantity of interest can be minimized by defining a weighting function as a linear functional of the solution. We establish a J optimality and error bounds for the proposed method, and extensive numerical experiments show thattheweightedLSPGmethodsoutperformsotherspectralmethodsinminimizingcorresponding 5 targetweightednorms. ] A Key words. stochastic Galerkin, least-squares Petrov–Galerkin projection, residual minimiza- tion,spectralprojection N . AMS subject classifications. 35R60,60H15,60H35,65C20,65N30,93E24 h t a 1. Introduction. Forwarduncertaintypropagationforparameterizedlinearsys- m temsisimportantinarangeofapplicationstocharacterizetheeffectsofuncertainties [ on the output of computational models. Such parameterized linear systems arise in many important problems in science and engineering, including stochastic partial 1 v differential equations (SPDEs) where uncertain input parameters are modeled as a 2 set of random variables (e.g., diffusion/ground water flow simulations where diffu- 9 sivity/permeability is modeled as a random field [15, 30]). It has been shown [11] 4 that intrusive methods (e.g., stochastic Galerkin [2, 10, 13, 19, 28]) for uncertainty 1 propagationcanleadtosmallererrorsforafixedbasisdimension,comparedwithnon- 0 . intrusivemethods[27](e.g.,sampling-basedmethods[3,14,17],stochasticcollocation 1 [1, 21]). 0 7 The stochastic Galerkin method combined with generalized polynomial chaos 1 (gPC) expansions [29] seeks a polynomial approximation of the numerical solution : in the stochastic domain by enforcing a Galerkin orthogonality condition, i.e., the v i residual of the parameterized linear system is forced to be orthogonal to the span of X the stochastic polynomial basis with respect to an inner product associated with an r underlying probability measure. The Galerkin projection scheme is popular for its a simplicity (i.e., the trial and test bases are the same) and its optimality in terms of ∗ThisworkwassupportedbytheU.S.DepartmentofEnergyOfficeofAdvancedScientificCom- putingResearch,AppliedMathematicsprogramunderawardDEC-SC0009301andbytheU.S.Na- tionalScienceFoundationundergrantDMS1418754. †Department of Computer Science, University of Maryland, College Park, MD 20742 ([email protected]). ‡Sandia National Laboratories, Livermore, CA 94550 ([email protected]). Sandia National Laboratoriesisamulti-programlaboratorymanagedandoperatedbySandiaCorporation,awholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National NuclearSecurityAdministrationundercontractDE-AC04-94AL85000. §Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland,CollegePark,MD20742([email protected]). 1 2 K.LEE,K.CARLBERG,ANDH.C.ELMAN minimizing an energy norm of solution errors when the underlying PDE operator is symmetric positive definite. In many applications, however, it has been shown that the stochastic Galerkin method does not exhibit any optimality property [20]. That is, it does not produce solutions that minimize any measure of the solution error. In such cases, the stochastic Galerkin method can lead to poor approximations and non-convergent behavior. To address this issue, we propose a novel optimal projection technique, which we refer to as the stochastic least-squares Petrov–Galerkin (LSPG) method. Inspired by the successes of LSPG methods in nonlinear model reduction [8, 9, 7], finite element methods [4, 5, 16], and iterative linear solvers (e.g., GMRES, GCR) [22], we propose, as an alternative to enforcing the Galerkin orthogonality condition, to directly min- imize the residual of a parameterized linear system over the stochastic domain in a (weighted)(cid:96)2-norm. ThestochasticLSPGmethodproducesanoptimalsolutionfora given stochastic subspace and guarantees that the (cid:96)2-norm of the residual monoton- ically decreases as the stochastic basis is enriched. In addition to producing mono- tonicallyconvergentapproximationsasmeasuredinthechosenweighted(cid:96)2-norm,the method can also be adapted to target output quantities of interest (QoI); this can be accomplished by employing a weighted (cid:96)2-norm used for least-squares minimization that coincides with the (cid:96)2-(semi)norm of the error in the chosen QoI. In addition to proposing the stochastic LSPG method, this study shows that specificchoicesofweightingfunctionsleadtoequivalencebetweenthestochasticLSPG method and both the stochastic Galerkin method and the pseudo-spectral method [26, 27]. We demonstrate the effectiveness of this method with extensive numerical experiments on various SPDEs. The results show that the proposed LSPG technique significantly outperforms the stochastic Galerkin when the solution error is measured indifferentweighted(cid:96)2-norms. Wealsoshowthattheproposedmethodcaneffectively minimize the error in target QoIs. An outline of the paper is as follows. Section 2 formulates parameterized linear algebraic systems and reviews conventional spectral approaches for computing nu- merical solutions. Section 3 develops a residual minimization formulation based on least-squares methods and its adaptation to the stochastic LSPG method. We also provide proofs of optimality and monotonic convergence behavior of the proposed method. Section 4 provides error analysis for stochastic LSPG methods. Section 5 demonstrates the efficiency and the effectiveness of the proposed methods by testing them on various benchmark problems. Finally, Section 6 outlines some conclusions. 2. Spectral methods for parameterized linear systems. We begin by in- troducing a mathematical formulation of parameterized linear systems and briefly reviewing the stochastic Galerkin and the pseudo-spectral methods , which are spec- tral methods for approximating the numerical solutions of such systems. 2.1. Problem formulation. Consider a parameterized linear system (1) A(ξ)u(ξ)=b(ξ), where A:Γ→Rnx×nx, and u,b:Γ→Rnx. The system is parameterized by a set of stochasticinputparametersξ(ω)≡{ξ (ω),...,ξ (ω)}. Here,ω ∈Ωisanelementary 1 nξ event in a probability space (Ω,F,P) and the stochastic domain is denoted by Γ ≡ (cid:81)nξ Γ where ξ :Ω→Γ . We are interested in computing a spectral approximation i=1 i i i ofthenumericalsolutionu(ξ)inann -dimensionalsubspaceS spannedbyafinite ψ nψ STOCHASTICLEAST-SQUARESPETROV–GALERKINMETHOD 3 set of polynomials {ψ (ξ)}nψ such that S ≡span{ψ }nψ ⊆L2(Γ), i.e., i i=1 nψ i i=1 nψ (cid:88) (2) u(ξ)≈u˜(ξ)= u¯ ψ (ξ)=(ψT(ξ)⊗I )u¯, i i nx i=1 where {u¯i}ni=ψ1 with u¯i ∈ Rnx are unknown coefficient vectors, u¯ ≡ [u¯T1 ··· u¯Tnψ]T ∈ Rnxnψ is the vertical concatenation of these coefficient vectors, ψ ≡[ψ1 ··· ψnψ]T ∈ Rnψ is a concatenation of the polynomial basis, ⊗ denotes the Kronecker product, and Inx denotes the identity matrix of dimension nx. Note that u˜ ∈ (Snψ)nx. Typically, the “stochastic” basis {ψ } consists of products of univariate polynomi- i als: ψ ≡ ψ ≡ (cid:81)nξ π (ξ ) where {π }nξ are univariate polynomials, i α(i) k=1 αk(i) k αk(i) k=1 α(i) = (α (i),··· ,α (i)) ∈ Nnξ is a multi-index and α represents the degree of a 1 nξ 0 k polynomial in ξ . The dimension of the stochastic subspace n depends on the num- k ψ berofrandomvariablesn , themaximumpolynomialdegreep, andaconstructionof ξ the polynomial space (e.g., a total-degree space that contains polynomials with total degree up to p, (cid:80)nξ α (i) ≤ p). By substituting u(ξ) with u˜(ξ) in (1), the residual k=1 k can be defined as nψ (cid:88) (3) r(u¯;ξ):=b(ξ)−A(ξ) u¯ ψ (ξ)=b(ξ)−(ψT(ξ)⊗A(ξ))u¯, i i i=1 where ψT(·)⊗A(·):Γ→Rnx×nψnx. It follows from (2) and (3) that our goal now is to compute the unknown co- efficients {u¯ }nψ of the solution expansion. We briefly review two conventional i i=1 approaches for doing so: the stochastic Galerkin method and the pseudo-spectral method. In the following, ρ ≡ ρ(ξ) denotes an underlying measure of the stochastic space Γ and (cid:90) (4) (cid:104)g,h(cid:105) ≡ g(ξ)h(ξ)ρ(ξ)dξ, ρ Γ (cid:90) (5) E[g]≡ g(ξ)ρ(ξ)dξ, Γ define an inner product between scalar-valued functions g(ξ) and h(ξ) with respect to ρ(ξ) and the expectation of g(ξ), respectively. The (cid:96)2-norm of a vector-valued function v(ξ)∈Rnx is defined as (cid:88)nx (cid:90) (6) (cid:107)v(cid:107)2 ≡ v2(ξ)ρ(ξ)dξ =E[vTv]. 2 i i=1 Γ Typically, the polynomial basis is constructed to be orthogonal in the (cid:104)·,·(cid:105) inner ρ product, i.e., (cid:104)ψ ,ψ (cid:105) = (cid:81)nξ (cid:104)π ,π (cid:105) = δ , where δ denotes the Kro- i j ρ k=1 αk(i) αk(j) ρk ij ij necker delta. 2.2. StochasticGalerkinmethod. ThestochasticGalerkinmethodcomputes the unknown coefficients {u¯ }nψ of u˜(ξ) in (2) by imposing orthogonality of the i i=1 residual(3)withrespecttotheinnerproduct(cid:104)·,·(cid:105) inthesubspaceS . ThisGalerkin ρ nψ orthogonality condition can be expressed as follows: Find u¯SG ∈Rnxnψ such that (7) (cid:104)r (u¯SG),ψ (cid:105) =E[r (u¯SG)ψ ]=0, i=1,...,n , j =1,...,n , i j ρ i j x ψ 4 K.LEE,K.CARLBERG,ANDH.C.ELMAN where r ≡[r ··· r ]T. The condition (7) can be represented in matrix notation as 1 nx (8) E[ψ⊗r(u¯SG)]=0. From the definition of the residual (3), this gives a system of linear equations (9) E[ψψT ⊗A]u¯SG =E[ψ⊗b], of dimension n n . This yields an algebraic expression for the stochastic-Galerkin x ψ approximation (10) u˜SG(ξ)=(ψ(ξ)T ⊗I )E[ψψT ⊗A]−1E[ψ⊗Au]. nx If A(ξ) is symmetric positive definite, the solution of linear system (9) minimizes the solution error e(x)≡u−x in the A(ξ)-induced energy norm (cid:107)v(cid:107)2 ≡E[vTAv], i.e., A (11) u˜SG(ξ)= argmin (cid:107)e(x)(cid:107)2. A x∈(Snψ)nx In general, however, the stochastic-Galerkin approximation does not minimize any measure of the solution error. 2.3. Pseudo-spectral method. The pseudo-spectral method directly approx- imates the unknown coefficients {u¯ }nψ of u˜(ξ) in (2) by exploiting orthogonality i i=1 of the polynomial basis {ψ (ξ)}nψ . That is, the coefficients u¯ can be obtained by i i=1 i projecting the numerical solution u(ξ) onto the orthogonal polynomial basis as (12) u¯PS =E[uψ ], i=1,...,n , i i ψ which can be expressed as (13) u¯PS =E[ψ⊗A−1b], or equivalently (14) u˜PS(ξ)=(ψ(ξ)T ⊗I )E[ψ⊗u]. nx The associated optimality property of the approximation, which can be derived from optimality of orthogonal projection, is (15) u˜PS(ξ)= argmin (cid:107)e(x)(cid:107)2. 2 x∈(Snψ)nx In practice, the coefficients {u¯PS}nψ are approximated via numerical quadrature as i i=1 (cid:88)nq (cid:88)nq (cid:16) (cid:17) (16) u¯PS =E[uψ ]= u(ξ(k))ψ (ξ(k))w = A−1(ξ(k))f(ξ(k)) ψ (ξ(k))w , i i i k i k k=1 k=1 where {(ξ(k),w )}nq are the quadrature points and weights. k k=1 While stochastic Galerkin leads to an optimal approximation (11) under certain conditions and pseudo-spectral projection minimizes the (cid:96)2-norm of the solution er- ror (15), neither approach provides the flexibility to tailor the optimality properties of the approximation. This may be important in applications where, for example, minimizing the error in a quantity of interest is desired. To address this, we propose a general optimization-based framework for spectral methods that enables the choice of a targeted weighted (cid:96)2-norm in which the solution error is minimized. STOCHASTICLEAST-SQUARESPETROV–GALERKINMETHOD 5 3. Stochastic least-squares Petrov–Galerkin method. Asastartingpoint, we propose a residual-minimizing formulation that computes the coefficients u¯ by directly minimizing the (cid:96)2-norm of the residual, i.e., u˜LSPG(ξ)= argmin (cid:107)f −Ax(cid:107)2 = argmin (cid:107)e(x)(cid:107)2 , (17) 2 ATA x∈(Snψ)nx x∈(Snψ)nx where (cid:107)v(cid:107)2 ≡ E[vTATAv]. Thus, the (cid:96)2-norm of the residual is equivalent to a ATA weighted (cid:96)2-norm of the solution error. Using (2) and (3), we have (18) u¯LSPG = argmin(cid:107)r(x¯)(cid:107)2. 2 x¯∈Rnxnψ The definition of the residual (3) allows the objective function in (18) to be written in quadratic form as (19) (cid:107)r(x¯)(cid:107)2 =(cid:107)f −(ψT ⊗A)x¯(cid:107)2 =x¯TE[ψψT ⊗ATA]x¯−2E[ψ⊗ATf]Tx¯+E[fTf]. 2 2 Noting that the mapping x¯(cid:55)→(cid:107)r(x¯)(cid:107)2 is convex, the (unique) solution u¯LSPG to (18) 2 is a stationary point of (cid:107)r(x¯)(cid:107)2 and thus satisfies 2 (20) E[ψψT ⊗ATA]u¯LSPG =E[ψ⊗ATf], which can be interpreted as the normal-equations form of the linear least-squares problem (18). Consider a generalization of this idea that minimizes the solution error in a tar- geted weighted (cid:96)2-norm by choosing a specific weighting function. Let us define a weightingfunctionM(ξ)≡Mξ(ξ)⊗Mx(ξ),whereMξ :Γ→RandMx :Γ→Rnx×nx. Then, the stochastic LSPG method can be written as u˜LSPG(M)(ξ)= argmin (cid:107)M(b−Ax)(cid:107)2 = argmin (cid:107)e(x)(cid:107)2 , (21) 2 ATMTMA x∈(Snψ)nx x∈(Snψ)nx with(cid:107)v(cid:107)2 ≡E[vTATMTMAv]=E[(MTM ⊗(M Av)TM Av]. Algebraically, ATMTMA ξ ξ x x this is equivalent to u¯LSPG(M) = argmin(cid:107)Mr(x¯)(cid:107)2 = argmin(cid:107)(M ⊗M )(1⊗b−(cid:0)ψT ⊗A(cid:1)x¯)(cid:107)2 2 ξ x 2 x¯∈Rnxnψ x¯∈Rnxnψ (22) = argmin(cid:107)M ⊗(M b)−(cid:0)(M ψT)⊗(M A)(cid:1)x¯(cid:107)2. ξ x ξ x 2 x¯∈Rnxnψ We will restrict our attention to the case M (ξ) = 1 and denote M (ξ) by M(ξ) for ξ x simplicity. Now, the algebraic stochastic LSPG problem (22) simplifies to (23) u¯LSPG(M) = argmin(cid:107)Mr(x¯)(cid:107)2 = argmin(cid:107)Mb−(ψT ⊗MA)x¯(cid:107)2. 2 2 x¯∈Rnxnψ x¯∈Rnxnψ The objective function in (23) can be written in quadratic form as (24) (cid:107)Mr(x¯)(cid:107)2 =x¯TE[(ψψT ⊗MATMTMA)]x¯−2(E[ψ⊗ATMTMf])Tx¯+E[bTMTMb]. 2 Asbefore,becausethemappingx¯(cid:55)→(cid:107)Mr(x¯)(cid:107)2isconvex,theuniquesolutionu¯LSPG(M) 2 of (23) corresponds to a stationary point of (cid:107)Mr(x¯)(cid:107)2 and thus satisfies 2 (25) E[ψψT ⊗ATMTMA]u¯LSPG(M) =E[ψ⊗ATMTMf], 6 K.LEE,K.CARLBERG,ANDH.C.ELMAN which is the normal-equations form of the linear least-squares problem (23). This yields the following algebraic expression for the stochastic-LSPG approximation: (26) u˜LSPG(M)(ξ)=(ψ(ξ)T ⊗I )E[ψψT ⊗ATMTMA]−1E[ψ⊗ATMTMAu]. nx Petrov–Galerkin projection. Another way of interpreting the normal equa- tions (25) is that the (weighted) residual M(ξ)r(u¯LSPG(M);ξ) is enforced to be or- thogonal to the subspace spanned by the optimal test basis {φ }nψ with φ (ξ) := i i=1 i ψ (ξ)⊗M(ξ)A(ξ) and span{φ }nψ ⊆L2(Γ). That is, this projection is precisely the i i i=1 (least-squares) Petrov–Galerkin projection, (27) E[φT(b−(ψT ⊗MA)u¯LSPG(M))]=0, where φ(ξ)≡[φ (ξ) ··· φ (ξ)]. 1 nψ MonotonicConvergence. Thestochasticleast-squaresPetrov-Galerkinismono- tonicallyconvergent. Thatis,asthetrialsubspaceS isenriched(byaddingpolyno- nψ mialstothebasis),theoptimalvalueoftheconvexobjectivefunction(cid:107)Mr(u¯LSPG(M))(cid:107)2 2 monotonicallydecreases. ThisisapparentfromtheLSPGoptimizationproblem(21): Defining u˜LSPG(cid:48)(M)(ξ)= argmin (cid:107)M(b−Ax)(cid:107)2, (28) 2 x∈(Snψ+1)nx wehave(cid:107)M(b−Au˜LSPG(cid:48)(M))(cid:107)2 ≤(cid:107)M(b−Au˜LSPG(M))(cid:107)2(and(cid:107)u−uLSPG(cid:48)(M)(cid:107) 2 2 ATMTMA ≤(cid:107)u−uLSPG(M)(cid:107) ) if S ⊆S . ATMTMA nψ nψ+1 Weightingstrategies. DifferentchoicesofweightingfunctionM(ξ)allowLSPG to minimize different measures of the error. We focus on four particular choices: 1. M(ξ) = C−1(ξ), where C(ξ) is a Cholesky factor of A(ξ), i.e., A(ξ) = C(ξ)CT(ξ). This decomposition exists if and only if A is symmetric pos- itive semidefinite. In this case, LSPG minimizes the energy norm of the solution error (cid:107)e(x)(cid:107)2 ≡ (cid:107)C−1r(x¯)(cid:107)2 (= (cid:107)e((ΨT ⊗I )x¯)(cid:107)2) and is mathe- A 2 nx A matically equivalent to the stochastic Galerkin method described in Section 2.2, i.e., u˜LSPG(C−1) = u˜SG. This can be seen by comparing (11) and (21) with M =C−1, as ATMTMA=A in this case. 2. M(ξ) = I , where I is the identity matrix of dimension n . In this case, nx nx x LSPG minimizes the (cid:96)2-norm of the residual (cid:107)e(x)(cid:107) ≡(cid:107)r(x¯)(cid:107)2. ATA 2 3. M(ξ)=A−1(ξ). In this case, LSPG minimizes the (cid:96)2-norm of solution error (cid:107)e(x)(cid:107)2 ≡ (cid:107)A−1r(x¯)(cid:107)2. This is mathematically equivalent to the pseudo- 2 2 spectral method described in Section 2.3, i.e., u˜LSPG(A−1) = u˜PS, which can be seen by comparing (15) and (21) with M =A−1. 4. M(ξ) = F(ξ)A−1(ξ) where F : Γ → Rno×nx is a linear functional of the solution associated with a vector of output quantities of interest. In this case, LSPG minimizes the (cid:96)2-norm of the error in the output quantities of interest (cid:107)Fe(x)(cid:107)2 ≡(cid:107)FA−1r(x¯)(cid:107)2. 2 2 We again emphasize that two particular choices of the weighting function M(ξ) lead to equivalence between LSPG and existing spectral-projection methods (stochastic Galerkin and pseudo-spectral projection), i.e., (29) u˜LSPG(C−1) =u˜SG, u˜LSPG(A−1) =u˜PS, where the first equality is valid (i.e., the Cholesky decomposition A(ξ)= C(ξ)CT(ξ) can be computed) if and only if A is symmetric positive semidefinite. Table 1 sum- marizes the target quantities to minimize (i.e., (cid:107)e(x)(cid:107)2 ≡E[e(x)TΘe(x)]), the corre- Θ sponding LSPG weighting functions, and the method names LSPG(Θ). STOCHASTICLEAST-SQUARESPETROV–GALERKINMETHOD 7 Table 1 Different choices for the LSPG weighting function QuantityminimizedbyLSPG Weightingfunction Methodname Quantity Expression Energynormoferror (cid:107)e(x)(cid:107)2 M(ξ)=C−1(ξ) LSPG(A)/SG A (cid:96)2-normofresidual (cid:107)e(x)(cid:107)2ATA M(ξ)=Inx LSPG(ATA) (cid:96)2-normofsolutionerror (cid:107)e(x)(cid:107)2 M(ξ)=A−1(ξ) LSPG(2)/PS 2 (cid:96)2-normoferrorinquantitiesofinterest (cid:107)Fe(x)(cid:107)2 M(ξ)=F(ξ)A−1(ξ) LSPG(FTF) 2 4. Erroranalysis. Ifanapproximationsatisfiesanoptimal-projectioncondition (30) u˜= argmin (cid:107)e(x)(cid:107)2, Θ x∈(Snψ)nx then (31) (cid:107)e(u˜)(cid:107)2 = min (cid:107)e(x)(cid:107)2. Θ Θ x∈(Snψ)nx Using norm equivalence (32) (cid:107)x(cid:107)2 ≤C(cid:107)x(cid:107)2, Θ(cid:48) Θ we can characterize the solution error e(u˜) in any alternative norm Θ(cid:48) as (33) (cid:107)e(u˜)(cid:107)2 ≤C min (cid:107)e(x)(cid:107)2. Θ(cid:48) Θ x∈(Snψ)nx Thus, the error in an alternative norm Θ(cid:48) is controlled by the optimal objective- function value min (cid:107)e(x)(cid:107)2 (which can be made small if the trial space ad- x∈(Snψ)nx Θ mits accurate solutions) and the stability constant C. Table2reportsnorm-equivalenceconstantsforthenormsconsideredinthiswork. Here, we have defined (34) σ (M)≡ inf (cid:107)Mx(cid:107) /(cid:107)x(cid:107) , σ (M)≡ sup (cid:107)Mx(cid:107) /(cid:107)x(cid:107) . min 2 2 max 2 2 x∈(L2(Γ))nx x∈(L2(Γ))nx Table 2 Stability constant C in (32) Θ(cid:48)=A Θ(cid:48)=ATA Θ(cid:48)=2 Θ(cid:48)=FTF Θ=A 1 σmax(A) σmin1(A) σσmmaixn((FA))2 Θ=ATA 1 1 1 σmax(F)2 σmin(A) σmin(A)2 σmin(A)2 Θ=2 σmax(A) σmax(A)2 1 σmax(F)2 Θ=FTF σmax(A) σmax(A)2 1 1 σmin(F)2 σmin(F)2 σmin(F)2 This exposes several interesting conclusions. First, if n < n , then the null space o x of F is nontrivial and so σ (F) = 0. This implies that LSPG(FTF), for which min Θ = FTF, will have an undefined value of C when the solution error is measured in other norms, i.e., for Θ(cid:48) = A, Θ(cid:48) = ATA, and Θ(cid:48) = 2. It will have controlled errors only for Θ(cid:48) =FTF, in which case C =1. Second, note that for problems with small σ (A), the (cid:96)2 norm in the quantities of interest may be large for the LSPG(A)/SG, min orLSPG(ATA),whileitwillremainwellbehavedforLSPG(2)/PSandLSPG(FTF). 8 K.LEE,K.CARLBERG,ANDH.C.ELMAN 5. Numericalexperiments. ThissectionexplorestheperformanceoftheLSPG methods for solving elliptic SPDEs parameterized by one random variable (i.e., n = ξ 1). The maximum polynomial degree used in the stochastic space S is p; thus, nψ the dimension of S is n = p+1. In physical space, the SPDE is defined over a nψ ψ two-dimensional rectangular bounded domain D, and it is discretized using the finite element method with bilinear (Q ) elements as implemented in the Incompressible 1 Flow and Iterative Solver Software (IFISS) package [24]. Sixteen elements are em- ployed in each dimension, leading to n = 225 = 152 degrees of freedom excluding x boundary nodes. All numerical experiments are performed on an Intel 3.1 GHz i7 CPU, 16 GB RAM using Matlab R2015a. Measuring weighted (cid:96)2-norms. ForallLSPGmethods,theweighted(cid:96)2-norms canbemeasuredbyevaluatingtheexpectationsinthequadraticformoftheobjective function shown in (24). This requires evaluation of three expectations (35) (cid:107)Mr(x¯)(cid:107)2 :=x¯TT x¯−2TTx¯+T , 2 1 2 3 with (36) T1 :=E[(ψψT ⊗ATMTMTA)]∈Rnxnψ×nxnψ, (37) T2 :=E[ψ⊗ATMTMb]∈Rnxnψ, (38) T :=E[bTMTMb]∈R. 3 NotethatT doesnotdependonthestochastic-spacedimensionn . Thesequantities 3 ψ can be evaluated by numerical quadrature or analytically if closed-form expressions for those expectations exist. Unless otherwise specified, we compute these quantities usingtheintegralfunctioninMatlab,whichperformsadaptivenumericalquadra- ture based on the 15-point Gauss–Kronrod quadrature formula [23]. Error measures. In the experiments, we assess the error in approximate solu- tions computed using various spectral-projection techniques using four relative error measures (see Table 1): (39) (cid:107)e(x)(cid:107)2 (cid:107)e(x)(cid:107)2 (cid:107)e(x)(cid:107)2 (cid:107)Fe(x)(cid:107)2 η (x):= ATA, η (x):= 2, η (x):= A, η (x):= 2. r (cid:107)f(cid:107)2 e (cid:107)u(cid:107)2 A (cid:107)u(cid:107)2 Q (cid:107)Fu(cid:107)2 2 2 A 2 5.1. Stochastic diffusion problems. Considerthesteady-statestochasticdif- fusion equation with homogeneous boundary conditions, (cid:26) −∇·(a(x,ξ)∇u(x,ξ)) =f(x,ξ) in D×Γ (40) u(x,ξ) =0 on ∂D×Γ, wherethediffusivitya(x,ξ)isarandomfieldandD =[0,1]×[0,1]. Therandomfield a(x,ξ) is specified as an exponential of a truncated Karhunen-Lo`eve (KL) expansion (cid:16) (cid:17) [18] with covariance kernel, C(x, y) ≡ σ2exp −|x1−y1| − |x2−y2| , where c is the c c correlation length, i.e., (41) a(x,ξ)≡exp(µ+σa (x)ξ), 1 where {µ,σ2} are the mean and variance of a and a (x) is the first eigenfunction 1 in the KL expansion. After applying the spatial (finite-element) discretization, the problem can be reformulated as a parameterized linear system of the form (1), where STOCHASTICLEAST-SQUARESPETROV–GALERKINMETHOD 9 A(ξ) is a parameterized stiffness matrix obtained from the weak form of the problem (cid:82) whose(i,j)-elementis[A(ξ)] = ∇a(x,ξ)ϕ (x)·ϕ (x)dx(with{ϕ }standardfinite ij D i j i elementbasisfunctions)andb(ξ)isaparameterizedright-handsidewhoseithelement (cid:82) is [b(ξ)] = f(x,ξ)ϕ (x)dx. Note that A(ξ) is symmetric positive definite for this i D i problem; thus LSPG(A)/SG is a valid projection scheme (the Cholesky factorization A(ξ)=C(ξ)C(ξ)T exists) and is equal to stochastic Galerkin projection. Output quantities of interest. We consider n output quantities of interest o (F(ξ)u(ξ) ∈ Rno) that are random linear functionals of the solution and F(ξ) is of dimension n ×n having the form: o x (1) F1(ξ):=g(ξ)×GwithG∼U(0, 1)no×nx: eachentryofGisdrawnuniformly from[0,1]andg(ξ)isascalar-valuedfunctionofξ. TheresultingoutputQoI, F (ξ)u(ξ), is a vector-valued function of dimension n . 1 o (2) F (ξ):=b(ξ)TM¯: M¯ is a mass matrix defined via [M¯] ≡(cid:82) ϕ (x)ϕ (x)dx. 2 ij D i j The output QoI is a scalar-valued function F (ξ)u(ξ) = b(ξ)TM¯u(ξ), which 2 approximates a spatial average 1 (cid:82) f(x,ξ)u(x,ξ)dx. |D| D 5.1.1. Diffusion problem 1: Lognormal random coefficient and deter- ministic forcing. In this example, we take ξ in (41) to follow a standard normal (cid:16) (cid:17) distribution (i.e., ρ(ξ) = √1 exp −ξ2 and ξ ∈ (−∞,∞)) and f(x,ξ) = 1 is de- 2π 2 terministic. Because ξ is normally distributed, normalized Hermite polynomials (or- thogonal with respect to (cid:104)·,·(cid:105) ) are used as polynomial basis {ψ (ξ)}nψ . ρ i i=1 Figure1reportstherelativeerrors(39)associatedwithsolutionscomputedusing four LSPG methods (LSPG(A)/SG, LSPG(ATA), LSPG(2)/PS, and LSPG(FTF)) for varying polynomial degree p. Here, we consider the random output QoI, i.e., F =F ,n =100,andg(ξ)=ξ. Thisresultshowsthatthreemethods(LSPG(A)/SG, 1 o LSPG(ATA), and LSPG(2)/PS) monotonically converge in all four error measures, whereas LSPG(FTF) does not. This is an artifact of rank deficiency in F , which 1 leads to σ (F ) = 0; as a result, all stability constants C for which Θ = FTF in min 1 Table 2 are unbounded, implying lack of error control. Figure 1 also shows that each LSPG method minimizes its targeted error measure for a given stochastic-subspace dimension (e.g., LSPG minimizes the (cid:96)2-norm of the residual); this is also evident fromTable2,asthestabilityconstantrealizesitsminimumvalue(C =1)forΘ=Θ(cid:48). Table3showsactualvaluesofthestabilityconstantofthisproblemandwellexplains the behaviors of all LSPG methods. For example, the first column of Table 3 shows that the stability constant is increasing in the order (LSPG(A)/SG, LSPG(ATA), LSPG(2)/PS, and LSPG(FTF)), which is represented in Figure 1a. Table 3 Stability constant C of Diffusion problem 1 Θ(cid:48)=A Θ(cid:48)=ATA Θ(cid:48)=2 Θ(cid:48)=FTF Θ=A 1 26.43 2.06 11644.22 Θ=ATA 2.06 1 4.25 24013.48 Θ=1 26.43 698.53 1 5646.32 Θ=FTF ∞ ∞ ∞ 1 The results in Figure 1 do not account for computational costs. This point is addressed in Figure 2, which shows the relative errors as a function of CPU time. As we would like to devise a method that minimizes both the error and computational time, we examine a Pareto front (black dotted line) in each error measure. For a fixed value of p, LSPG(2)/PS is the fastest method because it does not require so- 10 K.LEE,K.CARLBERG,ANDH.C.ELMAN 4 8 ) A η 0 6 1 2 g o (l 4 r 0 solutionerro−−42 ηdual(log)r10−022 mof−6 Resi−4 or LSPG(A)/SG −6 LSPG(A)/SG Energyn−−108 LLLSSSPPPGGG(((2AF)TT/AFP)S) −−108 LLLSSSPPPGGG(((AF2)TT/AFP)S) 2 4 6 8 10 2 4 6 8 10 polynomialdegreep polynomialdegreep (a)RelativeenergynormofsolutionerrorηA (b)Relative(cid:96)2-normofresidualηr 2 0 −1 0 ) ηQ−2 ηg)e10−2 (log10−3 rror(lo−4 utQoI−−45 e p ution−6 nout−6 Sol−8 LLSSPPGG((AA)T/AS)G Errori−7 LLSSPPGG((AA)T/AS)G LSPG(2)/PS −8 LSPG(2)/PS LSPG(FTF) LSPG(FTF) −10 −9 2 4 6 8 10 2 4 6 8 10 polynomialdegreep polynomialdegreep (c)Relative(cid:96)2-normofsolutionerrorηe (d)Relative(cid:96)2-normofoutputQoIerrorηQwith F =F1,no=100,andg(ξ)=ξ Fig. 1. Relative error measures versus polynomial degree for diffusion problem 1: lognormal random coefficient and deterministic forcing. Note that each LSPG method performs best in the error measure it minimizes. lution of a coupled system of linear equations of dimension n n which is required x ψ by the other three LSPG methods (LSPG(A)/SG, LSPG(ATA), and LSPG(FTF)). As a result, pseudo-spectral projection (LSPG(2)/PS) generally yields the best over- all performance in practice, even when it produces larger errors than other methods for a fixed value of p. Also, for a fixed value of p, LSPG(A)/SG is faster than LSPG(ATA) because the weighted stiffness matrix A(ξ) obtained from the finite ele- mentdiscretizationissparserthanAT(ξ)A(ξ). Thatis,thenumberofnonzeroentries to be evaluated for LSPG(A)/SG in numerical quadrature is smaller than the ones for LSPG(ATA), and exploiting this sparsity structure in the numerical quadrature causes LSPG(A)/SG to be faster than LSPG(ATA).