Alma Mater Studiorum Universita` di Bologna Dottorato di Ricerca in MATEMATICA Ciclo XXVI Settore Concorsuale di afferenza: 01/A5 Settore Scientifico disciplinare: MAT/08 Spectral estimates and preconditioning for saddle point systems arising from optimization problems Presentata da: Mattia Tani Coordinatore Dottorato: Relatore: Chiar.mo Prof. Chiar.mo Prof. Giovanna Citti Valeria Simoncini Esame Finale anno 2015 Contents 1 Introduction 1 1.1 Aims and outline . . . . . . . . . . . . . . . . . . . . . . . . . 4 I Numerical linear algebra preliminaries 7 2 Saddle point systems and Krylov methods background 9 2.1 Saddle point systems . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Krylov methods . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Preconditioners for saddle point systems . . . . . . . . . . . . 21 3 The augmented block diagonal preconditioner 27 3.1 The preconditioner . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 New spectral estimates for the case C = 0 . . . . . . . . . . . 30 6 3.3 Numerical illustration . . . . . . . . . . . . . . . . . . . . . . . 39 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 II Saddle point systems arising in PDE-constrained optimization 43 4 Refined spectral estimates for a preconditioned system in a nonstandard inner product 45 4.1 An optimal control problem with PDE constraints . . . . . . . 46 4.2 Conjugate Gradient in a nonstandard inner product . . . . . . 48 4.3 Refined spectral estimates . . . . . . . . . . . . . . . . . . . . 50 4.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 56 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5 New preconditioning strategies for optimal control problems with inequality constraints 61 5.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 i ii CONTENTS 5.2 Overview of the current approaches . . . . . . . . . . . . . . . 68 5.3 A new approximation to the active-set Schur complement . . . 72 5.4 New preconditioners for the active-set Newton method . . . . 80 5.5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 83 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 III Saddle point systems arising in the solution of Quadratic Programming problems 97 6 Spectral estimates for unreduced symmetric systems 99 6.1 Interior Point methods . . . . . . . . . . . . . . . . . . . . . . 101 6.2 The KKT system . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.3 Spectral estimates . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.4 Validation of the spectral bounds for K . . . . . . . . . . . . 116 3 6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7 Preconditioning for the reduced and unreduced formulation125 7.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.2 Standard preconditioners for the unreduced systems . . . . . . 130 7.3 Qualitatively different preconditioning strategies . . . . . . . . 136 7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 A Technical proofs 149 Notation Sets and vector spaces R Set of real numbers R+ Set of positive real numbers Rn Space of real vectors of dimension n Rm n Space of m n real matrices × × P Space of polynomials of degree at most k k Matrices and vectors I Identity matrix of order n n blkdiag(A,B,...) Block diagonal matrix with blocks A, B,... diag(A) Diagonal matrix derived from the diagonal of A diag(v) Diagonal matrix with vector v on the diagonal nnz(A) Number of nonzero entries of A rank(A) Rank of matrix A κ(A) Condition number of matrix A A Entry of matrix A in the i th row and j th column ij − − v Maximum entry of vector v max v Minimum entry of vector v min v i th entry of vector v i − Eigenvalues and singular values spec(A) spectrum of A λ (A) Maximum eigenvalue of matrix A with real spectrum max λ (A) Minimum eigenvalue of matrix A with real spectrum min λ (A) i th eigenvalue of a matrix A with real spectrum, i − sorted in ascending order σ (B) Maximum singular value of matrix B max σ (B) Minimum singular of matrix B min σ (B) i th singular value of matrix B, in descending order i − iii iv CONTENTS Subspaces ker(A) Null space of matrix A range(A) Range of matrix A span v ,v ,... Linear space of vectors v , v ,... 1 2 1 2 { } Given column vectors x and y, we write (x,y) for the column vector given by their concatenation, instead of using [xT,yT]T. Given A and B two square symmetricmatrices, wewrite A B to intend thatA B is positivesemidef- (cid:23) − inite, and A B to intend that A B is positive definite. The Euclidean ≻ − norm of a vector v Rn is defined as v := √vTv. The matrix norm ∈ k k Bv induced by the Euclidean norm is defined as B := max k k, where k k v Rn,v=0 v B Rm n. Note that we use the same symbol, ,∈to d6enoktekboth the × ∈ k·k vector and the matrix norm, as there is no ambiguity between them. Given a symmetric and positive definite matrix Rn n, we define the associated × D ∈ vector norm as v = √vT v for v Rn, and the matrix norm induced by k k D ∈ D Av it as A := max k k for A Rn n D × k kD v Rn,v=0 v ∈ ∈ 6 k k D Chapter 1 Introduction We are interested in the numerical solution of the linear system v = b A where RN N is nonsingular and b RN. This problem arises very × A ∈ ∈ frequently in all areas of scientific computing. Indeed, numerical methods often reduce the solution of more complex problems, like the solution of nonlinear equations or differential equations, to the solution of one or more linearsystems. Asaconsequence,thesolution(possiblymany)linearsystems often constitutes the main computational effort when numerical methods are appliedonreal-world problems, andtheimportanceofhavingfastalgorithms dealing with such problems cannot be overestimated. Despite the seeming simplicity of linear systems, the actual computation of their solution can be extremely hard when the dimension of the problem N is large. Nowadays applications often lead to very large linear system, but these are typically sparse and/or have some special structure which can be exploitedbymodernalgorithmstogaininefficiency. Ofcourse,theproperties and the structure of a linear systems depend on the nature of the original problem. In this thesis, we focus on linear systems stemming from optimization problems of the form minimize 1xTAx cTx x Rn 2 − ∈ s. t. Bx = d (1.1) Dx f ≥ where A Rn n is symmetric and positive semidefinite, B Rm n with × × ∈ ∈ m n, D Rl n, c Rn, d Rm, f Rl, and the inequality between × ≤ ∈ ∈ ∈ ∈ 1 2 1. Introduction vectors is intended componentwise. Such problems are referred to as convex Quadratic Programming (QP) problems (cf. [95, Chapter 16]). The numerical solution of (1.1) often requires the solution of (one or more) symmetric indefinite linear systems having a peculiar block structure. As an example, consider the simplified case when D = 0 and f = 0, i.e. when there are no inequality constraints. If we introduce the vector p Rm ∈ of Lagrange multipliers, the first order optimality conditions for (1.1) read Ax+BTp = c, Bx = d, or, written in matrix form A BT x c = . (1.2) B 0 p d (cid:20) (cid:21)(cid:20) (cid:21) (cid:20) (cid:21) Thus, the solution of the optimization problem is equivalent to the solution of the above linear system. A system whose coefficient matrix has the struc- ture given in (1.2) is referred to as a saddle point system. Such systems form a very important family, whose relevance is not limited to optimization problems. Indeed, saddle point systems arise naturally in many areas of com- putational science and engineering, such as fluid dynamics, linear elasticity, electromagnetism, and many others. As a consequence, the literature on this topic is vast and often focused on particular applications. In the next chap- ter, we will discuss the main properties of saddle point systems and review some well-known approaches to solve them. For the moment, it is enough to mention that solving saddle point systems is often a challenging task. When solving a linear system, one has to choose between a direct or an iterative method. Direct methods, which include the LU and Cholesky fac- torization, are characterized by the property of delivering the exact solution (if rounding errors are not considered) in a finite number of steps. Direct methods are considered robust and predictable, and they typically are the method of choice when is dense and has small dimension. A On the other hand, when direct methods are applied to large and sparse systems, a complication arising is that some entries which are zero in may A become nonzero in the factors. This phenomenon, known as fill-in, may greatly increase the memory storage and the computational cost. Fill-in can be reduced using appropriate strategies, giving rise to the so-called sparse direct methods. Nevertheless, when is large and sparse, iterative methods are often pre- A ferred. These methods compute a sequence of approximate solutions (vk)k N ∈ which should converge to the exact solution. 3 Krylov subspace methods, or, in short, Krylov methods, are a very pop- ular family of iterative methods. They take their name from the so-called Krylov subspaces, which are defined as ( ,r ) = span r , r ,..., k 1r k 0 0 0 − 0 K A A A wherer = b v istheinitialresidu(cid:8)al. InKrylovmetho(cid:9)ds,ateachiteration 0 0 −A the approximate solution v is sought in the k dimensional affine subspace k − v + ( ,r ). 0 k 0 K A The actual expression of v depends on the particular Krylov method k employed. In many important cases, v satisfies some optimality property; k forexample,inminres[99]andgmres[115],v isthevectorofv + ( ,r ) k 0 k 0 K A which minimizes the Euclidean norm of the residual r = b v . k k −A The eigenvalue distribution of the system matrix often plays a crucial role in Krylov methods. This is especially true when is symmetric, as in (1.2). A Indeed, the rate of convergence of Krylov methods for symmetric systems can be bounded by a term that depends only on the extreme eigenvalues of . As an example, consider minres, a Krylov method often employed A in the context of saddle point systems. In minres, if the spectrum of is A contained in the set [ d, c] [c,d], where d > c > 0, then the following can − − ∪ be shown ( see e.g. [64, (3.15)]): [k/2] r d/c 1 k k k 2 − , k = 1,2,... (1.3) r ≤ d/c+1 k 0k (cid:18) (cid:19) where [ ] denotes the integer part. It is apparent that, if d/c 1 convergence · ≈ will be fast; if, on the other hand, d/c 1 then a slow convergence might ≫ be observed. Generally speaking, in the symmetric case, the more clustered the eigenvalues of the faster the convergence. A The rate of convergence of Krylov methods can be improved (sometimes drastically) using a preconditioning strategy. By preconditioning, it is meant that the original problem is replaced with another one, which is equivalent but easier to solve. More precisely, instead of solving v = b, we solve for A example the equivalent system 1 v = 1b (1.4) − − P A P where RN N is an invertible matrix called preconditioner. Two are × P ∈ the features of a good preconditioner. First, since at each iteration of a Krylov method we need to compute a matrix-vector product involving 1 − P (or, equivalently, we need to solve a linear system with coefficient matrix ), P this operation must be sufficiently cheap. Second, the system 1 must − P A have better spectral properties than the original system . A 4 1. Introduction System (1.4) is referred to as left preconditioning. Other reformulations for the system v = b using the same are possible, and will be discussed A P in the next chapter. For the moment we just note that, even if the original system matrix is symmetric, such property may not hold for the precon- A ditioned system. When a Krylov method is applied to a nonsymmetric system, the role of the eigenvalues of is less clear and no simple bound like (1.3) can be A found. In some cases, the eigenvalue distribution may even be misleading. Consider for example the gmres method. It has been proved in [65] that any nonincreasing convergence curve is possible for this method, regardless of the eigenvalues of the system matrix . Though this result might seem A catastrophic at first, it is been observed that for many practical problems well-clustered eigenvalues (away from 0) still result in fast convergence, even in the nonsymmetric case. As a consequence of the relation between the eigenvalues and the rate of convergence, it is extremely important to have meaningful spectral esti- mates for the matrices involved. What we mean here is that the estimates should accurately reflect the eigenvalue distributions of such matrices. In- deed, meaningfulspectralestimatesfor 1 giveapreciseideaoftheworst- − P A case scenario one can get when attempting to solve the linear system, and hence they can be considered a measure of the preconditioner quality. This is true even in the nonsymmetric case, although here information on the spectrum has to be coupled with some other information. We emphasize that even spectral estimates for the unpreconditioned sys- tem may be useful, for a general reason not directly related with Krylov methods. If is symmetric, as the matrices that will be discussed in this A thesis, then spectral bounds for the unpreconditioned system give also an upper bound for its condition number, which in turn can be used to estimate the stability of numerical methods employed for its solution. 1.1 Aims and outline The optimization problems that will be discussed in this thesis can be dividedintotwoclasses. First,weconsiderQuadraticProgrammingproblems oftheform(1.1)whenD = I andf = 0, i.e. whentheinequalityconstraints n are just x 0. These are known as QP problems in standard form. ≥ The second class of problems considered in this thesis stems from the discretization of certain PDE-constrained optimal control problems, with possible further inequality constraints on the control and/or on the state. In these cases, the variable x in (1.1) can be splitted in two parts, namely
Description: