ebook img

Information-based complexity of convex programming PDF

268 Pages·1995·1.185 MB·English
Save to my drive
Quick download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Information-based complexity of convex programming

1 TECHNION - THE ISRAEL INSTITUTE OF TECHNOLOGY FACULTY OF INDUSTRIAL ENGINEERING & MANAGEMENT INFORMATION-BASED COMPLEXITY OF CONVEX PROGRAMMING A. Nemirovski Fall Semester 1994/95 2 Information-Based Complexity of Convex Programming Goals: given a class of Convex Optimization problems, one may look for an efficient algo- rithm for the class, i.e., an algorithm with a ”good” (best possible, polynomial time,...) theo- retical worst-case efficiency estimate on the class. The goal of the course is to present a number of efficient algorithms for several standard classes of Convex Optimization problems. Thecoursedealswiththeblack-boxsettingofanoptimizationproblem(allknowninadvance is that the problem belongs to a given ”wide” class, say, is convex, convex of a given degree of smoothness, etc.; besides this a priory qualitative information, we have the possibility to ask an ”oracle” for quantitive local information on the objective and the constraints, like their values and derivatives at a point). We present results on the associated with this setting complexity of standard problem classes (i.e. the best possible worst-case # of oracle calls which allows to solve any problem from the class to a given accuracy) and focus on the corresponding optimal algorithms. Duration: one semester Prerequisites: knowledgeofelementaryCalculus,LinearAlgebraandofthebasicconcepts of Convex Analysis (like convexity of functions/sets and the notion of subgradient of a convex function) is welcomed, although is not absolutely necessary. Contents: Introduction: problem complexity and method efficiency in optimization Methods with linear dimension-dependent convergence from bisection to the cutting plane scheme how to divide a n-dimensional pie: the Center-of-Gravity method the Outer Ellipsoid method polynomial solvability of Linear Programming the Inner Ellipsoid method convex-concave games and variational inequalities with monotone operators Large-scale problems and methods with dimension-independent convergence subradient and mirror descent methods for nonsmooth convex optimization optimal methods for smooth convex minimization strongly convex unconstrained problems How to solve a linear system: optimal iterative methods for unconstrained convex quadratic minimization 3 About Exercises The majority of Lectures are accompanied by the ”Exercise” sections. In several cases, the exercises are devoted to the lecture where they are placed; sometimes they prepare the reader to the next lecture. The mark ∗ at the word ”Exercise” or at an item of an exercise means that you may use hints given in Appendix ”Hints”. A hint, in turn, may refer you to the solution of the exercise given in the Appendix ”Solutions”; this is denoted by the mark +. Some exercises are marked by + rather than by ∗; this refers you directly to the solution of an exercise. Exercises marked by # are closely related to the lecture where they are placed; it would be a good thing to solve such an exercise or at least to become acquainted with its solution (if any is given). Exercises which I find difficult are marked with >. The exercises, usually, are not that simple. They in no sense are obligatory, and the reader is not expected to solve all or even the majority of the exercises. Those who would like to work on the solutions should take into account that the order of exercises is important: a problem which could cause serious difficulties as it is becomes much simpler in the context (at least I hope so). 4 Contents 1 Introduction: what the course is about 9 1.1 Example: one-dimensional convex problems . . . . . . . . . . . . . . . . . . . . . 9 1.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3 Exercises: Brunn, Minkowski and convex pie . . . . . . . . . . . . . . . . . . . . 17 1.3.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3.2 Brunn, Minkowski and Convex Pie . . . . . . . . . . . . . . . . . . . . . . 18 2 Methods with linear convergence, I 29 2.1 Class of general convex problems: description and complexity . . . . . . . . . . . 29 2.2 Cutting Plane scheme and Center of Gravity Method . . . . . . . . . . . . . . . . 31 2.2.1 Case of problems without functional constraints . . . . . . . . . . . . . . 31 2.3 The general case: problems with functional constraints . . . . . . . . . . . . . . . 36 2.4 Exercises: Extremal Ellipsoids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.1 Tschebyshev-type results for sums of random vectors . . . . . . . . . . . . 46 3 Methods with linear convergence, II 51 3.1 Lower complexity bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 The Ellipsoid method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.1 Ellipsoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2.2 The Ellipsoid method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3 Exercises: The Center of Gravity and the Ellipsoid methods . . . . . . . . . . . . 61 3.3.1 Is it actually difficult to find the center of gravity? . . . . . . . . . . . . . 61 3.3.2 Some extensions of the Cutting Plane scheme . . . . . . . . . . . . . . . . 65 4 Polynomial solvability of Linear Programming 75 4.1 Classes P and NP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.1 Polynomial solvability of FLP . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.2 From detecting feasibility to solving linear programs . . . . . . . . . . . . 82 4.3 Exercises: Around the Simplex method and other Simplices . . . . . . . . . . . . 84 4.3.1 Example of Klee and Minty . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.2 The method of outer simplex . . . . . . . . . . . . . . . . . . . . . . . . . 85 5 6 CONTENTS 5 Linearly converging methods for games 87 5.1 Convex-concave games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2 Cutting plane scheme for games: updating localizers . . . . . . . . . . . . . . . . 89 5.3 Cutting plane scheme for games: generating solutions . . . . . . . . . . . . . . . 90 5.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.5 Exercises: Maximal Inscribed Ellipsoid . . . . . . . . . . . . . . . . . . . . . . . . 95 6 Variational inequalities with monotone operators 101 6.1 Variational inequalities with monotone operators . . . . . . . . . . . . . . . . . . 101 6.2 Cutting plane scheme for variational inequalities . . . . . . . . . . . . . . . . . . 108 6.3 Exercises: Around monotone operators . . . . . . . . . . . . . . . . . . . . . . . . 111 7 Large-scale optimization problems 115 7.1 Goals and motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.2 The main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.3 Upper complexity bound: the Gradient Descent . . . . . . . . . . . . . . . . . . . 117 7.4 The lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.5 Exercises: Around Subgradient Descent . . . . . . . . . . . . . . . . . . . . . . . 122 8 Subgradient Descent and Bundle methods 127 8.1 Subgradient Descent method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8.2 Bundle methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 8.2.1 The Level method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 8.2.2 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.3 Exercises: Mirror Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 9 Large-scale games and variational inequalities 143 9.1 Subrgadient Descent method for variational inequalities . . . . . . . . . . . . . . 144 9.2 Level method for variational inequalities and games. . . . . . . . . . . . . . . . . 147 9.2.1 Level method for games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9.2.2 Level method for variational inequalities . . . . . . . . . . . . . . . . . . . 150 9.3 Exercises: Around Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 9.3.1 ”Prox-Level” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 9.3.2 Level for constrained optimization . . . . . . . . . . . . . . . . . . . . . . 154 10 Smooth convex minimization problems 159 10.1 Traditional methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 10.2 Complexity of classes S (L,R) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 n 10.2.1 Upper complexity bound: Nesterov’s method . . . . . . . . . . . . . . . . 162 10.2.2 Lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 10.2.3 Appendix: proof of Proposition 10.2.1 . . . . . . . . . . . . . . . . . . . . 169 11 Constrained smooth and strongly convex problems 171 11.1 Composite problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 11.2 Gradient mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 11.3 Nesterov’s method for composite problems . . . . . . . . . . . . . . . . . . . . . . 175 CONTENTS 7 11.4 Smooth strongly convex problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 12 Unconstrained quadratic optimization 181 12.1 Complexity of quadratic problems: motivation . . . . . . . . . . . . . . . . . . . 181 12.2 Families of source-representable quadratic problems. . . . . . . . . . . . . . . . . 183 12.3 Lower complexity bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 12.4 Complexity of linear operator equations . . . . . . . . . . . . . . . . . . . . . . . 187 12.5 Ill-posed problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 12.6 Exercises: Around quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . 192 13 Optimality of the Conjugate Gradient method 195 13.1 The Conjugate Gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 13.2 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 13.3 Proof of the main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 13.3.1 CGM and orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . 198 13.3.2 Expression for inaccuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 13.3.3 Momentum inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 13.3.4 Proof of (13.3.20) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 13.3.5 Concluding the proof of Theorem 13.2.1 . . . . . . . . . . . . . . . . . . . 204 13.4 Exercises: Around Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . 206 14 Convex Stochastic Programming 211 14.1 Stochastic Approximation: simple case . . . . . . . . . . . . . . . . . . . . . . . . 214 14.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 14.1.2 The Stochastic Approximation method . . . . . . . . . . . . . . . . . . . . 214 14.1.3 Comments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 14.2 MinMax Stochastic Programming problems . . . . . . . . . . . . . . . . . . . . . 218 Hints to exercises 221 Solutions to exercises 235 8 CONTENTS Lecture 1 Introduction: what the course is about What we are interested in the course are theoretically efficient methods for convex optimization problems. Almost each word in the previous sentence should be explained, and this explanation, that is, formulation of our goals, is the main thing I am going to speak about today. I believe that the best way to explain what we are about to do is to start with a simple example - one-dimensional convex minimization - where everything is seen. 1.1 Example: one-dimensional convex problems Consider one-dimensional convex problems minimize f(x) s.t. x ∈ G = [a,b], where [a,b] is a given finite segment on the axis. It is also known that our objective f is a continuous convex function on G; for the sake of simplicity, assume that we know bounds, let them be 0 and V, for the values of the objective on G. Thus, all we know about the objective is that it belongs to the family P = {f : [a,b] → R | f is convex and continuous; 0 ≤ f(x) ≤ V,x ∈ [a,b]}. And what we are asked to do is to find, for a given positive ε, an ε-solution to the problem, i.e., a point x¯ ∈ G such that f(x¯)−f∗ ≡ f(x¯)−minf ≤ ε. G Of course, our a priori knowledge on the objective given by the inclusion f ∈ P, is, for small ε, far from being sufficient for finding an ε-solution, and we need some source of quantitative informationontheobjective. Thestandardassumptionherewhichcomesfromtheoptimization practice is that we can compute the value and a subgradient of the objective at a point, i.e., we have access to a subroutine, an oracle O, which gets, as an input, a point x from our segment and returns the value f(x) and a subgradient f(cid:48)(x) of the objective at the point. We have subject the input to the subroutine to the restriction a < x < b, since the objective, generally speaking, is not defined outside the segment [a,b], and its subgradient might be unde- fined at the endpoints of the segment as well. I should also add that the oracle is not uniquely 9 10 LECTURE 1. INTRODUCTION: WHAT THE COURSE IS ABOUT defined by the above description; indeed, at some points f may have a ”massive” set of sub- gradients, not a single one, and we did not specify how the oracle at such a point chooses the subgradient to be reported. We need exactly one hypothesis of this type, namely, we assume the oracle to be local: the information on f reported at a point x must be uniquely defined by the behaviour of f in a neighbourhood of x: {f,f¯∈ P,x ∈ int G, f ≡ f¯ in a neighbourhood of x } ⇒ O(f,x) = O(f¯,x). What we should do is to find a method which, given on input the desired value of accuracy ε, after a number of oracle calls produces an ε-solution to the problem. And what we are interested in is the most efficient method of this type. Namely, given a method which solves every problem from our family to the desired accuracy in finite number of oracle calls, let us define the worst-case complexity N of the method as the maximum, over all problems from the family, of the number of calls; what we are looking for is exactly the method of the minimal worst-case complexity. Thus, the question we are interested in is Given - the family P of objectives f, - a possibility to compute values and subgradients of f at a point of (a,b), - desired accuracy ε, what is the minimal #, Compl(ε), of computations of f and f(cid:48) which is sufficient, for all f ∈ P, to form an ε-minimizer of f? What is the corresponding - i.e., the optimal - minimization method? Of course, to answer the question we should first specify the notion of a method. This is very simple task. Indeed, let us think what a method, let it be called M, could be. It should perform sequential calls for the oracle, at i-th step forwarding to it certain input x ∈ (a,b), let i us call this input i-th search point. The very first input x is generated by the method when 1 the method has no specific information on the particular objective f the method is applied to; thus, the first search point should be objective-independent: x = SM. (1.1.1) 1 1 Now, the second search point is generated after the method knows the value and a subgradient of the objective at the first search point, and x should be certain function of this information: 2 x = SM(f(x ),f(cid:48)(x )). (1.1.2) 2 2 1 1 Similarly, i-th search point is generated by the method when it already knows the values and the subgradients of f at the previous search points, and this is all the method knows about f so far, so that i-th search point should be certain function of the values and the subgradients of the objective at the previous search points: x = SM(f(x ),f(cid:48)(x );...;f(x ),f(cid:48)(x )). (1.1.3) i i 1 1 i−1 i−1 We conclude that the calls to the oracle are defined by certain recurrence of the type (1.1.3); the rules governing this recurrence, i.e., the functions SM(·), are specific for the method and i form a part of its description.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.