ebook img

Nonlinear Programming 2. Proceedings of the Special Interest Group on Mathematical Programming Symposium Conducted by the Computer Sciences Department at the University of Wisconsin–Madison, April 15–17, 1974 PDF

358 Pages·1975·14.05 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Nonlinear Programming 2. Proceedings of the Special Interest Group on Mathematical Programming Symposium Conducted by the Computer Sciences Department at the University of Wisconsin–Madison, April 15–17, 1974

Nonlinear Programming 2 Edited byOLMangasarian R.R.Meyer S.M. Robinson Proceedings of the Special Interest Group on Mathematical Programming Symposium conducted by the Computer Sciences Department at the University of Wisconsin - Madison April 15-17, 1974 Academic Pressane New York San Francisco London 1975 A SUBSIDIARY OF HARCOURT BRACE JOVANOVICH, PUBLISHERS COPYRIGHT © 1975, BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER. ACADEMIC PRESS, INC. Ill Fifth Avenue, New York, New York 10003 United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. 24/28 Oval Road, London NW1 Library of Congress Cataloging in Publication Data Symposium on Nonlinear Programming, 2d, Madison, Wis., 1974. Nonlinear programming, 2. Bibliography: p. Includes index. 1. Nonlinear programming. I. Mangasarian, Olvi L., (date) II. Meyer Robert R. HI. Robinson, Stephen M. IV. Association for Computing Machinery. Special Interest Group on Mathematical Programming. V. Wisconsin. University- Madison. Computer Sciences Dept. VI. Title. T57.8.S9 1974 519.7'6 75-9854 ISBN 0-12-468650-8 PRINTED IN THE UNITED STATES OF AMERICA CONTRIBUTORS Egon Balas (2791 Robert G. Jeroslow (313), GSIA Carnegie-Mellon University 205 GSIA Pittsburgh, Pennsylvania 15213 Carnegie-Mellon University Pittsburgh, Pennsylvania 15213 Richard H. Bar tesi (231), Dept. of Mathematical Sciences Barry Kort (193), John Hopkins University 3D-502 Bell Labs Baltimore, Maryland 21218 Holmdel, New Jersey 07733 Dimitri P. Bertsekas (165), E. Polak (255), Coordinated Science Lab Department of EECS University of Illinois University of California Urbana, Illinois 61801 Berkeley, California 94720 Roger Fletcher (121), Michael J. D. Powell (1), Department of Mathematics Building 8.9 The University A.E.R.E., Harwell Dundee, Scotland Didcot, England Ubaldo Garcia-Palomares (101), Klaus Ritter (55), IVIC Mathematisches Institut A Apartado 1827 Universität Stuttgart Caracas, Venezuela Herdweg 23 A. A. Goldstein (215), 7 Stuttgart N, Germany Department of Mathematics Kurt Spielberg (333), University of Washington Seattle, Washington 98195 38 Knoll wood Drive Cherry Hill, New Jersey 08034 Pierre Huard (29), 21 Rés. Elysée I Monique Guignard Spielberg (333), 78170 La Celle-St. Cloud 38 Knoll wood Drive France Cherry Hill, New Jersey 08034 /. Teodoru (255), Department of EECS University of California Berkeley, California 94720 vu PREFACE In May 1970 the first symposium in Madison on nonlinear programming took place; its proceedings were published the same year by Academic Press. In April 1974, under the sponsorship of the Special Interest Group on Mathematical Programming (SIGMAP) of the Association for Computing Machinery, a second symposium on nonlinear programming was conducted at Madison by the Computer Sciences Department of the University of Wisconsin. These are the proceedings of that second symposium. This volume contains thirteen papers. Two of the papers (the ones by Fletcher and Ritter) were not presented at the symposium because the authors were unable to be present. In the paper by Powell global and superlinear convergence of a class of algorithms is obtained by imposing changing bounds on the variables of the problem. In the paper by Huard convergence of the well-known reduced gradient method is established under suitable conditions. In the paper by Ritter a superlinearly convergent quasi-Newton method for unconstrained minimization is given. Garcia-Palomares gives a superlinearly convergent algorithm for linearly constrained optimization problems. The next three papers, by Fletcher, Bertsekas and Kort, give exceptionally penetrating presentations of one of the most recent and effective methods for constrained optimization, namely the method of augmented Lagrangians. In the paper by Goldstein a method for handling mini­ mization problems with discontinuous derivatives is given. Bartels discusses the advantages of factorizations of updatings for Jacobian-related matrices in mini­ mization problems. Polak and Teodoru give Newton-like methods for the solution of nonlinear equations and inequalities. The papers by Balas, by Jeroslow, and by Guignard and Spielberg deal with various aspects of integer programming. It is hoped that these papers communicate the richness and diversity that exist today in the research of some of the leading experts in nonlinear programming. The editors would like to thank Dr. Michael D. Grigoriadis, SIGMAP Chairman, for his interest and encouragement of the symposium, and Professor L. Fox, editor of the Journal of the Institute of Mathematics and Its Applications, for permission to include the paper by Fletcher. We also would like to thank Mrs. Dale M. Malm, the symposium secretary, for her efficient handling of the sym­ posium arrangements and her expert typing of the proceedings manuscript. O.L. Mangasarian R.R. Meyer S.M. Robinson ix CONVERGENCE PROPERTIES OF A CLASS OF MINIMIZATION ALGORITHMS by M.J.D. Powell1) ABSTRACT Many iterative algorithms for minimizing a func­ tion F (x) = F (χ-.,χ , . . . ,x ) require first derivative s 2 of F(x) to be calculated, but they maintain an approximation to the second derivative matrix auto­ matically. In order that the approximation is use­ ful, the change in x made by each iteration is subject to a bound that is also revised automati­ cally. Some convergence theorems for a class of minimization algorithms of this type are presented, which apply to methods proposed by Powell (1970) and by Fletcher (1972) . This theory has the follow­ ing three valuable features which are rather uncom­ mon. There is no need for the starting vector x to be close to the solution. The function F(x) need not be convex. Superlinear convergence is proved even though the second derivative approxima­ tions may not converge to the true second deriva­ tives at the solution. 1. The class of algorithms The methods under consideration are iterative. Given a starting vector x ι they generate a Computer Science and Systems Division, A.E.R.E. Harwell, Didcot, Berkshire, England 1 M. J. D. POWELL (k) sequence of points x (k=l,2,3,...), which is intended to converge to the point at which the objec­ tive function F(x) is least. The components of x, namely (x,/X >···/X ), are the variables of the ob­ 2 jective function. At the beginning of each iteration a starting (k) (k) point x is available, with a matrix B and ( k ) ( k ) a step-bound Δ . B is a square symmetric matrix whose elements are the approximations (1.1) B.[k) « d2F(x(k))/d d , Xi Xj (k) and Δ bounds the change in x made by the (k) (k) iteration. Both B and Δν are revised auto­ matically in accordance with some rules given later. (k) At x the first derivative vector (1.2) g(k) = V F(x(k)) is calculated. Also we use the notation g(x) and G(x) to denote the first derivative vector and the second derivative matrix of F(x) at a general point x . To define x we make use of the quadratic approximation F(x(k)+M « Φ(χ(Κ)+δ_) (1.3) = F(x(k)) + 6_Tg(k) (k) Because most gradient algorithms terminate if 9_ is zero, and because this paper studies convergence properties as k increases, we suppose that g is never identically zero. Therefore we can 2 MINIMIZATION ALGORITHMS (k) calculate a value of <S, <5_ say, such that the inequality (1.4) Φ (x(k) +6,(k) ) < F(x(k)) is satisfied. Then x is defined by the equation 'x(k) 6(k>, F(x(k> + 6(k)) < F(x(k>) + (1.5) x(k+1) - )- xW , F(x(k) ( k )) >-F(x(k)), + i which provides the condition (1.6) F(x(k+1)) < F(x(k)) . In order to obtain fast convergence ultimately, (k) we note that, if the matrix B is positive definite, the least value of expression (1.3) occurs when 6_ is the vector -[B j 3. · Therefore, if the inequality (1.7) IICB^]"1 (k)|| <- A(k) â (k) is satisfied, and if B is positive definite, (k) 6_ is defined by the Newton formula (1.8) £(k)--[B(k)]-1 <k). a (k) (k) Otherwise the length of 6_ is restricted by â , so we force the equation (1.9) ||^(k) || = A(k) . (k) On every iteration the choice of δ_ must satisfy the inequality (1.10) F(x(k)) - Φ(χ(1° + ^(k)) * c ||g(k)||min[||^(k)|l , x llg(k)ll/ l|B(k)|l ] , 3 M. J. D. POWELL where c, is a positive constant, which is stronger than condition (1.4). It is proved in Section 5 that equation (1.8) satisfies this condition. Here and throughout the paper the vector norms are Euclidean, and the matrix norms are subordinate to the vector norms. Because the constant c, in expression (1.10) is arbitrary, this choice of norm does not lose any generality. This remark also applies to the other conditions that are imposed on the class of algorithms. (k) We have been vague about the definition of δ_ deliberately, in order to increase the range of applicability of the convergence theorems that are presented. The generality of condition (1.10) is (k) useful because it allows 6_ to be specified by either the Levenberg/Marquardt (Marquardt, 1963) technique or by the dog-leg technique (Powell, 1970). This statement is proved in Section 5 . The initial second derivative approximation B may beany symmetric matrix, and any method may be used to define the sequence of matrices (k) B (k=l,2,3,...), provided that the condition (1.11) ||B(k)|| , c + c I \\6_{i)\\ 2 3 Z J i=l is satisfied, where c and c^ are positive con­ 2 stants. However a stronger condition is needed if one requires the rate of convergence of the sequence (k) x (k=l,2,3, . . . ) to be superlinear. A suitable condition is given later. 4 MINIMIZATION ALGORITHMS (k) The general strategy for revising Δ is that it may sometimes increase, but it is decreased when the actual change in the objective function CF(x(k)) - F(x(k) + 6_(k)) ] is much worse than the predicted change CF(x(k)) - Φ (x(k) +6_(k) ) ] . Thus the bound on the change in x is related automati­ cally to the accuracy of the approximation (1.3). Specifically we require positive constants ο.,ο^,ο^ and c- such that, when the inequality (1.12) F(x_(k)) - F(x(k)+^(k)) > c {F(x(k)) 4 (ki .(kK-, - AΦ/ (x '+6_v ) } is obtained, then Δ satisfies the bounds (1.13) ||i(k)|| <- A(k+1) <- c ||i(k)|| , 5 (k) and, when inequality (1.12) fails, Δν satisfies the bounds (1.14) c P(k()k|)| ,. ^< ^A(^k +1)* <c „ ||, .δÄ(_k ) M 6 7 As well as being positive, these constants must obey the conditions c < 1, c_ ^ 1 and c^ ^ c« < 1 . A4 5 6 7 Moreover we require each algorithm of the class to impose a fixed upper bound on the step-lengths, so we have a condition of the form (1.15) Δ(Κ) < Δ , k = 1,2,3,. . . . The class of algorithms that is analysed con­ sists of the algorithms that meet all the conditions given so far. We claim that these conditions are sensible in practice, and that they allow some use­ ful methods, including an algorithm proposed by Powell (1970) and the version of Fletcher's (1972) hypercube method when there are no constraints on the variables. M. J. D. POWELL In Section 2 it is proved that, under very mild conditions on F(x), each algorithm of the class provides the limit (1.16) lim inf ||g;(k) 11 = 0 . (k) Therefore, if oneof the points x falls into a region where F(x) is locally convex, if F(x) has a local minimum in this region, and if the (k) step bounds Δ and the inequality (1.6) keep (k) the later points of the sequence xv ' (k=l,2,3,...) inside the region, then convergence is obtained to the local minimum. Thus it is common for the (k) sequence x (k=l,2,3,...) to tend to a limit. It is proved in Section 3 that, when the points (k) x tend to a limit at which the second derivative matrix of F(x) is positive definite, then the sum Σ|| &_ || is convergent. In view of the fact (k) that the conditions (1.11) on B are not very restrictive, this result is quite pleasing, and also (k) it shows that the matrices B are uniformly bounded. However one would prefer a much stronger theorem (k) on the convergence of the sequence x (k=l,2,...), for the algorithms of the class that relate the (k) matrices B (k=l,2,3,...) to the second derivatives of F(x). For example the updating formula, that is obtained by symmetrizing the Broyden rank-one formula (Powell, 1970), satisfies condition (1.11) and provides the limit (1.17) ||g(x(kWk)) -g(k) - B (k) <5(k) || / || ^(k) | |- 0 . Therefore it is proved in Section 4 that the algorithms of the class that satisfy condition (1.17) 6

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.