ebook img

The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation PDF

348 Pages·2012·1.259 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation

The Art of Differentiating Computer Programs An Introduction to Algorithmic Differentiation SE24_Naumann_FM-08-10-11.indd 1 10/28/2011 11:49:08 AM The SIAM series on Software, Environments, and Tools focuses on the practical implementation of computational methods and the high performance aspects of scientific computation by emphasizing in-demand software, computing environments, and tools for computing. Software technology development issues such as current status, applications and algorithms, mathematical software, software tools, languages and compilers, computing environments, and visualization are presented. Editor-in-Chief Jack J. Dongarra University of Tennessee and Oak Ridge National Laboratory Editorial Board James W. Demmel, University of California, Berkeley Dennis Gannon, Indiana University Eric Grosse, AT&T Bell Laboratories Jorge J. Moré, Argonne National Laboratory Software, Environments, and Tools Uwe Naumann, The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation C. T. Kelley, Implicit Filtering Jeremy Kepner and John Gilbert, editors, Graph Algorithms in the Language of Linear Algebra Jeremy Kepner, Parallel MATLAB for Multicore and Multinode Computers Michael A. Heroux, Padma Raghavan, and Horst D. Simon, editors, Parallel Processing for Scientific Computing Gérard Meurant, The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations Bo Einarsson, editor, Accuracy and Reliability in Scientific Computing Michael W. Berry and Murray Browne, Understanding Search Engines: Mathematical Modeling and Text Retrieval, Second Edition Craig C. Douglas, Gundolf Haase, and Ulrich Langer, A Tutorial on Elliptic PDE Solvers and Their Parallelization Louis Komzsik, The Lanczos Method: Evolution and Application Bard Ermentrout, Simulating, Analyzing, and Animating Dynamical Systems: A Guide to XPPAUT for Researchers and Students V. A. Barker, L. S. Blackford, J. Dongarra, J. Du Croz, S. Hammarling, M. Marinova, J. Was´niewski, and P. Yalamov, LAPACK95 Users’ Guide Stefan Goedecker and Adolfy Hoisie, Performance Optimization of Numerically Intensive Codes Zhaojun Bai, James Demmel, Jack Dongarra, Axel Ruhe, and Henk van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide Lloyd N. Trefethen, Spectral Methods in MATLAB E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users’ Guide, Third Edition Michael W. Berry and Murray Browne, Understanding Search Engines: Mathematical Modeling and Text Retrieval Jack J. Dongarra, Iain S. Duff, Danny C. Sorensen, and Henk A. van der Vorst, Numerical Linear Algebra for High-Performance Computers R. B. Lehoucq, D. C. Sorensen, and C. Yang, ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods Randolph E. Bank, PLTMG: A Software Package for Solving Elliptic Partial Differential Equations, Users’ Guide 8.0 L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley, ScaLAPACK Users’ Guide Greg Astfalk, editor, Applications on Advanced Architecture Computers Roger W. Hockney, The Science of Computer Benchmarking Françoise Chaitin-Chatelin and Valérie Frayssé, Lectures on Finite Precision Computations SE24_Naumann_FM-08-10-11.indd 2 10/28/2011 11:49:08 AM The Art of Differentiating Computer Programs An Introduction to Algorithmic Differentiation Uwe Naumann RWTH Aachen University Aachen, Germany Society for Industrial and Applied Mathematics Philadelphia SE24_Naumann_FM-08-10-11.indd 3 10/28/2011 11:49:09 AM Copyright © 2012 by the Society for Industrial and Applied Mathematics 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. Trademarked names may be used in this book without the inclusion of a trademark symbol. These names are used in an editorial context only; no infringement of trademark is intended. Ampl is a registered trademark of AMPL Optimization LLC, Lucent Technologies Inc. Linux is a registered trademark of Linus Torvalds. Maple is a trademark of Waterloo Maple, Inc. Mathematica is a registered trademark of Wolfram Research, Inc. MATLAB is a registered trademark of The MathWorks, Inc. For MATLAB product information, please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 USA, 508-647-7000, Fax: 508-647-7001, [email protected], www.mathworks.com. NAG is a registered trademark of the Numerical Algorithms Group. Library of Congress Cataloging-in-Publication Data Naumann, Uwe, 1969- The art of differentiating computer programs : an introduction to algorithmic differentiation/ Uwe Naumann. p. cm. -- (Software, environments, and tools) Includes bibliographical references and index. ISBN 978-1-611972-06-1 1. Computer programs. 2. Automatic differentiations. 3. Sensitivity theory (Mathematics) I. Title. QA76.76.A98N38 2011 003’.3--dc23 2011032262 Partial royalties from the sale of this book are placed in a fund to help students attend SIAM meetings and other SIAM-related activities. This fund is administered by SIAM, and qualified individuals are encouraged to write directly to SIAM for guidelines. is a registered trademark. SE24_Naumann_FM-08-10-11.indd 4 10/28/2011 11:49:09 AM To Ines, Pia, and Antonia t SE24_Naumann_FM-08-10-11.indd 5 10/28/2011 11:49:09 AM Contents Preface xi Acknowledgments xv Optimality xvii 1 MotivationandIntroduction 1 1.1 Motivation: Derivativesfor... . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 ...SystemsofNonlinearEquations . . . . . . . . . . . . 2 1.1.2 ...NonlinearProgramming . . . . . . . . . . . . . . . . 9 1.1.3 ...NumericalLibraries . . . . . . . . . . . . . . . . . . 22 1.2 ManualDifferentiation . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.3 ApproximationofDerivatives . . . . . . . . . . . . . . . . . . . . . . 27 1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1.4.1 FiniteDifferencesandFloating-PointArithmetic . . . . . 34 1.4.2 DerivativesforSystemsofNonlinearEquations . . . . . 34 1.4.3 DerivativesforNonlinearProgramming . . . . . . . . . 35 1.4.4 DerivativesforNumericalLibraries . . . . . . . . . . . . 35 2 FirstDerivativeCode 37 2.1 Tangent-LinearModel . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.1.1 Tangent-LinearCodebyForwardModeAD . . . . . . . 40 2.1.2 Tangent-LinearCodebyOverloading . . . . . . . . . . . 48 2.1.3 SeedingandHarvestingTangent-LinearCode . . . . . . 51 2.2 AdjointModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.2.1 AdjointCodebyReverseModeAD . . . . . . . . . . . . 56 2.2.2 AdjointCodebyOverloading . . . . . . . . . . . . . . . 71 2.2.3 SeedingandHarvestingAdjointCode . . . . . . . . . . . 76 2.3 CallTreeReversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2.3.1 CallTreeReversalModes . . . . . . . . . . . . . . . . . 80 2.3.2 CallTreeReversalProblem . . . . . . . . . . . . . . . . 81 2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 2.4.1 CodeDifferentiationRules . . . . . . . . . . . . . . . . 86 2.4.2 DerivativesforSystemsofNonlinearEquations . . . . . 87 2.4.3 DerivativesforNonlinearProgramming . . . . . . . . . 88 vii viii Contents 2.4.4 DerivativesforNumericalLibraries . . . . . . . . . . . . 88 2.4.5 CallTreeReversal . . . . . . . . . . . . . . . . . . . . . 88 3 HigherDerivativeCode 91 3.1 NotationandTerminology . . . . . . . . . . . . . . . . . . . . . . . . 91 3.2 Second-OrderTangent-LinearCode . . . . . . . . . . . . . . . . . . . 98 3.2.1 SourceTransformation . . . . . . . . . . . . . . . . . . . 100 3.2.2 Overloading . . . . . . . . . . . . . . . . . . . . . . . . 102 3.3 Second-OrderAdjointCode . . . . . . . . . . . . . . . . . . . . . . . 104 3.3.1 SourceTransformation . . . . . . . . . . . . . . . . . . . 110 3.3.2 Overloading . . . . . . . . . . . . . . . . . . . . . . . . 121 3.3.3 CompressionofSparseHessians . . . . . . . . . . . . . 129 3.4 HigherDerivativeCode . . . . . . . . . . . . . . . . . . . . . . . . . 131 3.4.1 Third-OrderTangent-LinearCode . . . . . . . . . . . . . 134 3.4.2 Third-OrderAdjointCode . . . . . . . . . . . . . . . . . 136 3.4.3 FourthandHigherDerivativeCode . . . . . . . . . . . . 142 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 3.5.1 SecondDerivativeCode . . . . . . . . . . . . . . . . . . 144 3.5.2 UseofSecondDerivativeModels . . . . . . . . . . . . . 144 3.5.3 ThirdandHigherDerivativeModels . . . . . . . . . . . 144 4 DerivativeCodeCompilers—AnIntroductoryTutorial 147 4.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 4.2 FundamentalConceptsandTerminology . . . . . . . . . . . . . . . . 153 4.3 LexicalAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4.3.1 FromREtoNFA . . . . . . . . . . . . . . . . . . . . . . 157 4.3.2 FromNFAtoDFAwithSubsetConstruction . . . . . . . 157 4.3.3 MinimizationofDFA . . . . . . . . . . . . . . . . . . . 158 4.3.4 flex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 4.4 SyntaxAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 4.4.1 Top-DownParsing . . . . . . . . . . . . . . . . . . . . . 166 4.4.2 Bottom-UpParsing . . . . . . . . . . . . . . . . . . . . 168 4.4.3 ASimpleLRLanguage . . . . . . . . . . . . . . . . . . 169 4.4.4 ASimpleOperatorPrecedenceLanguage . . . . . . . . . 174 4.4.5 ParserforSL2Programswithflexandbison . . . . . 175 4.4.6 Interactionbetweenflexandbison . . . . . . . . . . 180 4.5 Single-PassDerivativeCodeCompilers. . . . . . . . . . . . . . . . . 185 4.5.1 AttributeGrammars . . . . . . . . . . . . . . . . . . . . 185 4.5.2 Syntax-DirectedAssignment-LevelSAC . . . . . . . . . 188 4.5.3 Syntax-DirectedTangent-LinearCode . . . . . . . . . . 194 4.5.4 Syntax-DirectedAdjointCode . . . . . . . . . . . . . . . 197 4.6 TowardMultipassDerivativeCodeCompilers . . . . . . . . . . . . . 204 4.6.1 SymbolTable . . . . . . . . . . . . . . . . . . . . . . . 205 4.6.2 ParseTree . . . . . . . . . . . . . . . . . . . . . . . . . 206 4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 4.7.1 LexicalAnalysis . . . . . . . . . . . . . . . . . . . . . . 208 4.7.2 SyntaxAnalysis . . . . . . . . . . . . . . . . . . . . . . 208 Contents ix 4.7.3 Single-PassDerivativeCodeCompilers . . . . . . . . . . 208 4.7.4 TowardMultipassDerivativeCodeCompilers . . . . . . 208 5 dcc—APrototypeDerivativeCodeCompiler 209 5.1 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 5.1.1 Tangent-LinearCodebydcc . . . . . . . . . . . . . . . 210 5.1.2 AdjointCodebydcc . . . . . . . . . . . . . . . . . . . 211 5.1.3 Second-OrderTangent-LinearCodebydcc . . . . . . . 213 5.1.4 Second-OrderAdjointCodebydcc . . . . . . . . . . . 215 5.1.5 HigherDerivativeCodebydcc . . . . . . . . . . . . . . 221 5.2 Installationof dcc . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 5.3 Useof dcc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 5.4 IntraproceduralDerivativeCodebydcc . . . . . . . . . . . . . . . . 222 5.4.1 Tangent-LinearCode. . . . . . . . . . . . . . . . . . . . 222 5.4.2 AdjointCode . . . . . . . . . . . . . . . . . . . . . . . . 224 5.4.3 Second-OrderTangent-LinearCode . . . . . . . . . . . . 226 5.4.4 Second-OrderAdjointCode . . . . . . . . . . . . . . . . 227 5.4.5 HigherDerivativeCode . . . . . . . . . . . . . . . . . . 229 5.5 RunTimeofDerivativeCodebydcc . . . . . . . . . . . . . . . . . 230 5.6 InterproceduralDerivativeCodebydcc . . . . . . . . . . . . . . . . 231 5.6.1 Tangent-LinearCode. . . . . . . . . . . . . . . . . . . . 231 5.6.2 AdjointCode . . . . . . . . . . . . . . . . . . . . . . . . 232 5.6.3 Second-OrderTangent-LinearCode . . . . . . . . . . . . 235 5.6.4 Second-OrderAdjointCode . . . . . . . . . . . . . . . . 235 5.6.5 HigherDerivativeCode . . . . . . . . . . . . . . . . . . 236 5.7 TowardReality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 5.7.1 Tangent-LinearCode. . . . . . . . . . . . . . . . . . . . 237 5.7.2 AdjointCode . . . . . . . . . . . . . . . . . . . . . . . . 237 5.7.3 Second-OrderTangent-LinearCode . . . . . . . . . . . . 239 5.7.4 Second-OrderAdjointCode . . . . . . . . . . . . . . . . 239 5.8 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 AppendixA:DerivativeCodebyOverloading 243 A.1 Tangent-LinearCode . . . . . . . . . . . . . . . . . . . . . . . . . . 243 A.2 AdjointCode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 A.3 Second-OrderTangent-LinearCode . . . . . . . . . . . . . . . . . . . 249 A.4 Second-OrderAdjointCode . . . . . . . . . . . . . . . . . . . . . . . 251 AppendixB:Syntaxof dccInput 257 B.1 bisonGrammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 B.2 flexGrammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 AppendixC:(Hintson)Solutions 261 C.1 Chapter1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 C.1.1 Exercise1.4.1 . . . . . . . . . . . . . . . . . . . . . . . 261 C.1.2 Exercise1.4.2 . . . . . . . . . . . . . . . . . . . . . . . 262 C.1.3 Exercise1.4.3 . . . . . . . . . . . . . . . . . . . . . . . 266 C.1.4 Exercise1.4.4 . . . . . . . . . . . . . . . . . . . . . . . 269 x Contents C.2 Chapter2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 C.2.1 Exercise2.4.1 . . . . . . . . . . . . . . . . . . . . . . . 269 C.2.2 Exercise2.4.2 . . . . . . . . . . . . . . . . . . . . . . . 276 C.2.3 Exercise2.4.3 . . . . . . . . . . . . . . . . . . . . . . . 281 C.2.4 Exercise2.4.4 . . . . . . . . . . . . . . . . . . . . . . . 283 C.2.5 Exercise2.4.5 . . . . . . . . . . . . . . . . . . . . . . . 286 C.3 Chapter3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 C.3.1 Exercise3.5.1 . . . . . . . . . . . . . . . . . . . . . . . 295 C.3.2 Exercise3.5.2 . . . . . . . . . . . . . . . . . . . . . . . 296 C.3.3 Exercise3.5.3 . . . . . . . . . . . . . . . . . . . . . . . 298 C.4 Chapter4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 C.4.1 Exercise4.7.1 . . . . . . . . . . . . . . . . . . . . . . . 308 C.4.2 Exercise4.7.2 . . . . . . . . . . . . . . . . . . . . . . . 309 C.4.3 Exercise4.7.3 . . . . . . . . . . . . . . . . . . . . . . . 312 C.4.4 Exercise4.7.4 . . . . . . . . . . . . . . . . . . . . . . . 322 Bibliography 333 Index 339 Preface “How sensitive are the values of the outputs of my computer program with respect to changes in the values of the inputs? How sensitive are these first-order sensitivities with respect to changes in the values of the inputs? How sensitive are the second-order sensitivitieswithrespecttochangesinthevaluesoftheinputs? ...” Computationalscientists,engineers,andeconomistsaswellasquantitativeanalysts incomputationalfinancetendtoaskthesequestionsonaregularbasis. Theywritecomputer programs in order to simulate diverse real-world phenomena. The underlying mathemat- icalmodelsoftendependonapossiblylargenumberof(typicallyunknownoruncertain) parameters. Valuesforthecorrespondinginputsofthenumericalsimulationprogramscan, for example, be the result of (typically error-prone) observations and measurements. If verysmallperturbationsintheseuncertainvaluesyieldlargechangesinthevaluesofthe outputs,thenthefeasibilityoftheentiresimulationbecomesquestionable. Nobodyshould makedecisionsbasedonsuchhighlyuncertaindata. Quantitative information about the extent of this uncertainty is crucial. First- and higher-ordersensitivitiesofoutputsofnumericalsimulationprogramswithrespecttotheir inputs(alsofirstandhigherderivatives)formthebasisforvariousapproximationsofuncer- tainty. Theyarealsocrucialingredientsofalargenumberofnumericalalgorithmsranging from the solution of (systems of) nonlinear equations to optimization under constraints givenas(systemsof)partialdifferentialequations. Thisbookdescribesasetoftechniques for modifying the semantics of numerical simulation programs such that the desired first andhigherderivativescanbecomputedaccuratelyandefficiently. Computerprogramsim- plementalgorithms. Consequently, thesubjectisknownasAlgorithmic(alsoAutomatic) Differentiation(AD). ADprovidestwofundamentalmodes. Inforwardmode,atangent-linearversionof the original program is built. The sensitivities of all outputs of the program with respect toitsinputscanbecomputedatacomputationalcostthatisproportionaltothenumberof inputs. Thecomputationalcomplexityissimilartothatoffinitedifferenceapproximation. Atthesametime,thedesiredderivativesarecomputedwithmachineaccuracy. Truncation isavoided. Reversemodeyieldsanadjointprogramthatcanbeusedtoperformthesametaskata computationalcostthatisproportionaltothenumberofoutputs. Forexample,inlarge-scale nonlinear optimization a scalar objective that is returned by the given computer program can depend on a very large number of input parameters. The adjoint program allows for thecomputationofthegradient(thefirst-ordersensitivitiesoftheobjectivewithrespectto allparameters)atasmallconstantmultipleR (typicallybetween3and30)ofthecostof runningtheoriginalprogram. Itoutperformsgradientaccumulationroutinesthatarebased xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.