Lecture Notes in Control and Information Sciences 434 Editors ProfessorDr.-Ing.ManfredThoma InstitutfuerRegelungstechnik,UniversitätHannover,Appelstr.11,30167Hannover, Germany E-mail:[email protected] ProfessorDr.FrankAllgöwer InstituteforSystemsTheoryandAutomaticControl,UniversityofStuttgart, Pfaffenwaldring9,70550Stuttgart,Germany E-mail:[email protected] ProfessorDr.ManfredMorari ETH/ETLI29,Physikstr.3,8092Zürich,Switzerland E-mail:[email protected] SeriesAdvisoryBoard P.Fleming UniversityofSheffield,UK P.Kokotovic UniversityofCalifornia,SantaBarbara,CA,USA A.B.Kurzhanski MoscowStateUniversity,Russia H.Kwakernaak UniversityofTwente,Enschede,TheNetherlands A.Rantzer LundInstituteofTechnology,Sweden J.N.Tsitsiklis MIT,Cambridge,MA,USA Forfurthervolumes: http://www.springer.com/series/642 S. Bhatnagar, H.L. Prasad, and L.A. Prashanth Stochastic Recursive Algorithms for Optimization Simultaneous Perturbation Methods ABC Authors Prof.S.Bhatnagar L.A.Prashanth DepartmentofComputerScience DepartmentofComputerScience andAutomation andAutomation IndianInstituteofScience IndianInstituteofScience Bangalore Bangalore India India H.L.Prasad DepartmentofComputerScience andAutomation IndianInstituteofScience Bangalore India ISSN0170-8643 e-ISSN1610-7411 ISBN978-1-4471-4284-3 e-ISBN978-1-4471-4285-0 DOI10.1007/978-1-4471-4285-0 SpringerLondonHeidelbergNewYorkDordrecht LibraryofCongressControlNumber:2012941740 (cid:2)c Springer-VerlagLondon2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpub- lication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforany errorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespect tothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) To SB’sparentsDr. G. K. Bhatnagarand Mrs.S.K. Bhatnagar,hiswifeAartiand daughterShriya To HLP’s parentsDr.H. R. Laxminarayana Bhattaand Mrs.G. S.Shreemathi, brother-in-lawGiridharN. Bhat andsister VathsalaG. Bhat To LAP’s daughterSamudyata Preface TheareaofstochasticapproximationhasitsrootsinapaperpublishedbyRobbins and Monro in 1951, where the basic stochastic approximation algorithm was in- troduced.Eversince,ithasbeenappliedinavarietyofapplicationscuttingacross severaldisciplinessuchascontrolandcommunicationengineering,signalprocess- ing,roboticsandmachinelearning. KieferandWolfowitz,inapaperin1952(nearlysixdecadesago)publishedthe first stochastic approximationalgorithmfor optimization.The algorithmproposed by them was a gradient search algorithm that aimed at finding the maximum of a regression function and incorporated finite difference gradient estimates. It was later found that whereas the Kiefer-Wolfowitz algorithm is efficient in scenarios involving scalar parameters, this is not necessarily the case with vector parame- ters,particularlythoseforwhichtheparameterdimensionishigh.Theproblemthat arises is that the number of function measurements needed at each update epoch grows linearly with the parameter dimension. Many times, it is also possible that theobjectivefunctionisnotobservableassuchandoneneedstoresortto simula- tion.Insuchscenarios,withvectorparameters,onerequiresacorresponding(linear intheparameter-dimension)numberofsystemsimulations.Inthecaseoflargeor complexsystems,thiscanresultinasignificantcomputationaloverhead. Subsequently,inapaperpublishedin1992,Spallproposedastochasticapprox- imationschemeforoptimizationthatdoesarandomsearchintheparameterspace and only requires two system simulations regardless of the parameter dimension. This algorithm that came to be known as simultaneous perturbation stochastic approximation or SPSA for short, has become very popular because of its high efficiency, computational simplicity and ease of implementation. Amongst other impressive works, Katkovnik and Kulchitsky, in a paper published in 1972, also proposed a random search scheme (the smoothed functional (SF) algorithm) that onlyrequiresonesystemsimulationregardlessoftheparameterdimension.Subse- quentworkshowedthatatwo-simulationcounterpartofthisschemeperformswell inpractice.BoththeKatkovnik-KulchitskyaswellastheSpallapproachesinvolve perturbing the parameter randomly by generating certain i.i.d. random variables. VIII Preface The difference between these schemes lies in the distributions these perturbation randomvariablescanpossessandtheformsofthegradientestimators. Stochasticapproximationalgorithmsforoptimizationcanbeviewedascounter- partsofdeterministicsearchschemeswithnoise.Whereas,theSPSAandSFalgo- rithmsaregradient-basedalgorithms,duringthelastdecadeorso,therehavebeen paperspublishedonNewton-basedsearchschemesforstochasticoptimization.Ina paperin2000,SpallproposedthefirstNewton-basedalgorithmthatestimatedboth the gradientand the Hessian usinga simultaneousperturbationapproachincorpo- rating SPSA-type estimates. Subsequently, in papers published in 2005 and 2007, Bhatnagar proposed more Newton-based algorithms that develop and incorporate bothSPSAandSFtypeestimatesofthegradientandHessian.Inthistext,wecom- monlyrefertoallapproachesforstochasticoptimizationthatarebasedonrandomly perturbingparametersinordertoestimatethegradient/Hessianofagivenobjective functionassimultaneousperturbationmethods.Bhatnagarandcoauthorshavealso developedandappliedsuchapproachesforconstrainedstochasticoptimization,dis- crete parameter stochastic optimization and reinforcementlearning – an area that deals with the adaptive control of stochastic systems under real or simulated out- comes. The authorsof this bookhave also studied engineeringapplicationsof the simultaneous perturbation approaches for problems of performance optimization in domainssuch as communicationnetworks,vehiculartraffic controland service systems. Themainfocusofthistextisonsimultaneousperturbationmethodsforstochas- ticoptimization.Thisbookisdividedintosixpartsandcontainsatotaloffourteen chapters and five appendices. Part I of the text essentially provides an introduc- tiontooptimizationproblems-bothdeterministicandstochastic,givesanoverview of search algorithms and a basic treatment of the Robbins-Monro stochastic ap- proximationalgorithm as well as a generalmulti-timescale stochastic approxima- tionscheme.PartIIofthetextdealswithgradientsearchstochasticalgorithmsfor optimization.Inparticular,theKiefer-Wolfowitz,SPSAandSFalgorithmsarepre- sentedanddiscussed.PartIIIdealswithNewton-basedalgorithmsthatareinpartic- ularpresentedforthelong-runaveragecostobjective.Thesealgorithmsarebasedon SPSAandSFbasedestimatorsforboththegradientandtheHessian.PartIVofthe book deals with a few variationsto the generalscheme and applicationsof SPSA and SF based approachesthere. In particular, we consider adaptationsof simulta- neous perturbation approaches for problems of discrete optimization, constrained optimization(underfunctionalconstraints) as well as reinforcementlearning.The long-runaveragecostcriterionwillbeconsideredherefortheobjectivefunctions. PartVofthebookdealswiththreeimportantapplicationsrelatedtovehiculartraf- ficcontrol,servicesystemsaswellascommunicationnetworks.Finally,fiveshort appendicesat the end summarize some of the basic material as well as important resultsusedinthetext. This book in many ways summarizesthe variousstrands of research on simul- taneousperturbationapproachesthatSBhasbeeninvolvedwith duringthecourse of the last fifteen years or so. Both HLP and LAP have also been working in this areaforoverfiveyearsnowandhavebeenactivelyinvolvedinthevariousaspects Preface IX oftheresearchreportedhere.Alargeportionofthistext(inparticular,PartsIII-V aswellasportionsofPartII)isbasedmainlyontheauthors’owncontributionsto thisarea.The textprovidesa compactcoverageof thematerialin a waythatboth researchersandpractitionersshouldfinduseful.Thechoiceoftopicsisintendedto coverasufficientwidthwhileremainingtiedtothecommonthemeofsimultaneous perturbationmethods. While we have made attempts at conveyingthe main ideas behindthevariousschemesandalgorithmsaswellastheconvergenceanalyses,we have also included sufficient material on the engineering applications of these al- gorithmsin order to highlightthe usefulness of these methodsin solving real-life engineeringproblems.Asmentionedbefore,anentirepartofthetext,namelyPart IV, comprising of three chapters is dedicated for this purpose. The text in a way providesabalancedcoverageofmaterialrelatedtoboththeoryandapplications. Acknowledgements SBwasfirstintroducedtotheareaofstochasticapproximationduringhisPh.Dwork withProf.VivekBorkarandProf.VinodSharmaattheIndianInstituteofScience. Subsequently,hebegantolookatsimultaneousperturbationapproacheswhiledo- ing a post doctoral with Prof. Steve Marcus and Prof. Michael Fu at the Institute forSystemsResearch,UniversityofMaryland,CollegePark.Hehasalsobenefitted significantlyfromreadingtheworksofProf.JamesSpallandthroughinteractions withhim.Hewouldliketothankallhiscollaboratorsovertheyears.Inparticular, he would like to thank Prof. Vivek Borkar,Prof. Steve Marcus, Prof. MichaelFu, Prof.RichardSutton,Prof.CsabaSzepesvari,Prof.VinodSharma,Prof.Karmeshu, Prof. M. Narasimha Murty, Prof. N. Hemachandra, Dr. Ambedkar Dukkipati and Dr. MohammedShahidAbdulla.He wouldlike to thankProf.AnuragKumarand Prof. K. V. S. Hari for severalhelpfuldiscussions on optimization approachesfor certain problemsin vehicular traffic control(during the course of a joint project), whichis also the topicofChapter 13in thisbook.SB considershimselffortunate to have had the pleasure of guiding and teaching several bright students at IISc. He would like to acknowledge the work done by all the current and former stu- dents of the Stochastic Systems Laboratory. A large part of SB’s research during thelasttenyearsatIISchasbeensupportedthroughprojectsfromtheDepartment of Science and Technology,Departmentof InformationTechnology,TexasInstru- ments, Satyam Computers, EMC and Wibhu Technologies.SB would also like to acknowledge the various institutions where he worked and visited during the last fifteen years where portions of the work reported here have been conducted: The Institute for Systems Research, University of Maryland, College Park; Vrije Uni- versiteit,Amsterdam;IndianInstituteofTechnology,Delhi;TheRLAILaboratory, UniversityofAlberta;andtheIndianInstituteofScience.Amajorpartofthework reportedherehasbeenconductedatIIScitself.Finally,hewouldliketo thankhis parentsDr. G. K. Bhatnagarand Mrs. S. K. Bhatnagarfor their support,help and guidanceallthroughtheyears,hiswifeAartianddaughterShriyafortheirpatience, X Preface understandingandsupport,andhisbrotherDr.Shashankforhisguidanceandteach- ingduringSB’sformativeyears. HLP’s interest in the area of control engineering and decision making, which was essentially sown in him by interactions with Prof. U. R. Prasad at I.I.Sc., Dr. K. N. Shubhanga and Jora M. Gonda at NITK, led him to the area of oper- ations research followed by that of stochastic approximation. He derives inspira- tionsfromtheworksofProf.VivekBorkar,Prof.JamesSpall,Prof.RichardSutton, Prof.ShalabhBhatnagarandseveraleminentpersonalitiesinthefieldofstochastic approximation.He thanksDr. Nirmit V. Desai at IBM Research, India, collabora- tionwithwhom,broughtupseveralnewstochasticapproximationalgorithmswith practicalapplicationstotheareaofservicesystems.HethanksProf.ManjunathKr- ishnapurandProf.P.S.Sastryforequippinghimwithmathematicalrigourneeded forstochasticapproximation.He thanksI.R.RaoatNITKwhohasbeenconstant sourceofmotivation.HethankshisfatherDr.H.R.LaxminarayanaBhatta,mother Mrs.G.S.Shreemathi,brother-in-lawGiridharN.BhatandsisterVathsalaG.Bhat, fortheirconstantsupportandunderstanding. LAP would like to thank his supervising professorSB for introducingstochas- tic optimization during his PhD work. The extensive interactions with SB served to sharpen the understanding of the subject. LAP would also like to thank HLP, collaborationwithwhomhasbeenmostvaluable.LAP’sprojectassociateshipwith Departmentof InformationTechnologyas wellas his internshipat IBM Research presentedmanyopportunitiesfordevelopingaswellasapplyingsimultaneousper- tubation methodsin variouspracticalcontexts and LAP would like to thank Prof. Anurag Kumar, Prof. K.V.S. Hari of ECE department, IISc and Nirmit Desai and GargiDasguptaofIBMResearchforseveralusefulinteractionsonthesubjectmat- ter.Finally,LAPwouldliketothankhisfamilymembers-particularlyhisparents, wifeandhisdaughter,fortheirsupportinthisendeavour. Bangalore, S.Bhatnagar May2012 H.L.Prasad L.A.Prashanth Contents PartI IntroductiontoStochasticRecursiveAlgorithms 1 Introduction................................................... 3 1.1 Introduction ............................................... 3 1.2 OverviewoftheRemainingChapters.......................... 7 1.3 ConcludingRemarks........................................ 11 References..................................................... 11 2 DeterministicAlgorithmsforLocalSearch........................ 13 2.1 Introduction ............................................... 13 2.2 DeterministicAlgorithmsforLocalSearch ..................... 14 References..................................................... 15 3 StochasticApproximationAlgorithms ............................ 17 3.1 Introduction ............................................... 17 3.2 TheRobbins-MonroAlgorithm............................... 18 3.2.1 ConvergenceoftheRobbins-MonroAlgorithm ........... 19 3.3 Multi-timescaleStochasticApproximation ..................... 23 3.3.1 ConvergenceoftheMulti-timescaleAlgorithm ........... 24 3.4 ConcludingRemarks........................................ 26 References..................................................... 27 PartII GradientEstimationSchemes 4 Kiefer-WolfowitzAlgorithm..................................... 31 4.1 Introduction ............................................... 31 4.2 TheBasicAlgorithm........................................ 31 4.2.1 ExtensiontoMulti-dimensionalParameter ............... 35 4.3 VariantsoftheKiefer-WolfowitzAlgorithm .................... 36 4.3.1 FixedPerturbationParameter .......................... 36 4.3.2 One-SidedVariants................................... 37 XII Contents 4.4 ConcludingRemarks........................................ 38 References..................................................... 38 5 GradientSchemes withSimultaneousPerturbationStochastic Approximation ................................................ 41 5.1 Introduction ............................................... 41 5.2 TheBasicSPSAAlgorithm .................................. 41 5.2.1 GradientEstimateUsingSimultaneousPerturbation ....... 42 5.2.2 TheAlgorithm ...................................... 43 5.2.3 ConvergenceAnalysis ................................ 44 5.3 VariantsoftheBasicSPSAAlgorithm ......................... 47 5.3.1 One-MeasurementSPSAAlgorithm .................... 47 5.3.2 One-SidedSPSAAlgorithm ........................... 49 5.3.3 FixedPerturbationParameter .......................... 49 5.4 GeneralRemarksonSPSAAlgorithms ........................ 51 5.5 SPSAAlgorithmswithDeterministicPerturbations .............. 52 5.5.1 PropertiesofDeterministicPerturbationSequences........ 52 5.5.2 HadamardMatrixBasedConstruction................... 54 5.5.3 Two-SidedSPSAwithHadamardMatrixPerturbations .... 56 5.5.4 One-SidedSPSAwithHadamardMatrixPerturbations..... 62 5.5.5 One-MeasurementSPSA with Hadamard Matrix Perturbations........................................ 63 5.6 SPSAAlgorithmsforLong-RunAverageCostObjective ......... 65 5.6.1 TheFramework...................................... 65 5.6.2 TheTwo-SimulationSPSAAlgorithm................... 65 5.6.3 Assumptions ........................................ 66 5.6.4 ConvergenceAnalysis ................................ 68 5.6.5 ProjectedSPSAAlgorithm ............................ 73 5.7 ConcludingRemarks........................................ 75 References..................................................... 75 6 SmoothedFunctionalGradientSchemes.......................... 77 6.1 Introduction ............................................... 77 6.2 GaussianBasedSFAlgorithm................................ 79 6.2.1 GradientEstimationviaSmoothing ..................... 79 6.2.2 TheBasicGaussianSFAlgorithm ...................... 81 6.2.3 ConvergenceAnalysisofGaussianSFAlgorithm ......... 82 6.2.4 Two-MeasurementGaussianSFAlgorithm............... 88 6.3 GeneralConditionsforaCandidateSmoothingFunction.......... 91 6.4 CauchyVariantoftheSFAlgorithm........................... 92 6.4.1 GradientEstimate.................................... 92 6.4.2 CauchySFAlgorithm ................................ 94 6.5 SFAlgorithmsfortheLong-RunAverageCostObjective ......... 94 6.5.1 TheG-SF1Algorithm ................................ 95 6.5.2 TheG-SF2Algorithm ................................ 99