Table Of Content

Bayesian Optimization Bayesian optimization is a methodology for optimizing expensive objective functions that has provensuccessinthesciences,engineering,andbeyond.Thistimelytextprovidesaself-contained andcomprehensiveintroductiontothesubject,startingfromscratchandcarefullydevelopingall thekeyideasalongtheway.Thisbottom-upapproachilluminatesunifyingthemesinthedesignof Bayesianoptimizationalgorithmsandbuildsasolidtheoreticalfoundationforapproachingnovel situations. Thecoreofthebookisdividedintothreemainparts,coveringtheoreticalandpracticalaspects ofGaussianprocessmodeling,theBayesianapproachtosequentialdecisionmaking,andthereal- izationandcomputationofpracticalandeffectiveoptimizationpolicies. Followingthisfoundationalmaterial,thebookprovidesanoverviewoftheoreticalconvergence results,asurveyofnotableextensions,acomprehensivehistoryofBayesianoptimization,andan extensiveannotatedbibliographyofapplications. Roman Garnett is Associate Professor in Computer Science and Engineering at Washington University in St.Louis. He has been a leader in the Bayesian optimization community since 2011, when he cofounded a long-running workshop on the subject at the NeurIPS conference. His researchfocusisdevelopingBayesianmethods–includingBayesianoptimization–forautomating scientiﬁcdiscovery. Published online by Cambridge University Press Published online by Cambridge University Press R O M A N G A R N E T T Washington University in St Louis B A Y E S I A N O P T I M I Z A T I O N Published online by Cambridge University Press ShaftesburyRoad,CambridgeCB28EA,UnitedKingdom OneLibertyPlaza,20thFloor,NewYork,NY10006,USA 477WilliamstownRoad,PortMelbourne,VIC3207,Australia 314–321,3rdFloor,Plot3,SplendorForum,JasolaDistrictCentre,NewDelhi–110025,India 103PenangRoad,#05–06/07,VisioncrestCommercial,Singapore238467 CambridgeUniversityPressispartofCambridgeUniversityPress&Assessment, adepartmentoftheUniversityofCambridge. WesharetheUniversity’smissiontocontributetosocietythroughthepursuitof education,learningandresearchatthehighestinternationallevelsofexcellence. www.cambridge.org Informationonthistitle:www.cambridge.org/9781108425780 DOI:10.1017/9781108348973 © RomanGarnett2023 Thispublicationisincopyright.Subjecttostatutoryexceptionandtotheprovisions ofrelevantcollectivelicensingagreements,noreproductionofanypartmaytake placewithoutthewrittenpermissionofCambridgeUniversityPress&Assessment. Firstpublished2023 PrintedintheUnitedKingdombyTJBooksLimited,PadstowCornwall AcataloguerecordforthispublicationisavailablefromtheBritishLibrary. ISBN978-1-108-42578-0Hardback CambridgeUniversityPress&Assessmenthasnoresponsibilityforthepersistence oraccuracyofURLsforexternalorthird-partyinternetwebsitesreferredtointhis publicationanddoesnotguaranteethatanycontentonsuchwebsitesis,orwill remain,accurateorappropriate. Published online by Cambridge University Press CONTENTS preface ix notation xiii 1 introduction 1 1.1 FormalizationofOptimization 2 1.2 TheBayesianApproach 5 2 gaussianprocesses 15 2.1 DefinitionandBasicProperties 16 2.2 InferencewithExactandNoisyObservations 18 2.3 OverviewofRemainderofChapter 26 2.4 JointGaussianProcesses 26 2.5 Continuity 28 2.6 Differentiability 30 2.7 ExistenceandUniquenessofGlobalMaxima 33 2.8 InferencewithNon-GaussianObservationsandConstraints 35 2.9 SummaryofMajorIdeas 41 3 modelingwithgaussianprocesses 45 3.1 ThePriorMeanFunction 46 3.2 ThePriorCovarianceFunction 49 3.3 NotableCovarianceFunctions 51 3.4 ModifyingandCombiningCovarianceFunctions 54 3.5 ModelingFunctionsonHigh-DimensionalDomains 61 3.6 SummaryofMajorIdeas 64 4 modelassessment,selection,andaveraging 67 4.1 ModelsandModelStructures 68 4.2 BayesianInferenceoverParametricModelSpaces 70 4.3 ModelSelectionviaPosteriorMaximization 73 4.4 ModelAveraging 74 4.5 MultipleModelStructures 78 4.6 AutomatingModelStructureSearch 81 4.7 SummaryofMajorIdeas 84 5 decisiontheoryforoptimization 87 5.1 IntroductiontoBayesianDecisionTheory 89 5.2 SequentialDecisionswithaFixedBudget 91 5.3 CostandApproximationoftheOptimalPolicy 99 5.4 Cost-AwareOptimizationandTerminationasaDecision 103 5.5 SummaryofMajorIdeas 106 6 utilityfunctionsforoptimization 109 6.1 ExpectedUtilityofTerminalRecommendation 109 6.2 CumulativeReward 114 v Published online by Cambridge University Press vi contents 6.3 InformationGain 115 6.4 DependenceonModelofObjectiveFunction 116 6.5 ComparisonofUtilityFunctions 117 6.6 SummaryofMajorIdeas 119 7 commonbayesianoptimizationpolicies 123 7.1 ExampleOptimizationScenario 124 7.2 Decision-TheoreticPolicies 124 7.3 ExpectedImprovement 127 7.4 KnowledgeGradient 129 7.5 ProbabilityofImprovement 131 7.6 MutualInformationandEntropySearch 135 7.7 Multi-ArmedBanditsandOptimization 141 7.8 MaximizingaStatisticalUpperBound 145 7.9 ThompsonSampling 148 7.10 OtherIdeasinPolicyConstruction 150 7.11 SummaryofMajorIdeas 156 8 computingpolicieswithgaussianprocesses 157 8.1 NotationforObjectiveFunctionModel 157 8.2 ExpectedImprovement 158 8.3 ProbabilityofImprovement 167 8.4 UpperConfidenceBound 170 8.5 ApproximateComputationforOne-StepLookahead 171 8.6 KnowledgeGradient 172 8.7 ThompsonSampling 176 8.8 MutualInformationwith𝑥 180 ∗ 8.9 MutualInformationwith𝑓 187 ∗ 8.10 AveragingoveraSpaceofGaussianProcesses 192 8.11 AlternativeModels:BayesianNeuralNetworks,etc. 196 8.12 SummaryofMajorIdeas 200 9 implementation 201 9.1 GaussianProcessInference,Scaling,andApproximation 201 9.2 OptimizingAcquisitionFunctions 207 9.3 StartingandStoppingOptimization 210 9.4 SummaryofMajorIdeas 212 10 theoreticalanalysis 213 10.1 Regret 213 10.2 UsefulFunctionSpacesforStudyingConvergence 215 10.3 RelevantPropertiesofCovarianceFunctions 220 10.4 BayesianRegretwithObservationNoise 224 10.5 Worst-CaseRegretwithObservationNoise 232 10.6 TheExactObservationCase 237 10.7 TheEffectofUnknownHyperparameters 241 10.8 SummaryofMajorIdeas 243 Published online by Cambridge University Press contents vii 11 extensionsandrelatedsettings 245 11.1 UnknownObservationCosts 245 11.2 ConstrainedOptimizationandUnknownConstraints 249 11.3 SynchronousBatchObservations 252 11.4 AsynchronousObservationwithPendingExperiments 262 11.5 MultifidelityOptimization 263 11.6 MultitaskOptimization 266 11.7 MultiobjectiveOptimization 269 11.8 GradientObservations 276 11.9 StochasticandRobustOptimization 277 11.10 IncrementalOptimizationofSequentialProcedures 281 11.11 Non-GaussianObservationModelsandActiveSearch 282 11.12 LocalOptimization 285 12 abriefhistoryofbayesianoptimization 287 12.1 HistoricalPrecursorsandOptimalDesign 287 12.2 SequentialAnalysisandBayesianExperimentalDesign 287 12.3 TheRiseofBayesianOptimization 289 12.4 LaterRediscoveryandDevelopment 290 12.5 Multi-ArmedBanditstoInfinite-ArmedBandits 292 12.6 What’sNext? 294 a thegaussiandistribution 295 b methodsforapproximatebayesianinference 301 c gradients 307 d annotatedbibliographyofapplications 313 references 331 index 353 Published online by Cambridge University Press Published online by Cambridge University Press PREFACE MyinterestinBayesianoptimizationbeganin2007atthestartofmy doctoral studies. I was frustrated that there seemed to be a Bayesian approachtoeverytaskIcaredabout,except optimization.Ofcourse,as wasoftenthecaseatthattime(nottomentionnow!),Iwasmistakenin thisbelief,butoneshouldneverletignoranceimpedeinspiration. Meanwhile,mylabmateandsoon-to-befrequentcollaboratorMike Osbornehadafreshcopyof rasmussenandwilliams’sGaussianPro- cessesforMachineLearningandjustwouldnot stoptalkingaboutgpsat ourlabmeetings.Throughsheerbruteforceofrepetition,Islowlybuilt ahand-wavyintuitionforGaussianprocesses–mymentalmodelwas the“sausageplot”–withoutevenbeingsureabouttheirprecisedefi- nition.However,IwasprettysurethatmarginalswereGaussian(what else?),andonedayitoccurredtomethatonecouldachieveBayesian Thefirstofmany“sausageplots”tocome. optimizationbymaximizingtheprobabilityofimprovement.Thiswas thealgorithmIwaslookingfor!InmyexcitementIshotoffanemailto Mikethatkickedoffyearsoffruitfulcollaboration: Can I ask a dumb question about gps? Let’s say that I’m doing functionapproximationonanintervalwithagp.SoI’vegotthis meanfunction𝑚 𝑥 andavariancefunction𝑣 𝑥 .Isittruethatif ( ) ( ) Ipickaparticularpoint𝑥,then𝑝 𝑓 𝑥 𝑚 𝑥 ,𝑣 𝑥 ?Please ( ) ∼N ( ) ( ) sayyes. (cid:0) (cid:1) (cid:0) (cid:1) Ifthisistrue,thenIthinktheideaofdoingBayesianoptimization usinggpsis,dareIsay,trivial. Thehubrisofyouth! Well, it turned out I was 45 years too late in proposing this algo- rithm,1andthatitonlyseemed“trivial”becauseIhadnoappreciationfor 1 h.j.kushner(1962).AVersatileStochastic itstheoreticalfoundation.However,trulygreatideasarerediscovered ModelofaFunctionofUnknownandTime VaryingForm.JournalofMathematicalAnaly- manytimes,andmyexcitementdidnotfade.OnceIdevelopedadeeper sisandApplications5(1):150–167. understandingofGaussianprocessesandBayesiandecisiontheory,I cametoseethemasa“Bayesiancrank”Icouldturntorealizeadaptive algorithmsforanytask.Ihavebeenrepeatedlyastonishedtofindthat theresultingalgorithms–seeminglybymagic–automaticallydisplay intuitiveemergentbehaviorasaresultoftheircarefuldesign.Mygoal withthisbookistopaintthisgrandpicture.Ineffect,itisagifttomy formerself:thebookIwishIhadintheearlyyearsofmycareer. In the context of machine learning, Bayesian optimization is an ancient idea – kushner’s paper appeared only three years after the term“machinelearning”wascoined!Despiteitsadvancedage,Bayesian optimization has been enjoying a period of revitalization and rapid progressoverthepasttenyears.Theprimarydriverofthisrenaissance has been advances in computation, which have enabled increasingly sophisticatedtoolsforBayesianmodelingandinference. Ironically,however,perhapsthemostcriticaldevelopmentwasnot Bayesianatall,buttheriseofdeepneuralnetworks,anotheroldidea ix https://doi.org/10.1017/9781108348973.001 Published online by Cambridge University Press