ebook img

Reinforcement Learning: An Introduction PDF

331 Pages·1998·1.143 MB·English
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Reinforcement Learning: An Introduction

Reinforcement Learning AdaptiveComputationandMachineLearning ThomasDietterich,serieseditor ChristopherBishop,DavidHeckerman,MichaelJordan,andMichaelKearns,associateeditors Bioinformatics:TheMachineLearningApproach,PierreBaldiandSørenBrunak. ReinforcementLearning:AnIntroduction,RichardS.SuttonandAndrewG.Barto RichardS.SuttonandAndrewG.Barto Reinforcement Learning AnIntroduction ABradfordBook TheMITPress Cambridge,Massachusetts London,England ©1998RichardS.SuttonandAndrewG.Barto Allrightsreserved.Nopartofthisbookmaybereproducedinanyformbyanyelectronicor mechanicalmeans(includingphotocopying,recording,orinformationstorageandretrieval) withoutpermissioninwritingfromthepublisher. ThisbookwassetinTimesRomanbyWindfallSoftwareusingZzTEXandwasprintedand boundintheUnitedStatesofAmerica. LibraryofCongressCataloging-in-PublicationData Sutton,RichardS. Reinforcementlearning:anintroduction/RichardS.Suttonand AndrewG.Barto. p. cm.—(Adaptivecomputationandmachinelearning) “ABradfordbook.” Includesbibliographicalreferencesandindex. ISBN0-262-19398-1(alk.paper) 1.Reinforcementlearning(Machinelearning) I.Barto,AndrewG. II.Title. III.Series. Q325.6.S88 1998 006.3(cid:1)1—dc21 97-26416 CIP InmemoryofA.HarryKlopf Contents SeriesForeword xiii Preface xv I TheProblem 1 1 Introduction 3 1.1 ReinforcementLearning 3 1.2 Examples 6 1.3 ElementsofReinforcementLearning 7 1.4 AnExtendedExample:Tic-Tac-Toe 10 1.5 Summary 15 1.6 HistoryofReinforcementLearning 16 1.7 BibliographicalRemarks 23 2 EvaluativeFeedback 25 2.1 Ann-ArmedBanditProblem 26 2.2 Action-ValueMethods 27 2.3 SoftmaxActionSelection 30 (cid:1) 2.4 EvaluationVersusInstruction 31 2.5 IncrementalImplementation 36 viii Contents 2.6 TrackingaNonstationaryProblem 38 2.7 OptimisticInitialValues 39 (cid:1) 2.8 ReinforcementComparison 41 (cid:1) 2.9 PursuitMethods 43 (cid:1) 2.10 AssociativeSearch 45 2.11 Conclusions 46 2.12 BibliographicalandHistoricalRemarks 48 3 TheReinforcementLearningProblem 51 3.1 TheAgent–EnvironmentInterface 51 3.2 GoalsandRewards 56 3.3 Returns 57 3.4 UnifiedNotationforEpisodicandContinuingTasks 60 (cid:1) 3.5 TheMarkovProperty 61 3.6 MarkovDecisionProcesses 66 3.7 ValueFunctions 68 3.8 OptimalValueFunctions 75 3.9 OptimalityandApproximation 80 3.10 Summary 81 3.11 BibliographicalandHistoricalRemarks 83 II ElementarySolutionMethods 87 4 DynamicProgramming 89 4.1 PolicyEvaluation 90 4.2 PolicyImprovement 93 4.3 PolicyIteration 97 4.4 ValueIteration 100 4.5 AsynchronousDynamicProgramming 103 4.6 GeneralizedPolicyIteration 105 4.7 EfficiencyofDynamicProgramming 107 ix Contents 4.8 Summary 108 4.9 BibliographicalandHistoricalRemarks 109 5 MonteCarloMethods 111 5.1 MonteCarloPolicyEvaluation 112 5.2 MonteCarloEstimationofActionValues 116 5.3 MonteCarloControl 118 5.4 On-PolicyMonteCarloControl 122 5.5 EvaluatingOnePolicyWhileFollowingAnother 124 5.6 Off-PolicyMonteCarloControl 126 5.7 IncrementalImplementation 128 5.8 Summary 129 5.9 BibliographicalandHistoricalRemarks 131 6 Temporal-DifferenceLearning 133 6.1 TDPrediction 133 6.2 AdvantagesofTDPredictionMethods 138 6.3 OptimalityofTD(0) 141 6.4 Sarsa:On-PolicyTDControl 145 6.5 Q-Learning:Off-PolicyTDControl 148 (cid:1) 6.6 Actor–CriticMethods 151 (cid:1) 6.7 R-LearningforUndiscountedContinuingTasks 153 6.8 Games,Afterstates,andOtherSpecialCases 156 6.9 Summary 157 6.10 BibliographicalandHistoricalRemarks 158 III AUnifiedView 161 7 EligibilityTraces 163 7.1 n-StepTDPrediction 164 7.2 TheForwardViewofTD(λ) 169 7.3 TheBackwardViewofTD(λ) 173 x Contents 7.4 EquivalenceofForwardandBackwardViews 176 7.5 Sarsa(λ) 179 7.6 Q(λ) 182 (cid:1) 7.7 EligibilityTracesforActor–CriticMethods 185 7.8 ReplacingTraces 186 7.9 ImplementationIssues 189 (cid:1)7.10 Variableλ 189 7.11 Conclusions 190 7.12 BibliographicalandHistoricalRemarks 191 8 GeneralizationandFunctionApproximation 193 8.1 ValuePredictionwithFunctionApproximation 194 8.2 Gradient-DescentMethods 197 8.3 LinearMethods 200 8.4 ControlwithFunctionApproximation 210 8.5 Off-PolicyBootstrapping 216 8.6 ShouldWeBootstrap? 220 8.7 Summary 222 8.8 BibliographicalandHistoricalRemarks 223 9 PlanningandLearning 227 9.1 ModelsandPlanning 227 9.2 IntegratingPlanning,Acting,andLearning 230 9.3 WhentheModelIsWrong 235 9.4 PrioritizedSweeping 238 9.5 Fullvs.SampleBackups 242 9.6 TrajectorySampling 246 9.7 HeuristicSearch 250 9.8 Summary 252 9.9 BibliographicalandHistoricalRemarks 254 10 DimensionsofReinforcementLearning 255 10.1 TheUnifiedView 255 10.2 OtherFrontierDimensions 258 xi Contents 11 CaseStudies 261 11.1 TD-Gammon 261 11.2 Samuel’sCheckersPlayer 267 11.3 TheAcrobot 270 11.4 ElevatorDispatching 274 11.5 DynamicChannelAllocation 279 11.6 Job-ShopScheduling 283 References 291 SummaryofNotation 313 Index 315

See more

The list of books you might like

book image

Atomic Habits James Clear

JAMES CLEAR
·6.4 MB

book image

Credence

Penelope Douglas
·487 Pages
·2020
·0.86 MB

book image

Can’t Hurt Me: Master Your Mind and Defy the Odds

David Goggins
·364 Pages
·2018
·2.99 MB

book image

The 48 Laws of Power

Robert Greene
·454 Pages
·2004
·1.92 MB

book image

Scott Alan Roberts

The Secret History of the Reptilians The Pervasive Presence of the Serpent in Human History, Religion & Alien Mythos-New Page Books (2013)
·0.9747 MB

book image

Carolina football

2006
·33.2 MB

book image

Regulation and Genetics: Bacterial DNA Viruses

Dietmar Rabussay, E. Peter Geiduschek (auth.), Heinz Fraenkel-Conrat, Robert R. Wagner (eds.)
·363 Pages
·1977
·7.736 MB

book image

C anton (©bsterber

24 Pages
·2010
·42.97 MB

book image

C++ for You++, AP Edition

580 Pages
·2012
·1.69 MB

book image

C++ för dig som kan Java

32 Pages
·2000
·0.73 MB

book image

Hesiod's Theogony: from Near Eastern creation myths to Paradise lost

Hésiode;Scully, Stephen
·283 Pages
·2015
·10.519 MB

book image

Handbook of Modern Sensors: Physics, Designs, and Applications

Jacob Fraden (auth.)
·670 Pages
·2010
·14.566 MB

book image

Nureyev. La vita

Julie Kavanagh
·2019
·49.445 MB

book image

Recovering Lost Landscapes

Vujadin Ivanišević, Tatjana Veljanovski, David Cowley, Grzegorz Kiarszys & Ivan Bugarski
·176 Pages
·2015
·12.706 MB

book image

Sneakers: Fashion, Gender, and Subculture

Yuniya Kawamura
·2016
·7.894 MB

book image

Afkondigingsblad van Aruba 1993 no. 64

DWJZ - Directie Wetgeving en Juridische Zaken
·3 Pages
·1993
·0.03 MB