Table Of Content

Aske Plaat Learning to Play Reinforcement Learning and Games Learning to Play Aske Plaat Learning to Play Reinforcement Learning and Games Aske Plaat Leiden Institute of Advanced Computer Science Leiden University Leiden, The Netherlands ISBN 978-3-030-59237-0 ISBN 978-3-030-59238-7 (eBook) https://doi.org/10.1007/978-3-030-59238-7 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Tomystudents Contents 1 Introduction................................................... 1 1.1 TuesdayMarch15,2016..................................... 1 1.2 IntendedAudience.......................................... 3 1.3 Outline ................................................... 5 2 IntelligenceandGames ......................................... 9 2.1 Intelligence................................................ 10 2.2 GamesofStrategy.......................................... 19 2.3 GamePlayingPrograms..................................... 32 3 ReinforcementLearning ........................................ 43 3.1 MarkovDecisionProcesses .................................. 44 3.2 PolicyFunction,State-ValueFunction,Action-ValueFunction ..... 47 3.3 SolutionMethods .......................................... 49 3.4 Conclusion................................................ 60 3.5 Practice................................................... 62 4 HeuristicPlanning ............................................. 71 4.1 TheSearch-EvalArchitecture ................................ 72 4.2 StateSpaceComplexity ..................................... 81 4.3 SearchEnhancements....................................... 85 4.4 EvaluationEnhancements ................................... 101 4.5 Conclusion................................................106 4.6 Practice...................................................108 5 AdaptiveSampling .............................................113 5.1 MonteCarloEvaluation .....................................114 5.2 MonteCarloTreeSearch ....................................116 5.3 *MCTSEnhancements......................................125 5.4 Practice...................................................130 vii viii Contents 6 FunctionApproximation........................................135 6.1 DeepSupervisedLearning...................................136 6.2 *AdvancedNetworkArchitectures ............................156 6.3 DeepReinforcementLearning................................166 6.4 *DeepReinforcementLearningEnhancements.................. 177 6.5 Practice...................................................186 7 Self-Play ......................................................195 7.1 Self-PlayArchitecture.......................................196 7.2 AlphaGo.................................................. 207 7.3 *Self-PlayEnhancements....................................215 7.4 Practice...................................................228 8 Conclusion ....................................................233 8.1 ArtificialIntelligenceinGames...............................234 8.2 FutureDrosophilas .........................................239 8.3 AComputer’sLookatHumanIntelligence .....................244 8.4 Practice...................................................253 A DeepReinforcementLearningEnvironments......................255 B *AlphaGoTechnicalDetails .....................................259 B.1 AlphaGo..................................................259 B.2 AlphaGoZero .............................................262 B.3 AlphaZero ................................................265 C MatchesofFanHui,LeeSedol,andKeJie ........................ 267 D LearningtoPlayandProgramGoandChess...................... 277 D.1 LearningtoPlay ........................................... 277 D.2 LearningtoProgram........................................278 D.3 ScientificJournalsandConferences ...........................280 E RunningPython ............................................... 281 References.........................................................283 Figures............................................................ 319 Tables............................................................. 323 Algorithms ........................................................325 Index ............................................................. 327 Preface Amazing breakthroughs in reinforcement learning have taken place. Computers teach themselves to play Chess and Go and beat world champions. There is talk aboutexpandingapplicationareastowardsgeneralartificialintelligence(AI).The breakthroughsinBackgammon,Checkers,Chess,Atari,Go,Poker,andStarCraft haveshownthatwecanbuildmachinesthatexhibitintelligentgameplayingofthe highestlevel.Thesesuccesseshavebeenwidelypublicizedinthemedia,andinspire AIentrepreneursandscientistsalike.Reinforcementlearningingameshasbecomea mainstreamAIresearchtopic.Itisabroadtopic,andthesuccessesbuildonarange ofdiversetechniques,fromexactplanningalgorithms,toadaptivesampling,deep functionapproximation,andingeniousself-playmethods. Perhapsbecauseofthebreadthofthesetechnologies,orbecauseoftherecency ofthebreakthroughs,therearefewbooksthatexplainthesemethodsindepth.This bookcoversallmethodsinonecomprehensivevolume,explainingthelatestresearch, bringingyouascloseaspossibletoworkingimplementations,withmanyreferences totheoriginalresearchpapers. TheprogrammingexamplesinthisbookareinPython,thelanguageinwhich mostcurrentreinforcementlearningresearchisconducted.Wehelpyoutogetstarted withmachinelearningframeworkssuchasGym,TensorFlow,andKeras,andprovide exercisestohelpunderstandhowAIislearningtoplay. Thisisnotatypicalreinforcementlearningtextbook.Mostbooksonreinforcement learningtakethesingle-agentperspective,ofpathfindingandrobotplanning.We takeasinspirationthebreakthroughsingameplaying,andusetwo-agentgamesto explainthefullpowerofdeepreinforcementlearning. Boardgameshavealwaysbeenassociatedwithreasoningandintelligence.Our games perspective allows us to make connections with artificial intelligence and generalintelligence,givingaphilosophicalflavortoanotherwisetechnicalfield. ix x Preface ArtificialIntelligence EversincemyearlydaysasastudentIhavebeencaptivatedbyartificialintelligence, bymachinesthatbehaveinseeminglyintelligentways.InitiallyIhadbeentaught that,becausecomputersweredeterministicmachines,theycouldneverdosomething new.YetinAIthesemachinesdocomplicatedthingssuchasrecognizepatterns,and playChessgames.Actionsemergedfromthesemachines,behaviorthatappearednot tohavebeenprogrammedintothem.Theactionsseemednew,andevencreative,at times. FormythesisIgottoworkongameplayingprogramsforcombinatorialgames such as Chess, Checkers, and Othello. The paradox became even more apparent. Thesegameplayingprogramsallfollowedanelegantarchitecture,consistingofa searchfunctionandanevaluationfunction.1Thesetwofunctionstogethercouldfind goodmovesallbythemselves.Couldintelligencebesosimple? The search-evaluation architecture has been around since the earliest days of computerChess.Togetherwithminimax,itwasproposedina1952paperbyAlan Turing,mathematician,code-breakingwarhero,andoneofthefathersofcomputer scienceandartificialintelligence.Thesearch-evaluationarchitectureisalsousedin DeepBlue,theChessprogramthatbeatWorldChampionGarryKasparovin1997in NewYork. Afterthathistoricmoment,theattentionoftheAIcommunityshiftedtoanew gamewithwhichtofurtherdevelopideasforintelligentgameplay.ItwastheEast AsiangameofGothatemergedasthenewgrandtestofintelligence.Simple,elegant, andmind-bogglinglycomplex. Thisnewgamespawnedthecreationofimportantnewalgorithms,andnotone,but two,paradigmshifts.Thefirstalgorithmtoupsettheworldviewofgamesresearchers wasMonteCarloTreeSearch,in2006.Sincethe1950sgenerationsofgameplaying researchers,myselfincluded,werebroughtupwithminimax.Theessenceofminimax istolookaheadasfarasyoucan,tothenchoosethebestmove,andtomakesure thatallmovesaretried(sincebehindseeminglyharmlessmovesdeepattacksmay hidethatyoucanonlyuncoverifyousearchallmoves).AndnowMonteCarloTree Searchintroducedrandomnessintothesearch,andsampling,deliberatelymissing moves.YetitworkedinGo,andmuchbetterthanminimax. MonteCarloTreeSearchcausedastrongincreaseinplayingstrength,although notyetenoughtobeatworldchampions.Forthat,wehadtowaitanothertenyears. In2013ourworldviewwasinforanewshock,becauseagainanewparadigm shookuptheconventionalwisdom.Neuralnetworkswerewidelyviewedtobetoo slowandtooinaccuratetobeusefulingames.ManyMaster’sthesesofstubborn studentshadsadlyconfirmedthistobethecase.Yetin2013GPUpowerallowedthe useofasimpleneuralnettolearntoplayAtarivideogamesjustfromlookingat thevideopixels,usingamethodcalleddeepreinforcementlearning.Twoyearsand muchhardworklater,deepreinforcementlearningwascombinedwithMonteCarlo 1Thesearchfunctionsimulatesthekindoflook-aheadthatmanyhumangameplayersdointheir head,andtheevaluationfunctionassignsanumericscoretoaboardpositionindicatinghowgood thepositionis. Preface xi TreeSearchintheprogramAlphaGo.Thelevelofplaywasimprovedsomuchthata yearlaterfinallyworldchampionsinGowerebeaten,manyyearsbeforeexpertshad expectedthatthiswouldhappen.Andinothergames,suchasStarCraftandPoker, self-playreinforcementlearningalsocausedbreakthroughs. The AlphaGo wins were widely publicized. They have had a large impact, on science,onthepublicperceptionofAI,andonsociety.AIresearcherseverywhere wereinvitedtogivelectures.Audienceswantedtoknowwhathadhappened,whether computers finally had become intelligent, what more could be expected from AI, andwhatallthiswouldmeanforthefutureofthehumanrace.Manystart-upswere created,andexistingtechnologycompaniesstartedresearchingwhatAIcoulddofor them. The modern history of computer games spans some 70 years. There has been much excitement. Many ideas were tried, some with success. Games research in reinforcementlearninghaswitnessedmultipleparadigmshifts,goingfromheuristic planning,toadaptivesampling,todeeplearning,toself-play.Theachievementsare large,andsoistherangeoftechniquesthatareused.Wearenowatapointwherethe techniqueshavematuredsomewhat,andachievementscanbedocumentedandput intoperspective. Inexplainingthetechnologies,Iwilltellthestoryofhowonekindofintelligence works,theintelligenceneededtoplaytwo-persongamesoftacticsandstrategy.(As toknowingthefutureofthehumanrace,surelymoreisneededthananunderstanding ofheuristics,deepreinforcementlearning,andgameplayingprograms.)Itwillbea storyinvolvingmanyscientists,programmers,andgameenthusiasts,allfascinated bythesamegoal:creatingartificialintelligence.Comeandjointhisfascinatingride.

Learning to Play: Reinforcement Learning and Games PDF

335 Pages·2020·18.262 MB·English

by Aske Plaat

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Learning to Play: Reinforcement Learning and Games PDF Free - Full Version

by Aske Plaat| 2020| 335 pages| 18.262| English

Download Learning to Play: Reinforcement Learning and Games by Aske Plaat in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Learning to Play: Reinforcement Learning and Games

No description available for this book.

Detailed Information

Author:	Aske Plaat
Publication Year:	2020
ISBN:	9783030592387
Pages:	335
Language:	English
File Size:	18.262
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Learning to Play: Reinforcement Learning and Games Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Learning to Play: Reinforcement Learning and Games PDF?

Yes, on https://PDFdrive.to you can download Learning to Play: Reinforcement Learning and Games by Aske Plaat completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Learning to Play: Reinforcement Learning and Games on my mobile device?

After downloading Learning to Play: Reinforcement Learning and Games PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Learning to Play: Reinforcement Learning and Games?

Yes, this is the complete PDF version of Learning to Play: Reinforcement Learning and Games by Aske Plaat. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Learning to Play: Reinforcement Learning and Games PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.