Table Of Content

Generating and Solving Imperfect Information Games DaphneKoller AviPfeffer UniversityofCalifornia UniversityofCalifornia Berkeley,CA 94720 Berkeley,CA 94720 [email protected] [email protected] Abstract players. Theonlyuncertaintyisaboutfuturemoves. Ingames suchas poker, the players have imperfectinformation: they WorkongameplayinginAIhastypicallyignored have only partial knowledge about the current state of the gamesof imperfectinformationsuch as poker. In game. This can result in complex chains of reasoning such thispaper,wepresentaframeworkfordealingwith as: “SinceIhavetwoacesshowing,butsheraised,thenshe suchgames. Wepointoutseveralimportantissues is either bluffingor she has a goodhand; but then if I raise that arise only in the context of imperfect infor- alot, shemay realize that I have at leasta third ace, so she mation games, particularly the insufficiency of a mightfold; so maybeI shouldunderbid,but .” Itshould simple game tree model to represent the players’ (cid:0)(cid:1)(cid:0)(cid:2)(cid:0) befairlyobviousthatthestandardtechniquesareinadequate informationstateandtheneedforrandomizationin forsolvingsuchgames: novariantoftheminimaxalgorithm theplayers’optimalstrategies. WedescribeGala, duplicatesthetypeofcomplexreasoningwejustdescribed. animplementedsystemthatprovidestheuserwitha Ingametheory[vonNeumannandMorgenstern,1947],on verynaturalandexpressivelanguagefordescribing theotherhand,virtuallyalloftheworkhasfocusedongames games. From a game description, Gala createsan withimperfectinformation. Gametheoryismostlyintended augmentedgame treewith information setswhich todeal with gamesderived from“real life,” and particularly can be usedbyvariousalgorithmsinorder tofind fromeconomicapplications. Inreallifeonerarelyhasperfect optimalstrategiesforthatgame. Inparticular,Gala information. The insights developed by game theorists for implementsthefirstpracticalalgorithmforfinding such games also apply to the imperfect information games optimalrandomizedstrategiesintwo-playerimper- encounteredinAIapplications. fect information competitive games [Koller et al., Itiswell-knowningametheorythatthenotionofa strat- 1994]. Therunningtimeofthisalgorithmispolyno- egyisnecessarilydifferentforgameswithimperfectinforma- mialinthesizeofthegametree,whereasprevious tion. Inperfectinformationgames,theoptimalmoveforeach algorithms were exponential. We present exper- playerisclearlydefined: ateverystagethereisa“right”move imental results showing that this algorithm is also thatis at leastasgood asany othermove. But inimperfect efficientinpracticeandcanthereforeformthebasis informationgames,thesituationisnotasstraightforward. In foragameplayingsystem. thesimplegameof“scissors-paper-stone,”anydeterministic strategyis alosing one assoon asit is revealed tothe other 1 Introduction players. Intuitively,ingameswhere there is an information Theideaofgettingacomputertoplayagamehasbeenaround gap, it is usually to my advantage to keep my opponent in sincetheearliestdaysofcomputing. Thefundamentalideais the dark. The only way to do that is by using randomized asfollows: Whenitisthecomputer’sturntomove,itcreates strategies. Oncerandomizedstrategiesareallowed,theexis- some part of the game tree starting at the current position, tenceof“optimalstrategies”inimperfectinformationgames evaluates the ‘leaves’ of this partial tree using a heuristic canbe proved. In particular, this means thatthere exists an evaluationfunction, and thendoesa minimax searchof this optimalrandomizedstrategyforpoker,inmuchthesameway tree to determine the optimal move at the root. This same as there exists an optimal deterministic strategy for chess. simpleideaisstillthecoreofmostgame-playingprograms. Kuhn[1950]hasshownforasimplifiedpokergamethatthe Thisparadigmhasbeensuccessfullyappliedtoalarge class optimalstrategydoes,indeed,userandomization. ofgames,inparticularchess,checkers,othello,backgammon, The optimality of a strategy has two consequences: the andgo[RussellandNorvig,1994,Ch.5]. Therehavebeenfar player cannot do better than this strategy if playing against fewersuccessfulprogramsthatplaygamessuchaspokeror a good opponent, and furthermore the player does not do bridge. Weclaimthatthisisnotanaccident. Thesegamesfall worseevenifhisstrategyisrevealedtohisopponent,i.e.,the intotwofundamentallydifferentclasses, andthe techniques opponent gains no advantage from figuring out the player’s thatapplytoonedonotusuallyapplytotheother. strategy. This last feature is particularly important in the Theessentialdifferenceliesintheinformationthatisavail- contextofgame-playingprograms,sincetheyarevulnerable abletotheplayers. Ingamessuchaschessorevenbackgam- tothisformofattack: sometimesthecodeisaccessible,and mon,thecurrentstateofthegameisfullyaccessibletoboth ingeneral,sincetheyalwaysplaythesameway,theirstrategy 1185 can be deduced by intensivetesting. Given these important playingsystemforimperfectinformationgames. benefits of randomized strategies in imperfect information games,itissomewhatsurprisingthatnoneoftheAIpapersthat 2 Somebasicgametheory dealwiththesegames(e.g.,[Blairetal.,1993;Gordon,1993; SmithandNau,1993])utilizesuchstrategies. Gametheoryisthestrategicanalysisofinteractivesituations. Inthiswork,weattempttosolvethecomputationalproblem Several aspects of a situation are modeled explicitly: the associatedwithimperfectinformationgames:Givenaconcise playersinvolved,thealternativeactionsthatcanbetakenby description of a game, compute optimal strategies for that each player at various times, the dynamics of the situation, game. Two issues in particular must be addressed. First, theinformationavailabletoplayers,andtheoutcomesatthe howdowespecifyimperfectinformationgames? Describing end. Given such a model, game theory provides the tools the dynamics ofthe players’ informationstates ina concise toformallyanalyze thestrategicinteractionandrecommend fashionisanontrivialknowledgerepresentationtask. Second, ‘rational’strategiestotheplayers. givenagametreewiththeappropriatestructure,howdowe Thestandardrepresentationofagameincomputerscience findoptimalstrategiesforit? isatree,inwhicheachnodeisapossiblestateofthegame,and Wepresentan implementedsystem, called Gala, thatad- eachedgeisanactionavailabletoaplayerthattakesthegame dressesboththesecomputationalissues. Galaconsistsoffour toanewstate. Ateachnodethereisasingleplayerwhoseturn components. Thefirstisaknowledgerepresentationlanguage itistochooseanaction. Thesetofedgesleadingoutofanode thatallowsaclearand concisespecificationofimperfectin- are the choices available to that player. The player may be formationgames. As ourexamples show, the descriptionof chanceor‘nature’,inwhichcasetheedgesrepresentrandom agameinGalaisverysimilarto,andnotmuchlongerthan, events. Theleavesofthetreespecifyapayoffforeachplayer. anaturallanguagedescriptionoftherulesofthegame. The This representation is inadequate for games with imperfect secondcomponentofthesystemgeneratesgametreesfroma information,becauseitdoesnotspecifytheinformationstates gamedescriptioninthelanguage. Thesegametreesareaug- oftheplayers. Aplayercannotdistinguishbetweenstatesof mentedwithinformationsets,astandardconceptfromgame thegameinwhichshehasthe sameinformation. Thus, any theorythatcapturestheinformationstatesoftheplayers. decision taken by the player must be the same at all such Thethirdcomponentof thesystem addressesthe issueof nodes. Toencodethisconstraint,thegametreeisaugmented findinggoodstrategiesforsuchgames. Obviously,thestan- with information sets. An information set contains a set of dard minimax-type algorithms cannot produce randomized nodesthatareindistinguishabletoaplayeratthetimeshehas strategies. Thegametheoreticparadigmforsolvinggamesis tomakeadecision. basedontakingtheentiregametree,andtransformingitinto Figure 1 presents part of the game tree for a simplified a matrix (called the normal or strategic form of the game). variant of poker described by Kuhn [1950]. The game has Varioustechniques,suchaslinearprogramming,canthenbe twoplayers and a deck containing the three cards 1, 2, and appliedtothismatrixinordertoconstructoptimalstrategies. 3. Each player antes one dollar and is dealt one card. The Unfortunately,thismatrixistypicallyexponentialinthesize figure shows the part of the game tree correspondingto the ofthegametree,makingtheentireapproachimpracticalfor deals (cid:0)2 1 , (cid:0)2 3 , and (cid:0)1 3 . The game has three rounds. (cid:1) (cid:2) (cid:1) (cid:2) (cid:1) (cid:2) mostgames. Inthefirstround,thefirstplayercaneitherbetanadditional In recentwork, Koller, Megiddo, and von Stengel [1994] dollarorpass. Afterhearingthefirstplayer’sbet,thesecond presentanalternativeapproachtodealingwithimperfectin- playerdecideswhethertobetorpass. Ifplayer1passesand formation games. They define a new representation, called player 2 bets, player 1 gets one more opportunityto decide thesequenceform,whosesizeislinearinthesizeofthegame whether or not to bet. If both bet or both pass, the player tree. Theyshowthatmanyofthestandardalgorithmscanbe with the highest card takes the pot. If one player bets and adapted to find optimal strategies using this representation. theotherpasses,thenthebettingplayerwinsonedollar. Let This results in exponentially faster algorithms for solving a (cid:0)(cid:4)(cid:3) denote the hands dealt to the two players. Initially, (cid:1)(cid:6)(cid:5)(cid:7)(cid:2) large classof games. Inparticular, they presentan effective player1onlyknowshisowncard,soforeachpossible(cid:3) , he polynomialtimealgorithmforsolvingtwo-playerfullycom- hasoneinformationset containingtwonodes;eachnode (cid:8)(cid:10)(cid:9) petitive games (such as poker). We have implemented this correspondstothetwopossibilitiesforplayer2’shand. Inher algorithm as part of the Gala system, and tested it on large turn,player2knows aswellasplayer1’sactionatthefirst (cid:5) examplesofseveralgames. Theresultsareencouraging,sug- round. Hence,shehastwoinformationsetsforeach (cid:5) —(cid:11)(cid:13)(cid:14)(cid:12) gestingthat,inpractice,therunningtimeofthealgorithmis and(cid:11)(cid:16)(cid:14)(cid:15) —correspondingtoplayer1’spreviousaction. Finally, asmallpolynomialinthesizeofthegametree. player1hasaninformationset atthethirdround. (cid:8)(cid:16)(cid:9)(cid:17) ThefinalcomponentofGalapresentstheoptimalstrategies Given a game tree augmented with information sets, one inawaythatiscomprehensibletotheuser. Foranydecision can define the notion of strategy. A deterministic strategy, point in the game, it tells the user which actions should be likeaconditionalplaninAI, isa veryexplicit “how-to-play played with which probability. The system also provides manual”thattellstheplayerwhattodoateverypossiblepoint otherinformation,suchasoneplayer’sbeliefsaboutthestate inthegame. Inthepokerexample,suchamanualforplayer1 of another agent, or the expected value of a branch in the would contain an entry: “If I hold a 3, and I passed on the tree. This functionalitymakes Gala a useful tool for game- first round, and my opponent bets, then bet 1.” In general, theory researchers and educators, as well as for users who adeterministicstrategyforplayer specifiesamoveateach (cid:18) wish to use Gala as a game-theory based decision support ofher informationsets. Since the player cannot distinguish system. Finally,Galacanalsoplaythegameaccordingtothe betweennodesinthesameinformationset,thestrategycannot computed strategy, making it a basis for a computer game- dictatedifferentactionsatthosenodes. 1186 (2 , 1 ) (2 , 3 ) ( 1 , 3 ) 1 2 1 −2 1 −2 1 −1 −1 −1 2 −1 −2 −1 −2 Figure1: Apartialgametreeforsimplifiedpoker,containingthreeofthesixpossibledeals. Amovetotheleftcorrespondsto apass,amovetotherighttoabet. Theinformationsetsaredrawnasellipses;someofthemextendintootherpartsofthetree. Deterministicstrategies are adequate for games with per- abehavior strategy) specifies a probabilitydistributionover fectinformation,wheretheplayersalwaysknowthecurrent the moves at each information set. In our poker example, stateofthegame. Inthosegamestheinformationsetsofboth a randomized strategy for player 1 can be described by (cid:9) 1 playersarealwayssinglenodes,andadeterministicstrategy definingtheprobabilityofbettingateachinformationset (cid:0)(cid:2)(cid:1) for player is a function from those nodes at which it is and , (cid:3)(cid:11)(cid:10) 1 2 3. Acombinationofrandomizedstrategi(cid:8)es(cid:9) (cid:18) (cid:8) (cid:9)(cid:17) (cid:1) (cid:1) her turn to move to possible moves at that node. The fact (cid:3) , oneforeachplayer, inducesaprobabilitydistri- (cid:9) 1(cid:1) (cid:0)(cid:1)(cid:0)(cid:2)(cid:0) (cid:1)(cid:12)(cid:9) thatdeterministicstrategiessufficeforsuchgamesisthebasis butionontheleavesofthetree,therebyallowingustodefine forthestandardminimaxalgorithm(anditsvariants)usedfor theexpectedpayoff (cid:1) (cid:0) (cid:3) foreachplayer . (cid:13) (cid:9) 1(cid:1) (cid:0)(cid:2)(cid:0)(cid:1)(cid:0) (cid:1)(cid:14)(cid:9) (cid:2) (cid:18) gamessuchaschess. Insuchgames,calledzero-sumgames, InhisNobel-prizewinningtheorem,Nashshowedthatthe there are twoplayerswhose payoffsalwayssumto zero, so useof randomized strategies allows us to duplicate the suc- thatoneplayerwinspreciselywhattheotherloses. Asshown cessfulbehavior thatwe get fromdeterministicstrategies in by Zermelo [1913], the strategies produced by the minimax the perfect information case. In general games, there is al- algorithm are optimal in a very strong sense. Player (cid:18) can- waysacombination(cid:9) 1(cid:1) (cid:0)(cid:2)(cid:0)(cid:1)(cid:0) (cid:1)(cid:14)(cid:9) (cid:3) ofrandomizedstrategiesthat not do better than to play the resulting strategy if the other isinequilibrium: forany (cid:18),andanystrategy(cid:9) (cid:17)(cid:1) , player is rational. Furthermore, she can publicly announce her intention to do so without adversely affecting her pay- (cid:13) (cid:1) (cid:0) (cid:9) 1(cid:1) (cid:0)(cid:1)(cid:0)(cid:2)(cid:0) (cid:1)(cid:12)(cid:9) (cid:3) (cid:2)(cid:16)(cid:15)(cid:17)(cid:13) (cid:1) (cid:0) (cid:9) 1(cid:1) (cid:0)(cid:1)(cid:0)(cid:2)(cid:0) (cid:1)(cid:14)(cid:9) (cid:17)(cid:1) (cid:1) (cid:0)(cid:1)(cid:0)(cid:1)(cid:0) (cid:1)(cid:12)(cid:9) (cid:3) (cid:2) (cid:0) offs. Ageneralizedversionoftheminimaxalgorithmshows Thatis, noplayer gainsan advantagebydiverging fromthe the existence of optimal deterministic strategies for general equilibriumsolution,solongastheotherplayerssticktoit. games of perfect information. The resulting strategy com- bination (cid:0) (cid:0) (cid:0)(cid:4)(cid:3) has the important property of beingin Justasinthecaseofperfectinformationgames,theequi- equilibrium1:(cid:1) f(cid:0)(cid:1)o(cid:0)(cid:1)r(cid:0) (cid:1)any(cid:2) , player cannotpickabetterstrategy libriumstrategiesareparticularlycompellingwhenthegame than (cid:0) (cid:1) if the other pl(cid:18)ayers are(cid:18) all playing their strategy (cid:0)(cid:6)(cid:5) . iszero-sum. Then,asshownbyvonNeumann[vonNeumann Thisisaminimalpropertythatwewantofa“solution”toa andMorgenstern,1947], any equilibriumstrategyisoptimal game: Withoutit,wearedrawnbackintothewebofsecond against a rational player. More precisely, the equilibrium guessingthatcharacterizesimperfectinformationgames. (If pairs (cid:9) 1(cid:1)(cid:12)(cid:9) 2 are preciselythose where (cid:9) 1 is thestrategy that sw(cid:8) hi,ellbpufiltagytuhsreetnhoe(cid:0)(cid:1)u“(cid:0)(cid:2)to(cid:0)t.rh)tahtotdhoisx”isstbreattteegryf,othremneI,sshoosuhlde’dlloa(cid:7)ct,ubalultysdhoe mimsaapxxriiemmciiiszzeeelssymmmaaxxin(cid:18)(cid:20)(cid:18)(cid:20)(cid:18)1(cid:19)2(cid:19)2(cid:19)mmmiinnax(cid:18)(cid:20)(cid:18)(cid:20)2(cid:19)1(cid:19)(cid:18) 1(cid:13)(cid:19)(cid:13) (cid:13)12(cid:0)1(cid:0)(cid:9)(cid:9)(cid:0) (cid:9)1(cid:17)1(cid:17) (cid:1)(cid:14)1(cid:1)(cid:12)(cid:17) (cid:9)(cid:9)(cid:1)(cid:12)(cid:9)2(cid:17)2(cid:17) (cid:2)(cid:2)2(cid:17) (cid:2)a()wn.dhi(cid:9)Icn2htu,isisttiihnveceelsyt(cid:13),ra2(cid:9)te1(cid:10)(cid:22)giys(cid:21) tth(cid:13)ha1et, Itshouldbefairlyobviousthatdeterministicstrategieswill optimaldefensive strategy for player 1: it provides the best ingeneralnothavethesepropertiesingameswithimperfect worst-casepayoff. Itisthesestrategiesthatwewillbemost information.Deterministicstrategiesarepredictable,andpre- concernedwithfinding. dictableplaygivestheopponentinformation. Theopponent can then find a strategy calculated to take advantage of this 3 Gala: a gamedescriptionlanguage information,therebymakingtheoriginalstrategysuboptimal. Unpredictableplay,ontheotherhand,maintainstheinforma- As we mentioned, the first component of Gala is a knowl- tiongap. Therefore,playersinimperfectinformationgames edge representation language for describing games. This is shoulduserandomizedstrategies. a Prolog-based language, that uses the power of a declara- Randomizedstrategiesareanaturalextensionofdetermin- tive representation to allow clear and concise specification isticstrategies.Whereadeterministicstrategychoosesamove of games. The idea of a declarative language to specify ateachinformationset,arandomizedstrategy(formallycalled gameswasproposedbyPell[1992],whoutilizesittospecify 1187 game(blind_tic_tac_toe, untilagoalisreached. Inblindtic-tac-toe,forexample, the [players : [a, b], playerstaketurnsexecutingthesequenceofactionsspecified objects : [grid_board : array(’$size’, ’$size’)], in the mark feature, until the condition specified in the full params : [size], flow : (take_turns(mark,unless(full),until(win))), orthewinfeatureissatisfied. Theunlessconditionistested before the turn. Gala also allows gameflow to be nested mark : (choose(’$player’, (X, Y, Mark), (empty(X, Y), member(Mark, [x, o]))), recursively. Each phase can be described by its own series reveal(’$opponent’, (X, Y)), offeatures,whichmayincludeflow. Theflowofbridge,for place((X, Y), Mark)), example,canbedescribedasfollows: full : (\+(empty(_, _)) -> outcome(draw)), win : (straight_line(_, _, length = 3, contains(Mark)) -> flow : (play_phase(bidding), play_phase(take_tricks)), ... outcome(wins(’$player’)))]). phase(bidding, [flow : (take_turns(bid, until(contract_reached))), ... phase(take_tricks, Figure2: AGaladescriptionofblindtic-tac-toe [flow : (play_rounds(trick, 13), ... Inordertoallowanaturalspecificationofthegame,Gala providesaseparaterepresentationforthegame state, where symmetric chess-like games—a class of two-player perfect- relevant information about the current state of the game is informationboardgames. Ourlanguageismuch moregen- stored. Inblindtic-tac-toe,thegamestatecontainsthecurrent eral,andcanbeusedtorepresentaverywideclassofgames, boardposition. Thisinformationisaccessed,forexample,by inparticular: one-player,two-playerandmulti-playergames; chooseinordertodeterminewhichmovesarepossible: only gameswheretheoutcomesarearbitrarypayoffs;and games thosesquaresthatareemptyarelegalmoves. Thegamestate with either perfect or imperfect information. As we will ismaintainedbymodifyingitappropriately,e.g.,bytheplace show,theexpressivepowerofGalaallowsforclearandcon- operation, when the players make their moves. Much of cisegamedescriptions,thataregenerallyofsimilarlengthto thefunctionalityinthehigherlevelsoftheGalalanguageis naturallanguagerepresentationsoftherulesofthegame. devotedtoaccessingandmanipulatingthegamestate. ToillustratesomeofthefeaturesofGala,Figure2presents The intermediate levels of Gala provide a shorthand for anexample ofacompletedescriptionfor“blindtic-tac-toe,” conceptsthatoccurubiquitouslyingames. Theseincludelo- animperfectinformationversionofstandardtic-tac-toe. The cationsandtheircontents,piecesandtheirmovementpatterns, playerstaketurnsplacingmarksinsquares,butinhisturna andresourcesthatchangehands,suchasmoney. Inblindtic- playercanchoosetomarkeitheranxorano;herevealstohis tac-toe,thestatementsthatdealwiththecontentsofsquares opponentthesquareinwhichhemakesthemark,butnotthe areaninstanceoflocationsandtheircontents.Otherexamples typeofmarkused. Asusual,thegoalistocompletealineof offunctionalitysupportedbythislevelaremove(queen(white), threesquareswiththesamemark. (d,1), (d,8))andpay(gambler, pot, Bet). A game descriptioninGala is a list offeatures, each one On a more abstract level, we have observed that certain describingsomeaspectofthegame. Forexample, players : structures and combinations appear in virtually all games. [a, b] indicates that the game is to be played between two Whiletheseareusuallysetsofonesortoranother,theycome playersnamed‘a’and‘b’. in many flavors. For example, a flush in poker is a set of TheGalalanguagehasseverallayers: theloweronespro- fivecardssharingacommonproperty;astraight,ontheother videbasicprimitives,whilethehigherlayersusethoseprimi- hand, is a sequence of cards in which successive elements tivestoprovidemorecomplexfunctionality.Thelowestlayer beararelationtooneanother;afullhouseisapartitioninto providesthefundamentalprimitivesfordefiningthestructure equivalenceclassesbasedonrankinwhichtheclassesareof of a game. The choose(Player, Move, Constraint) primitive a specific size. A word in Scrabble and a 21 in Blackjack describes the possible moves available to player at a given areanothertypeofcombination: acollectionofobjectsbear- point in the game. It allows Player to make any move Move ing no particular relationship to each other but forming an satisfyingConstraint. Thislastargumentcanbeanarbitrary interestinggroupintotality. segment of Prolog code. In our example, Move consists of a TheProloglanguageprovidesafewpredicatesthatdescribe square,specifiedbyitscoordinatesXandY,andamarkMark; setsand subsets. We have supplementedthese with various predicatesthatmake it easy todescribemany of thecombi- Constraint requiresthatthesquarebeemptyandthatMarkbe eitherxoro. Thefirstargumenttochoosecanalsobenature, Sneatti)odnesteorcmcuinriensgwinhegthaemreSse.t iFsoar seexqaumepnlcee, icnhawinh(icPrhedsiuccactees,- inwhichcaseoneofanumberofeventsischosenatrandom. sive elements are related by Predicate; partition(Relation, By default, these random events have uniform probability, Set, Classes) partitions Set into equivalence Classes based butadifferentprobabilitydistributionmaybespecified. The onRelation. Foramoreelaborateexample,considerthefol- outcomeprimitivedescribestheoutcomeofthegameattheend lowingcode,whichconciselytestsforalltypesofpokerhand of a particular sequence of moves. This will often be a list exceptflushesandstraights. ofpayoffs,one foreach player; but, asthe example demon- strates, Gala allows other possibilities. The reveal(Player, detailed_partition(match_rank, Hand, Classes, Ranks, Sizes), associate(Sizes, Type, Fact)primitivedescribesthedynamicsoftheplayers’infor- [([4, 1], four_of_a_kind), ([3, 2], full_house), mationstates. ItaddsFacttoPlayer’s informationstate. The ([3, 1, 1], three_of_a_kind), ([2, 2, 1], two_pairs), ([2, 1, 1, 1], one_pair), ([1, 1, 1, 1, 1], nothing)]) information added can be simple or an arbitrary Prolog ex- pression. Inblindtic-tac-toe,aplayerchoosesbothasquare Thepredicatedetailedpartition takestwoinputs,aset— andamarkbutrevealstohisopponentonlythemark. in this case Hand—and an equivalencerelation—in this case Atasomewhathigherlevel,theflowfeaturedescribesthe matchrank, which relates two cards if they have the same course of the game. The game can be divided into phases: rank. It partitionsthe set into equivalenceclasses, and pro- somemaytakeplacejustonce,whileotherscanberepeated ducesthreeoutputs: alistClassesoftheequivalenceclasses 1188 indecreasingorderofsize;acorrespondinglistofthedefin- Maximize(cid:4) min(cid:5) (cid:13) (cid:0)(cid:7)(cid:6) (cid:1)(cid:9)(cid:8) (cid:2) ingpropertyoftheequivalenceclasses,inthiscasetheRanks subjectto (cid:6) representsastrategyforplayer1 (cid:0)(cid:11)(cid:10) (cid:2) present in the hand; and a list Sizes of the sizes of the dif- (cid:8) representsastrategyforplayer2 f6e(cid:3)r]e,ntthcelnasCsleass.seIsnwthoiusldexbaem[[p6le(cid:1) ,,if6H(cid:0) a,nd6(cid:3)is],[9[(cid:0) 9,(cid:0) ,6(cid:1) 9,(cid:2) ]9](cid:2) ,,Ra6n(cid:0)k,s wstrhaetreegi(cid:13)es(cid:0)(cid:7)(cid:6)c(cid:1)(cid:12)o(cid:8)rr(cid:2)esdpeonnodteinsgthteoe(cid:6) xpecatreedpplaayyeodff.toplayer1ifthe wouldbe[6, 9], andSizeswouldbe[3, 2]. Inpoker,Sizes It turns out that the heart(cid:1)(cid:12)(cid:8)of the problem is finding an contains the relevant structure of the hand, and it is used to appropriatesetofvariablesforrepresentingthestrategy. The classifythehand usingan associationlist. The abovehand, firstattempt is touse themove probabilitiesin the behavior forexample,isimmediatelyclassifiedasafullhouse. strategy. In the poker example, we would then have (cid:6) (cid:10) Thehigh levelmodules ofGala buildonthe intermediate (cid:13)(cid:15)(cid:14) (cid:14) : (cid:3) (cid:10) 1 2 3 representingplayer 1’s strategy, and tloevaelcsetrotapinrocvliadsesmofogreamspeesc,isfiuccfhunasctbioonaardlistyththaattfoisrmcoamgmriodn, (cid:8) (cid:10)(cid:17)(cid:9) (cid:1) (cid:13)(cid:15)(cid:17)(cid:9)(cid:18) (cid:12)(cid:14) (cid:1) (cid:18) (cid:15)(cid:14) : (cid:5) (cid:1)(cid:10) (cid:1) 1(cid:16)(cid:1) 2(cid:1) 3(cid:16) representingplayer2’sstrategy. The problem is that this payoff is a nonlinear function of playingcards,dice,andsoon. Intheblindtic-tac-toeexample, the (cid:14) ’sand (cid:18) ’s. Inordertoavoidthisproblem,whichwould wedeclareagrid-boardobject. Thismakesawholerangeof forceustousenonlinearoptimizationtechniques,thestandard predicatesavailablethatdependontheboardbeingrectilinear. solutionalgorithmsingametheorydonotusegametreesand Thestraightlinepredicateisanexample;ittestsforastraight behavior strategies as their primary representation. Rather, lineofthreesquarescontainingthesamemark. Thispredicate theyoperateonanalternativerepresentationcalledthenormal isdefinedintermsofchain. Ingeneral,high-levelpredicates form. Inthetwo-playercase,thenormalformisamatrix aretypicallyveryeasytodefineintermsoftheintermediate (cid:19) whose rows are all the deterministic strategies of the first level concepts, so that adding a module for a new class of playerandwhosecolumnsareallthedeterministicstrategies gamesrequireslittleeffort. ofthesecond. Theentryinthe throwand thcolumnisthe AusefulfeatureofGalaisthatitallowssomeparameters (cid:18) (cid:20) expected payoff to the players when player 1 plays strategy ofthegametobeleftunspecifiedinthegamedescriptionand (cid:1) (cid:5) (cid:0) and player 2 plays strategy (cid:0) . A randomized strategy provided when the game is played. In blind tic-tac-toe, the 1 2 can now be viewed as a probability distributionover all the board size is such a parameter. This makes it very easy to deterministic strategies. Hence, (cid:6) is simply a probability encode a large class of games in a single program. These distributionoverrows: ithasavariable(cid:14) (cid:1) foreachrow,such pisarpaomsseitbelresctoanparocvtuidalelythbeemcoodvee-mcoennttapinaitntegrnfesaotufrepsi.ecTehsuisn,iat that (cid:14) (cid:1) (cid:15) 0 for all (cid:18), and (cid:21) (cid:1) (cid:14) (cid:1) (cid:10) 1. If player 1 plays (cid:6) and player 2 plays , then the expected payoff of the game gameatruntime. ThisallowsasimpleinterfacebetweenGala (cid:8) issimply(cid:6)(cid:23)(cid:22) . Underthisrepresentationofstrategies, (cid:0)(cid:25)(cid:10) and Pell’sMetagameprogram[Pell, 1992], which generates (cid:19)(cid:24)(cid:8) (cid:2) takesaparticularlysimpleform. Itisthenfairlyeasytoshow symmetricchess-likegamesrandomly. that that appropriate vectors (cid:6) and can be found from GivenadescriptionofagameintheGalalanguage,Gala (cid:8) (cid:19) usingstandardlinearprogrammingmethods. generatesthecorrespondinggametreewithinformationsets For non-zero-sumgames, the normalform also formsthe asdescribedinSection2. Thetreeisdefinedbythe choose, basis for essentially all solution algorithms. Gala provides reveal and outcome primitives. The Gala interpreter “plays” accesstothenormalformalgorithmsusinganinterfacetothe thegameandconstructsthegametreeasitencountersthese operations. When it encounters a choose primitive, a node GAMBITsystem,developedbyMcKelveyandTurocy [McK- is added to the tree, and an edge is added for every option elvey, 1992]. GAMBITprovides a toolkit forsolving various classesofgames,includinggameswithmorethantwoplayers available to the player. The interpreter then explores each and gameswherethe interests of theplayers arenot strictly branch of the tree corresponding to each of the options. If opposing. Since Gala allows a clear and compact specifi- thefirstargumenttochooseisaplayer, thesystemalsoadds cationof suchgames, the combinedsystem provides both a the node to the appropriate information set of that player: representation language and solution algorithms for games the one that contains all the nodes where the player has the describingmulti-agentinteractions. sameinformationstate. Theinformationstateconsistsofall Unfortunately, the normal-form algorithms are practical factsrevealedtotheplayerbytherevealprimitive,thelistof onlyforverysmallgames. Thereasonisthatthenormalform choices available to the player, and all decisions previously istypicallyexponential inthesize ofthegame tree. Thisis takenbytheplayer. Ifthefirstargumenttochooseisrandom, easytosee: Adeterministicstrategymustspecifyanactionat thenthenodeismarkedasachancenode,andtheprobability eachinformationset. Thetotalnumberofpossiblestrategies of each random choice is recorded. When the interpreter is therefore exponential in the number of information sets, encounterstheoutcomeprimitive,itaddsaleaftothetreeand whichis usuallycloselyrelatedtothesizeofthegametree. backtrackstoexploreotherbranches. Consider our poker example, generalized to a deck with (cid:26) cards. Foreachcard(cid:3) ,player1mustdecidewhethertopass 4 Solvingimperfectinformationgames orbet,andifhehastheoption,whethertopassorbetatthe Howdowefindequilibriumstrategiesinimperfectinforma- thirdround. Therearethree coursesofactionforeach (cid:3) , so tion games? This is, in general, a very difficult problem. thetotalnumberofpossiblestrategiesis3 . Player2,onthe (cid:27) ConsiderthepokerexamplefromSection2. There,wespec- other hand, must decide on her action for each card and (cid:5) ified a strategy for each of the players using six numbers. eachofthetwoactionspossibleforthefirstplayerinthefirst Whentryingtosolveagame,weneedtofindanappropriate round. Thenumberofdifferentdecisionsistherefore2 ,so (cid:10) (cid:26) setofnumbersthatsatisfiesthepropertieswewant. Thatis, thetotalnumberofdeterministicstrategiesis22 4 . Since (cid:27) (cid:27) we want totreat the parameters of the strategy as variables, thenormalformhasarowforeachstrategyofoneplayerand andsolveforthem. Thegeneralcomputationalproblemis: acolumnforeachstrategyoftheother,itisalsoexponential 1189 in ,whilethesizeofthegametreeisonly9(cid:1)(cid:0) 1. Ingeneral, ciselyanalogoustotheexpressionweobtainedforthenormal (cid:26) (cid:26) thenormal-formconversionistypicallyexponentialinterms form. Itremainsonlytospecifyconstraintson(cid:6) and guar- (cid:8) ofbothtimeandspace. anteeingthattheyrepresentstrategies. Forthenormalform, This problem makes the standard solution algorithms an theseconstraintssimplyassertedthatthesevectorsrepresent unrealisticoptionformany games. Duetothelarge branch- probabilitydistributions. Inthiscase,theconstraintsarede- ingfactorinmanygames,eventheapproachofincrementally rivedfromthefollowingfact: If(cid:4) isthesequenceforplayer (cid:18) solvingsubtreeswouldnotsufficetosolvethisproblem. (This leadingtoaninformationsetatwhichplayer hastomove,1 (cid:18) aimpppreorafecchtailnsfooremncaotiuonntegrasmoethse;rsdeieffiSceuclttiioesni6n.)theRceocnetnetxlyt,oaf saentd,t(cid:8)he1n(cid:1) (cid:0)(cid:2)w(cid:0)(cid:1)e(cid:0) (cid:1)(cid:9)m(cid:8) u(cid:27)stahreavtehethpaots(cid:14)si(cid:6)bl(cid:10)(cid:17)e m(cid:14) (cid:6)(cid:11)o(cid:10)ves(cid:0)(cid:13)at(cid:12)(cid:14)t(cid:12)(cid:15)h(cid:12)(cid:14)a(cid:0)t i(cid:14)n(cid:6)(cid:11)fo(cid:10)(cid:17)rm(cid:16) .aTtiohne 1 new approach to solving imperfect information games was only other constraints are that the realization weight of the developedbyKoller,Megiddo,andvonStengel[1994]. This emptysequenceis1(becausetherootofthegameisreached (cid:18)(cid:6) approachusesaconversiontoanalternativeformcalledthese- inanyplayofthegame),andthat(cid:14) (cid:15) 0forall(cid:4) . quenceform,whichallowsittoavoidtheexponentialblowup Notethatthesequenceformisatmostlinearinthesizeof associatedwiththenormalform. Wewilldescribethemain the game tree, since there is at most one sequence for each ideasbrieflyhere;formoredetailssee[Kolleretal.,1994]. nodeinthegametree,andoneconstraintforeachinformation The sequence form is based on a different representation set. Furthermore,itcanbegeneratedveryeasilybyasingle of the strategic variables. Rather than representing proba- pass over the game tree. The format of the sequence form resemblesthatofthenormalforminmanyways,anditappears bilitiesof individualmoves (asin thenon-linearrepresenta- tion above), or probabilities of full deterministic strategies thatmanynormal-formsolutionalgorithmscanbeconverted (as in the normal form), the variables represent the realiza- to work for the sequence form. The work of [Koller et al., 1994]focusesonthetwo-playercase.Theyprovidesequence- tion weight of different sequences of moves. Essentially, a sequence for a player corresponds to a path down the tree, formvariantsforthebestnormal-formalgorithmsforsolving but it isolates the moves under that player’s direct control, both zero-sum and general two-player games. The result whichisofmostinteresttousisthefollowing: ignoringchancemovesandthedecisionsoftheotherplayers. Inourpokergame,forexample,player1wouldhave4(cid:2)(cid:0) 1 Theorem4.1: The optimal strategies of a two-player zero- (cid:26) sequences. In additiontotheempty sequence(whichcorre- sumgamearethesolutionsofalinearprogrameachofwhose spondstotherootofthegame)hehasfoursequencesforeach dimensionsislinearinthesizeofthegametree. card(cid:3) : [beton(cid:3) ](inwhichcasethereisnothirdround),[pass Thematrixofthelinearprogrammentionedinthetheorem on (cid:3) ], [passon (cid:3) , bet inthelastround],and[passon (cid:3) , pass is essentially the sequence form. The resulting matrix can in the last round]. Player 2 also has 4(cid:3)(cid:0) 1 sequences: the (cid:26) thenbesolvedbyanystandardlinearprogrammingalgorithm, emptysequence,andforeachcard ,thefoursequences[bet (cid:5) suchasthe simplex algorithmwhich is knownto work well on afterseeingapass],[passon afterseeingapass],[bet (cid:5) (cid:5) inpractice. Wecanalso useadifferentlinearprogramming on afterseeingabet],[beton afterseeingabet]. Givena (cid:5) (cid:5) algorithmwhoseworst-caserunningtimeisguaranteedtobe randomizedstrategy,therealizationweightofasequencefora polynomial. Hence,thistheoremisthebasisforanefficient playeristheproductoftheprobabilitiesoftheplayer’smoves polynomial time algorithm for finding optimal solutions to encodedinthe sequence. Essentially, therealization weight two-playerzero-sumgames. of the sequence corresponding to a path down the tree is a conditionalprobability: theprobabilitythatthispathistaken 5 Experimentalresults giventhattheotherplayersandnatureallcooperatetomake thispossible. Theprobabilitythatapathisactuallytakenin Thesequence-formalgorithmfortwo-playerzero-sumgames agameisthereforetheproductofthe realizationweightsof hasbeenfullyimplementedaspartoftheGalasystem. The alltheplayers’sequencesonthatpath,timestheprobability systemgeneratesthesequenceform, createsthe appropriate ofallthechancemovesonthepath. linearprogram,andsolvesitusingthestandardoptimization Thesequenceformofatwo-playergameconsistsofapay- library of CPLEX. We compared this algorithm to the tradi- offmatrix andalinearsystemofconstraintsforeachplayer. tionalnormalformalgorithmbyusingGAMBITtoconvertthe (cid:19) Inatwoplayergame, the throwof correspondstoase- gametreesgeneratedbyGalatothenormalform,andCPLEX (cid:1) (cid:18) (cid:19) (cid:5) tosolvetheresultinglinearprogram. Weexperimentedwith quence(cid:4) forplayer1,andthe thcolumntoasequence (cid:4) forplayer12. Theentry(cid:5) (cid:1)(cid:5) isthe(cid:20) weightedsumofthepayof2f twogames: thesimplifiedpokergamedescribedinSection2, increasingthenumberofcardsinthedeck;andaninspection attheleaves thatarereachedbythispairofsequences(they game which has received significant attention in the game areweightedbytheprobabilitiesofthechancemovesonthe theorycommunityasamodelofon-siteinspectionsforarms path). If apair ofsequencesis not consistentwith any path controltreaties[Avenhausetal.,1995]. Theresultingrunning toaleaf,thematrixentryiszero. So,forexample,thematrix timesareshowninFigure3. Theyareasonewouldexpectina entryforthepairofsequences[beton2]and[passon1after comparisonbetweenapolynomialandexponentialalgorithm. seeing a bet] is 1. The matrix entry for the pair [bet on 2] Theseresultsare continuedfor thesequenceform inFig- and[passon1afterseeingapass]is0, sincethispairisnot ure4. (Itwasimpossibletoobtainnormal-formresultsforthe consistentwithanyleaf. Wenow solve (cid:0)(cid:11)(cid:10) usingrealizationweights asour strate- largergames.) There,wealsoshowthedivisionoftimebe- (cid:2) (cid:7)(cid:6) tweengeneratingthesequenceformandsolvingtheresulting gicvariables. Wewillhaveavariable(cid:14) foreachsequence (cid:4) 1 of player 1, and a variable (cid:18) (cid:6) 2 for ea1ch sequence (cid:4) 2 of 1Thisformulationrequiresthattheplayersneverforgettheirown player 2. Using the analysis above, we can show that the moves or informationtheyonce had. Thisimpliesthat thereis at expectedpayoffofthegame (cid:13) (cid:0) (cid:6) (cid:1)(cid:9)(cid:8) (cid:2) is (cid:6) (cid:22) (cid:19) (cid:8) . This ispre-1190mostonesequence(cid:19) leadingtothisinformationset. 250 600 Normal form Normal form Sequence form Sequence form 500 200 c) c) e e me (s 150 me (s 400 g ti g ti 300 n n nni 100 nni u u al r al r 200 ot ot t 50 t 100 0 0 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 number of nodes in tree number of nodes in tree Poker Inspectiongame Figure3: Normalformvs.sequenceformrunningtime 160 2500 Solve Solve 140 Total Total 2000 120 100 c) c) 1500 e e e (s 80 e (s m m ti 60 ti 1000 40 500 20 0 0 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000 number of nodes in tree number of nodes in tree Poker Inspectiongame Figure4: Timeforgeneratingandsolvingthesequenceform linearprogram. Forthepokergames,wecanseethatgener- attempttodiscouragethegamblerfrom‘changinghismind’ atingthe sequenceformtakesthe bulkof the time. Solving andbettingonthefinalround. Inmorecomplexgames,wesee even the largest of these games takes less than 10 seconds. otherexampleswhere“human”behavior(e.g.,underbidding) This leads us to believe that these techniques can be made isgame-theoreticallyoptimal. to run considerably faster by optimizing the sequence-form generator. Finally,notethatthealgorithmismuchfasterfor 6 Discussion pokergamesthanfortheinspectiongames. Inthefullpaper, As inthe case of perfectinformation games, game trees for weexplaintheseresults,anddefinecertaincharacteristicsof full-fledgedgamesareoftenenormous. Althoughweexpect a game that tend to have a significant effect on the running to solve games with hundreds of thousands of nodes in the timeofthesequence-formalgorithm. nearfuture,full-scalepokerismuchlargerthanthatanditis As we remarked above, the final component of the Gala unlikely we will be able to solve it completely. Of course, system reads in the strategies computed by this algorithm, chess-playing programs are very successful in spite of the andinterpretsthem inawaythatismeaningfulwithrespect fact that we currently cannot solve full-scale chess. Can to the game. In particular, it allows the strategies to be ex- weapplythestandardgame-playingtechniquestoimperfect amined by the user, who can then use them as part of the informationgames? Webelievethattheanswerisyes,butthe decision-makingprocess. We have discovered that examin- issueisnontrivial.Eventheconceptofa‘subtree’isnotwell- ingthesestrategiesoftenyieldsinterestinginsightsaboutthe defined in such games. For one thing, the program cannot game. Figure 5 shows the strategies for both players in an simplycreatethesubtreestartingatthecurrentstate,sinceit eightcardsimplifiedpoker. Considertheprobabilitythatthe doesnot knowpreciselywhich node ofthe game treeis the gamblerbetsinthefirstround: itisfairlyhighona1,some- actualstateofthegame;itknowsonlythatthenodeisoneof whatlowerona2,0onthemiddlecards,andthengoesupfor thoseinacertaininformationset.Inaddition,informationsets thehighcards. Thebehaviorforthelowcardscorrespondsto belongingtootherplayersmaycrossthe“subtreeboundary,” bluffing,acharacteristicthatone tends toassociatewith the aswasthecaseinFigure1. Itisnotobvioushowtodealwith psychologicalmakeupofhumanplayers. Similarly,aftersee- theseproblems. Wehopetoaddressthisissueinfuturework. ingapassinthefirstround,thedealerbetsonlowcardswith Anotherapproachthatmaywellprovefruitfulisbasedonthe veryhighprobability.Psychologically,weinterpretthisasan observation that there is a lot of regularity in the strategies 1191 1 1 first round after seeing pass second round after seeing bet 0.8 0.8 g g n n betti 0.6 betti 0.6 of of y y bilit 0.4 bilit 0.4 a a b b o o Pr Pr 0.2 0.2 0 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Card received Card received Dealer Gambler Figure5: Strategiesfor8cardpoker forsmallpokergames: theplayeroftenbehavesthesamefor References a variety of different hands. This suggests that in order to [Avenhausetal.,1995] R. Avenhaus, B. von Stengel, and solvelarge games, wecouldabstract awaysome featuresof S.Zamir.Inspectiongames.InHandbookofGameTheory, thegame,andsolvetheresultingsimplifiedgamecompletely. Vol.3,toappear.North-Holland,1995. Forthegameofpoker,wecouldabstractbypartitioningtheset of possibledealsintoclusters, and then solvetheabstracted [Blairetal.,1993] J.R.S. Blair, D. Mutchler, and C. Liu. Gameswithimperfectinformation.InWorkingNotesAAAI game. Our experimental results indicate that the resulting FallSymposiumonGames: PlanningandLearning,1993. strategieswouldbeveryclosetooptimal. [Gordon,1993] S. Gordon. A comparison between prob- Mostofthetechniqueswediscussedinthispaperalsoapply abilistic search and weighted heuristics in a game with tomoregeneralclassesofgames. Galaprovidesthefunction- incompleteinformation.InWorkingNotesAAAIFallSym- ality forspecifyingarbitrarymulti-playergames. Currently, posiumonGames: PlanningandLearning,1993. thesecanonlybe solvedusingthetraditional(normal-form) algorithmsaccessedthroughourGAMBITinterface,andthese [Kolleretal.,1994] D.Koller,N.Megiddo,andB.vonSten- are practical only for small games. However, the sequence gel. Fast algorithms for finding randomized strategies in formcanbeusedtorepresentanyperfectrecallgame,andthe gametrees. InProceedingsofthe26thAnnualACMSym- resultsof[Kolleretal.,1994]indicatethatmanyofthestan- posiumontheTheoryofComputing,pages750–759,1994. dardtechniquescouldcarryoverfromthenormalformtothe [Kuhn,1950] H.W. Kuhn. A simplified two-person poker. sequenceform. Wehopetousethesequenceformapproach InContributionstotheTheoryofGamesI,pages97–103. for more general games, and show that the resulting expo- PrincetonUniversityPress,1950. nentialreductionincomplexityindeedoccursinpractice. If so,theresultingsystemmayallowananalysisofmulti-player [McKelvey,1992] R.D.McKelvey. GAMBIT: InteractiveEx- games, a class of games that have been largely overlooked. tensiveFormGameProgram. CaliforniaInstituteofTech- Perhaps more importantly,the system could also be usedto nology,1992. solvegamesthatmodelmulti-agentinteractionsin‘reallife’. [Pell,1992] B. Pell. Metagame in symmetric, chess-like games.InHeuristicProgramminginArtificialIntelligence WebelievethattheGalasystemfacilitatesfutureresearch 3—TheThirdComputerOlympiad.EllisHorwood,1992. into these and other questions. Its ability to easily specify games of different types and to generate many variants of [RussellandNorvig,1994] S.J. Russell and P. Norvig. Ar- eachgameallowsanynewapproachtobeextensivelytested. tificial Intelligence: A Modern Approach. Prentice Hall, We intend to make this system available through a WWW 1994. site (http://www.cs.berkeley.edu/˜daphne/gala/), [SmithandNau,1993] S.J.J.SmithandD.S.Nau. Strategic inthehopethatitwillprovidethefoundationforotherwork planning for imperfect-information games. In Working onimperfectinformationgames. Notes AAAI Fall Symposium on Games: Planning and Learning,1993. Acknowledgements [vonNeumannandMorgenstern,1947] J.vonNeumannand O. Morgenstern. The Theory of Games and Economic WearedeeplygratefultoRichardMcKelveyandTedTurocy Behavior. PrincetonUniversityPress,2ndedition,1947. forgoingoutoftheirwaytoensurethattheGAMBITfunctional- [Zermelo,1913] E. Zermelo. U¨ber eine Anwendung der ityweneededforourexperimentswasreadyontime. Wealso MengenlehreaufdieTheoriedesSchachspiels.InProceed- thanktheInternationalComputerScienceInstituteatBerke- ingsoftheFifthInternationalCongressofMathematicians leyforprovidingusaccesstotheCPLEXsystem. Wealsowish II,pages501–504.CambridgeUniversityPress,1913. tothankNimrodMegiddo,BarneyPell,StuartRussell,John Tomlin,andBernhardvonStengelforusefuldiscussions. 1192

Games that boost performance : [30 ready-to-use group activities] PDF

8 Pages·2005·2.405 MB·English

by Steve Sugar, Carol Willett

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Games that boost performance : [30 ready-to-use group activities]

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.