ebook img

DTIC ADA593080: Learning Hierarchical Skills for Game Agents from Video of Human Behavior PDF

0.44 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA593080: Learning Hierarchical Skills for Game Agents from Video of Human Behavior

Learning Hierarchical Skills for Game Agents from Video of Human Behavior NanLi and DavidJ.Stracuzzi and GaryCleveland and PatLangley SchoolofComputingandInformatics,ArizonaStateUniversity TolgaKo¨nik and DanShapiro and KamalAli ComputationalLearningLaboratory,StanfordUniversity MatthewMolineaux DavidW.Aha KnexusResearch,Springfield,VA NavalResearchLaboratory,Washington,DC Abstract skills are acquired in a cumulative manner, with new skills building on those previously learned, and are based on con- Developing autonomous agents for computer ceptual knowledge that includes temporal constraints. The games is often a lengthy and expensive undertak- learnedskillsalsoreproducetheobservedbehaviorfaithfully ing that requires manual encoding of detailed and withcomparableefficacyinspiteofdifferencesbetweenthe complex knowledge. In this paper we show how observationanddemonstrationenvironments. toacquirehierarchicalskillsforcontrollingateam WebeginbyintroducingtheRush2008footballsimulator, ofsimulatedfootballplayersbyobservingvideoof which we use to demonstrate our approach. We then briefly collegefootballplay. Wethendemonstratethere- reviewthe ICARUS agentarchitecture, andpresentourwork sultsintheRush2008footballsimulator,showing onacquiringstructureddomainknowledgefromobservedhu- thatthelearnedskillshavehighfidelitywithrespect man behavior within this framework. Next, we outline the totheobservedvideoandarerobusttochangesin video preprocessing steps, the application of ICARUS to the theenvironment. Finally,weconcludewithdiscus- processedvideo,andtheapplicationoftheacquiredskillsin sionsofthisworkandofpossibleimprovements. Rush. Finally,weconcludewithasummaryofrelatedwork anddirectionsforfuturedevelopment. 1 Introduction 2 TheRushFootballSimulator Constructingintelligentagentsforcomputergamesisanim- portant aspect of game development. However, traditional Rush 2008, a research extension of Rush 20051, simulates methods are expensive, and the resulting agents only func- playinaneightplayervariantofAmericanfootball. Forthe tion in narrowly defined circumstances. Agent architectures remainder of the paper, we assume a basic familiarity with [Newell,1990],whicharedesignedtomodelhuman-levelin- the rules and objectives of American football. The simula- telligence,canhelpsimplifythistaskbyreducingtheneedfor tor encodes several types of offensive and defensive forma- developerstoexplicitlyprovideagentswithintelligentbehav- tions, along with several distinct plays (coordinated strate- iors.Developersneedonlyspecifystaticknowledgeaboutthe gies) for each formation. Each player is assigned a role, domain, while the architecture constructs the necessary be- such as quarterback (QB), running back (RB), or left wide haviorspecifications,andhandlesthereasoninganddecision receiver (LWR). Players are controlled either by assigning makingrequiredfortheirexecution. themahigh-levelgoalsuchaspassroutecrossoutatyard15, Recent research [Pearson and Laird, 2004; Nejati et al., whichinstructsareceivertorun15yardsdownfield,makea 2006] aims to simplify the knowledge acquisition process. hardrightturn, andthenkeeprunningtotrytocatchapass, We expand upon this work by learning from video data, by orbyassigningspecificinstructionssuchasstrideforwardon representing and acquiring temporal knowledge, and by re- eachclocktick. ducing the amount of expert input required by the system. We instrumented Rush so that ICARUS can perceive all In the following, we present a system that learns hierarchi- players and objects on the field and control the actions of calskillsforfootballplayersbyobservingvideosofcollege eachoffensiveplayeronatick-by-tickbasis. Eachoffensive football plays, and then executes those skills in a simulated playersharesperceptionandknowledgewiththeotheroffen- football environment. The system takes in discrete percep- sive players, and carries out actions instructed by ICARUS. tual traces generated from the video, analyzes these traces ExamplesofthetypesofactionsavailabletoICARUSinclude with existing knowledge, and learns new skills that can be appliedwhencontrollingagentsinafootballsimulator. The 1http://rush2005.sourceforge.net/ Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2009 2. REPORT TYPE 00-00-2009 to 00-00-2009 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Learning Hierarchical Skills for Game Agents from Video of Human 5b. GRANT NUMBER Behavior 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Knexus Research Corp,9120 Beachway Lane,Springfield,VA,22153 REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES In U. Kuter & H. Mu?oz-Avila (Eds.) Learning Structural Knowledge from Observations: Papers from the IJCAI Workshop (Technical Report WS-09-21). Pasadena, CA: AAAI Press 2009 14. ABSTRACT Developing autonomous agents for computer games is often a lengthy and expensive undertaking that requires manual encoding of detailed and complex knowledge. In this paper we show how to acquire hierarchical skills for controlling a team of simulated football players by observing video of college football play. We then demonstrate the results in the Rush 2008 football simulator, showing that the learned skills have high fidelity with respect to the observed video and are robust to changes in the environment. Finally, we conclude with discussions of this work and of possible improvements. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE Same as 6 unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 (throwTo <receiver>), which instructs the QB to pass to a Table1: Sampleconceptsforthefootballdomain. specific teammate, and (stride <direction)>, which tells a playertoruninoneofeightdirectionsforoneclocktick.De- ;?passerdroppedback?n-stepsafterreceivingthesnap fensiveplayersarecontrolledbythesimulator,whichselects ((dropped-back?passer?n-steps) oneofseveralavailablestrategiesforeachplay. :relations (((snap-completed?passer?ball)?snap-startNOW) ((possession?passer?ball)?poss-start?poss-end) ((moved-distance?passer?n-stepsS) 3 TheICARUSArchitecture ?mov-start?mov-end)) Our framework forplay observation, execution andlearning :constraints((≤?snap-start?poss-start) is the ICARUS architecture, which is an instance of a uni- (≤?mov-end?poss-end) fied cognitive architecture [Newell, 1990]. ICARUS shares (=?mov-endNOW))) manyfeatureswithotherarchitectureslikeSoar[Lairdetal., ;learnedstartconditionforskilldropped-back 1986] and ACT-R [Anderson, 1993], including a distinction ((scdropped-back-c1?passer) betweenshort-termandlong-termmemories,andgoal-driven :relations (((possession?passer?ball)?pos-start?pos-end) butreactiveexecution.Italsohasmanynovelfeaturesinclud- ((snap-completed?passer?ball)?snap-startNOW)) ingacommitmenttoseparatestorageforconceptualandskill :constraints((≤?snap-start?pos-start))) knowledge, the indexing of skills by the goals they achieve, and architecturalsupport for temporal knowledge [Stracuzzi et al., 2009]. In this section, we summarize the basic as- dropped-backstatesthat?passermusthavepossessionofthe sumptions of the framework along with its operational pro- balluntilhehasfinisheddroppingback. cedurestoprovidebackgroundforthenewlearningmethods. ICARUS updates belief memory by matching the general- Webeginbydescribingtheconceptualandbeliefrepresenta- ized concept definitions to percepts and existing beliefs in a tionsalongwiththeinferenceprocess,andthendescribeskill bottom-up manner. Concepts that depend only on percepts structuresandtheexecutionmechanism. arematchedagainsttheperceptualbufferfirst,andtheresults are placed in belief memory. This triggers matching against 3.1 Beliefs,ConceptsandInference higher-level concept definitions. The process continues un- tilthedeductiveclosureofperceptual,beliefandconceptual Reasoning about the environment is a principal task per- memoryhasbeencomputed. formed by intelligent agents, and determines which actions the agent must carry out to achieve its goals. ICARUS per- 3.2 Goals,SkillsandExecution forms the inference task by matching the generalized con- ceptualstructuresinlong-termmemoryagainstperceptsand Afterinferringasetofbeliefsaboutitsenvironment,ICARUS beliefsstoredinshort-termmemory. Oneachcycle,anagent nextevaluatesitsskillknowledgetodeterminewhichactions receives low-level perceptual information about its environ- to take in the environment. For this, the architecture uses ment. Eachperceptdescribestheattributesofasingleobject a goal memory, which stores goals that the agent wants to intheworld,andisshortlived. Forexample,theperceptsde- achieve. It then retrieves skills that are able to achieve its rivedfromthefootballvideolastonlyforthedurationofone goalsfromlong-termskillmemory. videoframe(1/30th ofasecond)beforebeingreplacedwith Skill memory contains a set of hierarchical skills indexed newinformation. byconceptsdefinedintheconceptualmemory.Eachskillde- Intelligent behavior requires more than low-level percep- scribes a method that the agent can execute to achieve the tualinformation. ICARUSthereforeproduceshigher-levelbe- condition described by the associated concept. Each skill liefsabouttheenvironmentbasedonpercepts. Beliefsrepre- controls only a single agent (player). Coordination among sentrelationsamongobjectsintheenvironment,andhavetwo agentsisachievedbyrelyingonspecifictimingeventsthatare associatedtimestampsthatindicatethebeginningandendof observedbymultipleagents. Skillsconsistofahead,which theperiodinwhichthebeliefisknowntobetrue. Allofthe specifiesthegoalthattheagentachievesbycarryingoutthe inferredbeliefsforthecurrentepisodeareretainedinasim- skill,asetofstartconditionsthatmustbesatisfiedtoinitiate pleepisodicbeliefmemory,sothattheagentcanreasonabout theskill,andabodythatstatestheorderedsubgoalsthatthe eventsovertime. agentmustachievetoreachtheskill’sgoal. Aprimitiveskill ICARUS beliefs are instances of generalized concepts is one that refers only to actions that are directly executable stated in a hierarchically organized, long-term conceptual intheenvironment,whileanon-primitiveskillreferstoother memory.Eachconceptdescribesaclassofenvironmentalsit- skillsassubgoals. uationsusingarelationallanguage;itincludesahead,which Givenskillmemoryandoneormoregoals, theexecution consistsofapredicatewitharguments,andabody,whichde- modulefirstselectsanunsatisfied,top-levelgoal.Sinceskills finesthesituationsunderwhichtheconceptistrue. Thebody can refer to other skills, the system next finds an applicable hasarelationsfieldaswellasaconstraintsfield.Therelations paththroughtheskillhierarchy,whichbeginswithaskillthat fieldspecifiesthesubconceptsonwhichtheconceptdepends can achieve the selected goal and terminates with one that alongwiththeirassociatedtimestamps,whichcorrespondto refers to actions the agent can execute in the environment. the timestamps on beliefs. The constraints field uses these ICARUS continues executing along this path until the goal timestampstodescribethetemporalrelationshipsamongthe is achieved or the selected skills no longer apply, in which subconcepts. Forexample,inTable1theconstraintsfieldof caseitmustdetermineanewskillpathfromthecurrentstate The agent first tries to explain its observations with exist- Table2: Sampleskillsfromfootball. ingskillsbyretrievingallskillsthatcanachievethetop-level ;learnedskillforrunningashortreceiverpattern goal. Itthenchecksthatthestartconditionsandsubgoalsare ((short-pattern-completed?agent?drop-steps?dir) consistentwiththeagent’sobservationsandinferredbeliefs. :start ((scshort-pattern-completed-c1?dir)) Skills not consistent with these are ignored. The agent then :subgoals((moved?agent?dir) selectsacandidateskillandparsestheobservationsequence (pass-blocked-until-dropped-back?agent?drop-steps) into subsequences based on the times when the start condi- (moved-until-ball-caught?agent?dir))) tions and subgoals were achieved. The explanation process ;learnedskillforreceivingapassafterrunningashortpattern then recurses on each subsequence (corresponding to a start ((short-reception-completed?receiver?drop-steps?dir) conditionorsubgoal)untiltheentireobservationsequenceis :start ((scshort-reception-completed-c1?ball)) explainedbyasequenceofprimitiveskills. :subgoals((short-pattern-completed?receiver?drop-steps?dir) If ICARUS failstoexplaintheobservationsequenceusing (ran-with-ball-until-tackled?receiver?ball))) skills, it attempts to explain it using concepts. First, it re- trieves the belief corresponding to the goal from the belief memory, along with lower-level beliefs that support it. The to the selected goal. This skill selection mechanism enables agent then divides the observed sequence into subsequences ICARUS to be persistent to its previous choice, while being basedonthesesupportinglower-levelbeliefs. Aswithskill- reactivetotheworldifunexpectedeventsoccur. basedexplanations,theagentrecursivelyattemptstoexplain Consider the skills shown in Table 2. The first, short- each subgoal. The observation sequence is considered suc- pattern-completed,producesasituationinwhichaplayerini- cessfullyexplainedifeither(1)thereexistsaskillforthegoal tially blocks while the passer completes his drop-back, and thatisapplicableatthebeginningofthetrace,inwhichcase then runs in direction ?dir until the ball is caught. The re- thereisnoneedtolearn,or(2)allthesubgoalsofthecurrent ceiver pattern ends if and when the ball is caught. Thus, a tracehavealreadybeensuccessfullyexplained. goal (short-reception-completed FB 7 E) indicates that the fullback (FB) is the intended receiver and runs a short pat- 4.2 ConstructingNewSkills terntotheright(east)afterallowingthequarterbacktomake After successfully explaining an observation sequence, the aseven-stepdrop. agent learns skills for the achieved goal based on the expla- nationgeneratedinthepreviousstep. Theconstructedskills 4 LearningSkillsfromVideo then allow the agent to achieve these goals in the future un- der similar conditions without additional help from human The goal of this work is to learn hierarchically structured experts. Newskillsaregeneratedintwoways,dependingon skills from preprocessed videos in a cumulative manner and howtheexplanationwasconstructed. inthecontextoftemporalconstraints. Inthissection,weout- If the explanation is based on skill knowledge, then the lineourmethodforacquiringsuchskillsbyobservingother subgoalsofthenewskillarethechronologicallyorderedstart actorsastheyachievesimilargoals.Theinputtoourlearning conditionsandsubgoalsfromtheskillsthatprovidedtheex- algorithm consists of a goal, a set of concepts sufficient for planation that were achieved during the observed sequence. interpreting the observed agent’s behavior, the set of prim- The start condition for a new skill includes the start condi- itive skills available to the ICARUS agent plus any known tions and subgoals that were true before or during the first non-primitiveskills,andthesequenceofobservedperceptual cycleoftheobservedsubsequence. states. Iftheexplanationisbasedonconceptualknowledge,then The algorithm runs in three steps. First, the system ob- the learner retrieves the concept definition corresponding to serves the entire video-based perceptual sequence on actors the goal. The subgoals of the new skill correspond to the achievingtheintendedgoalandinfersbeliefsabouteachstate subconceptsthatwereachievedduringtheobservedsequence (frame). Next,theagentexplainshowthegoalwasachieved in chronological order. The start condition of the new skill using existing knowledge. Finally, the algorithm constructs includes the generalized lower-level beliefs (subgoals from theneededskillsalongwithanysupportingknowledge,such the explanation as in Section 4.1) that were true before or asspecializedstartconditions,basedontheexplanationgen- duringfirstcycleoftheobservedsubsequence. eratedinthepreviousstep. Inthefollowing,wedescribethe Sincetheconceptdefinitionincludesinformationaboutthe explanationandlearningtasksindetail. temporalrelationshipsamongitssubconcepts(andtherefore theskill’ssubgoals),thelearnedskillshouldretainthisinfor- 4.1 ExplainingObservedBehavior mationinitsstartcondition. Theagentthereforeconstructsa Givenagoalandtheassociatedbeliefmemory,theagentmust startconditionconceptbyextractingthestartconditionsub- explainhowthegoalwasachievedbytheobservedactor.Re- conceptsaswellastheirassociatedtemporalconstraintsfrom callthatbeliefmemoryisepisodicandbasedonhierarchical the original concept definition. Note that the learned start concepts,soitcontainssufficientinformationtodescribethe condition concepts describe both what must be true in the entiresequenceatseverallevelsofabstraction. Explanations current state and conditions that must have held in the past aregeneratedbyeithermatchinginferredbeliefsagainstex- toensurethattheskillapplies. istingskillknowledge,orbymatchinginferredbeliefsagainst Examplesoflearnedconceptsandskillsinthefootballdo- knownconceptdefinitions. mainareshowninTables1and2.Ourlearningmechanismis run E run NE run E run N run N run N with−ball line of run− scrimmage LT LG C RG RT LWR block block QB block block RTE RWR run S block run E catch block RB RB scramble Figure1: Atypicalframefromthefootballvideo. throw Figure2:DiagramofthepassplayobservedbyICARUSwith incrementalinthesensethatconstructedconceptsandskills annotationsindicatingactionstakenbyindividualplayers. canserveasdomainknowledgeforfuturetraceexplanation. Thisenablesourmechanismtoacquireincreasinglycomplex The tracking data used here is the same as used by Hess domainknowledgeovertime. and Fern [2009] for their work on supervised learning. The authorsmanuallyprovidedactivitylabelsforeachplayer,and 5 DemonstrationandEvaluationinRush used an automated technique [Hess et al., 2007] to provide Our evaluation objectives are to assess whether the learned the player labels. From these labeled tracks, we generated skills have both high fidelity with respect to the original theperceptionsequencefortheplay. video, and utility similar to the observed play with respect 5.2 MappingCollegePlaysintoRush to the game of football. To evaluate the former, we visu- ally compared the player tracks produced by executing our Asnoted, ateaminRushconsistsofeightplayerswhilethe learned agents in Rush to the player tracks from the source teamsfromthevideohaveelevenplayers. Wethereforemap video. Forthelatter,wecomparedtheyardagegainedbythe theeleven-playerskillslearnedfromthevideointoskillsfor learnedagentstoyardagegainedbyhand-constructedagents the eight Rush players. In practice, we can accomplish this forthesameplay. simply by dropping the goals associated with three of the Threestepswererequiredtoacquiretheskillsfromvideo players. Ideally, we would ignore the players that have the andthenexecutetheminRush. First,therawvideowaspre- leastimpactontheplay. processedtoproduceasequenceofICARUSperceptualstates. Figure 2 shows a play diagram for the specific play ob- Second, we applied the learning system to construct skills servedby ICARUS duringtesting(thisinformationisnotin- from the sequence. Finally, we mapped the learned eleven- cluded in the perceptual sequence). Rush plays typically player skills onto an eight-player team, and executed the re- include three offensive linemen (RG, C, and LG) along sultingskillsinRush. Belowweprovidedetailsforthispro- withsomecombinationoffourrunningbacksandreceivers. cess,alongwiththeresults. The play shown in Figure 2 has five linemen and five backs/receivers. To map this play into Rush, we therefore 5.1 PreprocessingtheCollegeFootballVideo droppedtwolinemen(RTandLT)andonerunningback(RB, OurrawvideodatawasobtainedfromtheOregonStateUni- leftside),allofwhichhadtop-levelgoalsofpass-blockingfor versityfootballteamandcorrespondstothatusedbycoaches thedurationoftheplay. ICARUSusedthegoalsfortheeight duringgameanalysisandplanning. Itwasshotviaapanning remainingplayerstoguideexecutioninRush. and zooming camera that is fixed at a mid-field location on 5.3 EvaluatingPlaysinRush top of the stadium. The video is qualitatively different than typicaltelevisionfootageinthatthecameraoperatorattempts Thelearningtaskinthisworkistoacquirehierarchicalskills tokeepasmuchoftheactioninviewaspossible. Atypical forallelevenoffensivefootballplayersonthefieldforagiven shotfromourvideoisshowninFigure1. play. To evaluate the learned skills, we let ICARUS control ForICARUStolearnfromvideo,itmustbeprocessedintoa theoffensiveplayersinafootballsimulatorusingthelearned sequenceof ICARUS perceptions. Specifically, therepresen- skills. Ideally,theplayproducedbythelearnedskillsshould tation should include: (1) player tracks, which give the 2D look similar to play observed in the video, while advancing fieldcoordinateofeachplayerateachtimepointduringthe theballonthefield. play,and(2)playerlabels,whichdescribethefunctionalrole Figures 3 and 4 show the tracks generated by each offen- ofeachplayer(e.g. quarterback, runningback, etc), and(3) siveplayerfortheobservedandsimulatedplaysrespectively. activitylabels,whichdescribethehigh-levelactivityofeach Bothfiguresareplottedonthesamescalehorizontally,simi- playerthroughouttheplay. larscalesvertically,andusethesameinitialballlocation.The downfield C RTE RWR 8.0 yd gain 0 yd gain downfield LWR LT LG RG RT line of scrimmage 8. FB LWR LG C RG RTE line of scrimmage RB FB RWR QB QB Figure3:TracksofplayersobservedbyICARUSinthevideo. Figure 4: Tracks of players generated by ICARUS using learnedskillsintheRushsimulator. tracks generated by ICARUS represent an idealized version of the tracks from the video. Some of the differences, such asroundversussquareturns,derivefromthesimplephysics and patterns) to establish an average. However, we do not used in Rush. Others, such as differences in the tracks of have such data for the observed video. Doing so would re- the quarterback (QB) and blockers (C, LG, RG), stem from quireustocollectandanalyzevideobasedonthesameplay the behaviors of the defensive players. For example, after from several different offensive teams against several differ- completinga7-yarddrop(sameinbothfigures),thequarter- ent defensive teams. In practice, this type of play is usually backsimply“scrambles”inanydirectionnecessarytoavoid aimed at gaining relatively small distances (up to 10 yards), theoncomingdefensewhilewaitingfortheintendedreceiver suggestingthatourresultisreasonable. (FB) to become available. This is equally true in collegiate gamesandinthesimulator. 6 RelatedandFutureWork Onedifferencebetweentheobservedandsimulatedplays relates to the fullback (FB), who does not run as far to the Acquiringdomainknowledgeisanimportanttaskinbuilding right either before or after catching the ball in the simula- intelligentagents. PearsonandLaird[2004]acquiredomain tor. Prior to the catch, this is partly due to differences in knowledgeguidedbysubjectmatterexperts, whileNejatiet the relative speed of the players, and partly due to timing al. [2006]learnhierarchicaltasknetworksbyobservingpre- differences between the video (one tick equals 1/30th of a processed expert traces. Likewise, Hogg et al. [2008] de- second)andRush(onetickequals1/10th ofasecond). Fur- scribe an algorithm that acquires hierarchical task networks ther investigation into these timing differences may yield a from a set of traces and semantically-annotated tasks. Tak- higherfidelityresponse. Afterthecatch,thedifferenceisdue ingadifferenttack,ForbusandHinrichs[2004]demonstrate toinsufficientparameterizationoftherun-with-ballprimitive skilllearningbyanalogy. Noneoftheseapproachescurrently skill. Currently, the skill causes the receiver to run directly supportlearningfromobservedbehaviororlearningcoordi- downfield regardless of oncoming defenders. To reproduce nated multi-agent strategies. Moreover, the start conditions theplaymorefaithfully, thisskillshouldcausetheplayerto oftheconstructedmethodsonlycheckthecurrentstateofthe runtothemostopenpartofthefield. world,whileourapproachenablesthestartconditionstode- A second important divergence involves the tight end scribethepaststatesoftheworld. (RTE), who does not turn northeast in his simulated route. Other game-related efforts have aimed at reducing the ef- A closer look at the trace data shows that the ball is caught fort required to construct game-agents. For example, Kelly beforehemakestheturn,whichendstheskillgoverninghis et al. [2008] use offline planning with hierarchical task net- route (run north for 17 yards, then northeast until the ball is works to generate scripts automatically. Changes in the en- caught). This is again an artifact of timing differences be- vironment do not affect the behavior of these constructed tweenthesimulatorandthevideo,whichhighlightstheneed agents, whereas the agents built by our system will perform to fine-tune plays after they are learned. We return to this reactively.Asecondapproachusesreinforcementlearningal- issueinthenextsection. gorithms[BradleyandHayes,2005;NasonandLaird,2005] Finally, both plays produced similar yardage results, but toallowanagenttolearnthroughexperience. Ourapproach thiswillnotalwaysbethecase.Evaluatingdifferencesinper- differs because learning is analytical and based on observed formanceisproblematic.Althoughnotdonehere,wecanrun video rather than agent experience, which makes our ap- theplayrepeatedlyinRushwhilevaryingthecharacteristics proach much more computationally efficient. Our approach ofboththeoffensiveanddefensiveplayers(suchasspeedand also differs from these in that ICARUS is an integrated ar- agility),alongwiththedefensivestrategy(suchasformation chitecturethatcombinesobservation,inference,learningand executioninasinglesystem. [BradleyandHayes,2005] Jay Bradley and Gillian Hayes. There are many possible avenues for future development. Group utility gunctions: Learning equilibria between Forexample,weneedtoconductdetailedexperimentsacross groupsofagentsincomputergamesbymodifyingtherein- manydifferentplaystoexaminethestrengthsandweaknesses forcementsignal.InProceedingsoftheIEEECongresson of our approach. We are also extending our approach to ac- EvolutionaryComputation,pages1914–1921,Edinburgh, quire skills for coordinating among multiple agents. Ideally UK,2005.IEEEPress. thesystemwouldlearnknowledgeabouttimingandcooper- [ForbusandHinrichs,2004] Ken Forbus and Tom Hinrichs. ationamongmultipleagents. Themaindifferencesbetween Companion cognitive systems: A step towards human- themulti-agentsystemandthecurrentsystem,wouldbethat level AI. In Proceedings of the AAAI Fall Symposium (1)beliefs,explanationsandskillswouldbegeneratedforall onAchievingHuman-levelIntelligencethroughIntegrated playersinasingleepisode,ratherthanmultiplesingle-player Systems and Research, Washington, DC, 2004. AAAI episodes, and (2) additional, high-level multi-player coordi- Press. nationskillswouldalsobeacquired. Thiswouldrequireonly [HessandFern,2009] RobinHessandAlanFern. Discrim- minorextensionstotheknowledgerepresentation, inference inatively trained particle filters for complex multi-object andexecutionmodules. tracking. InProceedingsoftheIEEEConferenceonCom- A second extension involves modifying the learned skills puter Vision and Pattern Recognition, Miami, FL, 2009. and the top-level goal parameters (such as number of yards IEEEPress. that a receiver runs before changing direction) by learning withinRush.Forexample,systematicerrorssuchasreceivers [Hessetal.,2007] RobinHess,AlanFern,andEricMorten- running into heavy coverage can be detected and revised. son. Mixture-of-partspictorialstructuresforobjectswith This constitutes a structural revision to the learned skills. variable part sets. In Proceedings of the Eleventh IEEE Similarly, ICARUS can modify the form of learned plays by International Conference on Computer Vision, Rio de adjustingtheparameters(arguments)inthegoals. Forexam- Janeiro,Brazil,2007.IEEEPress. ple,timingplaysthatrequirereceiverstorunnyardsdown- [Hoggetal.,2008] Chad Hogg, He´ctor Mun˜oz-Avila, and fieldbeforeturningtocatchtheballcanbeadjustedbytuning Ugur Kuter. HTN-MAKER: Learning HTNs with mini- thevalueofn. mal additional knowledge engineering required. In Pro- ceedingsoftheTwenty-ThirdConferenceonArtificialIn- 7 Conclusion telligence,Chicago,2008.AAAIPress. Constructingautonomousagentsisanessentialtaskingame [Kelleyetal.,2008] John Paul Kelley, Adi Botea, and Sven development. In this paper, we outlined a system that an- Koenig. Offlineplanningwithhierarchicaltasknetworks alyzes preprocessed video footage of human behavior in a in video games. In Proceedings of the Fourth Artificial collegefootballgame,constructshierarchicalskills,andthen IntelligenceandInteractiveDigitalEntertainmentConfer- executes them in a football simulator. The learned skills in- ence,Stanford,CA,2008.AAAIPress. corporatetemporalconstraintsandprovideforavarietyofco- [Lairdetal.,1986] JohnE.Laird, PaulS.Rosenbloom, and ordinatedbehavioramongtheplayers. Althoughthelevelof AlanNewell.ChunkinginSoar:Theanatomyofageneral precisioninICARUS’controlneedsimprovement,theresults learningmechanism. MachineLearning,1:11–46,1986. suggestthatourmethodisaviableandefficientapproachto [NasonandLaird,2005] Shelley Nason and John E. Laird. acquiring the complex and structured behaviors required for Soar-RL: Integrating reinforcement learning with Soar. realisticagentsinmoderngames. Moreover,wepointedout CognitiveSystemsResearch,6(1):51–59,2005. several avenues for extending this work and improving the qualityofthelearnedagents. Additionalstudiesarerequired, [Nejatietal.,2006] Negin Nejati, Pat Langley, and Tolga butweviewagentarchitecturesasapromisingapproachfor Konik. Learning hierarchical task networks by observa- futuredevelopment. tion. InProceedingsofthe23ndInternationalConference onMachineLearning,Pittsburgh,PA,2006.ACM. 8 Acknowledgments [Newell,1990] AllenNewell.UnifiedTheoriesofCognition. HarvardUniversityPress,Cambridge,MA,1990. This material is based in part on research sponsored by DARPA under agreement FA8750-05-2-0283. The U. S. [PearsonandLaird,2004] Douglas Pearson and John E. Governmentisauthorizedtoreproduceanddistributereprints Laird. Redux: Example-driven diagrammatic tools for for Governmental purposes notwithstanding any copyright rapidknowledgeacquisition. InProceedingsofBehavior notationthereon.Theviewsandconclusionscontainedherein Representation in Modeling and Simulation, Washington arethoseoftheauthorsandshouldnotbeinterpretedasrepre- D.C.,2004. sentingtheofficialpoliciesorendorsements,eitherexpressed [Stracuzzietal.,2009] David J. Stracuzzi, Nan Li, Gary onimplied,ofDARPAortheU.S.Government. Cleveland, and Pat Langley. Representing and reasoning over time in a symbolic cognitive architecture. In Pro- References ceedingsofthe31stAnnualMeetingoftheCognitiveSci- ence Society, Amsterdam, Netherlands, 2009. Cognitive [Anderson,1993] J. R. Anderson. Rules of the Mind. ScienceSociety,Inc. LawrenceErlbaumAssociates,Hillsdale,NJ,1993.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.