Table Of ContentLearning Hierarchical Skills for Game Agents from Video of Human Behavior
NanLi and DavidJ.Stracuzzi and GaryCleveland and PatLangley
SchoolofComputingandInformatics,ArizonaStateUniversity
TolgaKo¨nik and DanShapiro and KamalAli
ComputationalLearningLaboratory,StanfordUniversity
MatthewMolineaux DavidW.Aha
KnexusResearch,Springfield,VA NavalResearchLaboratory,Washington,DC
Abstract skills are acquired in a cumulative manner, with new skills
building on those previously learned, and are based on con-
Developing autonomous agents for computer
ceptual knowledge that includes temporal constraints. The
games is often a lengthy and expensive undertak-
learnedskillsalsoreproducetheobservedbehaviorfaithfully
ing that requires manual encoding of detailed and
withcomparableefficacyinspiteofdifferencesbetweenthe
complex knowledge. In this paper we show how
observationanddemonstrationenvironments.
toacquirehierarchicalskillsforcontrollingateam
WebeginbyintroducingtheRush2008footballsimulator,
ofsimulatedfootballplayersbyobservingvideoof
which we use to demonstrate our approach. We then briefly
collegefootballplay. Wethendemonstratethere-
reviewthe ICARUS agentarchitecture, andpresentourwork
sultsintheRush2008footballsimulator,showing
onacquiringstructureddomainknowledgefromobservedhu-
thatthelearnedskillshavehighfidelitywithrespect
man behavior within this framework. Next, we outline the
totheobservedvideoandarerobusttochangesin
video preprocessing steps, the application of ICARUS to the
theenvironment. Finally,weconcludewithdiscus-
processedvideo,andtheapplicationoftheacquiredskillsin
sionsofthisworkandofpossibleimprovements.
Rush. Finally,weconcludewithasummaryofrelatedwork
anddirectionsforfuturedevelopment.
1 Introduction
2 TheRushFootballSimulator
Constructingintelligentagentsforcomputergamesisanim-
portant aspect of game development. However, traditional Rush 2008, a research extension of Rush 20051, simulates
methods are expensive, and the resulting agents only func- playinaneightplayervariantofAmericanfootball. Forthe
tion in narrowly defined circumstances. Agent architectures remainder of the paper, we assume a basic familiarity with
[Newell,1990],whicharedesignedtomodelhuman-levelin- the rules and objectives of American football. The simula-
telligence,canhelpsimplifythistaskbyreducingtheneedfor tor encodes several types of offensive and defensive forma-
developerstoexplicitlyprovideagentswithintelligentbehav- tions, along with several distinct plays (coordinated strate-
iors.Developersneedonlyspecifystaticknowledgeaboutthe gies) for each formation. Each player is assigned a role,
domain, while the architecture constructs the necessary be- such as quarterback (QB), running back (RB), or left wide
haviorspecifications,andhandlesthereasoninganddecision receiver (LWR). Players are controlled either by assigning
makingrequiredfortheirexecution. themahigh-levelgoalsuchaspassroutecrossoutatyard15,
Recent research [Pearson and Laird, 2004; Nejati et al., whichinstructsareceivertorun15yardsdownfield,makea
2006] aims to simplify the knowledge acquisition process. hardrightturn, andthenkeeprunningtotrytocatchapass,
We expand upon this work by learning from video data, by orbyassigningspecificinstructionssuchasstrideforwardon
representing and acquiring temporal knowledge, and by re- eachclocktick.
ducing the amount of expert input required by the system. We instrumented Rush so that ICARUS can perceive all
In the following, we present a system that learns hierarchi- players and objects on the field and control the actions of
calskillsforfootballplayersbyobservingvideosofcollege eachoffensiveplayeronatick-by-tickbasis. Eachoffensive
football plays, and then executes those skills in a simulated playersharesperceptionandknowledgewiththeotheroffen-
football environment. The system takes in discrete percep- sive players, and carries out actions instructed by ICARUS.
tual traces generated from the video, analyzes these traces ExamplesofthetypesofactionsavailabletoICARUSinclude
with existing knowledge, and learns new skills that can be
appliedwhencontrollingagentsinafootballsimulator. The 1http://rush2005.sourceforge.net/
Report Documentation Page Form Approved
OMB No. 0704-0188
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,
including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington
VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it
does not display a currently valid OMB control number.
1. REPORT DATE 3. DATES COVERED
2009 2. REPORT TYPE 00-00-2009 to 00-00-2009
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Learning Hierarchical Skills for Game Agents from Video of Human
5b. GRANT NUMBER
Behavior
5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S) 5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION
Knexus Research Corp,9120 Beachway Lane,Springfield,VA,22153 REPORT NUMBER
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT
NUMBER(S)
12. DISTRIBUTION/AVAILABILITY STATEMENT
Approved for public release; distribution unlimited
13. SUPPLEMENTARY NOTES
In U. Kuter & H. Mu?oz-Avila (Eds.) Learning Structural Knowledge from Observations: Papers from the
IJCAI Workshop (Technical Report WS-09-21). Pasadena, CA: AAAI Press 2009
14. ABSTRACT
Developing autonomous agents for computer games is often a lengthy and expensive undertaking that
requires manual encoding of detailed and complex knowledge. In this paper we show how to acquire
hierarchical skills for controlling a team of simulated football players by observing video of college football
play. We then demonstrate the results in the Rush 2008 football simulator, showing that the learned skills
have high fidelity with respect to the observed video and are robust to changes in the environment. Finally,
we conclude with discussions of this work and of possible improvements.
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF
ABSTRACT OF PAGES RESPONSIBLE PERSON
a. REPORT b. ABSTRACT c. THIS PAGE Same as 6
unclassified unclassified unclassified Report (SAR)
Standard Form 298 (Rev. 8-98)
Prescribed by ANSI Std Z39-18
(throwTo <receiver>), which instructs the QB to pass to a
Table1: Sampleconceptsforthefootballdomain.
specific teammate, and (stride <direction)>, which tells a
playertoruninoneofeightdirectionsforoneclocktick.De- ;?passerdroppedback?n-stepsafterreceivingthesnap
fensiveplayersarecontrolledbythesimulator,whichselects ((dropped-back?passer?n-steps)
oneofseveralavailablestrategiesforeachplay. :relations (((snap-completed?passer?ball)?snap-startNOW)
((possession?passer?ball)?poss-start?poss-end)
((moved-distance?passer?n-stepsS)
3 TheICARUSArchitecture
?mov-start?mov-end))
Our framework forplay observation, execution andlearning :constraints((≤?snap-start?poss-start)
is the ICARUS architecture, which is an instance of a uni- (≤?mov-end?poss-end)
fied cognitive architecture [Newell, 1990]. ICARUS shares (=?mov-endNOW)))
manyfeatureswithotherarchitectureslikeSoar[Lairdetal., ;learnedstartconditionforskilldropped-back
1986] and ACT-R [Anderson, 1993], including a distinction ((scdropped-back-c1?passer)
betweenshort-termandlong-termmemories,andgoal-driven :relations (((possession?passer?ball)?pos-start?pos-end)
butreactiveexecution.Italsohasmanynovelfeaturesinclud- ((snap-completed?passer?ball)?snap-startNOW))
ingacommitmenttoseparatestorageforconceptualandskill :constraints((≤?snap-start?pos-start)))
knowledge, the indexing of skills by the goals they achieve,
and architecturalsupport for temporal knowledge [Stracuzzi
et al., 2009]. In this section, we summarize the basic as- dropped-backstatesthat?passermusthavepossessionofthe
sumptions of the framework along with its operational pro- balluntilhehasfinisheddroppingback.
cedurestoprovidebackgroundforthenewlearningmethods. ICARUS updates belief memory by matching the general-
Webeginbydescribingtheconceptualandbeliefrepresenta- ized concept definitions to percepts and existing beliefs in a
tionsalongwiththeinferenceprocess,andthendescribeskill bottom-up manner. Concepts that depend only on percepts
structuresandtheexecutionmechanism. arematchedagainsttheperceptualbufferfirst,andtheresults
are placed in belief memory. This triggers matching against
3.1 Beliefs,ConceptsandInference higher-level concept definitions. The process continues un-
tilthedeductiveclosureofperceptual,beliefandconceptual
Reasoning about the environment is a principal task per-
memoryhasbeencomputed.
formed by intelligent agents, and determines which actions
the agent must carry out to achieve its goals. ICARUS per-
3.2 Goals,SkillsandExecution
forms the inference task by matching the generalized con-
ceptualstructuresinlong-termmemoryagainstperceptsand Afterinferringasetofbeliefsaboutitsenvironment,ICARUS
beliefsstoredinshort-termmemory. Oneachcycle,anagent nextevaluatesitsskillknowledgetodeterminewhichactions
receives low-level perceptual information about its environ- to take in the environment. For this, the architecture uses
ment. Eachperceptdescribestheattributesofasingleobject a goal memory, which stores goals that the agent wants to
intheworld,andisshortlived. Forexample,theperceptsde- achieve. It then retrieves skills that are able to achieve its
rivedfromthefootballvideolastonlyforthedurationofone goalsfromlong-termskillmemory.
videoframe(1/30th ofasecond)beforebeingreplacedwith Skill memory contains a set of hierarchical skills indexed
newinformation. byconceptsdefinedintheconceptualmemory.Eachskillde-
Intelligent behavior requires more than low-level percep- scribes a method that the agent can execute to achieve the
tualinformation. ICARUSthereforeproduceshigher-levelbe- condition described by the associated concept. Each skill
liefsabouttheenvironmentbasedonpercepts. Beliefsrepre- controls only a single agent (player). Coordination among
sentrelationsamongobjectsintheenvironment,andhavetwo agentsisachievedbyrelyingonspecifictimingeventsthatare
associatedtimestampsthatindicatethebeginningandendof observedbymultipleagents. Skillsconsistofahead,which
theperiodinwhichthebeliefisknowntobetrue. Allofthe specifiesthegoalthattheagentachievesbycarryingoutthe
inferredbeliefsforthecurrentepisodeareretainedinasim- skill,asetofstartconditionsthatmustbesatisfiedtoinitiate
pleepisodicbeliefmemory,sothattheagentcanreasonabout theskill,andabodythatstatestheorderedsubgoalsthatthe
eventsovertime. agentmustachievetoreachtheskill’sgoal. Aprimitiveskill
ICARUS beliefs are instances of generalized concepts is one that refers only to actions that are directly executable
stated in a hierarchically organized, long-term conceptual intheenvironment,whileanon-primitiveskillreferstoother
memory.Eachconceptdescribesaclassofenvironmentalsit- skillsassubgoals.
uationsusingarelationallanguage;itincludesahead,which Givenskillmemoryandoneormoregoals, theexecution
consistsofapredicatewitharguments,andabody,whichde- modulefirstselectsanunsatisfied,top-levelgoal.Sinceskills
finesthesituationsunderwhichtheconceptistrue. Thebody can refer to other skills, the system next finds an applicable
hasarelationsfieldaswellasaconstraintsfield.Therelations paththroughtheskillhierarchy,whichbeginswithaskillthat
fieldspecifiesthesubconceptsonwhichtheconceptdepends can achieve the selected goal and terminates with one that
alongwiththeirassociatedtimestamps,whichcorrespondto refers to actions the agent can execute in the environment.
the timestamps on beliefs. The constraints field uses these ICARUS continues executing along this path until the goal
timestampstodescribethetemporalrelationshipsamongthe is achieved or the selected skills no longer apply, in which
subconcepts. Forexample,inTable1theconstraintsfieldof caseitmustdetermineanewskillpathfromthecurrentstate
The agent first tries to explain its observations with exist-
Table2: Sampleskillsfromfootball.
ingskillsbyretrievingallskillsthatcanachievethetop-level
;learnedskillforrunningashortreceiverpattern goal. Itthenchecksthatthestartconditionsandsubgoalsare
((short-pattern-completed?agent?drop-steps?dir) consistentwiththeagent’sobservationsandinferredbeliefs.
:start ((scshort-pattern-completed-c1?dir)) Skills not consistent with these are ignored. The agent then
:subgoals((moved?agent?dir) selectsacandidateskillandparsestheobservationsequence
(pass-blocked-until-dropped-back?agent?drop-steps) into subsequences based on the times when the start condi-
(moved-until-ball-caught?agent?dir))) tions and subgoals were achieved. The explanation process
;learnedskillforreceivingapassafterrunningashortpattern then recurses on each subsequence (corresponding to a start
((short-reception-completed?receiver?drop-steps?dir) conditionorsubgoal)untiltheentireobservationsequenceis
:start ((scshort-reception-completed-c1?ball)) explainedbyasequenceofprimitiveskills.
:subgoals((short-pattern-completed?receiver?drop-steps?dir) If ICARUS failstoexplaintheobservationsequenceusing
(ran-with-ball-until-tackled?receiver?ball))) skills, it attempts to explain it using concepts. First, it re-
trieves the belief corresponding to the goal from the belief
memory, along with lower-level beliefs that support it. The
to the selected goal. This skill selection mechanism enables agent then divides the observed sequence into subsequences
ICARUS to be persistent to its previous choice, while being basedonthesesupportinglower-levelbeliefs. Aswithskill-
reactivetotheworldifunexpectedeventsoccur. basedexplanations,theagentrecursivelyattemptstoexplain
Consider the skills shown in Table 2. The first, short- each subgoal. The observation sequence is considered suc-
pattern-completed,producesasituationinwhichaplayerini- cessfullyexplainedifeither(1)thereexistsaskillforthegoal
tially blocks while the passer completes his drop-back, and thatisapplicableatthebeginningofthetrace,inwhichcase
then runs in direction ?dir until the ball is caught. The re- thereisnoneedtolearn,or(2)allthesubgoalsofthecurrent
ceiver pattern ends if and when the ball is caught. Thus, a tracehavealreadybeensuccessfullyexplained.
goal (short-reception-completed FB 7 E) indicates that the
fullback (FB) is the intended receiver and runs a short pat- 4.2 ConstructingNewSkills
terntotheright(east)afterallowingthequarterbacktomake After successfully explaining an observation sequence, the
aseven-stepdrop. agent learns skills for the achieved goal based on the expla-
nationgeneratedinthepreviousstep. Theconstructedskills
4 LearningSkillsfromVideo then allow the agent to achieve these goals in the future un-
der similar conditions without additional help from human
The goal of this work is to learn hierarchically structured
experts. Newskillsaregeneratedintwoways,dependingon
skills from preprocessed videos in a cumulative manner and
howtheexplanationwasconstructed.
inthecontextoftemporalconstraints. Inthissection,weout-
If the explanation is based on skill knowledge, then the
lineourmethodforacquiringsuchskillsbyobservingother
subgoalsofthenewskillarethechronologicallyorderedstart
actorsastheyachievesimilargoals.Theinputtoourlearning
conditionsandsubgoalsfromtheskillsthatprovidedtheex-
algorithm consists of a goal, a set of concepts sufficient for
planation that were achieved during the observed sequence.
interpreting the observed agent’s behavior, the set of prim-
The start condition for a new skill includes the start condi-
itive skills available to the ICARUS agent plus any known tions and subgoals that were true before or during the first
non-primitiveskills,andthesequenceofobservedperceptual
cycleoftheobservedsubsequence.
states.
Iftheexplanationisbasedonconceptualknowledge,then
The algorithm runs in three steps. First, the system ob-
the learner retrieves the concept definition corresponding to
serves the entire video-based perceptual sequence on actors
the goal. The subgoals of the new skill correspond to the
achievingtheintendedgoalandinfersbeliefsabouteachstate
subconceptsthatwereachievedduringtheobservedsequence
(frame). Next,theagentexplainshowthegoalwasachieved
in chronological order. The start condition of the new skill
using existing knowledge. Finally, the algorithm constructs
includes the generalized lower-level beliefs (subgoals from
theneededskillsalongwithanysupportingknowledge,such
the explanation as in Section 4.1) that were true before or
asspecializedstartconditions,basedontheexplanationgen-
duringfirstcycleoftheobservedsubsequence.
eratedinthepreviousstep. Inthefollowing,wedescribethe
Sincetheconceptdefinitionincludesinformationaboutthe
explanationandlearningtasksindetail.
temporalrelationshipsamongitssubconcepts(andtherefore
theskill’ssubgoals),thelearnedskillshouldretainthisinfor-
4.1 ExplainingObservedBehavior
mationinitsstartcondition. Theagentthereforeconstructsa
Givenagoalandtheassociatedbeliefmemory,theagentmust startconditionconceptbyextractingthestartconditionsub-
explainhowthegoalwasachievedbytheobservedactor.Re- conceptsaswellastheirassociatedtemporalconstraintsfrom
callthatbeliefmemoryisepisodicandbasedonhierarchical the original concept definition. Note that the learned start
concepts,soitcontainssufficientinformationtodescribethe condition concepts describe both what must be true in the
entiresequenceatseverallevelsofabstraction. Explanations current state and conditions that must have held in the past
aregeneratedbyeithermatchinginferredbeliefsagainstex- toensurethattheskillapplies.
istingskillknowledge,orbymatchinginferredbeliefsagainst Examplesoflearnedconceptsandskillsinthefootballdo-
knownconceptdefinitions. mainareshowninTables1and2.Ourlearningmechanismis
run E run NE run E
run N run N run N with−ball
line of run−
scrimmage
LT LG C RG RT
LWR block block QB block block RTE RWR
run S block run E catch
block RB
RB
scramble
Figure1: Atypicalframefromthefootballvideo. throw
Figure2:DiagramofthepassplayobservedbyICARUSwith
incrementalinthesensethatconstructedconceptsandskills annotationsindicatingactionstakenbyindividualplayers.
canserveasdomainknowledgeforfuturetraceexplanation.
Thisenablesourmechanismtoacquireincreasinglycomplex
The tracking data used here is the same as used by Hess
domainknowledgeovertime.
and Fern [2009] for their work on supervised learning. The
authorsmanuallyprovidedactivitylabelsforeachplayer,and
5 DemonstrationandEvaluationinRush
used an automated technique [Hess et al., 2007] to provide
Our evaluation objectives are to assess whether the learned the player labels. From these labeled tracks, we generated
skills have both high fidelity with respect to the original theperceptionsequencefortheplay.
video, and utility similar to the observed play with respect
5.2 MappingCollegePlaysintoRush
to the game of football. To evaluate the former, we visu-
ally compared the player tracks produced by executing our Asnoted, ateaminRushconsistsofeightplayerswhilethe
learned agents in Rush to the player tracks from the source teamsfromthevideohaveelevenplayers. Wethereforemap
video. Forthelatter,wecomparedtheyardagegainedbythe theeleven-playerskillslearnedfromthevideointoskillsfor
learnedagentstoyardagegainedbyhand-constructedagents the eight Rush players. In practice, we can accomplish this
forthesameplay. simply by dropping the goals associated with three of the
Threestepswererequiredtoacquiretheskillsfromvideo players. Ideally, we would ignore the players that have the
andthenexecutetheminRush. First,therawvideowaspre- leastimpactontheplay.
processedtoproduceasequenceofICARUSperceptualstates. Figure 2 shows a play diagram for the specific play ob-
Second, we applied the learning system to construct skills servedby ICARUS duringtesting(thisinformationisnotin-
from the sequence. Finally, we mapped the learned eleven- cluded in the perceptual sequence). Rush plays typically
player skills onto an eight-player team, and executed the re- include three offensive linemen (RG, C, and LG) along
sultingskillsinRush. Belowweprovidedetailsforthispro- withsomecombinationoffourrunningbacksandreceivers.
cess,alongwiththeresults. The play shown in Figure 2 has five linemen and five
backs/receivers. To map this play into Rush, we therefore
5.1 PreprocessingtheCollegeFootballVideo
droppedtwolinemen(RTandLT)andonerunningback(RB,
OurrawvideodatawasobtainedfromtheOregonStateUni- leftside),allofwhichhadtop-levelgoalsofpass-blockingfor
versityfootballteamandcorrespondstothatusedbycoaches thedurationoftheplay. ICARUSusedthegoalsfortheeight
duringgameanalysisandplanning. Itwasshotviaapanning remainingplayerstoguideexecutioninRush.
and zooming camera that is fixed at a mid-field location on
5.3 EvaluatingPlaysinRush
top of the stadium. The video is qualitatively different than
typicaltelevisionfootageinthatthecameraoperatorattempts Thelearningtaskinthisworkistoacquirehierarchicalskills
tokeepasmuchoftheactioninviewaspossible. Atypical forallelevenoffensivefootballplayersonthefieldforagiven
shotfromourvideoisshowninFigure1. play. To evaluate the learned skills, we let ICARUS control
ForICARUStolearnfromvideo,itmustbeprocessedintoa theoffensiveplayersinafootballsimulatorusingthelearned
sequenceof ICARUS perceptions. Specifically, therepresen- skills. Ideally,theplayproducedbythelearnedskillsshould
tation should include: (1) player tracks, which give the 2D look similar to play observed in the video, while advancing
fieldcoordinateofeachplayerateachtimepointduringthe theballonthefield.
play,and(2)playerlabels,whichdescribethefunctionalrole Figures 3 and 4 show the tracks generated by each offen-
ofeachplayer(e.g. quarterback, runningback, etc), and(3) siveplayerfortheobservedandsimulatedplaysrespectively.
activitylabels,whichdescribethehigh-levelactivityofeach Bothfiguresareplottedonthesamescalehorizontally,simi-
playerthroughouttheplay. larscalesvertically,andusethesameinitialballlocation.The
downfield C RTE RWR 8.0 yd gain 0 yd gain downfield
LWR LT LG RG RT line of scrimmage 8.
FB LWR LG C RG RTE line of scrimmage
RB FB RWR
QB
QB
Figure3:TracksofplayersobservedbyICARUSinthevideo.
Figure 4: Tracks of players generated by ICARUS using
learnedskillsintheRushsimulator.
tracks generated by ICARUS represent an idealized version
of the tracks from the video. Some of the differences, such
asroundversussquareturns,derivefromthesimplephysics and patterns) to establish an average. However, we do not
used in Rush. Others, such as differences in the tracks of have such data for the observed video. Doing so would re-
the quarterback (QB) and blockers (C, LG, RG), stem from quireustocollectandanalyzevideobasedonthesameplay
the behaviors of the defensive players. For example, after from several different offensive teams against several differ-
completinga7-yarddrop(sameinbothfigures),thequarter- ent defensive teams. In practice, this type of play is usually
backsimply“scrambles”inanydirectionnecessarytoavoid aimed at gaining relatively small distances (up to 10 yards),
theoncomingdefensewhilewaitingfortheintendedreceiver suggestingthatourresultisreasonable.
(FB) to become available. This is equally true in collegiate
gamesandinthesimulator.
6 RelatedandFutureWork
Onedifferencebetweentheobservedandsimulatedplays
relates to the fullback (FB), who does not run as far to the Acquiringdomainknowledgeisanimportanttaskinbuilding
right either before or after catching the ball in the simula- intelligentagents. PearsonandLaird[2004]acquiredomain
tor. Prior to the catch, this is partly due to differences in knowledgeguidedbysubjectmatterexperts, whileNejatiet
the relative speed of the players, and partly due to timing al. [2006]learnhierarchicaltasknetworksbyobservingpre-
differences between the video (one tick equals 1/30th of a processed expert traces. Likewise, Hogg et al. [2008] de-
second)andRush(onetickequals1/10th ofasecond). Fur- scribe an algorithm that acquires hierarchical task networks
ther investigation into these timing differences may yield a from a set of traces and semantically-annotated tasks. Tak-
higherfidelityresponse. Afterthecatch,thedifferenceisdue ingadifferenttack,ForbusandHinrichs[2004]demonstrate
toinsufficientparameterizationoftherun-with-ballprimitive skilllearningbyanalogy. Noneoftheseapproachescurrently
skill. Currently, the skill causes the receiver to run directly supportlearningfromobservedbehaviororlearningcoordi-
downfield regardless of oncoming defenders. To reproduce nated multi-agent strategies. Moreover, the start conditions
theplaymorefaithfully, thisskillshouldcausetheplayerto oftheconstructedmethodsonlycheckthecurrentstateofthe
runtothemostopenpartofthefield. world,whileourapproachenablesthestartconditionstode-
A second important divergence involves the tight end scribethepaststatesoftheworld.
(RTE), who does not turn northeast in his simulated route. Other game-related efforts have aimed at reducing the ef-
A closer look at the trace data shows that the ball is caught fort required to construct game-agents. For example, Kelly
beforehemakestheturn,whichendstheskillgoverninghis et al. [2008] use offline planning with hierarchical task net-
route (run north for 17 yards, then northeast until the ball is works to generate scripts automatically. Changes in the en-
caught). This is again an artifact of timing differences be- vironment do not affect the behavior of these constructed
tweenthesimulatorandthevideo,whichhighlightstheneed agents, whereas the agents built by our system will perform
to fine-tune plays after they are learned. We return to this reactively.Asecondapproachusesreinforcementlearningal-
issueinthenextsection. gorithms[BradleyandHayes,2005;NasonandLaird,2005]
Finally, both plays produced similar yardage results, but toallowanagenttolearnthroughexperience. Ourapproach
thiswillnotalwaysbethecase.Evaluatingdifferencesinper- differs because learning is analytical and based on observed
formanceisproblematic.Althoughnotdonehere,wecanrun video rather than agent experience, which makes our ap-
theplayrepeatedlyinRushwhilevaryingthecharacteristics proach much more computationally efficient. Our approach
ofboththeoffensiveanddefensiveplayers(suchasspeedand also differs from these in that ICARUS is an integrated ar-
agility),alongwiththedefensivestrategy(suchasformation chitecturethatcombinesobservation,inference,learningand
executioninasinglesystem. [BradleyandHayes,2005] Jay Bradley and Gillian Hayes.
There are many possible avenues for future development. Group utility gunctions: Learning equilibria between
Forexample,weneedtoconductdetailedexperimentsacross groupsofagentsincomputergamesbymodifyingtherein-
manydifferentplaystoexaminethestrengthsandweaknesses forcementsignal.InProceedingsoftheIEEECongresson
of our approach. We are also extending our approach to ac- EvolutionaryComputation,pages1914–1921,Edinburgh,
quire skills for coordinating among multiple agents. Ideally UK,2005.IEEEPress.
thesystemwouldlearnknowledgeabouttimingandcooper- [ForbusandHinrichs,2004] Ken Forbus and Tom Hinrichs.
ationamongmultipleagents. Themaindifferencesbetween
Companion cognitive systems: A step towards human-
themulti-agentsystemandthecurrentsystem,wouldbethat
level AI. In Proceedings of the AAAI Fall Symposium
(1)beliefs,explanationsandskillswouldbegeneratedforall
onAchievingHuman-levelIntelligencethroughIntegrated
playersinasingleepisode,ratherthanmultiplesingle-player
Systems and Research, Washington, DC, 2004. AAAI
episodes, and (2) additional, high-level multi-player coordi-
Press.
nationskillswouldalsobeacquired. Thiswouldrequireonly
[HessandFern,2009] RobinHessandAlanFern. Discrim-
minorextensionstotheknowledgerepresentation, inference
inatively trained particle filters for complex multi-object
andexecutionmodules.
tracking. InProceedingsoftheIEEEConferenceonCom-
A second extension involves modifying the learned skills
puter Vision and Pattern Recognition, Miami, FL, 2009.
and the top-level goal parameters (such as number of yards
IEEEPress.
that a receiver runs before changing direction) by learning
withinRush.Forexample,systematicerrorssuchasreceivers [Hessetal.,2007] RobinHess,AlanFern,andEricMorten-
running into heavy coverage can be detected and revised. son. Mixture-of-partspictorialstructuresforobjectswith
This constitutes a structural revision to the learned skills. variable part sets. In Proceedings of the Eleventh IEEE
Similarly, ICARUS can modify the form of learned plays by International Conference on Computer Vision, Rio de
adjustingtheparameters(arguments)inthegoals. Forexam- Janeiro,Brazil,2007.IEEEPress.
ple,timingplaysthatrequirereceiverstorunnyardsdown- [Hoggetal.,2008] Chad Hogg, He´ctor Mun˜oz-Avila, and
fieldbeforeturningtocatchtheballcanbeadjustedbytuning
Ugur Kuter. HTN-MAKER: Learning HTNs with mini-
thevalueofn. mal additional knowledge engineering required. In Pro-
ceedingsoftheTwenty-ThirdConferenceonArtificialIn-
7 Conclusion telligence,Chicago,2008.AAAIPress.
Constructingautonomousagentsisanessentialtaskingame [Kelleyetal.,2008] John Paul Kelley, Adi Botea, and Sven
development. In this paper, we outlined a system that an- Koenig. Offlineplanningwithhierarchicaltasknetworks
alyzes preprocessed video footage of human behavior in a in video games. In Proceedings of the Fourth Artificial
collegefootballgame,constructshierarchicalskills,andthen IntelligenceandInteractiveDigitalEntertainmentConfer-
executes them in a football simulator. The learned skills in- ence,Stanford,CA,2008.AAAIPress.
corporatetemporalconstraintsandprovideforavarietyofco- [Lairdetal.,1986] JohnE.Laird, PaulS.Rosenbloom, and
ordinatedbehavioramongtheplayers. Althoughthelevelof AlanNewell.ChunkinginSoar:Theanatomyofageneral
precisioninICARUS’controlneedsimprovement,theresults learningmechanism. MachineLearning,1:11–46,1986.
suggestthatourmethodisaviableandefficientapproachto
[NasonandLaird,2005] Shelley Nason and John E. Laird.
acquiring the complex and structured behaviors required for
Soar-RL: Integrating reinforcement learning with Soar.
realisticagentsinmoderngames. Moreover,wepointedout
CognitiveSystemsResearch,6(1):51–59,2005.
several avenues for extending this work and improving the
qualityofthelearnedagents. Additionalstudiesarerequired, [Nejatietal.,2006] Negin Nejati, Pat Langley, and Tolga
butweviewagentarchitecturesasapromisingapproachfor Konik. Learning hierarchical task networks by observa-
futuredevelopment. tion. InProceedingsofthe23ndInternationalConference
onMachineLearning,Pittsburgh,PA,2006.ACM.
8 Acknowledgments [Newell,1990] AllenNewell.UnifiedTheoriesofCognition.
HarvardUniversityPress,Cambridge,MA,1990.
This material is based in part on research sponsored by
DARPA under agreement FA8750-05-2-0283. The U. S. [PearsonandLaird,2004] Douglas Pearson and John E.
Governmentisauthorizedtoreproduceanddistributereprints Laird. Redux: Example-driven diagrammatic tools for
for Governmental purposes notwithstanding any copyright rapidknowledgeacquisition. InProceedingsofBehavior
notationthereon.Theviewsandconclusionscontainedherein Representation in Modeling and Simulation, Washington
arethoseoftheauthorsandshouldnotbeinterpretedasrepre- D.C.,2004.
sentingtheofficialpoliciesorendorsements,eitherexpressed [Stracuzzietal.,2009] David J. Stracuzzi, Nan Li, Gary
onimplied,ofDARPAortheU.S.Government. Cleveland, and Pat Langley. Representing and reasoning
over time in a symbolic cognitive architecture. In Pro-
References ceedingsofthe31stAnnualMeetingoftheCognitiveSci-
ence Society, Amsterdam, Netherlands, 2009. Cognitive
[Anderson,1993] J. R. Anderson. Rules of the Mind.
ScienceSociety,Inc.
LawrenceErlbaumAssociates,Hillsdale,NJ,1993.