Probabilistic plan recognition for proactive assistant agents JeanOh FelipeMeneguzzi KatiaSycara {jeanoh,katia}@cs.cmu.edu [email protected] Abstract Human users dealing with multiple objectives in a complex environment, e.g., mili- taryplannersoremergencyresponseoperators,aresubjecttoahighlevelofcognitive workload. Whenthisloadbecomesanoverload, itcanseverelyimpairthequalityof theplanscreated.Toaddresstheseissues,intelligentassistantsystemshavebeenrigor- ouslystudiedinboththeAIandtheintelligentsystemsresearchcommunities. Inthis chapter, we discuss proactive assistant systems that predict future user activities that canbefacilitatedbytheassistant. Wefocusonproblemsinwhichauserissolvinga complexproblemwithuncertainty,andthusonplanrecognitionalgorithmssuitablefor thetargetproblemdomain. Specifically,wediscussagenerativemodelofplanrecog- nitionthatrepresentsuseractivitiesasanintegratedplanningandexecutionproblem. We describe a proactive assistant agent architecture and its applications in practical problemsincludingemergencyresponseandmilitarypeacekeepingoperations. keywords: planrecognition,proactiveassistants,intelligentagents,prognosticnorma- tiveassistance 1. Introduction Planrecognition,whichreferstothetaskofidentifyingtheuser’shigh-levelgoals (orintentions)byobservingtheuser’scurrentactivities,isacrucialcapabilityforintel- ligentassistantsystemsthatareintendedtobeincorporatedintotheuser’scomputing environment. Inthischapter,wediscusshowweuseplanrecognitiontodevelopasoftwareagent that can proactively assist human users in time stressed environments. Human users dealing with multiple objectives in a complex environment, e.g., military planners or emergencyresponseoperators,aresubjecttoahighlevelofcognitiveload. Whenthis loadisexcessive,itcanseverelyimpairthequalityoftheplansthatarecreated[28]. In order to help users focus on high-priority objectives, we develop a software assistant agent that can recognize the user’s goals and plans in order to proactively assist with tedious and time-consuming tasks, e.g., anticipating information needs or reserving resourcesaheadofuserneed. PreprintsubmittedtoElsevier April19,2013 Planrecognitionalgorithmsgenerallycenteronamodelthatdescribeshowauser behaves. Such a model can be built by collecting frequently observed sequences of useractions–e.g.,asaplanlibrary[13,11,3]. Bycontrast,agenerativeapproachcan be taken to develop a user model to represent how a user makes decisions to solve a problem–e.g., as a planning process [4, 24, 20, 21]. Choosing the right approach re- quiresanunderstandingofwherethetargetproblem’suncertaintyoriginates. Whereas aplanlibraryissuitableforrepresentingauser’sactivitiesthatmayconstitutemulti- pleunrelatedproblems–i.e., uncertaintyliesinauser’sobjectives, aplanningprocess cansuccinctlyrepresentacomplexdecision-makingprocessthatmayresultinalarge numberofvariousplans–i.e.,uncertaintyliesinanenvironment. In this chapter, we focus on the case where a user is solving a domain-specific problemthatdealswithahighlevelofcomplexityanduncertainty,e.g.,anemergency responsesystemwhereaflexibleplanismadeinadvancebuttheactualcourseofaction is dynamically determined during execution [7]. Thus, our discussion in this chapter focusesonthegenerativeapproach–usingaplannertorepresentausermodel–andhow thismodelcanbeusedinintelligentassistantsystems. Therestofthischapterisorganizedasfollows. Afterdiscussingproactiveassistant systems generally in Section 2, a generative plan recognition algorithm is described in detail in Section 3, followed by a description of how the results of plan recogni- tionareusedwithinaproactiveassistantarchitectureinSection4. Section5presents twoexamplesoffullyimplementedproactiveassistantsystems. Finally,thechapteris summarizedinSection6. 2. ProactiveAssistantAgent Asoftwareassistantsystem,likeahumanassistant,isexpectedtoperformvarious tasksonbehalfofauser.Anassistant’srolehasasetofdesiredqualificationsincluding theabilitytolearnauser’spreferences[17,22], theabilitytoassessthecurrentstate and to make rational decisions in various situations [8], and the ability to speculate on a user’s future activities so that time-consuming actions can be taken proactively [5]. Here, we focus on the assistant’s ability to make proactive decisions where plan recognitionisacrucialpart. Thecorecomponentofanintelligentassistantsystemisitsdecision-makingmod- ule. Forinstance,anagentcanmakedecisionsaccordingtoasetofprescribedrulesif completeinformationaboutitstasksisavailableapriori. Anassistant’sdecisionmak- ing can also be data-driven–i.e., an action is executed whenever its preconditions are changedasnewinformationispropagated,e.g.,aswithconstraint-basedplanners[2]. Alternatively,adecision-theoreticplannercanbeadopted,e.g.,ElectricElves[8]uses a Markov Decision Process (MDP) to develop a personal assistant, known as Friday, that determines the optimal action given various states. For instance, given an invi- tation (to which a user is supposed to respond), Friday may wait a little until its user respondsortakeanactiononbehalfoftheuseraccordingtotheexpectedrewardfor eachaction. In order to add the notion of plan recognition to an assistant’s decision-making module,apartiallyobservableMDP(POMDP)isgenerallyusedwhereauser’sgoals 2 (orintentions)areinsertedasunobservablevariables[10].Inthisapproach,planrecog- nitionistightlycoupledwithanassistant’sactionselection. Thatis,anassistantlearns anoptimalactiontotakeinresponsetoeachuserstatewithouthavinganotionofits own(agent)goalsorplanning.Inotherwords,traditional(PO)MDPapproachesmodel immediateassistantactionsinresponsetoindividualuseractions,eveniftheyimplic- itlyconsidertherewardoffutureuseractionsforthisactionselection. Thisapproach does not explicitly “look ahead” within a user’s plan nor does it consider time con- straints. For these reasons, the types of support that this approach can provide may be limited to atomic (or single) actions, such as opening a door for a user as in [10], andmaynotbesuitablefortime-consumingactionssuchasinformationprefetchingor morecomplexjobsthatrequiretheplanningofmultipleactions. Proac-ve-‐ manager Predic’on: Assignment: User needs info: Assist with info. on red by red for A (.9) @t time step t 2 2 blue for B (.1) @t 3 User plan Assistant plan red B Plan: SELECT source (deadline constraint: t) 2 A GET info. on red blue EXCERP info. I Figure1: ANTICO:aproactiveassistantagentarchitecture(anabstractview) Bycontrast,theproactiveassistantagentarchitecture,knownhereasAnytimeCog- nition (ANTICO1), separates plan recognition from the assistant’s decision-making module [19]. Figure 1 illustrates an abstract view of the architecture. Here, the user planrepresentstheassistant’sestimationofhowausermakesdecisions. Basedonthis userplan,planrecognitionisperformedtogeneratesequencesofexpecteduseractions. Theproactive-managerevaluatesthepredicteduserplantoidentifypotentialassistance needs. Here,thegeneralpurposeoftheevaluationistoidentifyasetofunmetprecon- ditions(orprerequisites)ofpredicteduseractions, butthecriteriaforevaluatinguser plansisspecifictoeachproblemdomain,forinstance,identifyinginformationneeded toexecutecertainactions[19,15]ordetectingpotentialnormviolations[21]. Theset ofidentifiedassistanceneedsislabeledasnewtasksfortheassistantandispassedto theassistant’splanningmodule. TheANTICOarchitecturealsosupportsthenotionofanassistant’sgoalsandplan- ning similar to Friday’s planner in Electric Elves [8]. Whereas Friday’s actions are triggered upon the receipt of a new request, ANTICO determines a set of assistive 1Inearlierwork,aninstanceofANTICOisreferredtoasANTicipatoryInformationandPlanningAgent (ANTIPA). 3 tasksaccordingtoitspredictionofuserneeds. Figure 1 includes a simple example, which can be seen within the dotted lines. By evaluating the user plan, it is predicted with .9 probability that heading toward areaAsuggestsinformationabouttheredzoneisneeded.Thepredictionalsosuggests informationaboutthebluezoneisneeded,buttheneedhasalowprobability,sothatthe requirementhasbeenpruned.Theproactivemanagerassignstheassistantanewgoalof acquiringredzoneinformation. Notethatadeadlinetimeconstraintisimposedonthis assignment,astheuserwillneedthisinformationbytimestept . Theassistantplans 2 and schedules necessary resources to acquire the needed information. For instance, theassistantfirstselectsaninformationsourcefromwhichtheneededinformationcan beretrievedbeforethedeadline. Aftertheassistantretrievestheinformationfromthe selectedsource,adatapost-processingactioncanbetakentoexcerpttheinformation forausertoparsequickly. Theinformationthathasbeenpreparedispassedbackto theproactivemanagertobepresentedtotheuserwhenneeded. Disengaginganassistant’splanningfromitsuser’splanninghasseveraladvantages overapproachesbasedontightcoupling. First,thesizeofthestatespaceisexponen- tially reduced as follows. Let us define a user’s planning space in terms of a set of variables,whereasubsetofthosevariablescanbedelegatedtoanassistant. Thesize ofstatespacegrowsexponentiallyinthenumberofvariables. Letuandadenotethe number of user variables and assistant variables, respectively. Without loss of gener- ality, weaddtwosimplifyingassumptionsthatuserandagentvariablesareexclusive and that the domain size for all variables is a constant d. Then, the size of the state space where these variables are tightly coupled is du+a whereas that of the detached approachisdu+da. The ANTICO architecture has been flexibly applied to two types of information assistants[19,15]andtoanassistantthatsupportshumansincomplyingwithorgani- zationalnorms[21],whichwillbedescribedfurtherinSection5. 3. Probabilisticplanrecognition In this section, we describe a generative approach to plan recognition [19] and discussrelatedwork. 3.1. Planrecognitionasplanning The idea of using AI planning for plan recognition has been gaining interest in variousfieldsincludingcognitivescience,machinelearning,andAIplanning. In cognitive science, Baker et al. [4] used a set of Markov Decision Processes (MDPs) to model how a human observer makes predictions when observing other agents’activities. TheirresultsshowthattheMDPframeworkresembleshowhumans make predictions in experiments where human subjects were asked to recognize the goalofananimatedagent. Theideaofplanrecognitionasplanningisalsocloselyrelatedtothenotionofin- verseoptimalcontrolinMarkovDecisionProcess(MDP)basedplanners[26]. Inverse optimal control, which refers to the task of recovering a cost function from observed optimal behavior, has been studied under various names including inverse reinforce- ment learning [18], apprenticeship learning [1], and imitation learning [29]. These 4 algorithmsfocusonthelearningofhiddencostfunctions(asopposedtousingprede- terminedcostfunctions),andhavebeenspecificallydesignedfortheMDPframework. AseriesofworkbyRam´ırezandGeffnercontributestobridgingAIplanningand goalrecognition,establishingthenotionofplanrecognitionasplanning. Becausetheir mainobjectiveistoidentifyauser’sgoals,itismoreappropriatetorefertotheirworks as “goal recognition.” Their initial work used classical planners for goal recognition [24]. In this work, goal prediction worked only when the observed actions precisely matched an expected sequence of actions. In order to overcome this drawback, they adopted a probabilistic model to address uncertainty [24]. This framework has also beenappliedtothePartiallyObservableMDP(POMDP)framework[25]. Inthefollowingsubsections,wedescribeaprobabilisticplanrecognitionalgorithm presentedin[20,21]. 3.2. RepresentingauserplanasanMDP WeuseanMDP[6]torepresentauser’sdecision-makingmodel.AnMDPisarich frameworkthatcanrepresentvariousreal-lifeproblemsinvolvinguncertainty. TheuseofanMDPtorepresentauserplanisjustifiedfortheproblemdomainof ourinterestinwhichusersarestronglymotivatedtoaccomplishasetofgoalsthatare clearly defined. Thus, we can assume that a user is executing a sequence of planned actions;thatis,theuserhasplannedtheobservedactions. Forinstance,inemergency responsesituations,everymajorgovernmentalorganizationhasasetofemergencyop- erationsplans(EOP)thathasbeencreatedinadvance. TheEOPprovidesafoundation for the creation of specific plans to respond to the actual details of a particular event [7]. In order to model the user’s planning process we consider an AI planner so that wecangenerateasetofalternativeplansbysolvingauser’splanningproblem. Atthe sametime, weneedamodelthatcancapturethenon-deterministicnatureofreal-life applications. SinceanMDPisastochasticplanneritsuitsbothofourpurposes. ThroughoutthepaperweuseDefinition1torefertoanMDP.Wenotethatthedis- countfactorγinDefinition1isanoptionalcomponentusedtoensurethattheBellman equationsconvergeininfinitehorizon. Whenthediscountfactorisnotspecified,itis assumedtobe1. Moreover,giventhemultipleequivalentwaystorendertheequations thatsolveMDPs,inthischapter,weusethepresentationstyleof[27,Chapter17]for clarity. Definition1(MDP). A Markov Decision Process (MDP) is represented as a tuple (cid:104)S,A,r,T,γ(cid:105), where S denotesasetofstates; A, asetofactions; r : S ×A → R, afunctionspecifyingarewardoftakinganactioninastate;T : S ×A×S → R,a statetransitionfunction;andγ,adiscountfactorindicatingthatarewardreceivedin the future is worth less than an immediate reward. Solving an MDP generally refers to a search for a policy that maps each state to an optimal action with respect to a discountedlong-termexpectedreward. Withoutlossofgeneralityweassumethattherewardfunctionr(s)canbegivensuch 5 that each individual state yields a reward when the agent reaches it.2 Although the MDPliteraturesometimesreferstoagoalstateasbeinganabsorbingorterminalstate, that is, a state s with T(s(cid:48)|s,a) = 0 for all a and for all s(cid:48) in S except the current states,thatisastatewithnopossibilityofleaving,wemeanagoalstatetobeastate with a positive reward, that is any state s with r(s) > 0. Note that satisfying time constraintsisimperativeinthetargetproblemdomain,i.e.,actionsmustbetakenina timelymanner,e.g.,inanemergencyresponsecase. Here,discountfactorγ isusedto managetimeconstraintsinaplanner,specifyingthatarewardisdecayedasafunction oftime. Definition2(ValueIteration). GivenanMDP,denotedbyatuple(cid:104)S,A,r,T,γ(cid:105),the valueofstates,denotedbyV(s),canbedefinedasthediscountedlong-termexpected rewardwhenstartingfromstatesandtakingthebestactionthereafter,whichisknown astheBellmanequationasfollows: (cid:88) V(s)=max[r(s,a)+γ V(s(cid:48))T(s(cid:48)|s,a)]. a∈A s(cid:48)∈S The value iteration algorithm initializes the values of states with some value (e.g., an arbitrary constant), and iteratively updates values V(s) for all states until they converge. Thealgorithmisguaranteedtoconvergewhen0 < γ < 1. ValueIteration computesadeterministicpolicybyselectinganoptimalactionineachstateasfollows: (cid:88) π(s)=argmax[r(s,a)+γ V(s(cid:48))T(s(cid:48)|s,a)]. a∈A s(cid:48)∈S Inadditiontooptimalactions,therecanbe“good”actionswhoseexpectedvalues come close to the optimum. It would be too naive for an assistant to assume that a human user will always choose the optimal action. In Definition 3, instead of com- puting a deterministic policy as in Definition 2, we compute a stochastic policy that, insteadofselectinganoptimalaction, ascribesprobabilityπ(s,a)ofselectingaction ainstatesaccordingtotheexpectedvalueoftakingactiona. Thispolicyexpresses theprobabilitywithwhichanimperfectdecision-makerselectsanaction,basedonthe actualperfectlyrationalchoice. Thisstochasticpolicyallowstheassistanttoprepare forawiderrangeofuseractionsthatarelikelytobechoseninreality. Asimilaridea ofcomputingastochasticpolicyfromValueIterationcanbefoundin[29]. Definition3(ValueIterationforastochasticpolicy). Let a ∈ A be an action and s,s(cid:48) ∈Sbestates,wedefineastochasticpolicyπ(s,a)denotingtheprobabilityofse- lectingactionainstates.Thisprobabilityiscomputedasaproportionofthemaximum expectedrewardofselectingactionainstates,suchthat: (cid:88) π(s,a)∝[r(s,a)+γ V(s(cid:48))T(s(cid:48)|s,a)]. s(cid:48)∈S 2Itistrivialtoseethatr(s,a)=(cid:88)T(s(cid:48)|s,a)r(s(cid:48)). s(cid:48) 6 Algorithm1: predictFuturePlan input :asetofgoalsG,asetofpoliciesΦ,asequenceofobservationsO output:predictedplanTree planTreet←createNewTree() noden←getRootNode(t) states←getCurrentState() foreachgoalg ∈Gdo policyπ ←getPolicyForGoal(Φ,g) g weightw ←p(g|O )/*Equation(1)*/ g t buildPlanTree(t,n,π ,s,w ,0) g g LetΦ = (cid:104)S,A,r,T,γ(cid:105)denoteanMDPrepresentingtheuser’splanningproblem. TheplanrecognitionalgorithmshowninAlgorithm1isatwo-stepprocess. Theagent firstestimateswhichgoalstheuseristryingtoaccomplishandthenpredictsasequence ofpossibleplanstepsthattheuserismostlikelytotaketoachievethosegoals. 3.3. Goalrecognition In the first step, the algorithm estimates a probability distribution over a set of possible goals. We use a Bayesian approach that assigns a probability mass to each goalaccordingtohowwellaseriesofobserveduseractionsismatchedwiththeoptimal plantowardthegoal. WedefinesetGofpossiblegoalstatesasallstateswithpositiverewardssuchthat G ⊆ S and r(g) > 0,∀g ∈ G. The algorithm initializes the probability distribution overthesetGofpossiblegoals,denotedbyp(g)foreachgoalg inG,proportionally (cid:80) to the reward r(g): such that p(g) = 1 and p(g) ∝ r(g). The algorithm then g∈G computesanoptimalpolicyπ foreverygoalg inG,consideringthepositivereward g onlyfromthespecifiedgoalstategandzerorewardsfromanyotherstatess∈S∧s(cid:54)= g. We use the variation of the value iteration algorithm described in Definition 3 for computinganoptimalpolicy. For each potential goal g ∈ G, the algorithm computes a goal-specific policy π g toachievegoalg. Followingtheassumptionthattheuseractsmoreorlessrationally, thispolicycanbecomputedbysolvingtheMDPtomaximizethelong-termexpected rewards. Insteadofadeterministicpolicythatspecifiesonlythebestactionthatresults inthemaximumreward,wecomputeastochasticpolicysuchthatprobabilityp(a|s,g) oftakingactionagivenstateawhenpursuinggoalg isproportionaltoitslong-term expectedvaluev(s,a,g): p(a|s,g)∝β v(s,a,g), whereβisanormalizingconstant. Notethatthisstepofcomputingoptimalpoliciesis performedonlyonceandcanbedoneoff-line,andtheresultingpoliciesarealsoused inthesecondstep,aswillbedescribedinSection3.4. Let O = s ,a ,s ,a ,...,s denote a sequence of observed states and actions t 1 1 2 2 t fromtimesteps1throughtwheres ∈S,a ∈A,∀t(cid:48) ∈{1,...,t}.Here,theassistant t(cid:48) t(cid:48) agentmustestimatetheuser’stargetedgoals. 7 After observing a sequence of user states and actions, the assistant agent updates theconditionalprobabilityp(g|O )thattheuserispursuinggoalggiventhesequence t of observations O . The conditional probability p(g|O ) can be rewritten using the t t Bayesruleas: p(s ,a ,...,s |g)p(g) p(g|O ) = 1 1 t . (1) t (cid:80) p(s ,a ,...,s |g(cid:48))p(g(cid:48)) g(cid:48)∈G 1 1 t By applying the chain rule, we can write the conditional probability of observing thesequenceofstatesandactionsgivengoalgas: p(s ,a ,...,s |g) = p(s |g)p(a |s ,g)p(s |s ,a ,g) 1 1 t 1 1 1 2 1 1 ... p(s |s ,a ,...,s ,g). t t−1 t−1 1 By the MDP problem definition, the state transition probability is independent of thegoals. BytheMarkovassumption,thestatetransitionprobabilityisalsoindepen- dentofanypaststatesexceptthecurrentstate,andtheuser’sactionselectiondepends onlyonthecurrentstateandthespecificgoal.Byusingtheseconditionalindependence relationships,weget: p(s ,a ,...,s |g) = p(s )p(a |s ,g)p(s |s ,a ) 1 1 t 1 1 1 2 1 1 ... p(s |s ,a ), (2) t t−1 t−1 wheretheprobabilityp(a|s,g)representstheuser’sstochasticpolicyπ (s,a)for g selectingactionafromstatesgivengoalg,whichwascomputedduringtheinitializa- tionstep. BycombiningEquation1and2,theconditionalprobabilityofagoalgivenaseries ofobservationscanbeobtained. Weusethisconditionalprobabilitytoassignweights whenconstructingapredictedplan-treeinthenextstep. The algorithmic complexity of solving an MDP using value iteration is quadratic inthenumberofstatesandlinearinthenumberofactions. Here,theoptimalpolicies forcandidategoalscanbeprecomputedoff-line. Thus,theactualruntimecomplexity of our goal recognition algorithm is linear in the number of candidate goals and the numberofobservations. 3.4. Planprediction Based on the predicted goals from the first step, we now generate a set of possi- blescenariosthattheuserwillfollow. Recallthatwesolvedtheuser’sMDPΦtoget stochastic policies for each potential goal. The intuition for using a stochastic pol- icy is toallow the agent to explore multiplelikely plan paths in parallel, relaxing the assumptionthattheuseralwaysactstomaximizeherexpectedreward. UsingtheMDPmodelandthesetofstochasticpolicies,wesampleatreeofmost likelysequencesofuseractionsandresultingstatesfromtheuser’scurrentstate,known 8 Algorithm2: buildPlanTree input :planTreet,noden,policyπ,states,weightw,deadlined output:predictedplanTree foreachactiona∈Ado weightw(cid:48) ←π(s,a)w ifw(cid:48) >thresholdθthen s(cid:48) ←sampleNextState(states,actiona) noden(cid:48) ←createNewNode(s(cid:48),w(cid:48),d) addChild(n,n(cid:48)) buildPlanTree(t,n(cid:48),π,s(cid:48),w(cid:48),d+1) here as a plan-tree. In a predicted plan-tree, a node contains the resulting state from takingapredicteduseraction,associatedwiththefollowingtwofeatures: priorityand deadline. We compute the priority of a node from the probability representing the agent’sbeliefthattheuserwillselecttheactioninthefuture;thatis,theagentassigns higher priorities to assist those actions that are more likely to be taken by the user. On the other hand, the deadline indicates the predicted time step when the user will executetheaction;thatis,theagentmustprepareassistancebeforeacertaintimepoint bywhichtimetheuserwillneedhelp. The algorithm builds a plan-tree by traversing the actions that, according to the policygeneratedfromtheMDPusermodel, theuserismostlikelytoselectfromthe currentstate. First,thealgorithmcreatesarootnodewithprobability1withnoaction attached. Then, according to the MDP policy, likely actions are sampled such that the algorithm assigns higher priorities to those actions that lead to a better state with respecttotheuser’splanningobjective.Notethatthealgorithmaddsanewnodeforan actiononlyiftheagent’sbeliefabouttheuser’sselectingtheactionishigherthansome thresholdθ; actionsareprunedotherwise. Notethattheassistantmayprepareforall possibleoutcomesiftheproblemspaceismanageablysmall;however,resourcessuch astime,CPU,andnetworkbandwidtharegenerallyconstrainedanditisthusnecessary toprioritizeassistivetasksaccordingtopredictedneeds. The recursive process of predicting and constructing a plan tree from a state is described in Algorithm 2. The algorithmic complexity of plan generation is linear in thenumberofactions. Theresultingplan-treerepresentsahorizonofsampledactions andtheirresultingstatesforwhichtheagentcanprepareappropriateassistance. 4. Planrecognitionwithinaproactiveassistantsystem This section describes how the predicted plan-tree from Section 3 fits inside the ANTICO architecture shown in Figure 1. ANTICO is a scalable model where the assistant agent dynamically plans and executes a series of actions to manage a set of currenttasksastheyarise. 9 4.1. Evaluatingpredicteduserplan Afterauserplanispredictedthroughaprocessofplanrecognition, theproactive manager evaluates each node in a predicted plan-tree according to domain-specific criteria. For example, if a user is solving a maze game that requires a password to enteracertainroom, thepasswordsinthepredicteduserpathareidentifiedasunmet requirements[19]. Auserplancanalsobeevaluatedaccordingtoasetofregulatory rulessuchassocialnorms. Inthiscase, anypotentialnormviolationinthepredicted userplangivesrisetoaneedforassistance[21]. Theevaluationofauserplanresultsinasetofnewtasksfortheassistant,e.g.,ac- quiringnecessaryinformationorresolvingnormviolationstorestorenormativestates. Sincetheevaluationoftheuserplanisnotthefocusofthischapterwereferreadersto relatedworkforfurtherdetail[21]. 4.2. Assistiveplanning In ANTICO, the assistant is essentially a planning agent that can plan its actions toaccomplishaspecifiedgoal. Theproactivemanagerformulatesanassistivetaskin termsoftheassistant’sinitialstateanditsgoalstate. Thearchitectureisnotboundtoanyspecifictypeofplanners,e.g.,aclassicalplan- ner may be used. Recall that a predicted user plan from the plan recognizer imposes deadlineconstraints(specifiedasthenodedepth)ontheagent’splanning. AnMDPis a preferred choice not only because it is consistent with the user plan model but also becausethediscountfactorcanbeusedtoimplementad-hocdeadlineconstraints. A deadline constraint is utilized to determine the horizon for an MDP plan solver, such thatthe agentplanner cancompletethe tasktosatisfy thedeadline. For moreprinci- pledtimeconstraintmanagement,integratedplanningandresourceschedulingcanbe considered. The planning problem formulated by the proactive manager may not always be solvable; for instance, the goal state may only be accomplished by modifying those variablesthattheassistantcannotaccess,ornoneoftheassistant’sactionshaveeffects that can lead to the specified goal state. In such cases, the assistant notifies the user immediately so that the user can take appropriate action on her own. Otherwise, the assistant starts executing its actions according to the optimal policy until it reaches a goalstate. 4.3. Cognitivelyalignedplanexecution Execution of an agent action may change one or more variables. For each newly generatedplan(orapolicy)fromtheplannermodule,anexecutoriscreatedasanew thread.Anexecutorwaitsforasignalfromthevariableobserverthatmonitorschanges intheenvironmentvariablestodeterminetheagent’scurrentstate.Whenanewstateis observedthevariableobservernotifiestheplanexecutortowakeup. Theplanexecutor thenselectsanoptimalactioninthecurrentstateaccordingtothepolicyandexecutes the action. After taking an action, the plan executor is resumed to wait for a new signalfromthevariableobserver. Iftheobservedstateisanabsorbingstate,thenplan executionisterminated;otherwiseanoptimalactionisexecutedfromthenewstate. 10
Description: