ArtificialIntelligence126(2001)139–157 Monitoring and control of anytime algorithms: A dynamic programming approach EricA.Hansena,∗, ShlomoZilbersteinb aComputerScienceDepartment,MississippiStateUniversity,Mississippi,MS 39762,USA bComputerScienceDepartment,UniversityofMassachusetts,Amherst,MA 01002,USA Abstract Anytimealgorithmsofferatradeoffbetweensolutionqualityandcomputationtimethathasproved usefulinsolvingtime-criticalproblemssuchasplanningandscheduling,beliefnetworkevaluation, and information gathering. Toexploit thistradeoff, a system must be able todecide when to stop deliberation and act on the currently available solution. This paper analyzes the characteristics of existingtechniquesformeta-levelcontrolofanytimealgorithmsanddevelopsanewframeworkfor monitoringandcontrol.Thenewframeworkhandleseffectivelytheuncertaintyassociatedwiththe algorithm’s performance profile, the uncertainty associated with the domain of operation, and the costofmonitoringprogress.Theresultisanefficientnon-myopicsolutiontothemeta-levelcontrol problemforanytimealgorithms.2001ElsevierScienceB.V.Allrightsreserved. Keywords:Anytimealgorithms;Flexiblecomputation;Meta-levelcontrol;Monitoring;Resource-bounded reasoning;Real-timedeliberation 1. Introduction From its early days, the field of artificial intelligence (AI) has searched for useful techniques for coping with the computational complexity of decision making. An early advocateoftheviewthatdecisionmakers(bothhumanandartificial)shouldforgoperfect rationality in favor of limited, economical reasoning was Herbert Simon. In 1958, he claimed that “the global optimization problem is to find the least-cost or best-return decision,netof computationalcosts” [32].The statistician I.J. Goodadvocateda similar approachtodecisionmakingthattakesdeliberationcostsintoaccount[9]. ∗ Correspondingauthor. E-mailaddress:[email protected](E.A.Hansen). 0004-3702/01/$–seefrontmatter 2001ElsevierScienceB.V.Allrightsreserved. PII:S0004-3702(00)00068-0 140 E.A.Hansen,S.Zilberstein/ArtificialIntelligence126(2001)139–157 Bythemid1980s,AIresearchersbegantoformalizetheseideasandproduceeffective models for trading off computational resources for quality of results (e.g., [19]). At the forefront of these efforts were two, essentially identical, models called “anytime algorithms”(developedbyDeanandBoddy[2,4])and“flexiblecomputation”(developed byHorvitz[13,15,17]).We adoptthe phrase “anytimealgorithm”in thispaper,although ourworkisequallyinfluencedbybothmodels. The defining property of an anytime algorithm is that it can be stopped at any time to provide a solution, and the quality of the solution increases with computation time. This property allows a tradeoff between computation time and solution quality, making itpossibletocomputeapproximatesolutionstocomplexproblemsundertimeconstraints. Anytimealgorithmsarebeingusedincreasinglyinarangeofpracticaldomainsthatinclude planningand scheduling[2,38],belief network and influence diagramevaluation[12,16, 36],databasequeryprocessing[31,33],andinformationgathering[10].Byitself,however, ananytimealgorithmdoesnotprovideacompletesolutiontoSimon’schallengetomake thebest-returndecision,netof computationalcosts. To achievethis, a meta-levelcontrol procedureinneededthatdetermineshowlongtoruntheanytimealgorithm,andwhento stopandactonthecurrentlyavailablesolution. Horvitz developed an approach to meta-level control of computation based on the expectedvalueofcomputation(EVC)[17].Theexpectedvalueofcomputationisdefinedas theexpectedimprovementindecisionqualitythatresultsfromperformingacomputation, takingintoaccountcomputationalcosts.(Theconceptoftheexpectedvalueofcomputation is related to the concept of the expected value of information, although the two are not identical[22].)The meta-levelcontrolproblemforanytimealgorithmsis the problemof determiningthestoppingtimeforananytimealgorithmthatoptimizestheexpectedvalue ofcomputation. Meta-level control of an anytime algorithm can be approached in two different ways. One approach is to allocate the algorithm’s running time before it starts [2,13]. If there is little or no uncertainty about the rate of improvement of solution quality, or about how the urgency for a solution might change after the start of the algorithm, then this approachworkswell.Veryoften,however,thereisuncertaintyaboutoneorboth.ForAI problem-solvingin particular, variance in the rate at which solution quality improves is common[24].Becausethebeststoppingtimewillvarywithfluctuationsinthealgorithm’s performance(and/orthestateoftheenvironment),asecondapproachtometa-levelcontrol is to monitor the progress of the algorithm (and/or the state of the environment) and determineatrun-timewhentostopdeliberationandactonthecurrentlyavailablesolution [3,17,37]. The optimal time allocation for an anytime algorithm depends on several factors: the qualityoftheavailablesolution,theprospectforfurtherimprovementinsolutionquality, thecurrenttime,thecostofdelayinaction,thecurrentstate oftheenvironment,andthe prospectforfurtherchangeintheenvironment.Inthispaper,wedevelopaframeworkfor run-timemonitoringandcontrolofanytimealgorithmsthattakesintoaccountthesevarious factors.Thesolutiondiffersfrompreviouslydevelopedapproachestothisproblemthatrely on a myopic estimate of the value of computation. We formalize the meta-level control problem as a sequential decision problem that can be solved by dynamic programming, in order to construct a non-myopic solution. Dynamic programming has been used E.A.Hansen,S.Zilberstein/ArtificialIntelligence126(2001)139–157 141 before for meta-level control of computation. Einav and Fehling [7] and Russell and Subramanian[27]usedynamicprogrammingtoscheduleasequenceofsolutionmethods for a real-time decision-makingproblem;and Zilberstein, Charpillet and Chassaing [40] usedynamicprogrammingtoscheduleasequenceofofcontractalgorithmstocreatethe best interruptible system given a stochastic deadline. The framework developed in this paperusesdynamicprogrammingtodeterminethestoppingtimeofanytimealgorithm,by monitoringitsprogress. The paper is organized as follows. Section 2 briefly reviews a framework for meta- level control of anytime algorithms that does not use monitoring. The rest of the paper extendsthisframeworktoincluderun-timemonitoring.Section3beginswithareviewof amyopicapproachtorun-timemonitoringandmeta-levelcontrolthatisbasedonthework ofHorvitz[17]andRussellandWefald[30].Thenitintroducesanon-myopicframework thatdeterminesnotonlywhentostopananytimealgorithm,butatwhatintervalstomonitor its progressand re-assess whether to continueor stop. Section 4 extendsthis framework toincludesituationsinwhichthestateoftheenvironmentmustbemonitored.Thepaper concludeswithadiscussionofsomeextensionsoftheframeworkandfuturework. 2. Meta-levelcontrolwithoutmonitoring Webeginbybrieflyreviewingaframeworkformeta-levelcontrolofanytimealgorithms that does not use run-time monitoring. This framework provides a foundation for the extensionsintroducedintherestofthepaper. Aframeworkformeta-levelcontrolofanytimealgorithmsrequiresamodelofhowthe qualityofasolutionproducedbythealgorithmincreaseswith computationtime,aswell asamodelofthetime-dependentutilityofasolution.Thephraseperformanceprofilewas introducedbyDeanand Boddy[4]to referto a modelof theperformanceof ananytime algorithmthatspecifiesexpectedqualityasafunctionofcomputationtime.Horvitz[13,17] introducedamodelthatcharacterizesuncertaintyaboutanalgorithm’sperformanceusing probabilities, and we follow Zilberstein and Russell [39] in calling this a probabilistic performanceprofile. Definition 1. A probabilistic performance profile of an anytime algorithm, Pr(q |t), j denotes the probability of getting a solution of quality q by running the algorithm for j t timeunits. Althoughitissometimespossibletorepresentaperformanceprofilebyaparameterized continuous function [4,15], a convenient and widely-used representation is a table of discretevaluesandthisistherepresentationweassumeinthispaper.1 Computationtimeis discretizedintoafinitenumberoftimesteps,t ...t ,wheret representsthestartingtime 0 n 0 ofthealgorithmandt itsmaximumrunningtime.Similarly,solutionqualityisdiscretized n 1A potential advantage of generalizing our results to a functional (rather than tabular) representation of a performance profile is that a functional representation may be more compact, and may be constructed more simply. 142 E.A.Hansen,S.Zilberstein/ArtificialIntelligence126(2001)139–157 into a finite number of levels, q ...q , where q is the lowest quality level and q is 0 m 0 m the highest quality level. The fineness of the discretization fixes a tradeoff between the accuracyoftheperformanceprofileandthespaceneededtostoreit.Thevaluesdefining the performance profile are collected by statistical analysis of the performance of the algorithm. In addition to a performance profile that models the behavior of an algorithm, meta- levelcontrolrequiresamodelofthetime-dependentutilityofasolution[2,18].Fornow, weassumethatutilitydependsonlyonthequalityofasolutionandcomputationtime.(In Section4,weconsideramoregeneralmodelinwhichutilityalsodependsonthestateof adynamicenvironment.) Definition2. Atime-dependentutilityfunction,U(q,t),representstheutilityofasolution ofqualityq attimet. It is often possible to simplify this functionby treating it as the sum of two functions thatHorvitz[14]callsobject-levelutilityandinference-relatedutility.Object-levelutility represents the utility of a solution without regard to the costs associated with its computation,andinference-relatedutilityrepresentsthesecomputationalcosts.Weadopt thefollowingterminologyofRussellandWefald[29]torefertothisdistinction. Definition3. Atime-dependentutilityfunctioniscalledtime-separableiftheutilityofa solution of quality q at time t, U(q,t), can be expressed as the difference between two functions, U(q,t)=U (q)−U (t), I C whereU (q)iscalledtheintrinsicvaluefunctionandU (t)iscalledthecostoftime. I C Given an anytime algorithm, its performance profile, and a time-dependent utility function, the meta-level control problem is the problem of deciding when to stop the algorithmandact onthe currentlyavailable solution.Oneapproachto this problemis to determinethealgorithm’srunningtimebeforeitstarts[2,13].Becausethetimeallocation isnotrevisedafterthestartofthealgorithm,wecallthisafixedallocationapproach. Definition 4. Given a time-dependent utility function and a probabilistic performance ∗ profile,anoptimalfixedallocation,t ,isdefinedby: (cid:1) t∗=argmax Pr(q |t)U(q ,t). i i t i Fig. 1 illustrates the selection of an optimal, fixed time allocation for the meta-level controlproblem.Thefiguredoesnotillustrateuncertaintyabouttherateofimprovementof solutionqualityoraboutthecostoftime,however.Thisframeworkformeta-levelcontrol isbestsuitedforproblemsforwhichthereislittleornouncertaintyabouteither.Therestof thepaperdiscusseshowtoextendthisframeworktocompensateforuncertaintybyusing run-timemonitoring. E.A.Hansen,S.Zilberstein/ArtificialIntelligence126(2001)139–157 143 Fig.1.Atypicalviewofanytimealgorithms[5,17,30]. 3. Monitoringtheprogressofcomputation Inthissection,wedescribehowtogeneralizetheframeworkintroducedintheprevious section to incorporate run-time monitoring of the progress of computation. If there is uncertainty about the progress of an anytime algorithm, monitoring the algorithm’s progress and revising the initial time allocation when progress is faster or slower than expectedcanimprovetheutilityofthesystem.Thisgeneralizationrequiresamorecomplex performance profile that allows prediction of the progress of an algorithm to be revised based on intermediate results. It also presents a more complex optimization problem; insteadofasingleonce-and-for-alldecisionabouthowlongtorunanalgorithm,asequence ofdecisionsaboutwhetherthecontinueorstopthealgorithmisrequired. 3.1. Dynamicperformanceprofiles A performance profile that predicts solution quality as a function of an algorithm’s overall running time is suitable for making a one-time decision about how long to run an algorithm, before the algorithm starts. To take advantage of information gathered by monitoringtheprogressofananytimealgorithm,amoreinformativeperformanceprofile is needed that conditions predicted improvement of solution quality on features of the currentlyavailablesolutionthatcanbemonitoredatruntime.Thesefeaturescanbedefined broadlyenoughtoincludetheentirestate ofthecomputation.However,itisuseful(and, weclaim,usuallysufficient)toconditionarun-timepredictionoffurtherimprovementon the quality of the currently available solution. We define a dynamic performance profile accordingly. Definition 5. A dynamic performance profile of an anytime algorithm, Pr(q |q ,(cid:14)t), j i denotes the probability of getting a solution of quality q by continuing the algorithm j fortimeinterval(cid:14)t whenthecurrentlyavailablesolutionhasqualityq . i The assumption that further improvement of solution quality depends only on the quality of the currently available solution is equivalentto assuming that solution quality 144 E.A.Hansen,S.Zilberstein/ArtificialIntelligence126(2001)139–157 satisfies the Markovpropertyfor this predictionproblem.For many anytimealgorithms, conditioninga predictionof futureimprovementonthe qualityof the currentlyavailable solutionworkswell—particularlyifsolutionqualityiseasilydeterminedatrun-time.But insomecases,thequalityofthecurrentlyavailablesolutionmaynotbethebest(oronly) predictor of likely improvement.In other cases, it may not be possible to determine the qualityofthecurrentlyavailablesolutionatrun-time. Whetheritiseasytodeterminesolutionqualityatrun-time,orpredictimprovementin qualitybased on the quality of the currentlyavailable solution,dependson how solution qualityis defined.Forexample,if solutionqualityisdefinedasthe valueofanobjective functionthatisiterativelyimprovedbyanoptimizationalgorithm,solutionqualitycanbe easilydeterminedatrun-time.Inthecaseofthetravelingsalesmanproblem,forexample, solution quality defined in this way is simply the length of the currently available tour. However,thismaynotbethebestpredictorofsubsequentimprovementinsolutionquality. The optimal value of an objective function usually differs from one problem instance to thenext.Forexample,theminimumtourlengthforoneinstanceofthetravelingsalesman problemmaybe100,whileforanotherinstanceofthetravelingsalesmanproblemitmay be 250. Solution quality defined as tour length will notdistinguish these cases, and thus willnotbeagoodpredictorofsubsequentimprovement.Abetterapproachwouldtakeinto accounthowfarthecurrenttourlengthisfromtheoptimaltourlength. For this reason, solution quality is often defined as the differencebetween the current value of an objective function and its optimal value. For cost-minimization problems, solutionqualityiscustomarilydefinedasthe“approximationratio”: Cost(Approximatesolution)/Cost(Optimalsolution) The inverse ratio is used for value-maximization problems. This definition of solution quality makes possible general claims about the performanceof an algorithm on a class ofprobleminstances.However,definingsolutionqualityinthiswayposesaproblemfor run-timemonitoring:assessingsolutionqualityrequiresknowingtheoptimalsolution.But arun-timemonitorneedstomakeadecisionbasedontheapproximatesolutioncurrently available, without knowing what the optimal solution will eventually be. As a result, it cannotknow,withcertainty,theactualqualityoftheapproximatesolution. Thisobservationholdsforotherclassesofproblemsbesidescombinatorialoptimization problems.Forproblemsthatinvolveestimatinga pointvalue,thedifferencebetweenthe estimated point value and the true point value can’t be known until the algorithm has convergedtoanexactvalue[17].Foranytimeproblem-solversthatrelyonabstractionto createapproximatesolutions,solutionqualitymaybedifficulttoassessforotherreasons. For example, it may be difficult for a run-time monitor to predict the extent of reactive planningneededtofillinthedetailsofanabstractplan[38]. Formany problems,it is possible to boundthe degreeof approximationto an optimal solution. Because a run-time monitor can only estimate where the optimal solution falls within this bound, it is sometimes useful to treat the error bound itself as a measure of solution quality. For example, Horvitz [17] uses an error bound to measure solution qualityinmeta-levelcontrolofanapproximationalgorithmforbeliefnetworkevaluation. Similarly, de Givry and Verfaillie [6] use an error bound to measure solution quality in meta-levelcontrolofananytimealgorithmforsolvingconstraintoptimizationproblems. E.A.Hansen,S.Zilberstein/ArtificialIntelligence126(2001)139–157 145 For now, we assume that solution quality is defined in such a way that it can be observed at run time and used to improve prediction of the subsequent progress of the algorithm.When a run-timemonitorcannoteasily determinethe qualityof the currently available solution, or when solution quality itself is not the best predictor of subsequent improvement,thesuccessofrun-timemonitoringdependsonbeingabletodesignareliable scheme for estimating/predicting solution quality at run-time. We discuss this issue at greaterlengthinSection3.5. 3.2. Amyopicstoppingcriterion Insteadofmakingasingledecisionabouthowlongtorunananytimealgorithmbefore itstarts,arun-timemonitormustmakeasequenceofdecisions.Eachtimeitmonitorsan algorithm’sprogress,itmustdecidewhethertocontinuedeliberationorstop.Itiseasyto state an optimalstoppingrule:computationshouldstopas soonasthe expectedvalueof computationisnolongerpositive[17].However,exactdeterminationoftheEVCrequires consideringallpossibledecisionsaboutwhethertocontinueorstopthatcouldbemadeat subsequenttimesteps.Therefore,amyopicestimateoftheEVCisoftenused.Amyopic estimateoftheEVCistheexpectedutilityofactingontheresultthatwillbeavailableafter continuingthealgorithmforexactlyonemoretimestepminustheexpectedvalueofacting immediately on the result currently available. Using our representation of performance profiles,amyopicestimateoftheexpectedvalueofcomputationisdefinedasfollows. Definition6. Supposethatananytimealgorithmproducesasolutionofqualityq attime i t.Thenamyopicestimateoftheexpectedvalueofcomputation(MEVC)is: (cid:1) MEVC((cid:14)t)= Pr(q |q ,(cid:14)t)U(q ,t+(cid:14)t)−U(q ,t), j i j i j where(cid:14)t denotesanadditionaltimestep. A myopic estimate of the value of computation is used by a run-time monitor that periodically checks the progress of the algorithm and decides whether to continue computationorstop. Definition7. Themyopicmonitoringapproachistocontinuethecomputationaslongas MEVC((cid:14)t)>0. AlthoughastoppingrulethatreliesonmyopiccomputationofEVCisnotoptimalunder all circumstances, it can be shown to perform optimally under some assumptions. The followingtheoremgivessufficientconditionsforitsoptimality. Theorem1. Givenatime-dependentutilityfunction,themyopicmonitoringapproachis optimalwhen: (1) evaluatingtheMEVCtakesanegligibleamountoftime;and (2) foreverytimet andqualitylevelq forwhichMEVCisnon-positive,MEVCisalso non-positiveforeverytimet+(cid:14)t andqualitylevelq+(cid:14)q. 146 E.A.Hansen,S.Zilberstein/ArtificialIntelligence126(2001)139–157 Proof. If MEVC is positive when the currently available solution has quality level q at time t, continuingthe algorithmfor anothertime step has positiveexpectedvalue and is the optimaldecision. If MEVC is non-positivewhen the currentlyavailable solution has quality level q at time t, then any policy for continuing cannot improve expected value unlessthereis sometime t +(cid:14)t andqualitylevelq +(cid:14)q for whichMEVCis positive. Thesecondconditionguaranteesthatthiscannothappen.Therefore,stoppingthealgorithm istheoptimaldecision. ✷ The second conditionof Theorem 1 takes a simpler and more intuitive form for time- separableutilityfunctions. Corollary 1. The second condition in Theorem 1 is met when the expected marginal increase in the intrinsic value of a solution is a non-increasing function of quality and themarginalcostoftimeisanon-decreasingfunctionoftime. Proof. Under the separability assumption, a myopic estimate of the expected value of computationcanbeexpressedas: (cid:2) (cid:3) (cid:1) (cid:4) (cid:5) MEVC(q ,(cid:14)t)= Pr(q |q ,(cid:14)t)U (q )−U (q ) − U (t+(cid:14)t)−U (t) . i j i I j I i C C j The first term in the expression on the right-hand side of this equation represents the expected marginal increase in the intrinsic value of a solution. The assumption that this isanon-increasingfunctionofquality,combinedwithmonotonicimprovementofquality withcomputationtime,meansthisquantitycannotincreasefromonetimesteptothenext. The second term in the expression on the right-handside of this equation represents the marginalcostoftime.By assumption,itisanon-decreasingfunctionoftime.Therefore, the future marginal cost can only become larger. As a consequence, the EVC can only becomesmallerastimeandqualityprogress.Inotherwords,ifitisnegativeatonepoint itcannotbepositiveinanyfuturesituation. ✷ Becausetheassumptionsonwhichtheyrestarereasonablyintuitive,Theorem1andits corollarysuggestthatthereisalargeclassofapplicationsforwhichthemyopicapproach tometa-levelcontrolisoptimal.Nevertheless,therearecasesinwhichrelianceonMEVC canleadtoasub-optimalstoppingdecision.HorvitzandBreese[3,17]describeabounded conditioning algorithm for probabilistic inference in belief networks for which MEVC canmisleadbecauseexpectedimprovementis“flat”intheneartermbutsubstantialafter some number of time steps. Because this can lead to a premature stopping decision, Horvitz suggests various degrees of lookahead to compute EVC more reliably. Horsch andPoole[12]addressasimilarprobleminconsideringmeta-levelcontrolofananytime algorithm for influence diagram evaluation. To counter it, they describe a non-myopic approachtoestimatingtheEVCbasedonalinearmodelconstructedfromempiricaldata. Theapproachtometa-levelcontroldescribedinthefollowingsectionaddressesthissame problembyextendingtheframeworkformeta-levelcontrolintroducedearlier. E.A.Hansen,S.Zilberstein/ArtificialIntelligence126(2001)139–157 147 3.3. Anon-myopicstoppingcriterion Usingrun-timemonitoringtodeterminewhentostopananytimealgorithmpresentsa sequenceof decisions; at each step, the decision is whether to continue the algorithmor stop and act on the currently available solution. In this section, we formalize and solve thissequentialdecisionproblem.Thesolutiontotheproblemisanon-myopicmonitoring policythattakesintoaccounttheeffectoffuturemonitoringdecisions. Definition8. A monitoringpolicy, π(q ,t ), is a mappingfrom time step t and quality i k k level q to a decision whether to continue the algorithm or stop and act on the currently i availablesolution. Toconstructanon-myopicmonitoringpolicy,wefollowthewell-establishedliterature on using dynamic programming for solving optimal stopping problems. If utility only depends on solution quality and computation time, a stopping rule can be found by optimizingthefollowingvaluefunction, (cid:6) U(q ,t) ifd=stop, V(q ,t)=max (cid:7) i i d jPr(qj|qi,(cid:14)t)V(qj,t+(cid:14)t) ifd=continue todeterminethefollowingpolicy, (cid:6) U(q ,t) ifd=stop, π(q ,t)=argmax (cid:7) i i d jPr(qj|qi,(cid:14)t)V(qj,t+(cid:14)t) ifd=continue, where(cid:14)trepresentsasingletimestepanddisabinaryvariablethatrepresentsthedecision toeitherstoporcontinuethealgorithm.Theoptimalpolicycanbecomputedoff-lineusing adynamicprogrammingalgorithmwithcomplexityO(|m|2|n|),wheremisthenumberof qualitylevelsandnisthenumberoftimesteps.Becausecomputationisoff-line,thisisan exampleofwhatHorvitzcallscompilationofmetareasoning. Onceanoptimalmonitoringpolicyis computed,meta-levelcontrolis straightforward. Every time step, the current solution quality and the current time are used to determine thebestaction(stoporcontinue),byindexingthepolicy.Thisleadstoagloballyoptimal (non-myopic)stoppingcriterion. Theorem2. Amonitoringpolicythatmaximizestheabovevaluefunctionisoptimalwhen qualityimprovementsatisfiestheMarkovpropertyandmonitoringhasnocost. Proof. This is an immediate outcome of the application of dynamic programming under the Markov assumption [1]. The Markov assumption requires that the probability distributionoffuturequalitydependsonlyonthecurrent“state”oftheanytimealgorithm, whichistakentobethequalityofthecurrentlyavailablesolution. ✷ We note that the same monitoringpolicy may also be computedusing the state-space ∗ ∗ searchalgorithmAO [21],ifan appropriateheuristic isavailable.AO findsan optimal solution for an initial state, which for this problem is (q ,0), and may find the same start monitoring policy more efficiently than dynamic programming if many states are not reachablefromtheinitialstateunderanoptimalpolicy. 148 E.A.Hansen,S.Zilberstein/ArtificialIntelligence126(2001)139–157 3.4. Cost-sensitivemonitoring Manyexistingapproachestomonitoringassumecontinuousorperiodicmonitoringthat incursnegligible cost [30,34,38].In many situations, however,run-time monitoringmay havea significantoverheadbecauseof theneed todeterminethecurrentsolutionquality (andsometimesalsothecurrentstateoftheenvironment).Ifthatinformationisavailableat nocost,monitoringeverytimestepmaybereasonable.Butsupposethatmonitoringincurs aconstantcost,C.Inthiscase,weneedtoconstructamorecomplextypeofmonitoring policy. Definition9. A cost-sensitivemonitoringpolicy,π (q ,t ),isa mappingfromtime step c i k t and quality level q into a monitoring decision ((cid:14)t,m) such that (cid:14)t represents the k i additionalamountoftimetoallocatetotheanytimealgorithm,andmisabinaryvariable that represents whether to monitor at the end of this time allocation or to stop without monitoring. In other words, a cost-sensitive monitoring policy specifies, for each time step t and k qualitylevelq ,thefollowingtwodecisions: i (1) howmuchadditionaltimetorunthealgorithm;and (2) whether to monitor at the end of this time allocation and re-assess whether to continue,ortostopwithoutmonitoring. An initial decision, π (q ,t ), specifies how much time to allocate to the algorithm c start 0 beforemonitoringforthefirsttimeorstoppingwithoutmonitoring.Notethatthevariable (cid:14)t makes it possible to controlthe time intervalbetween one monitoringaction and the next;itsvaluecan rangefrom0to t −t , where t is themaximumrunningtime ofthe n i n algorithmand t is howlong it has alreadyrun.The binaryvariablem makesit possible i to allocate time to the algorithm without necessarily monitoring at the end of the time interval;itsvalueiseitherstopormonitor. Given this formalization, it is possible to use dynamic programming to compute a combined policy for monitoring and stopping. Although dynamic programming is often usedtosolveoptimalstoppingproblems,thenovelaspectofthissolutionisthatdynamic programmingisusedtodeterminewhentomonitor,aswellaswhentostopandactonthe currentlyavailablesolution.Acost-sensitivemonitoringpolicyisfoundbyoptimizingthe followingvaluefunction: (cid:8)(cid:7) Pr(q |q ,(cid:14)t)U(q ,t +(cid:14)t) ifm=stop, V (q ,t )=max (cid:7)j j i j k c i k (cid:14)t,m jPr(qj|qi,(cid:14)t)Vc(qj,tk+(cid:14)t)−C ifm=monitor. Theorem3. Acost-sensitivemonitoringpolicythatmaximizestheabovevaluefunctionis optimalwhenqualityimprovementsatisfiestheMarkovproperty. Theproofisidenticalto thatof Theorem2 exceptthatthe costof monitoringistaken intoaccount. An advantage of this framework is that it makes it possible to find an intermediate strategybetweencontinuousmonitoringandnotmonitoringatall.Itcanrecognizewhether
Description: