Tracking Using an Explanatory Framework KamalikaChaudhuri YoavFreund DanielHsu ITA,UCSanDiego CSEDept.,UCSanDiego CSEDept.,UCSanDiego [email protected] [email protected] [email protected] 0 1 Abstract 0 2 We study the tracking problem, namely, estimating the hidden state of an object n overtime,fromunreliableandnoisymeasurements. Thestandardframeworkfor a thetrackingproblemisthegenerativeframework,whichisthebasisofsolutions J such as the Bayesian algorithm and its approximation, the particle filters. How- 9 ever, the problem with these solutions is that they are very sensitive to model 1 mismatches. ] In this paper, motivated by online learning, we introduce a new framework – an G explanatoryframework–fortracking. Weprovideanefficienttrackingalgorithm L for this framework. We provide experimental results comparing our algorithm . to the Bayesian algorithm on simulated data. Our experiments show that when s c thereareslightmodelmismatches,ouralgorithmvastlyoutperformstheBayesian [ algorithm. 2 v 1 Introduction 2 6 8 We study the tracking problem, which has numerous applications in AI, control and finance. In 2 tracking, we are given noisy measurements over time, and the problem is to estimate the hidden . 3 stateofanobject. Thechallengeistodothisreliably, bycombiningmeasurementsfrommultiple 0 time steps and prior knowledge about the state dynamics, and the goal of tracking is to produce 9 estimatesthatareasclosetothetruestatesaspossible. 0 : ThemostpopularsolutionstothetrackingproblemaretheKalmanfilter[1],theparticlefilter[2], v and their numerous extensionsand variations(e.g. [3, 4]), whichare basedon a generativeframe- i X workforthetrackingproblem. Supposewewanttotrackthestatext ofanobjectattimet, given only measurement vectors M(·,t(cid:48)) for times t(cid:48) ≤ t. In the generative approach, we think of the r a stateX(t)andmeasurementsM(·,t)asrandomvariables. Werepresentourknowledgeregarding the dynamics of the states using the transition process Pr(X(t)|X(t−1)) and our knowledge re- gardingthe(noisy)relationshipbetweenthestatesandtheobservationsbythemeasurementprocess Pr(M(·,t)|X(t)). Then, given only the observations, the goal of tracking is to estimate the hid- den state sequence (x ,x ,...). This is done by calculating the likelihood of each state sequence 1 2 andthenusingastheestimateeitherthesequencewiththehighestposteriorprobability(maximum a posteriori, or MAP) or the expected value of the state with respect to the posterior distribution (the Bayesian algorithm). In practice, one uses particle filters, which are an approximation to the Bayesianalgorithm. The problem with the generative framework is that in practice, it is very difficult to precisely de- terminethedistributionsofthemeasurements. Moreover, theBayesianalgorithmisverysensitive to model mismatches, so using a model which is slightly different from the model generating the measurementscanleadtoalargedivergencebetweentheestimatedstatesandthetruestates. Toaddressthis,weintroduceanonline-learning-basedframeworkfortracking. Inourframework, calledtheexplanatoryframework,wearegivenasetofstatesequencesorpathsinthestatespace; but instead of assuming that the observations are generated by a measurement model from a path 1 in this set, we think of each path as a mechanism for explaining the observations. We emphasize that this is done regardless of how the observations are generated. Suppose a path (x ,x ,...) is 1 2 proposed as an explanation of the observations (M(·,1),M(·,2),...). We measure the quality of this explanatory path using a predefined loss function, which depends only on the measurements (andnotonthehiddentruestate). Thetrackingalgorithmselectsitsownexplanatorypathbytaking a weighted average of the best explanatory paths according the past observations. The theoretical guarantee we provide is that the loss of the explanatory path generated in this online way by the tracking algorithm is close to that of the explanatory path with the minimum such loss; here, the loss is measured according to the loss function supplied to the algorithm. Such guarantees are analogous to competitive analysis used in online learning [5, 6, 7], and it is important to note that such guarantees hold uniformly for any sequence of observations, regardless of any probabilistic assumptions. Ournextcontributionistoprovideanonline-learning-basedalgorithmfortrackingintheexplana- tory framework. Our algorithm is based on NormalHedge [8], which is a general online learning algorithm. NormalHedgecanbeinstantiatedwithanylossfunction. Whensuppliedwithabounded lossfunction,itisguaranteedtoproduceapathwithlossclosetothatofthepathwiththeminimum loss, from a set of candidate paths. As it is inefficient to directly apply NormalHedge to track- ing, we derive a Sequential Monte Carlo approximation to NormalHedge, and we show that this approximationisefficient. Todemonstratetherobustnessofourtrackingalgorithm,weperformsimulationsonasimpleone- dimensionaltrackingproblem.Weevaluatetrackingperformancebymeasuringtheaveragedistance betweenthestatesestimatedbythealgorithms,andthetruehiddenstates. Weinstantiateouralgo- rithm with a simple clipping loss function. Our simulations show that our algorithm consistently outperformstheBayesianalgorithm,underhighmeasurementnoise,andawiderangeoflevelsof modelmismatch. WenotethatBayesianalgorithmcanalsobeinterpretedintheexplanatoryframework.Inparticular, if the loss of a path is the negative log-likelihood (the log-loss) under some measurement model, then,theBayesianalgorithmcanbeshowntoproduceapathwithlog-lossclosetothatofthepath withtheminimumlog-loss.Onemaybetemptedtothinkthatourtrackingsolutionfollowsthesame approach; however,thepointofourpaperisthatonecanuselossfunctionsthataredifferentfrom log-loss, andinparticular, weshowascenarioinwhichusingotherlossfunctionsproducesbetter trackingperformancethantheBayesianalgorithm(oritsapproximations). Therestofthepaperisorganizedasfollows.InSection2,weexplainindetailourexplanatorymodel fortracking. InSection3,wepresentNormalHedge,onwhichourtrackingalgorithmisbased. In Section 4, we provide our tracking algorithm. Section 5 presents the experimental comparison of ouralgorithmwiththeBayesianalgorithm. Finally,wediscussrelatedworkinSection6. ThedetailedboundsandproofsforNormalHedgeareprovidedinthesupplementarymaterial. We feel that the algorithm NormalHedge may be of more general interest, and hence these details for NormalHedgehavebeensubmittedtoNIPSinacompanionpaper. 2 Theexplanatoryframeworkfortracking In this section, we describe in more detail the setup of the tracking problem, and the explanatory frameworkfortracking. Intracking,ateachtimet,wearegivenasinput,measurements(orobser- vations)M(·,t),andthegoalistoestimatethehiddenstateofanobjectusingthesemeasurements, andourpriorknowledgeaboutthestatedynamics. Intheexplanatoryframework,wearegivenasetP ofpaths(sequences)overthestatespaceX ⊂ Rn. Ateachtimet,weassigntoeachpathinP alossfunction(cid:96). Thelossfunctionhastwoparts: a dynamicsloss(cid:96) andanobservationloss(cid:96) . d o The dynamics loss (cid:96) captures our knowledge about the state dynamics. For simplicity, we use a d dynamicsloss(cid:96) thatcanbewrittenas d (cid:88) (cid:96) (p)= (cid:96) (x ,x ) d d t t−1 t 2 forapathp = (x ,x ,...). Inotherwords,thedynamicslossattimetdependsonlyonthestates 1 2 attimetandt−1. Acommonwaytoexpressourknowledgeaboutthedynamicsisintermsofa dynamicsfunctionF,definedsothatpathswithx ≈F(x )willhavesmalldynamicsloss. t t−1 For example, consider an object moving with a constant velocity. Here, if the state x = (p,v), t where p is the position and v is the velocity, then we would be interested in paths in which x ≈ t x +(v,0). Inthesecases,thedynamicsloss(cid:96) (x ,x )istypicallyagrowingfunctionofthe t−1 d t t−1 distancefromx toF(x ). t t−1 Thesecondcomponentofthelossfunctionisanobservationloss(cid:96) .Givenapathp=(x ,x ,...), o 1 2 andmeasurementsM = (M(·,1),M(·,2),...),theobservationlossfunction(cid:96) (p,M)quantifies o howwellthepathpexplainsthemeasurements. Again,forsimplicity,werestrictourselvestoloss functions(cid:96) thatcanbewrittenas: o (cid:88) (cid:96) (p,M)= (cid:96) (x ,M(·,t)). o o t t In other words, the observation loss of a path at time t depends only on its state at time t and the measurementsattimet. Thetotallossofapathpisthesumofitsdynamicsandobservationlosses. Wenotethatthelossofapathdependsonlyonthatparticularpathandthemeasurements,andnot onthetruehiddenstate. Asaresult,thelossofapathcanalwaysbeevaluatedbythealgorithmat anygiventime. Thealgorithmicframeworkweconsiderinthismodelisanalogousto,andmotivatedbythedecision- theoretic framework for online learning [6, 5]. At time t, the algorithm assigns a weight wt to p each path p in P. The estimated state at time t is the weighted mean of the states, where the weight of a state is the total weight of all paths in this state. The loss of the algorithm at time t istheweightedlossofallpathsinP. Thetheoreticalguaranteewelookforisthatthelossofthe algorithm is close to the loss of the best path in P in hindsight (or, close to the loss of the top (cid:15)- quantilepathinP inhindsight). Thus, ifP hasasmallfractionofpathswithlowloss, andifthe lossfunctionssuccessfullycapturethetrackingperformance,then,thesequenceofstatesestimated bythealgorithmwillhavegoodtrackingperformance. 3 NormalHedge Inthissection,wedescribetheNormalHedgealgorithm.TopresentNormalHedgeinfullgenerality, wefirstneedtodescribethedecision-theoreticframeworkforonlinelearning. Theproblemofdecision-theoreticonlinelearningisasfollows. Ateachround,alearnerhasaccess toasetofN actions; forourpurposes, anactionisanymethodthatprovidesapredictionineach round.Thelearnermaintainsadistributionw overtheactionattimet.Ateachtimeperiodt,each i,t (cid:80) actionisuffersaloss(cid:96) whichliesinaboundedrange, andthelossofthelearneris w (cid:96) . i,t i i,t i,t We notice that this framework is very general – no assumption is made about the nature of the actionsandthedistributionofthelosses. Thegoalofthelearneristomaintainadistributionover the actions, such that its cumulative loss over time is low, compared to the cumulative loss of the actionwiththelowestcumulativeloss. Insomecases, particularly, whenthenumberofexpertsis very large, we are interested in acheiving a low cumulative loss compared to the top (cid:15)-quantile of actions. Here, for any (cid:15), the top (cid:15)-quantile of actions are the (cid:15) fraction of actions which have the lowestcumulativeloss. Starting with the seminal work of Littlestone and Warmuth [7], the problem of decision-theoretic onlinelearninghasbeenwell-studiedintheliterature[6,9,5]. Themostcommonalgorithmforthis problemisHedgeorExponentialWeights[6],whichassignstoeachactionaweightexponentially small in its total loss. In this paper however, we consider a different algorithm NormalHedge for thisproblem[8], anditisthisalgorithmthatformsthebasisofourtrackingalgorithm. Whilethe BayesianaveragingalgorithmcanbeshowntobeavariantofHedgewhenthelossfunctionisthe log-loss, such is not the case for NormalHedge, and it is a very different algorithm. A significant advantage of using NormalHedge is that it has no parameters to tune, yet acheives performance comparable to the best performance of previous online learning algorithms with optimally tuned parameters. 3 IntheNormalHedgealgorithm,foreachactioniandtimet,weusew todenotetheNormalHedge i,t weight assigned to action i at time t. At any time t, we define the regret R of our algorithm to i,t anactioniasthedifferencebetweenthecumulativelossofouralgorithmandthecumulativeloss of this action. Also, for any real number x, we use the notation [x] to denote max(0,x). The + NormalHedgealgorithmispresentedbelow. Algorithm1NormalHedge initialize R =0,w =1/N ∀i i,0 i,1 1: fort=1,2,...do 2: Eachactioniincursloss(cid:96)i,t. 3: Learnerincursloss(cid:96)A,t =(cid:80)Ni=1wi,t(cid:96)i,t. 4: Updatecumulativeregrets: Ri,t =Ri,t−1+((cid:96)A,t−(cid:96)i,t)∀i. 5: Findct >0satisfying N1 (cid:80)Ni=1exp(cid:16)([Ri2,ctt]+)2(cid:17)=e. 6: Updatedistribution: wi,t+1 ∝ [Ric,tt]+ exp(cid:16)([Ri2,ctt]+)2(cid:17)∀i. 7: endfor The performance guarantees for the NormalHedge algorithm, as shown by [8] can be stated as follows. Theorem1. IfNormalHedgehasaccesstoN actions,thenforalllosssequences,forallt,forall 0<(cid:15)≤1,theregretofthealgorithmtothetop(cid:15)-quantileoftheactionsisO((cid:112)t·ln(1/(cid:15))+ln2N). Notethattheactionswhichhavetotallossgreaterthanthetotallossofthealgorithm,areassigned zero weight. Since the algorithm performs almost as well as the best action, in a scenario where a few actions are significantly better than the rest, the algorithm will assign zero weight to most actions. In other words, the support of the NormalHedge weights may be a very small set, which cansignificantlyreduceitscomputationalcost. 4 TrackingusingNormalHedge To apply NormalHedge directly to tracking, we set each action to be a path in the state space, and the loss of each action at time t to be the loss of the corresponding path at time t. To make NormalHedgemorerobustinapracticalsetting,wemakeasmallchangetothealgorithm: instead ofusingcumulativeloss,weuseadiscountedcumulativeloss. Foradiscountparameter0<α<1, thediscountedcumulativelossofanactioniattimeT is(cid:80)T (1−α)T−t(cid:96) . Usingdiscounted t=1 i,t lossesiscommoninreinforcementlearning[10]; intuitively,itmakesthetrackingalgorithmmore flexible,andallowsittomoreeasilyrecoverfrompastmistakes. However, a direct application of NormalHedge is prohibitively expensive in terms of computation cost. Therefore,inthesequel,weshowhowtoderiveaSequentialMonte-Carlobasedapproxima- tiontoNormalHedge,andweusethisapproximationinourexperiments. ThekeyobservationbehindourapproximationisthattheweightsonactionsgeneratedbytheNor- malHedgealgorithminduceadistributionoverthestatesateachtimet. Wethereforeusearandom sampleofstatesineachroundtoapproximatethisdistribution. Thus,justasparticlefiltersapprox- imatetheposteriordensityonthestatesinducedbytheBayesianalgorithm,ourtrackingalgorithm approximatesthedensityinducedonthestatesbyNormalHedgefortracking. ThemaindifferencebetweenNormalHedgeandourtrackingalgorithmisthatwhileNormalHedge always maintains the weights for all the actions, we delete an action from our action list when its weightfallsto0. Wethenreplacethisactionbyourresamplingprocedure,whichchoosesanother actionwhichiscurrentlyinaregionofthestatespacewheretheactionshavelowlosses. Thus,we do not spend resources maintaining and updating weights for actions which do not perform well. AnotherdifferencebetweenNormalHedgeandourtrackingalgorithmisthatinourapproximation, wedonotexplicitlyimposeadynamicslossontheactions. Instead,weusearesamplingprocedure 4 Algorithm2Trackingalgorithm input N (number of actions), α (discount factor), Σ (resampling parameter) F (dynamics func- ∗ tion) 1: A:={x1,1,...,xN,1}withxi,1randomlydrawnfromX;Ri,0 :=0;wi,0 :=1/N ∀i 2: fort=1,2,...do 3: Obtainlosses(cid:96)i,t =(cid:96)o(xi,t)foreachactioniandupdateregrets: R :=(1−α)R +((cid:96) −(cid:96) )where(cid:96) =(cid:80)N w (cid:96) . i,t i,t−1 A,t i,t A,t i=1 i,t−1 i,t 4: Deletepooractions: letX ={i:Ri,t ≤0},setA:=A\X. 5: Resampleactions: A:=A∪Resample(X,Σ∗,t). 6: Computeweightofeachactioni: wi,t ∝ [Ric,t]+ exp(cid:16)([Ri2,tc]+)2(cid:17) wherecisthesolutiontotheequation 1 (cid:80)N exp(cid:16)([Ri,t]+)2(cid:17)=e. N i=1 2c 7: Estimate: xA,t :=(cid:80)Ni=1wi,txi,t. 8: Updatestates: xi,t+1 :=F(xi,t)∀i. 9: endfor Algorithm3Resamplingalgorithm input X (actionstoberesampled),Σ (resamplingparameter),t(currenttime) ∗ 1: forj ∈X do 2: SetX¯ :={i:Ri,t >0}. 3: IfX¯ =∅: setp¯i =1/N ∀i. Else: setp¯i ∝wi,t−1∀i∈X¯ andp¯i =0∀i∈/ X¯. 4: Drawi∼(p¯1,...,p¯N). 5: Drawxj,t ∼N(xi,t,Σ∗),andsetRj,t :=(1−α)Ri,t−1+((cid:96)A,t−(cid:96)o(xj,t)). 6: endfor Figure1: NormalHedge-basedtrackingalgorithm. thatonlyconsidersactionswithlowdynamicsloss. Thisalsoavoidsspendingresourcesonactions whichhavehighdynamicslossanyway. Our tracking algorithm is specified in Algorithm 2. Each action i in our algorithm is a path (x ,x ,...) in the state space X ⊂ Rn. However, we do not maintain this entire path explic- i,1 i,2 itly for each action; rather, Step 8 of the algorithm computes x from x using the dynamics i,t+1 i,t functionF, soweonlyneedtomaintainthecurrentstateofeachaction. Recall, applyingthedy- namicsfunctionF shouldensurethatthepathincursnoorlittledynamicsloss(seeSection2). WestartwithasetofactionsAinitiallypositionedatstatesuniformlydistributedovertheX,anda uniformweightingovertheseactions. Ineachround, likeNormalHedge, eachactionincursaloss determined by its current state, and the tracker incurs the expected loss determined by the current weightingoveractions. Usingtheselosses, weupdatethecumulative(discounted)regretstoeach action. However,unlikeNormalHedge,wethendeleteallactionswithzeroornegativeregret,and replace them using a resampling procedure. This procedure replaces poorly performing actions withactionscurrentlyathighdensityregionsofX,therebyprovidingabetterapproximationtothe intendedweights. The resampling procedure is explained in detail in Algorithm 3. The main idea is to sample from the regions of the state space with high weight. This is done by sampling an action proportional to its weight in the previous round. We then choose a state randomly (roughly) from an ellipsoid {x:(x−x )(cid:62)Σ−1(x−x )≤1}aroundthecurrentstatex oftheselectedaction;thenewaction t ∗ t t inheritsthehistoryoftheselectedaction,buthasacurrentstatewhichisdifferentfrom(butclose to)theselectedaction. Thislatterstepmakesthenewstatedistributionsmootherthantheoneinthe previousround,whichmaybesupportedonjustafewstatesifonlyafewactionshavelowlosses. WenotethatΣ canbesetsothattheresamplingprocedureonlysamplesactionswithlowdynamics ∗ 5 2σ o 2W Blue: H(x,z ),Red: M(x,t). Blue: ρ=0,Red: ρ=0.2. t Figure2: Plotsofthemeasurements(asafunctionofx)forρ = 0andσ = 1andthedensityof o thenoisen (x). t loss (and Step 8 of the algorithm ensures that the remaining actions in the set A do not incur any dynamicsloss);thus,ouralgorithmdoesnotexplicitlycomputeadynamicslossforeachaction. 5 Simulations For our simulations, we consider the task of tracking an object in a simple, one-dimensional state space.Toevaluateouralgorithm,wemeasurethedistancebetweentheestimatedstates,andthetrue statesoftheobject. Ourexperimentalsetupisinspired bytheapplicationoftrackingfacesin videos, using astandard facedetector[11]. Inthiscase,thestateisthelocationofaface,andeachmeasurementcorresponds toascoreoutputbythefacedetectorforaregioninthecurrentvideoframe. Thegoalistopredict the location of the face across several video frames, using these scores produced by the detector. Thedetectortypicallyreturnshighscoresforseveralregionsaroundthetruelocationofaface,butit mayalsoerroneouslyproducehighscoreselsewhere. Andthoughinsomecasesthedetectionscore mayhaveaprobabilisticinterpretation,itisoftendifficulttoaccuratelycharacterizethedistribution ofthenoise. The precise setup of our simulations is as follows. The object to be tracked remains sta- tionary or moves with velocity at most 1 in the interval [−500,500]. At time t, the true state is the position z ; the measurements correspond to a 1001-dimensional vector t M(t) = [M(−500,t),M(−499,t),...,M(499,t),M(500,t)] for locations in a grid G = {−500,−499,...,499,500},generatedbyanadditivenoiseprocess M(x,t)=H(x,z )+n (x). t t Here,H(x,z )isthesquarepulsefunctionofwidth2W aroundthetruestatez : H(x,x ) = 1if t t t |x−z | ≤ W and0otherwise(seeFigure2,left). Theadditivenoisen (x)israndomlygenerated t t independentlyforeachtandeachx∈G,usingthemixturedistribution (1−ρ)·N(0,σ2)+ρ·N(0,(10σ )2) o o (seeFigure 2, right). Theparameter σ representshow noisythemeasurements arerelativeto the o signal,andtheparameterρrepresentsthefractionofoutliers. Inourexperiments,wefixW = 50 andvaryσ andρ. ThetotalnumberoftimestepswetrackforisT =200. o In the generative framework, the dynamics of the object is represented by the transition model x ∼ N(x ,σ2), and the observations are represented by the measurement process M(x,t) ∼ t+1 t d N(H(x,x ),σ2). Thus,whenρ=0,theobservationsaregeneratedaccordingtothemeasurement t o processsuppliedtothegenerativeframework;forρ>0,aρfractionoftheobservationsareoutliers. Fortheexplanatoryframework,theexpectedstatedynamicsfunctionF istheidentityfunction,and theobservationlossofapathp=(x ,x ,...)attimetisgivenby 1 2 (cid:88) (cid:96) (x ,M(·,t))=− q(M(x,t)) o t x∈[xt−W,xt+W]∩G 6 Table1: ExperimentalResults. Root-mean-squared-errorsofthepredictedpositionsoverT = 200 timestepsforouralgorithm(NH),theBayesianalgorithm,andtheparticlefilter(PF).Thereported valuesaretheaveragesandstandarddeviationsover100simulations. LowNoise(σ =1) HighNoise(σ =8) o o ρ NH Bayes PF ρ NH Bayes PF 0.00 3.18±0.33 1.17±0.09 1.23±0.11 0.00 10.93±2.52 10.98±2.33 14.35±5.16 0.01 3.21±0.34 1.90±0.25 3.98±1.06 0.01 11.26±3.43 12.76±3.07 44.29±16.7 0.05 3.26±0.34 3.99±0.52 81.70±1.74 0.05 12.03±3.47 19.75±6.70 81.70±1.74 0.10 3.31±0.35 6.40±0.84 81.70±1.74 0.10 12.25±2.93 27.33±10.9 81.70±1.74 0.15 3.42±0.34 8.38±1.10 81.70±1.74 0.15 13.38±3.07 32.78±13.1 81.70±1.74 0.20 3.52±0.41 10.28±1.24 81.70±1.74 0.20 14.15±3.88 43.99±26.8 81.70±1.74 whereq(y)=min(1+σ ,max(y,−σ ))clipsthemeasurementstotherange[−σ ,1+σ ].Thatis, o o o o theobservationlossforx withrespecttoM(·,t)isthenegativesumofthresholdedmeasurement t valuesq(M(x,t))forxinanintervalofwidth2W aroundx . t GivenonlytheobservationvectorsM,weusethreedifferentmethodstoestimatethetrueunderlying state sequence (z ,z ,...). The first is the Bayesian algorithm, which recursively applies Bayes’ 1 2 rule to update a posterior distribution using the transition and observation model. The posterior distributionismaintainedateachlocationinthediscretizationG. FortheBayesianalgorithm, we set σ to the actual value of σ used to generate the observations, and we set σ = 2. The value o o d ofσ wasobtainedbytuningonmeasurementvectorsgeneratedwiththesametruestatesequence, d butwithindependentlygeneratednoisevalues. Thepriordistributionoverstatesassignsprobability one to the true value of z (which is 0 in our setup) and zero elsewhere. The second algorithm is 1 our algorithm (NH) described in Section 4. For our algorithm, we use the parameters Σ = 400 ∗ and α = 0.02. These parameters were also obtained by tuning over a range of values for Σ and ∗ α. Wealsocompareouralgorithmwiththeparticlefilter(PF),whichusesthesameparametersas withtheBayesianalgorithm,andpredictsusingtheexpectedstateunderthe(approximate)posterior distribution. Forouralgorithm,weuseN =100actions,andfortheparticlefilter,weuseN =100 particles. Forourexperiments,weuseanimplementationoftheparticlefilterdueto[12]. Figure 3 shows the true state and the states predicted by our algorithm (Blue) and the Bayesian algorithm(Red)fortwodifferentvaluesofσ for5independentsimulations. Table1summarizes o theperformanceofthesealgorithmsfordifferentvaluesoftheparameterρ,fortwodifferentvalues ofthenoiseparameterσ . WereporttheaverageandstandarddeviationoftheRMSE(root-mean- o squared-error) between the true state and the predicted state. The RMSE is computed over the T = 200 state predictions for a single simulation, and these RMSE values are averaged over 100 independentsimulations. OurexperimentsshowthattheBayesianalgorithmperformswellwhenρ=0,thatis,itissupplied withthecorrectnoisemodel;however,itsperformancedegradesrapidlyasρincreases,andbecomes very poor even at ρ = 0.2. On the other hand, the performance of our algorithm does not suffer appreciably when ρ increases. The degradation of performance of the Bayesian algorithm is even more pronounced, when the noise is high with respect to the signal (σ = 8). The particle filter o suffersaevenhigherdegradationinperformance, andhaspoorperformanceevenwhenρ = 0.01 (thatis,when99%oftheobservationsaregeneratedfromthecorrectlikelihooddistributionsupplied to the particle filter). Our results indicate that the Bayesian algorithm is very sensitive to model mismatches. On the other hand, our algorithm, when equipped with a clipped-loss function, is extremelyrobusttomodelmismatches. Inparticular,ouralgorithmprovidesaRMSEvalueof19.6 evenunderhighnoise(σ =8),whenρisashighas0.4. o Someadditionalexperimentswithouralgorithmareincludedinthesupplementaryappendix;they illustratehowtheperformanceofouralgorithmvarieswiththeparametersΣ andα,andtabulates ∗ theperformanceofouralgorithmforhighervaluesofρ. 7 150 150 100 100 Position 50 Position 50 0 NH 0 NH Bayes Bayes −50 −50 0 50 100 150 200 0 50 100 150 200 t t σ =1,ρ=0 σ =8,ρ=0 o o 150 150 100 100 Position 50 Position 50 0 NH 0 NH Bayes Bayes −50 −50 0 50 100 150 200 0 50 100 150 200 t t σ =1,ρ=0.1 σ =8,ρ=0.1 o o 150 150 100 100 Position 50 Position 50 0 NH 0 NH Bayes Bayes −50 −50 0 50 100 150 200 0 50 100 150 200 t t σ =1,ρ=0.2 σ =8,ρ=0.2 o o Figure3: Predictedpathsinfivesimulations. Firstcolumn: lownoise(σ = 1). Secondcolumn: o high noise (σ = 8). The blue lines correspond to our algorithm, the red lines correspond to the o Bayesianalgorithm,andthedashedblacklinerepresentsthetruestates. 6 Relatedwork The generative approach to tracking has roots in control and estimation theory, starting with the seminal work of Kalman [1]. The most popular generative method used in tracking is the particle filter[2],anditsnumerousvariants. Theliteraturehereisvast,andtherehavebeenmanyexciting developmentsinrecentyears(e.g.[4,13]); wereferthereaderto[14]foradetailedsurveyofthe results. ThesuboptimalityoftheBayesianalgorithmundermodelmismatchhasbeeninvestigatedinother contexts such as classification [15, 16]. The view of the Bayesian algorithm as an online learn- ing algorithm for log-loss is well-known in various communities, including information theory / MDL [17, 18] and computational learning theory [19, 20]. In our work, we look beyond the Bayesian algorithm and log-loss to consider other loss functions and algorithms that are more ap- propriateforourtask. Therehasalsobeensomeworkontrackingintheonlinelearningliterature(see,forexample,[21, 22]);there,however,theystudyaverydifferentmodelfortracking. 7 Conclusions Inthispaper,weintroduceanexplanatoryframeworkfortrackingbasedononlinelearning,which broadens the space for designing algorithms that need not conform to the standard Bayesian ap- proach to tracking. We propose a new algorithm for tracking in this framework that deviates sig- nificantlyfromtheBayesianapproach. Experimentalresultsshowthatouralgorithmsignificantly outperforms the Bayesian algorithm, even when the observations are generated by a distribution deviating just slightly from the model supplied to the Bayesian algorithm. Our work reveals an interestingconnectionbetweendecisiontheoreticonlinelearningandBayesianfiltering. 8 References [1] R.E.Kalman. Anewapproachtolinearfilteringandpredictionproblems. TransactionsoftheASME— JournalofBasicEngineering,82(D):35–45,1960. [2] A.Doucet,N.deFreitas,andN.J.Gordon.SequentialMonteCarloMethodsinPractice.Springer-Verlag, 2001. [3] M.IsardandA.Blake. Condensation–conditionaldensitypropagationforvisualtracking. International JournalonComputerVision,28(1):5–28,1998. [4] R.vanderMerwe,A.Doucet,N.deFreitas,andE.Wan. Theunscentedparticlefilter. InAdvancesin NeuralInformationProcessingSystems,2000. [5] N.Cesa-BianchiandG.Lugosi. Prediction,LearningandGames. CambridgeUniversityPress,2006. [6] Y.FreundandR.E.Schapire. Adecision-theoreticgeneralizationofon-linelearningandanapplication toboosting. JournalofComputerandSystemSciences,55:119–139,1997. [7] N. Littlestone and M. Warmuth. The weighted majority algorithm. Information and Computation, 108:212–261,1994. [8] A.Anonymous. Anonymoussubmission,2009. [9] Nicolo` Cesa-Bianchi,YoavFreund,DavidP.Helmbold,DavidHaussler,RobertE.Schapire,andMan- fredK.Warmuth. Howtouseexpertadvice. InSTOC,pages382–391,1993. [10] LesliePackKaelbling,MichaelL.Littman,andAndrewP.Moore. Reinforcementlearning:Asurvey. J. Artif.Intell.Res.(JAIR),4:237–285,1996. [11] P.ViolaandM.Jones. Rapidobjectdetectionusingaboostedcascadeofsimplefeatures. InConference onComputerVisionandPatternRecognition,2001. [12] N.deFreitas.Matlabcodesforparticlefiltering,2000.www.cs.ubc.ca/˜nando/software/upf demos.tar.gz. [13] M.Klaas,N.deFreitas,andA.Doucet. Towardpracticaln2montecarlo:Themarginalparticlefilter. In UAI,2005. [14] A.DoucetandA.M.Johansen. Atutorialonparticlefilteringandsmoothing: Fifteenyearslater. Tech- nicalreport,2008. www.cs.ubc.ca/˜arnaud/doucet johansen tutorialPF.pdf. [15] P.Domingos. Bayesianaveragingofclassifiersandtheoverfittingproblem. InICML,2000. [16] P.Gru¨nwaldandJ.Langford. Suboptimalbehaviorofbayesandmdlinclassificationundermisspecifica- tion. MachineLearning,66(2–3):119–149,2007. [17] N.MerhavandM.Feder. Universalprediction. IEEETransactionsonInformationTheory,39(4):1280– 1292,1993. [18] PeterD.Gru¨nwald. TheMinimumDescriptionLengthPrinciple. MITPress,2007. [19] YoavFreund. Predictingabinarysequencealmostaswellastheoptimalbiasedcoin. InCOLT,pages 89–98,1996. [20] ShamM.KakadeandAndrewY.Ng. Onlineboundsforbayesianalgorithms. InNIPS,2004. [21] M.HerbsterandM.Warmuth. Trackingthebestexpert. MachineLearning,32(2):151–178,1998. [22] W.KoolenandS.deRooij. Combiningexpertadviceefficiently. InCOLT,2008. 9

