Table Of Content

JMathImagingVis(2007)28:1–18 DOI10.1007/s10851-007-0007-8 Interacting and Annealing Particle Filters: Mathematics and a Recipe for Applications JürgenGall·JürgenPotthoff·ChristophSchnörr· BodoRosenhahn·Hans-PeterSeidel Publishedonline:14July2007 ©SpringerScience+BusinessMedia,LLC2007 Abstract Interactingandannealingaretwopowerfulstrate- 1 Introduction gies that are applied in different areas of stochastic mod- ellinganddataanalysis.Interactingparticlesystemsapprox- 1.1 Motivation imateadistributionofinterestbyafinitenumberofparticles wheretheparticlesinteractbetweenthetimesteps.Incom- Many real-world applications require the estimation of an puter vision, they are commonly known as particle filters. unknownstateofasystemfromgivenobservationsateach Simulated annealing, on the other hand, is a global opti- time step. An example from signal processing is shown on mization method derived from statistical mechanics. A re- theleftinFig.1wherethesolidlinerepresentsthetruesig- centheuristicapproachtofusethesetwotechniquesformo- nalandthecrossesrepresentthemeasurements.Theclassi- tioncapturinghasbecomeknownasannealedparticlefilter. calfilteringproblemconsistsinestimatingtheunknownsig- Inordertoanalyzethesetechniques,werigorouslyderivein nal from the observed measurements under some assump- this paper two algorithms with annealing properties based tions on the signal and on the observations. In computer on the mathematical theory of interacting particle systems. vision the observations are usually image sequences cap- Convergenceresultsandsufficientparameterrestrictionsen- tured by one or more cameras, and the discrete time steps ableustopointoutlimitationsoftheannealedparticlefilter. aregivenbytheframerateofthecameras.Inhumanmotion Moreover,we evaluatetheimpactof theparameterson the capturing for example, one estimates the state parameters performanceinvariousexperiments,includingthetracking such as joint angles and position of the human body in a ofarticulatedbodiesfromnoisymeasurements.Ourresults givenimagesequence.Theestimatedstateisdisplayedbya provideageneralguidanceonsuitableparameterchoicesfor meshmodelofthebodyasdepictedinFig.2. differentapplications. During the last years, particle filters have become very popular for solving these problems. Reasons for their pop- Keywords Interactingparticlesystems·Particlefiltering· ularityincludethattheyareeasytoimplementandtheydo Annealing·Motioncapture notassumethatthesignalandobservationscanbewellap- proximatedbylinearorGaussianmodelslikeotherfilters[1, (cid:2) 16–18].Foranoverviewandnumerousapplicationswerefer J.Gall( )·B.Rosenhahn·H.-P.Seidel Max-Planck-InstituteforComputerScience,Saarbrücken, theinterestedreaderto[8]. Germany Whilethemathematicalfundamentalsincludingconver- e-mail:[email protected] gence results have been developed further by Pierre del Moral in [25, 26, 29], a number of improved particle fil- J.Potthoff Dept.M&CS,StochasticsGroup,UniversityofMannheim, ters[8]havebeenproposed.Aheuristicallyjustifiedmodi- Mannheim,Germany fication,theannealedparticlefilter (APF),wasintroduced forarticulatedbodymotiontrackingbyJonathanDeutscher C.Schnörr etal.[6].Theydemonstratedsuperiorperformanceinexper- Dept.M&CS,CVGPRGroup,UniversityofHeidelberg, Heidelberg,Germany imentsbut,inviewofthemathematicaltheory,didnotgain 2 JMathImagingVis(2007)28:1–18 Fig.1 Filteringproblem.Left: Observations(crosses)ofan unknownsignal(solidline). Right:Estimationerrorofthe signalbyagenericparticlefilter (GPF)andanannealedparticle filter(APF).Therootmean squareerrorsaregivenbythe horizontallines.TheGPF(solid line)outperformstheAPF (dash-dotline) Fig.2 Motioncapturing.Left: EstimatebyaGPFwith2750 particles.Right:Estimatebyan APFwith250particles.The APFoutperformstheGPF furtherinsightintoboundsofthequalityofestimatesorre- fittest[19],andinteractingparticleapproximations[26],de- strictions necessary for the stability of the algorithm. As a pendingontheareaofresearch. result, it is not clear if the convergence results as stated in ConvergenceresultshavebeenestablishedbyPierreDel thesurvey[3]arevalidfortheAPF.Suchresults,however, Moral using discrete Feynman–Kac models [26]. These wouldbehelpfulforfurtherimprovements,simplifyingthe Feynman–Kacmodellingtechniquesarepowerfultoolsthat parameterchoiceinapplications,andforcomparisonswith canbeappliedinvariousdomainsofresearch. alternativeapproaches. Motivated by the experiments and the need for a math- Furthermotivationtorelatethedesignofparticlefiltersto ematical discussion of the heuristics based on the particle theavailablemathematicaltheoryisprovidedbytworepre- filter, we combine these heuristics developed for computer sentative experimental results: While the APF outperforms visiontaskswithFeynman–Kacmodelsknownfromapplied abasicparticlefilter(definedinSect.2.2)inthedomainof probabilityandquantumtheory.Inthepresentpaper,were- motiontrackingasillustratedinFig.2,theAPFfallsshortof strictourselvestotwomodelswithannealingpropertiesthat theparticlefilterforafilteringproblemasshowninFig.1. arerelatedtotheannealedparticlefilter(APF)[7].Thefirst model uses a principle similar to simulated annealing [20] 1.2 RelatedWorkandContribution which is a Markov process based method for optimization. Thesecondmodelisinspiredbyannealedimportancesam- Particle filters [8] are recursive Bayesian filters that are pling [31]. It is an importance sampling method [13] that based on Monte Carlo simulations [13]. They provide a convenient approach to approximate the distribution of in- usesasequenceofdensitiesforinterpolationbetweenapro- terest. This technique is known as bootstrap filtering [12], posal density and the density of a complex target distrib- condensation [15], Monte Carlo filters [21], survival of the ution. According to these models, we derive the two algo- JMathImagingVis(2007)28:1–18 3 rithms interacting simulated annealing (ISA) and interact- 2.1 Notations ingannealingsampling(IAS). Thoughthealgorithmsareverysimilar,theyhavediffer- Let (E,τ) be a topological space, and let B(E) denote its ent mathematical properties and are therefore suitable for Borel σ-algebra. B(E), C (E) and P(E) denotethesetof b different applications, namely optimization and sampling, bounded measurable functions, bounded continuous func- respectively.Indeed,weshowthattheAPFintegratesaspe- tionsandprobabilitymeasures,respectively.δx istheDirac cialcaseofISAintoagenericparticlefilterandthattheAPF measure concentrated in x ∈ E, and (cid:3)·(cid:3)∞ is the supre- convergesundersomeassumptionstotheglobalmaximum mum norm. Let f ∈ B(E), μ ∈ P(E), an(cid:2)d let K be a foreachtimestepasthenumberofiterationsgoestoinfin- Markov kern(cid:2)el on E. We write (cid:5)μ,f(cid:6) = Ef(x)μ(dx), ity.SinceISAcannotbeusedforsamplingfromaposterior (cid:5)(cid:2)K,f(cid:6)(x)= Ef(y)K(x,dy) for x∈E and (cid:5)μ,K(cid:6)(B)= distributionincontrasttoISA,theAPF isapplicabletoop- K(x,B)μ(dx) for B ∈B(E). The Dobrushin contrac- E timization problems but not to filtering problems, while a tioncoefficient[10]isdefinedby genericparticlefiltersolvesfilteringproblemsbutnotopti- β(K):= sup sup |K(x ,B)−K(x ,B)|. mizationproblems.Thisnovelresulthasasignificantimpact 1 2 x1,x2∈EB∈B(E) onapplicationsasmotiontrackingasweshowinourexper- iments.Werevealthatregardingvisualtrackingasanopti- Note that β(K) ∈ [0,1], and β(K K ) ≤ β(K )β(K ). 1 2 1 2 mization problem that is solved by ISA gives better results A family of transition kernels (Kt)t∈N0 is said to satisfy than regarding it as a filtering problem and solving it with the Feller property [33] if (cid:5)K ,f(cid:6)∈C (E) for all t and t b a generic particle filter. Moreover, our detailed experimen- f ∈C (E). b talcomparisonsprovideinformationfor suitableparameter settingsformotioncapturingandshowtherobustnessinthe 2.2 Definition presenceofnoise. Let X=(Xt)t∈N0 bean Rd-valuedMarkovprocess,called 1.3 Outline signalprocess,withafamilyoftransitionkernels(Kt)t∈N0 satisfyingtheFellerpropertyandinitialdistributionη .Let 0 We begin with the fundamentals of particle filters and dis- Y =(Yt)t∈N0 beanRm-valuedstochasticprocess,calledob- servationprocess,definedas cussconvergenceresultsundervariousassumptionsaswell astheirimpactonapplications,particularlyonmotioncap- Y =h (X )+W fort>0, Y =0, t t t t 0 turing. Section 3 reveals the coherence between Feynman– Kacmodelsandtheannealedparticlefilterandexplainsthe where for each t ∈ N, h : Rd → Rm is a continuous t resultsshowninFigs.1and2. function, (W ,t ∈N) are independent m-dimensional ran- t Specifically,theflowsoftheFeynman–Kacdistributions dom vectors and their distributions possess densities g ∈ t and a particle approximation of these flows by the inter- C (Rm),t∈N.Thefilteringproblemconsistsincomputing b acting annealing algorithm are given in Sect. 3.1. We state theconditionaldistribution convergencepropertiesofthealgorithminSect.3.2.While Sect.3.3presentsaninteractingversionofsimulatedanneal- ηt(B):=P(Xt ∈B|Yt,...,Y0), (2.1) ingthatconvergestotheregionsofglobalminima,aninter- for all B ∈ B(Rd) or, alternatively, (cid:5)η ,ϕ(cid:6) = E[ϕ(X ) | actingversionofannealedimportancesamplingisderivedin t t Y ,...,Y ]forallϕ∈B(Rd). Sect. 3.4. We validate the conclusions from the theory and t 0 assesstheirimpactonapplicationsinSect.4usingatrack- Algorithm1 Genericparticlefilter ingandafilteringexample,respectively.Weconcludewith adiscussionandindicatefurtherworkinSect.5. Requires: number of particles n, η0, (Kt)t∈N0, (gt)t∈N, (ht)t∈N,andobservations(yt)t∈N 1. Initialization 2 ParticleFilter • Samplex(i) fromη ∀i 0 0 2. Prediction In this section, we introduce a basic particle filter for solv- ingafilteringproblemasdescribedin[8],Chap.2and[4]. • Samplex¯t(+i)1 fromKt(xt(i),·)∀i Furthermore,wediscussitsmathematicalpropertiesandex- 3. Updating plain the poor performance in the human motion capturing experiment,seeFig.2. • Setπt(+i)1←gt+1(yt+1−ht+1(x¯t(+i)1))∀i 4 JMathImagingVis(2007)28:1–18 • Setπt(+i)1← (cid:3)njπ=t(1+i)π1t(+j)1 ∀i seqIuteinscceleoafrrathnadtomeveprryobwabeiiglihtytemdepaasrutircelsebsyet determines a 4. Resampling (cid:4)n • Set x(i) ←x¯(j) with probability π(j) ∀i and go to (cid:9)(i)δX(i) forn∈N. t+1 t+1 t+1 i=1 step2 Theideanowistoapproximatetheconditionaldistribu- The generic particle filter (GPF) is a commonly used tion η (2.1) by the distribution of an appropriate weighted t particlefilterforthesolutionofthefilteringproblem,which particle set. We note that each step of the generic particle providesabasisforfurtherdevelopmentsandmodifications filterdefinesaparticlesetandconsequentlyarandomprob- for other applications. The algorithm consists of the four abilitymeasure: steps “Initialization”, “Prediction”, “Updating” and “Re- (cid:4)n (cid:4)n sampling”. During the initialization, we sample n times 1 from the initial distribution η0. By saying that we sample ηˆtn:= n i=1δX¯t(i); η¯tn:=i=1(cid:9)(ti)δX¯t(i); x(i)fromadistributionμ,wemeanthatwesimulateninde- (cid:4)n pendentrandomsamples,alsonamedparticles,accordingto 1 ηn:= δ . μ.Hence,thenrandomvariables(X(i))areindependentand t n Xt(i) 0 i=1 identically distributed (i.i.d.) according to η . Afterwards, 0 the values of the particles are predicted for the next time With this notation, the algorithm is illustrated by the three stepaccordingtothedynamicsofthesignalprocess.During separatesteps the“Updating”step,eachpredictedparticleisweightedby the likelihood function gt(yt −ht(·)), which is determined ηtn−−Pr−ed−i−ct−io→n ηˆtn+1−−U−p−da−ti−n→g η¯tn+1−R−e−sa−m−p−li→ng ηtn+1. (2.2) bytheobservationy .Theresamplingisdonebydrawingn t times with replacement from the set (x¯t(+j)1)j=1,...,n accord- 2.3 Convergence (j) ingtotheprobabilitiesπ . t+1 Fortheparticlefilteralsoother“Resampling”stepsthan Theproofofthefollowingconvergenceresultcanbefound theonedescribedinAlgorithm1havebeenemployed,e.g. in[24]. branching procedures [4, 5, 8]. A detailed discussion can be found in [26], Chap. 11.8. The particle system is also Theorem2.2 Forallt ∈N ,thereexistsc independentof 0 t calledinteractingparticlesystem[25]sincetheparticlesare nsuchthat (obviously)notindependentafterresampling. For the case of a one-dimensional signal process, the E(cid:5)((cid:5)ηn,ϕ(cid:6)−(cid:5)η ,ϕ(cid:6))2(cid:6)≤c (cid:3)ϕ(cid:3)2∞ ∀ϕ∈B(Rd). (2.3) operation of the algorithm is illustrated in Fig. 3, where t t t n the grey circles represent the unweighted particles after Inequality(2.3)showsthattherateofconvergenceofthe the “Prediction” step and the black circles represent the meansquareerrorisoforder1/n.However,c dependson weightedparticlesafterthe“Updating”step.Whilethehor- t t and, without any additional assumption, c actually in- izontalpositionsoftheparticlesindicatetheirvaluesinthe t creases over time. This is not very satisfactory in applica- statespaceofthesignalprocess,thediametersoftheblack tions as this implies that one needs an increasingly larger circlesindicatetheparticleweights,thatisthelargerthedi- number of particles as time t increases to ensure a given ameter the greater the weight. As illustrated, the particles precision. We will state below a recent convergence result with large weight generate more offsprings than particles (Theorem2.6)whichisuniformintimeunderadditionalas- with lower weight during the “Resampling” step. In order sumptions on the filtering problem. The idea of preventing todiscussthemathematicalpropertiesofthealgorithm,we an increasing error is to ensure that any error is forgotten usethefollowingnotions(cf.also[22]). fastenough.Forthispurpose,wedefineaso-calledmixing conditioninaccordancewith[11]and[28]. Definition 2.1 A weighted particle is a pair (x,π) where x ∈ Rd and π ∈ [0,1]. A weighted particle set S is a Definition2.3 AkernelonEiscalledmixingifthereexists sequence of finite sets of random variables whose val- aconstant0<ε≤1andameasureμonE suchthat ues are weighted particles: the nth member of the se- quenceisasetof n rando(cid:3)mvariables S(n)={(X(1),(cid:9)(1)), 1 ...,(X(n),(cid:9)(n))},where n (cid:9)(n)=1. εμ(B)≤K(x,B)≤ μ(B) ∀x∈E,B∈B(E). (2.4) i=1 i ε JMathImagingVis(2007)28:1–18 5 Fig.3 Operationofthegeneric particlefilter This strong assumption means that the measure K(x,·) independentofnsuchthat depends only “weakly” on x. It can typically only be es- tablishedwhen E⊂Rd isaboundedsubset—which,how- E(cid:5)((cid:5)ηn,ϕ(cid:6)−(cid:5)η ,ϕ(cid:6))2(cid:6)≤c(ε)(cid:3)ϕ(cid:3)2∞ t t n ever,isthecaseinmanyapplications.Wegivetwoexamples wherethekernelsarenotmixing. ∀t∈N , ϕ∈B(Rd). 0 Example 2.4 Let E = {a,b} and K(x,B) := δ (B). As- Thismeansthataslongas themixingcondition(2.4) is x sume that K is mixing. From inequality (2.4) we get the satisfied there exists an upper bound of the error that is in- dependentofthetimeparameter.Hence,thenumberofpar- followingcontradiction ticles,thatensuresagivenprecisioninanapplication,does K(a,{b})=δ ({b})=0⇒μ({b})=0, not increase over time. An example that demonstrates the a impact of the condition is given in Sect. 4.2. The mixing K(b,{b})=δ ({b})=1⇒μ({b})>0. b condition can furthermore be relaxed such that the density dK(x,·)/dμisnotμ-almostsurelygreaterthanorequalto Example2.5 LetE=Rand ε>0butmayvanishonapartofthestatespace,asshown (cid:7) (cid:8) (cid:9) in[2]. 1 −(x−y)2 K(x,B):= √ exp dy. Itisimportanttonotethattheresultsaboveareonlyvalid 2π B 2 whenthesignalandtheobservationprocessareknownand satisfy the assumptions stated at the beginning. Since this Suppose there exists an ε >0 and a measure μ such that israrelythecaseforapplications,goodapproximationsare the inequality (2.4) is satisfied. Note that for all x ∈R and needed.Inapplicationslikemotioncapture,itisverydiffi- all intervals I =[a,b], a <b, we have K(x,I)>0. Our culttomodelthenoiseoftheobservationprocessinanap- assumptionentailsthatμ(I)>0.Butthenεμ(I)<K(x,I) propriatewaywhereasaweightfunctiong ,whichmeasures t cannotholdforallx∈R,sinceK(x,I)→0as|x|→+∞. the“quality”ofaparticlebasedonsomeimagefeatures,can beeasilydesignedsuchthatthemaximumisattainedforthe The uniform convergence of the generic particle filter truevalueof thesignal.In this caseparticlefiltersperform with respect to the time parameter was first proved by Del poorlyaswewillshowinSect.4andasillustratedinFig.2. Moral and Miclo [29] assuming that the mixing condition for(Kt)t∈N0 issatisfied.LeGlandandOudjane[11]showed also the uniform convergence (Theorem 2.6) by using the 3 InteractionandAnnealing mixingconditionforthefamilyofrandomkernels Beforewegointodetail,wesketchtheideaofannealing.As (cid:7) seen in the top left image of Fig. 4, it may happen that the Rt(x,B):= gt+1(Yt+1−ht+1(y))Kt(x,dy). predicted particles differ significantly from the “true” state B resulting in a poor estimate for the signal. This could be Theorem 2.6 If the family of random kernels (Rt)t∈N0 is caused by a rare event in the context of the filtering prob- mixing with ε ≥ε >0, then there exists a constant c(ε) lem or by a fast movement of the observed object in the t 6 JMathImagingVis(2007)28:1–18 Fig.4 Left:withoutan annealingeffect,theparticles getstuckinthelocalmaximum. Right:theannealingeffect ensuresthattheparticlesescape fromthelocalmaximum context of tracking. In order to obtain a better estimate in not necessarily given by a signal and observation process this situation, the idea is to move the particles towards the and we regard t as an iteration parameter and not anymore globalmaximumoftheweightfunction.Oneapproachisto asatimeparameter. repeattheprocedureforeachobservationorforeachframe, thatmeanstolettheparticlesundergodiffusion,toattribute 3.1 Feynman–KacModel weightstotheparticles,andtoresampleseveraltimesbefore the next time step. However as seen on the left hand side of Fig. 4, the particles might get stuck near a local maxi- Let (Xt)t∈N0 be an E-valued Markov process with family mum.Toavoidthismisbehavior,theparticlesarepreviously oftransitionkernels(Kt)t∈N0 andinitialdistributionη0.We weighted by smoothed versions of the weighting function, denote by Pη0 the distribution of the Markov process, i.e., wheretheinfluenceofthelocalmaximaisreducedfirstbut fort∈N0, increases gradually. This approach helps to overcome the problem with the local maxima, as depicted on the right Pη0(dx0×dx1×···×dxt) hand side of Fig. 4. In the following sections, we discuss =Kt−1(xt−1,dxt)···K0(x0,dx1)η0(dx0), Feynman–Kacmodelswithannealingpropertiesandreveal relationstotheannealedparticlefilter[7]thatalsorelieson andbyE [·]theexpectationwithrespecttoP .Moreover, thisannealingeffect.Notethatfromnowon,wedonotre- η0 η0 strictourselvestoafilteringproblemasintheprevioussec- let(gt)t∈N0 beafamilyofnonnegative,boundedmeasurable functionssuchthat tion but consider also an optimization problem. Moreover, (cid:10) (cid:12) the two algorithms for optimization and sampling, which (cid:11)t areintroducedinSects.3.3and3.4,applyonlyforonetime E g (X ) >0 ∀t∈N . η0 s s 0 step. Therefore, we use more general terms. Kt and gt are s=0 JMathImagingVis(2007)28:1–18 7 Definition3.1 Thesequenceofdistributions(ηt)t∈N0 onE Example 3.3 We continue Example 3.2. The selection ker- definedforanyϕ∈B(E)as nelbecomes (cid:10) (cid:12) (cid:5)η ,ϕ(cid:6):= (cid:5)γt,ϕ(cid:6), (cid:5)γ ,ϕ(cid:6):=E ϕ(X )(cid:11)t−1g (X ) St,ηt(xt,dyt)=(cid:13)texp(−βtVt(xt))δxt(dyt) t (cid:5)γt,1(cid:6) t η0 t s=0 s s +(1−(cid:13)texp(−βtVt(xt)))(cid:12)t(ηt)(dyt), ∀t∈N , (3.1) 0 where (cid:5) (cid:15) (cid:3) (cid:16)(cid:6) is called the Feynman–Kac model associated with the pair E exp − t−1β V(X ) (cid:12) (η )(dy )= η0(cid:5) (cid:15) (cid:3)s=0 s s (cid:16)(cid:6) (gt,Kt). t t t E exp − t β V(X ) η0 s=0 s s ×exp(−β V (y ))η (dy ). Example 3.2 Since we regard only models with anneal- t t t t t ing properties, the functions (gt)t∈N0 are unnormalized Algorithm2 Interactingannealingalgorithm Boltzmann–Gibbsmeasuresg (x)=exp(−β V (x)).Insta- t t t tisticalmechanics,V ≥0isinterpretedasenergyandβ ≥0 Requires:parameters((cid:13)t)t∈N0,numberofparticlesn,initial t as inverse temperature. These measures are used for simu- distributionη0,weightingfunctions(gt)t∈N0 andtransitions lated annealing [20] to obtain the global minimum of V, (Kt)t∈N0 whereβ isslowlyincreasedwithrespecttot.Equation(3.1) 1. Initialization t thenbecomes • Samplex(i) fromη ∀i (cid:10) (cid:13) (cid:14)(cid:12) 0 0 (cid:4)t−1 2. Selection (cid:5)γ ,ϕ(cid:6):=E ϕ(X )exp − β V(X ) . t η0 t s s s=0 • Setπt(i)←gt(xt(i))∀i • Fori from1ton: It is straightforward to check that the Feynman–Kac Sampleκ fromU[0,1] modelasdefinedabovesatisfiestherecursionrelation Ifκ≤(cid:13) π(i) then t t (cid:15) Setxˇ(i)←x(i) ηt+1=(cid:5)(cid:12)t(ηt),Kt(cid:6), (3.2) Else t t (cid:15) Setxˇ(i)←x(j) withprobability (cid:3)πt(j) wheretheBoltzmann–Gibbstransformation(cid:12)t isdefinedby t t nk=1πt(k) 3. Mutation 1 (cid:12) (η )(dx ):= g (x )η (dx ). (3.3) t t t (cid:5)η ,g (cid:6) t t t t • Samplex(i) fromK (xˇ(i),·)∀i andgotostep2 t t t+1 t t The particle approximation of the flow (3.2) depends on a The interacting annealing algorithm (IAA) describes chosenfamilyofMarkovtransitionkernels(Kt,ηt)t∈N0 sat- theapproximationbyaparticlesetusing(3.4).Theparticle isfyingthecompatibilitycondition (i) system is initialized by n i.i.d. random variables X with 0 (cid:5)(cid:12)tt(ηt),Kt(cid:6)=(cid:5)ηt,Kt,ηt(cid:6). csuormemη0non:=la(cid:3)wniη=01δdXet(ie)r/mni.nSiningcethKetr,ηatndcaonmbperroebgaabrdileitdyamstehae- 0 compositionofapairofselectionandmutationMarkovker- Thefamily(Kt,ηt)t∈N0 ofkernelsisnotuniquelydetermined nels,wesplitthetransitionsintothefollowingtwosteps bytheseconditions.Forexample,wecanchoose,asin[26], Sect.2.5.3, ηn−−S−el−ec−ti−o→n ηˇn−−M−u−ta−ti−o→n ηn , (3.6) t t t+1 K =S K , (3.4) t,ηt t,ηt t where where (cid:4)n (cid:4)n 1 1 St,ηt(xt,dyt)=(cid:13)tgt(xt)δxt(dyt) ηtn(ω):= n i=1δXt(i)(ω), ηˇtn(ω):= n i=1δXˇt(i)(ω). +(1−(cid:13) g (x ))(cid:12) (η )(dy ), (3.5) t t t t t t (i) During the selection step each particle X evolves ac- t with(cid:13)t ≥0and(cid:13)t(cid:3)gt(cid:3)∞≤1.Itisinterestingtoremarkthat cording to the Markov transition kernel St,ηtn(Xt(i),·). That (i) (i) theparameters(cid:13)t areallowedtodependonthecurrentdis- means Xt isacceptedwithprobability (cid:13)tgt(Xt ),andwe tributionη . set Xˇ(i)=X(i). Otherwise, Xˇ(i) is randomly selected with t t t t 8 JMathImagingVis(2007)28:1–18 distribution Example 3.6 If M(x,·) is independent of x, e.g., if it is equaltotheLebesguemeasureλ,wehaveβ(M)=0. (cid:4)n (i) g (X ) (cid:3) t t δ . i=1 nj=1gt(Xt(j)) Xt(i) Example 3.7 Suppose that M := KsKs+1···Kt, where (Kk)s≤k≤t are Markov kernels and s ≤t. Furthermore, we The mutation step consists in letting each selected parti- assume that there exists for all s ≤k≤t some ε ∈(0,1) ˇ(i) k cle Xt evolve according to the Markov transition kernel satisfyingforallx ,x ∈E K (Xˇ(i),·). 1 2 t t Algorithm 2 approximates generally any Feynman–Kac K (x ,·)≥ε K (x ,·), (3.8) k 1 k k 2 model associated with a pair (g ,K ). However, we regard t t only models with annealing properties where the IAA ap- i.e. the mixing condition (2.4). Let x1,x2 ∈ E and B ∈ proximatesaflowthatissuitableforglobaloptimizationor B(E). Then w(cid:17)e get |Kk(x1,B)−Kk(x2,B)|≤1−εk and for sampling depending on the choice of (g ,K ) given in thusβ(M)≤ t (1−ε ). t t k=s k Sects.3.3and3.4. Notethattherighthandsideof(3.7)isminimizedifwe 3.2 Convergence are able to choose Markov kernels K such that β(M ) is s s small. However, if we compare the examples, we see that In this section the asymptotic behavior of the particle ap- this can be interpreted as if we do not “trust the particles”. proximation model determined by the IAA is studied. Del Inpractice,itwouldbepreferabletoselecttheMarkovker- Moralestablishedthefollowingconvergencetheorem([26], nelsbymeansofthe“quality”oftheparticlesintheprevious Theorem7.4.4). step.Oneapproachistoselectkernelsthatdependonaset ofparameters,forexampleGaussiankernelswiththeentries Theorem3.4 Foranyϕ∈B(E), ofthecovariancematrixasparameters.Thevaluesofthepa- (cid:13) (cid:14) (cid:4)t rametersarethendeterminedautomaticallybytheparticles, 2osc(ϕ) Eη0[|(cid:5)ηtn+1,ϕ(cid:6)−(cid:5)ηt+1,ϕ(cid:6)|]≤ √n 1+ rsβ(Ms) , e.g.,thevarianceissetproportionaltothesamplingvariance s=0 oftheparticles.Thiscanberealizedbyadynamicvariance schemeaswewillexplaininSect.3.3. where It is worth to mention two special cases of the selec- (cid:8)(cid:17) (cid:9) t g (x) tion kernel (3.5) that defines the resampling procedure in rs :=xs,yu∈pE (cid:17)rtr==ssgrr(y) , the interacting annealing algorithm. If (cid:13)t =0 for all t, we get the resampling step of the generic particle filter. The Ms :=KsKs+1···Kt, second special case occurs when we set the parameters (cid:13) (η ):=(cid:13)(cid:16)/(cid:5)η ,g (cid:6),where0<(cid:13)(cid:16)≤1/gand for 0 ≤ s ≤ t. Moreover, osc(ϕ) := sup{|ϕ(x) − ϕ(y)|; t t t t t t x,y∈E}. (cid:8) (cid:8) (cid:9)(cid:9) g (x) g:= sup sup t <∞, (3.9) Thistheoremgivesusaroughestimateforthenumberof t∈N0 x,y∈E gt(y) particles asproposedin[27].Theselectionkernelbecomes (cid:13) (cid:14) 4osc(ϕ)2 (cid:4)t 2 g (x ) (cid:8) g (x ) (cid:9) n≥ δ2 1+ rsβ(Ms) (3.7) St,ηt(xt,·)=(cid:13)t(cid:16)(cid:5)ηt,gt (cid:6)δxt+ 1−(cid:13)t(cid:16)(cid:5)ηt,gt (cid:6) (cid:12)t(ηt). (3.10) s=0 t t t t needed to achieve a mean error less than a given δ>0. In Note that the necessary condition (cid:3)((cid:13)t(cid:16) gt)/(cid:5)ηt,gt(cid:6)(cid:3)∞ ≤1 ordertoevaluatetherighthandside,wemustcalculatethe is satisfied since gt/(cid:5)ηt,gt(cid:6)≤g. If we set the number of DobrushincontractioncoefficientoftheMarkovkernel M. particles n≥g, then we can choose (cid:13)(cid:16) =1/n. For some t The coefficient lies in the range 0 to 1, and the more the rando(cid:3)mvariables Xt(i) and therandomprobabilitymeasure probabilitymeasureM(x,·)“depends”onx∈E thehigher ηn= n δ /n,wethushave t j=1 X(i) thecoefficientwillbe.Wewillillustratethispropertyinthe t followingthreeexampleswherewealwaysassumethatE= (i) (i) g (X ) g (X ) [0,1]. (cid:13)t(cid:16) (cid:5)ηttn,gtt(cid:6) = (cid:3)nj=t1gtt(Xt(j)). Example3.5 IfM(x,·):=δ andx ,x ∈E withx (cid:15)=x , x 1 2 1 2 (i) tβh(eMn)w=e1g.et supB∈B(E)|δx1(B)−δx2(B)|=1. This yields Tplhaicsemdebaynπst(tih)a/t(cid:3)thnke=e1xπpt(rke)s.sion (cid:13)tπt inAlgorithm2isre- JMathImagingVis(2007)28:1–18 9 Pierre del Moral showed in [26], Sect. 9.4 that for any withrespecttot isslow,sinceittakesmanyiterationsuntil t∈N0 andϕ∈B(E)thesequenceofrandomvariables n is increased. Hence, kernels with ε close to 1 are prefer- √ able,e.g.Gaussiankernelsonaboundedsetwithaveryhigh n((cid:5)ηn,ϕ(cid:6)−(cid:5)η ,ϕ(cid:6)) t t varianceasdiscussedinSect.2.3.However,wecannotsam- plefromthemeasureηˇ directly,insteadweapproximateit converges in law to a Gaussian random variable W when t by n particles. Now the following problem arises. On one theselectionkernelin(3.5)isusedtoapproximatetheflow (3.2). It turns out that when we use (cid:13)(cid:16) =1/n, the variance hand the mass of the measure concentrates on a small re- t of W is strictly smaller than in the case with (cid:13) =0. This gion of E, and on the other hand the particles are spread t seems to indicate that it is preferable to use the selection over E if ε is large. As a result we get a degenerated sys- kernel(3.10). temwheretheweightsofmostoftheparticlesarezeroand thus the global minima are estimated inaccurately, particu- 3.3 InteractingSimulatedAnnealingAlgorithm larlyforsmalln.Ifwechooseakernelwithsmallεincon- trast,theconvergencerateisveryslow.Sinceneitherofthem In the preceding section, we discussed how a Feynman– is suitable for applications, we suggest a dynamic variance Kac model associated with a pair (gt,Kt) can be approxi- schemeinsteadofafixedkernelK asalreadymentionedin matedbythe IAA withoutgivingdetailson g and on K . t t Sect.3.2. However,wealreadyintroducedunnormalizedBoltzmann– LetK beafamilyofGaussiankernelsonaboundedset t Gibbsmeasuresexp(−β V)inExamples3.2and3.3.Inthe t E with covariance matrices (cid:18) proportional to the sample t following we outline an interacting algorithm that can be covariance after the “Resampling” step. That is, for a con- regarded as an interacting simulated annealing algorithm stantc>0, (ISA). WesupposethatKisaMarkovkernelsatisfyingthemix- (cid:4)n ingcondition(3.8)foranε∈(0,1)andosc(V)<∞.Atime (cid:18) := c (x(i)−μ ) (x(i)−μ )T, t n−1 t t ρ t t ρ meshisdefinedby i=1 (3.12) t(n):=n(1+(cid:17)c(ε)(cid:18)), μ := 1(cid:4)n x(i), (3.11) t n t c(ε):=(1−ln(ε/2))/ε2 forn∈N . i=1 0 Let 0≤β ≤β ≤··· be an annealing scheme such that where ((x) ) =max(x ,ρ) for a ρ >0. The value ρ en- 0 1 ρ k k βt =βt(n+1) isconstantintheinterval(t(n),t(n+1)].Fur- suresthatthevariancedoesnotbecomezero.Theelements thermore, we denote by ηˇt the Feynman–Kac distribution off the diagonal are usually set to zero, in order to reduce aftertheselectionstep,i.e.ηˇt =(cid:12)t(ηt).Accordingto[26], computationtime. Proposition6.3.2,cf.also[30],wehave WeremarkthattheAPFisaparticlefilterwherethe“Up- dating” and “Resampling” steps are replaced by the inter- Theorem 3.8 Let b∈(0,1) and βt(n+1) =(n+1)b. Then actingsimulatedannealingalgorithmwith(cid:13) =0.Thealgo- t foreachδ>0 rithmisillustratedsimilarlyasin(2.2)by lim ηˇ ({x∈E; V(x)≥V +δ})=0, n→∞ t(n) (cid:15) ηn−−Pr−ed−i−ct−io→n ηˆn −I−S→A ηn . (3.13) t t+1 t+1 whereV =sup{v≥0; V ≥va.e.}. (cid:15) ˆ(i) The ISA is initialized by the predicted particles X and Therateofconvergenceis d/n(1−b) where d isincreas- t+1 performs M times the selection and mutation steps. After- ingwithrespectto b and c(ε) butdoesnotdependon n as (i) wards the particles X are obtained by an additional se- givenin[26],Theorem6.3.1.Thistheoremestablishesthat t+1 lection. This shows that the annealed particle filter uses a theflowoftheFeynman–Kacdistributionηˇ becomescon- t centrated in the region of global minima as t →+∞. The simulatedannealingprincipletolocatetheglobalminimum of a function V at each time step. Hence, it is suitable for flowcanbeapproximatedbytheinteractingannealingalgo- rithmwithg =exp(−β V)andK =K. applications like motion capturing as illustrated in Fig. 2 t t t The mixing condition is not only essential for the con- and demonstrated in Sect. 4. However, it also reveals that vergenceresultbutalsoinfluencesthetimemeshbythepa- theconditionaldistribution(2.1)isnolongerapproximated rameter ε. When ε is small, c(ε) becomes large and thus in the way of the generic particle filter, and therefore the theintervalsofthetimemesh,whereηˇ andnareconstant, arguments in Sect. 2 cannot be applied without modifica- t arelargeaccordingto(3.11).Itentailsthattheconvergence tions.Inthenextsection,wepresentamodelthatapproxi- 10 JMathImagingVis(2007)28:1–18 matesagivendistributionbytheinteractingannealingalgo- Proof Letϕ∈B(E).From(3.15)and(3.16)weobtain rithm. (cid:18) (cid:19) (cid:11)t−1 E ϕ(X ) g (X ) μ0 t s s s=0 3.4 InteractingAnnealedSamplingAlgorithm (cid:13) (cid:14) (cid:7) (cid:7) (cid:7) (cid:11)t−1 = ··· ϕ(xt) Ks(xs,dxs+1)gs(xs) μ0(dy0) Ourmethodcanberegardedasanannealedimportancesam- E E E s=0 (cid:13) (cid:14) (cid:7) (cid:7) pling [31] for particles. In contrast to annealed importance (cid:11)t−1 sampling for a single particle, it allows additional interac- = ··· ϕ(xt) Ks(xs,dxs+1)gs(xs) tion of the particles during the steps and can be integrated E E s=1 (cid:7) inaninteractingparticlesystem.Theapproachismotivated × K (x ,dx )μ (dx ) 0 0 1 1 0 by [8], Chap. 7 where the combination of annealed impor- E (cid:13) (cid:14) tance sampling with the generic particle filter is suggested. (cid:7) (cid:7) (cid:11)t−1 LetusconsiderafinitesequenceofBoltzmann–Gibbsmea- = ··· ϕ(xt) Ks(xs,dxs+1)gs(xs) μ1(dx1) sures E E s=1 . . μ (dx):= 1 exp(−β V(x))μ (dx) (3.14) . (cid:7) t (cid:5)μ0,exp(−βtV)(cid:6) t 0 = ϕ(xt)μt(dxt). (cid:2) E according to some schedule 0=β0 <β1 <···<βT−1 < Note that the constant term of g in (3.15) is unimpor- β =1,whereμ ∈P(E).Incontrasttosimulatedanneal- t T 0 tant for the algorithm since it is compensated by ε of the t ing and ISA that converge to the global minima of V, an- selectionkernel(3.5)andbythenormalizationfactorofthe nealed importance sampling approximates the distribution Boltzmann–Gibbstransformation(3.3).Theresultinginter- μT. actingalgorithmcanberegardedasaninteractingannealing We use a Feynman–Kac model associated with a pair samplingalgorithm(ISA)thatconvergestoμ accordingto T (gt,Kt) as introduced in Sect. 3.1 to describe the mathe- Theorem3.4. maticalframework.Forany0≤t<T,wedefine Inthecontextoffilteringμ0 isthepredictedconditional distribution,exp(−V) isthelikelihoodfunction,and μ is T (cid:5)μ ,exp(−β V)(cid:6) theposteriordistributionapproximatedbytheweightedpar- g (x ):= 0 t t t (cid:5)μ0,exp(−βt+1V)(cid:6) ticleset.Hence,itwouldbedesirabletocombine ISA with thegenericparticlefilterassuggestedin[8],Chap.7.How- ×exp(−(βt+1−βt)V(xt)). (3.15) ever, we must pay attention to the crucial assumption that the transitions Kt leave the measures μt+1 invariant. This The Markov kernels (Kt)0≤t<T are chosen such that Kt meansthatthetransitionsdependonμ0 andthusontheun- leavesthemeasureμt+1 invariant,i.e., known signal. On account of this limitation of the ISA, we believe that the ISA is more relevant for applications, par- (cid:7) ticularlyformotioncapturing.Wewillthereforerestrictthe μt+1(B)= Kt(xt,B)μt+1(dxt), (3.16) evaluationinSect.4totheISA. E Another important consequence of this result is that the for all B ∈ B(E). Metropolis–Hastings updates [14, 23, annealedparticlefilterdoesnotapproximatetheconditional 32], for instance, are suitable choices for Markov transi- distribution (2.1), since it diffuses the particles by kernels thatdonotsatisfy(3.16).Hence,theAPFisnotsuitablefor tions that leave a measure invariant. The following lemma filteringproblemsasshowninFig.1. reveals that (μt)0≤t≤T are the Feynman–Kac distributions associatedwiththepair(g ,K ). t t 4 Evaluation Lemma3.9 Forany0≤t≤T,wehave In Sect. 3.3, we observed that the APF uses ISA for each (cid:18) (cid:11)t−1 (cid:19) timestepandthusperformswellformotioncapturing.For E ϕ(X ) g (X ) =(cid:5)μ ,ϕ(cid:6) ∀ϕ∈B(E). an exhaustive experimental evaluation, we track an articu- μ0 t s s t s=0 latedarmwithlessDOFthanintheexamplegiveninFig.2.

Description:

tured by one or more cameras, and the discrete time steps are given by the . 2 and [4]. Furthermore, we discuss its mathematical properties and ex-.

Interacting and Annealing Particle Filters PDF

18 Pages·2007·1.11 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Interacting and Annealing Particle Filters

Description:

tured by one or more cameras, and the discrete time steps are given by the . 2 and [4]. Furthermore, we discuss its mathematical properties and ex-.

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.