Analysis of stochastic approximation schemes with set-valued maps in the absence of a stability guarantee and their stabilization Vinayaka G. Yaji and Shalabh Bhatnagar, Department of Computer Science and Automation, Indian Institute of Science, Bangalore. [email protected], [email protected] 7 January 27, 2017 1 0 2 n Abstract a Inthispaper,weanalyzethebehaviorofstochasticapproximationschemeswithset-valuedmaps J in the absence of a stability guarantee. We prove that after a large number of iterations if the 6 stochastic approximation process enters the domain of attraction of an attracting set it gets locked 2 into the attracting set with high probability. We demonstrate that the above result is an effective ] instrumentforanalyzingstochasticapproximationschemesintheabsenceofastabilityguarantee,by Y usingitobtainanalternatecriteriaforconvergenceinthepresenceofalocally attractingsetforthe S mean field andby usingit toshow that afeedback mechanism, which involvesresetting theiterates . at regular time intervals, stabilizes the scheme when the mean field possesses a globally attracting s set,therebyguaranteeing convergence. Theresults inthispaperbuild on theworks ofV.S. Borkar, c [ C. Andrieu and H. F. Chen , byallowing for thepresence of set-valued drift functions. 1 v 1 Introduction 0 9 It is well knownthat severaloptimization andcontroltasks canbe castas a rootfinding problem. That 5 7 is, given f : Rd → Rd, one needs to find x∗ ∈Rd, such that f(x∗)= 0 (given such a point exists). Due 0 to practical considerations, one usually has access to noisy measurements/estimations of the function . whose rootneeds to be determined. An approachto solving sucha problemwith noisymeasurements of 1 0 f, is given by the recursion, 7 X −X −a(n)M =a(n)f(X ), (1) n+1 n n+1 n 1 : where {Mn}n≥1, denotes the noise arisingin the measurementoff and havingfixed an initial condition v (X ∈Rd), theiterates{X } aregeneratedaccordingtorecursion(1). [1]undercertainassumptions i 0 n n≥1 X which include the Lipschitz continuity of the function f, boundedness of the iterates along almost every r sample path (that is P(supn≥0kXnk < ∞) = 1) and a condition which ensures that the eventual a contribution of the additive noise terms is negligible, showed that the linearly interpolated trajectory of recursion (1) tracks the flow of the ordinary differential equation (o.d.e.) given by, dx =f(x). (2) dt Suchatrajectoryiscalledanasymptotic pseudotrajectory fortheflowofo.d.e.(2)(foraprecisedefinition see [1]). Suppose the set of zeros of f is a globally asymptotically stable set for the flow of o.d.e. (2), then it was shown that the limit set of an asymptotic pseudotrajectory was contained in such a set and hence the iterates {X } converge in the limit to a root of the function f. n n≥0 In order to analyze recursion (1) when the function f is no longer Lipschitz continuous or even continuous, but is just measurable satisfying the linear growth property, that is for every x ∈ Rd, kf(x)k ≤ K(1+kxk) for some K > 0, or when there is a non-additive noise/control component taking values in a compact set whose law is not known (in which case the recursion(1) takes the form X − n+1 X − a(n)M = a(n)f(X ,U ), where U denotes the noise/control), the above mentioned o.d.e. n n+1 n n n 1 method needed to be extended to recursions with much weaker requirements on the function f. This was accomplished in [2], where the asymptotic behavior of the recursion given by, X −X −a(n)M ∈a(n)F(X ), (3) n+1 n n+1 n wasstudied,whereF isaset-valuedmapsatisfyingsomeconditions(whiletheotherquantitieshavesame interpretationasin(1)). Undertheassumptionofstabilityofiterates(thatisP(sup kX k<∞)=1) n≥0 n andappropriateconditionsontheadditivenoiseterms,in[2],itwasshownthatthelinearlyinterpolated trajectory of recursion (3) tracks the flow of the differential inclusion (d.i.) given by, dx ∈F(x). (4) dt Wereferthereaderto[3,Ch.5.3]foradetailedargumentastohowthemeasurablecaseandthecasewith unknown noise/control be recast in the form of recursion (3). For a brief summary of the convergence analysis of recursion (3) we refer the reader to section 3.1 of this paper. Common to the analysis of both recursion (1) and (3) is the assumption on the stability of the iterates,thatisP(sup kX k<∞)=1. Theconditionofstabilityishighlynon-trivialanddifficultto n≥0 n verify. Overthe yearssignificant efforthas gone into providing sufficient conditions for stability (see [4], [5]). In [6], it was shown that for recursion (1), in the absence of stability guarantee, the probability of convergingto an attracting set of o.d.e. (2) given that the iterates lie in a neighborhood of it converged to one as the index (n) in which the iterate entered the neighborhood of the attracting set increased to infinity. This probability of the iterates converging to an attracting set given that the iterate lies in a neighborhood of it is called the lock-in probability and in [6] a lower bound for the same was used to obtain sample complexity bounds for recursion (1). Further a tighter lower bound for the lock-in probabilitywasderivedin[7]under a slightly strongernoise assumptionandusedto obtainconvergence guaranteewhen the law of the iterates are tight. In this paper we extend the results in [6] to the case of stochastic approximationschemes with set-valued maps as in recursion (3). 1.1 Contributions and organization of the paper We first provide a lower bound for the lock-in probability of stochastic approximation schemes with set-valued maps as in recursion (3). The bound is derived under an assumption on the additive noise terms which is stronger than the corresponding in [6], which is necessitated due to the lack of Lipschitz continuity of the drift function F. We establish that, P(X →A as n→∞|X ∈O′)≥1−2de−K˜/b(n0), n n0 for n large, where, A ⊆ Rd, denotes an attracting set of DI 4, O′ is an open neighborhood of A with 0 compact closure, K˜ is some positive constant and {b(n)} is a sequence of reals converging to zero, n≥0 which are step size dependent. Having summarized the convergence analysis under stability in section 3.1, we state the lock-in probabilityboundinsection3.2andprovideafewimplicationsofthesame. Usingthelock-inprobability result we provide an alternate criteria for convergence in the presence of a locally attracting set which removesthe needto verifystability. Adetailedcomparisonbetweenthe obtainedconvergenceguarantee and the corresponding in the presence of stability is also provided. Proofofthelock-inprobabilityresultispresentedinsection5. Theproofreliesheavilyontheinsights obtained from the analysis in [6] for single-valued maps. From the analysis in [6], it is evident that the Lipschitz continuity of the drift function f plays a crucial role in obtaining events and decoupling error contributions which in turn are necessary to obtain the bound in the inequality above. But in the recursion studied in this paper (that is recursion (3)), the drift function F is set-valued and the assumptions under which we study the said recursion (which are summarized in section 2), the drift function F is not even continuous. We overcome this problem by first obtaining a sequence of locally Lipschitz continuous set-valued maps which approximate the drift function F from above and then parameterizing them using the Stiener selection procedure. The associated results are summarized in section5.1. Thisenables usto write recursion(3)inthe formofrecursion(1),butwith locallyLipschitz continuous drift functions. Further the relation between the solutions of differential inclusions with the approximating set-valued maps as their vector field and those of DI (4), is established in section 5.2. Havingwrittenrecursion(3)intheformofrecursion(1),wethencollectsamplepathsofinterestinsection 5.3. Alongthe samplepaths thatarecollectedthe iteratesaresuchthat, havingenteredaneighborhood 2 ofthe attractingsetatiterationn ,the iterates willinfinitely oftenenterthe saidneighborhoodandthe 0 time elapsed between successive visits to the neighborhood of the attracting set can be upper bounded byaconstantwhichismeanfielddependent. Furtherweshowthatthe probabilityofoccurrenceofsuch samplepathscanbelowerboundedbyerrorcontributionsdue toadditivenoisetermsaloneafteralarge number of iterations. Using the concentration inequality for martingale sequences we obtain the lock-in probability bound in section 5.4. Usingthe lock-inprobabilityresultwedesignafeedbackmechanismwhichenablesustostabilizethe stochastic approximation scheme in the presence of a globally attracting set for DI (4). The feedback mechanism involves resetting the iterates at regular time intervals if they are found to be lying outside a certain compact set. This approach to stabilization has been studied in various forms for stochastic approximation schemes with single-valued drift functions as in recursion (1), in [8], [9], [10] and [11] to name a few. We extend the same to the case of set-valued drift functions. The main idea in the analysis of such a scheme is to show that along almost every sample path of the modified recursion,the numberofresetsthatareperformedisfinite,therebyguaranteeingthateventuallytheiteratesliewithin a compact set. We observe that the lock-in probability result (to be precise the approach adopted to obtainthe lock-inprobabilityresult)playsacentralroleinshowingthatthe numberofresetsperformed remainfinite. Havingshownthattheiterateseventuallyliewithinacompactset,weusetheconvergence argumentsfrom [2] to argue that the iterates generatedby the modified scheme convergeto the globally attracting set of DI (4). The modified scheme is presented and explained in detail in section 4. The proof of the finite resets theorem is presented in section 6. The procedure employed to collect sample paths in the proof of the lock-in probability result can be used to collect sample paths where only finite numberofresetshaveoccurredinthemodifiedschemeandthis inturnenablesusshowthatthenumber of resets are finite almost surely. Finally, we conclude by providing a few directions for future work in section 7. 2 Recursion and assumptions Let (Ω,F,P) be a probability space and {X } be a sequence of Rd-valued random variables on Ω, n n≥0 such that for every n≥0, X −X −a(n)M ∈a(n)F(X ), (5) n+1 n n+1 n where, (A1) F :Rd → subsets of Rd is a set-valued map which for every x∈Rd satisfies the following: (i) F(x)(cid:8)is a convex an(cid:9)d compact subset of Rd, (ii) there exists K >0 (independent of x) such that sup kyk≤K(1+kxk), y∈F(x) (iii) foreveryRd-valuedsequence{x } convergingtoxandforeverysequence{y ∈F(x )} n n≥1 n n n≥1 converging to y ∈Rd, we have that y ∈F(x). (A2) {a(n)} is a sequence of positive real numbers satisfying, n≥0 (i) ∞ a(n)=∞, n=0 (ii) P∞ (a(n))2 <∞. n=0 (A3) {M }P is a Rd-valued, martingale difference sequence with respect to the filtration {F := n n≥1 n σ(X ,M , m≤n)}. Furthermore, {M } are such that, m m n n≥1 kM k≤K(1+kx k) a.s., n+1 n for every n≥0, for some constant K >0. Assumption (A1) ensures that the set-valued map F is a Marchaud map. The condition (A1)(ii) is calledthe linear growth property since itensuresthatthe sizeofthe sets F(x) growlinearlywithrespect to the distance fromthe origin. The condition (A1)(iii) is calledthe closed graph property since it states that the graph of the set-valued map F, defined as, (x,y)∈R2d :x∈Rd, y ∈F(x) , (cid:8) (cid:9) 3 is a closedsubset of R2d. The map F being a Marchaudmap ensures that the differential inclusion (DI) given by, dx ∈F(x), (6) dt possesses at least one solution through every initial condition. By a solution of DI (6) with initial condition x ∈Rd, we mean an absolutely continuous function x:R →Rd such that x(0)=x and for 0 0 almost every t ∈ R, dx(t) ∈ F(x(t)). DI (6) is the mean field of recursion (5) and its dynamics play an dt important role in describing the asymptotic behavior of recursion (5). Assumption (A2) states the conditions to be satisfied by the step size sequence {a(n)} . Square n≥0 summability(thatis(A2)(ii))isneededlaterintheanalysisforobtainingaprobabilityboundoncertain tail events associated with the additive noise terms {M } . n n≥1 Assumption (A3), defines the martingale noise model. These terms denote the noise arising in the measurementofF(·). Thisconditionholdsinseveralreinforcementlearningapplications(see[3,Ch.10]) Remark Clearly when {M } are i.i.d. zero mean and bounded, assumption (A3) is satisfied. Fur- n n≥1 ther,sincethedriftfunctioninrecursion(5)isaset-valuedmap,scenarioswherethemeasurementnoise terms possess a bounded bias can be recast in the form of recursion (5) as explained below. Consider the recursion given by, X −X −a(n)M −a(n)η =a(n)f(X ), n≥0, (7) n+1 n n+1 n+1 n where f : Rd → Rd is a single-valued Lipschitz continuous map, for every n ≥ 0, η denotes the bias n+1 in the measurement noise. Let the bias terms {η } be bounded by a positive constant, say ǫ > 0 n n≥1 (that is, for every n ≥ 1, kη k ≤ ǫ). Then, recursion (7) can be written in the form of recursion (5) n with set-valued map F, given by, F(x) = {f(x)+η : kηk ≤ ǫ}, for every x ∈ Rd. We refer the reader to [3, ch. 5.3] for several other variants of the standard stochastic approximation scheme which can be analyzed with the help of recursion (5). 3 Lock-in probability for stochastic recursive inclusions In order to state the main result of this paper, definition of the flow of a DI, an attracting set for such a dynamical system are needed. We recall these notions below and we state them with respect to the mean field of recursion (5) (for a detailed description and associated results see [2]). The flow of DI (6) is given by the set-valued map Φ : R×Rd → {subsets of Rd}, where for every (t,x)∈R×Rd, Φ(t,x):= x(t)∈Rd :x(·) is a solution of DI (6) with x(0)=x . (8) AcompactsetA⊂Rd isa(cid:8)nattracting set fortheflowofDI (6),ifthereexistsa(cid:9)nopenneighborhood of A, say O, with the property that for every ǫ > 0, there exists a time T > 0 (depending on ǫ and O) suchthatforeveryt≥T andforeveryx∈U,Φ(t,x)∈Nǫ(A),whereNǫ(A)denotestheǫ-neighborhood of A. Such a neighborhood O of an attracting set A is called the fundamental neighborhood of A. The set of initial conditions in Rd from which the flow is attracted to an attracting set A is called the basin of attraction and is denoted by B(A). Formally, B(A):= x∈Rd :∩ {Φ(q,x):q ≥t}⊆A . t≥0 n o An attracting set A is said to be globally attracting if, B(A)=Rd. 3.1 Summary of the asymptotic analysis under stability Let t(0) := 0 and for every n ≥ 1, t(n) := n−1a(k). The linearly interpolated trajectory of recursion k=0 (5), is given by the stochastic process X¯ :Ω×R→Rd, where for every (ω,t)∈Ω×[0,∞), P t−t(n) t(n+1)−t X¯(ω,t):= X (ω)+ X (ω), (9) n+1 n t(n+1)−t(n) t(n+1)−t(n) (cid:18) (cid:19) (cid:18) (cid:19) where n is such that t∈[t(n),t(n+1)) and for every (ω,t)∈Ω×(−∞,0), X¯(ω,t):=X (ω). 0 4 For ω ∈Ω, the limit set map of X¯ is given by, λ:Ω→{subsets of Rd} where for every ω ∈Ω, λ(ω):=∩ X¯(ω,q):q ≥t . (10) t≥0 In [2], under assumptions (A1)−(A3) along with(cid:8)the additional(cid:9)assumption of stability of the iterates (that is P(sup kX k<∞)=1), it was shown that for almost every ω ∈Ω, the linearly interpolated n≥0 n trajectory of recursion (5), X¯(ω,·), is an asymptotic pseudotrajectory for the flow of DI (6). More precisely, for almost every ω ∈Ω, X¯(ω,·) was shown to satisfy the following: (a) The family of shiftedtrajectoriesgivenby {X¯(ω,·+t)} is relativelycompactinC(R,Rd)where t≥0 C(R,Rd)denotes the metric spaceofallcontinuousfunctions onR takingvalues inRd with metric D, which for every z, z′ ∈C(R,Rd) is given by ∞ 1 D(z,z′)= min{kz−z′k ,1}, (11) 2k [−k,k] k=1 X where kz−z′k :=sup kz(t)−z′(t)k. [−k,k] t∈[−k,k] (b) Every limit point of the shifted trajectories {X¯(ω,·+t)} is a solution of the DI (6). t≥0 From [2, Thm. 4.3], it follows that for almost every ω ∈ Ω, the limit set of the linearly interpolated trajectory X¯(ω,·), λ(ω), is a non-empty, compact and an internally chain transitive (ICT) set for the flow of DI (6) (see [2, Defn. VI] for definition of an ICT set). Now using [2, Thm. 3.23] the main convergence result of [2] follows and is stated below. Theorem 3.1 Let A⊆Rd be an attracting set for the flow of DI (6). Under assumptions (A1)−(A3), (a) for almost every ω ∈{ω ∈Ω:sup kX (ω)k<∞}∩{ω ∈Ω:λ(ω)∩B(A)6=∅}, λ(ω)⊆A and n≥0 n therefore as n→∞, X (ω)→A. n (b) if B(A) = Rd (that is A is a globally attracting set), then for almost every ω ∈ {ω ∈ Ω : sup kX (ω)k<∞}, λ(ω)⊆A and therefore as n→∞, X (ω)→A. n≥0 n n The assumption of stability of the iterates used to obtain the above convergence result is highly non-trivial and difficult to verify. Moreover the proof method used to prove the above convergence result cannot be modified in a straightforward manner to obtain a similar convergence guarantee. This warrantsanalternateapproachtostudythebehaviorofrecursion(5)intheabsenceofstabilityguarantee andwe accomplishthis by extending the lock-inprobabilityresult from[6] to the set-valuedcase. Using the obtained lock-in probability bound we recover convergence guarantee similar to Theorem 3.1 while eliminating the need to verify stability. 3.2 Main result and its implications Before we state the main result, we state an assumption which fixes the attracting set of interest. (A4) Let A ⊆ Rd, be an attracting set of DI (6) (the mean field of recursion (5)) with O ⊆ Rd as its fundamental neighborhood of attraction. Let O′ be an open neighborhood of the attracting set A (as in (A4)) such that O¯′ is compact and O¯′ ⊆O. Then the main result of the paper can be stated as follows. Theorem 3.2 (Lock-in probability) Under assumptions (A1)−(A4), there exists a constant K˜ > 0 (depending on the attracting set A and O′) and an N ≥ 1 such that, for every n ≥ N , for every 0 0 0 E ∈F satisfying E ⊆{ω ∈Ω:X (ω)∈O′} and P(E)>0, we have that, n0 n0 P(X →A as n→∞|E)≥1−2de−K˜/b(n0), n where, for every n≥0, b(n):= ∞ (a(k))2. k=n There are two immediate impliPcations of the above result and are stated below, one of which serves as an alternate convergence result in the absence of stability guarantees, that is it allows us to obtain the convergence guarantee in Theorem 3.1(a) without the need to verify whether a given sample path satisfies sup kX (ω)k<∞. n≥0 n 5 (1) As aconsequenceofassumption(A2)(ii), we havethatlim b(n)=0. ThereforefromTheorem n→∞ 3.2,ifthe observationthatiteratelies inaneighborhoodofthe attractingsetismade laterintime (n ), the probabilityofconvergingtothe attractingsetincreasesandconvergestooneasn →∞. 0 0 Formally, lim P(X →A as n→∞|X ∈O′)=1. n0→∞ n n0 (2) Suppose P(∩ ∪ {X ∈O′})>0 (if P(∩ ∪ {X ∈O′})=0 then the iterates almost N≥0 n≥N n N≥0 n≥N n surely do not converge to the attracting set A). Then for every N ≥ 0, P(∪ {X ∈ O′}) > 0 n≥N n and ∪ {X ∈O′}={X ∈O′}∪(∪ {X ∈/ O′, for N ≤k ≤n−1,X ∈O′}), n≥N n N n>N k n where, the union in the R.H.S. is disjoint. Then by Theorem 3.2, for every N ≥N , 0 P({X →A as n→∞}∩(∩ ∪ {X ∈O′})) n N≥0 n≥N n ≥ P({X →A as n→∞}∩{X ∈/ O′, for N ≤k ≤n−1,X ∈O′}) n k n n≥N X = P({X →A as n→∞|{X ∈/ O′, for N ≤k≤n−1,X ∈O′}) n k n n≥N(cid:20) X P({X ∈/ O′, for N ≤k ≤n−1,X ∈O′}) k n (cid:21) ≥ 1−2de−K˜/b(n) P({X ∈/ O′, for N ≤k≤n−1,X ∈O′}) k n nX≥N(cid:16) (cid:17) ≥ 1−2de−K˜/b(N) P({X ∈/ O′, for N ≤k ≤n−1,X ∈O′}) k n (cid:16) (cid:17)nX≥N = 1−2de−K˜/b(N) P(∪ {X ∈O′}) n≥N n ≥(cid:16)1−2de−K˜/b(N)(cid:17)P(∩ ∪ {X ∈O′}). N≥0 n≥N n (cid:16) (cid:17) TheaboveinequalityistrueforeveryN ≥N . Takinglimitandusingthefactthatlim b(n)= 0 n→∞ 0, we get that, P({X →A as n→∞}∩(∩ ∪ {X ∈O′}))=P(∩ ∪ {X ∈O′}). n N≥0 n≥N n N≥0 n≥N n Therefore from the above we can conclude that, Corollary 3.3 Under assumptions (A1)−(A4), for almost every ω ∈ ∩ ∪ {X ∈ O′}, N≥0 n≥N n X (ω)→A as n→∞. n Remark In comparison with Theorem 3.1(a), the condition that ω ∈ ∩ ∪ {X ∈ O′} is N≥0 n≥N n strongerthan the requirementthat ω ∈{λ(ω)∩B(A)6=∅} because the former requiresthe iterate sequenceto enteranopenneighborhoodofAwithcompactclosureinfinitely often while the latter requires the iterates to enter the basin of attraction of A infinitely often which is larger than O′. But in the presence of stability we have that, {supkX k<∞}∩{λ(·)∩B(A)6=∅}⊆∩ ∪ {X ∈O′}. n N≥0 n≥N n n≥0 Further, as a consequence of Corollary 3.3, we have that, P({supkX k<∞}∩{λ(·)∩B(A)6=∅})=P(∩ ∪ {X ∈O′}), n N≥0 n≥N n n≥0 or in other words, the sample paths which visit O′ infinitely often and are unstable, occur with zero probability. 6 4 Application: Stabilization via resetting In this section we modify recursion (5) in such a way that the modified procedure yields sample paths which are stable (that is lie in a compact set almost surely) which in turn allows us to recover the convergenceresultasinTheorem3.1(b)withoutthe needto verifystability,inthe presenceofaglobally attracting set for the mean field. That is, we replace assumption (A4) with the following stronger requirement. (A4)’ Let A⊆Rd be a globally attracting set for the flow of DI (6). The modification that we propose involves resetting the iterates at regular time intervals if they are found to be lying outside a certain compact set. Let the initial condition X (ω) = x ∈ Rd for every 0 0 ω ∈Ω and {r ∈(0,∞)} be such that, n n≥0 (1) kx k<r , 0 0 (2) for every n≥0, r <r , n n+1 (3) lim r =∞. n→∞ n The modified scheme, henceforth referred to as stabilized stochastic recursive inclusion (SSRI) is where every sample path is generated as outlined in Algorithm 1. Algorithm 1 SSRI given x and {r } 0 k k≥0 n←0 ⊲ Initialize iteration count k ←0 ⊲ Initialize reset count t ←0 ⊲ Initialize time elapsed since last check e T >0 ⊲ Initialize window length W n ←1 ⊲ Initialize window count W X′ (ω)←x ⊲ Initialize initial condition 0 0 while n≥0 do X (ω)−X′(ω)−a(n)M (ω)∈a(n)F(X′(ω)) ⊲ Obtain X (ω) n+1 n n+1 n n+1 t ←t +a(n) ⊲ Update the time elapsed e e if t ≥T then ⊲ Is time elapsed greater than the window length? e W if n =1 then ⊲ Have sufficient number of windows elapsed? W if kX (ω)k>r then ⊲ Is the iterate lying outside a compact set? n+1 k X′ (ω)=x ⊲ Perform reset n+1 0 k ←k+1 ⊲ Increment reset count else X′ (ω)←X (ω) ⊲ Perform no reset n+1 n+1 end if n =2k ⊲ Reset window count W else n ←n −1 ⊲ Decrement window count W W X′ (ω)=X (ω) ⊲ Perform no reset n+1 n+1 end if t ←0 ⊲ Reset time elapsed e else X′ (ω)←X (ω) ⊲ Perform no reset n+1 n+1 end if n←n+1 ⊲ Increment iteration count end while A flowchart depicting the flow of control in Algorithm 1 is presented in Figure 1. In order to understandthealgorithmletusconsiderthescenariowherethekth resethasbeenperformedatiteration indexn . Thenthealgorithmcheckswhethertheiterateliesinthecompactsetr U (closedballofradius 0 k r centeredattheorigin)afterapproximately2kT amountoftimehaselapsed(fortherelationbetween k W time and iteration index see section 3.1). Now either a reset occurs or the iterate is left unchanged. (a) If the iterate is left unchanged then the next reset check is performed after 2kT amount of time W has elapsed. 7 (b) If the iterate is reset, then, the next check is performed after 2k+1T amount of time has elapsed. W In fact it would suffice if the time between successive reset checks were set to be greater than a certain threshold which is determined by the minimum time needed by the flow of the mean field (that is DI (6)) to reach the attracting set A from any initial condition in a compact neighborhood of it. But in practicalscenariosonemaynotbe ableto compute sucha time andhence maynotbe ableto determine the required threshold. This approach of increasing time duration between successive reset checks with increasingresetcountallowsustobypassthisproblem. Thechoiceofexponentiallyincreasingdurations is one of convenience as it simplifies notations involved in proving certain results later. Start n←0(iterationcount) k←0(resetcount) te←0(timesincelastresetcheck) TW >0(windowlength) nw←1(windowcount) X0(ω)←x0 (initialcondition) Whilethe iterationcountis no Stop non-negative,i.e., Isn≥0? yes ObtainXn+1(ω)suchthat, Xn+1(ω)−Xn′(ω)−a(n)Mn+1(ω)∈a(n)F(Xn′(ω)) and updatetimeelapsedsincethelastcheck,i.e., te←te+a(n). Isthe timeelapsedsince Havesufficient Istheiterate thelastcheckgreater yes numberofwindows yes lyingoutsidetheball yes thanthewindow elapsed?,i.e., ofradiusrk?,i.e., length?,i.e., IsnW =1? IskXn+1(ω)k>rk? Iste≥TW? no no no Performnoreset,i.e., Performreset,i.e., Performnoreset,i.e, Xn′+1(ω)←Xn+1(ω) Performnoreset,i.e, Xn′+1(ω)←x0 Xn′+1(ω)←Xn+1(ω) wiannddowdeccoruemnte,nit.e., Xn′+1(ω)←Xn+1(ω) raesnedticnocurnemt,ei.net., nW ←nW−1 k←k+1 Incrementiteration Resettime Resetwindow count,i.e., elapsed,i.e., count,i.e., n←n+1 te←0 nW ←2k Figure 1: Flowchart depicting the flow of control in Algorithm 1 For every n≥1, define the indicator random variable χ :Ω→{0,1} such that, for every ω ∈Ω, n 0 if X (ω)=X′(ω), χ (ω)= n n (12) n (1 if Xn(ω)6=Xn′(ω). We assume that the noise terms {M } satisfy the following version of assumption (A3). n n≥1 8 (A3)’ {M } is a martingale difference sequence with respect to the filtration {F } , where, for every n n n≥1 n≥1, F denotes the smallestσ-algebrageneratedby the iterates X (that is the iterates before n m the reset operation) and noise terms M , for 0 ≤ m ≤ n (then it is easy to show that for every m n≥1, X′ and hence χ are F measurable). Since for every n≥1, M denotes the noise arising n n n n inthe estimation(ormeasurement)ofF atX′ , we assume thatthe energyofthe noisedepends n−1 on X′ . That is for every n≥0, kM k≤K(1+kX′k) a.s.. n−1 n+1 n The next theorem says that, for almost every sample path generated by Algorithm 1, the total number of resets is finite, thereby guaranteeingstability. The proof of this theorem (provided in section 6) crucially hinges on a lower bound for the probability of the event that there are no future resets given that there are a certain number of resets up until iteration n for some large n . Specifically it 0 0 requires the probability of the abovementioned event to convergeto one as n tends to infinity andthis 0 is guaranteed by Theorem 3.2. Theorem 4.1 (Finiteresets)Underassumptions(A1),(A2),(A3)′ and(A4)′,P({ω ∈Ω: ∞ χ (ω)<∞}) n=1 n =1. P As a consequence of the above theorem, we have the following. (a) Let ω ∈ {ω ∈ Ω : ∞ χ (ω) < ∞}. Then there exists an N ≥ 1 and R > 0 (depend- n=1 n ing on ω) such that, for every n ≥ N, X (ω) = X′(ω) and sup kX (ω)k ≤ R. Therefore P n n n≥N n E[(a(n))2kM k2|F ](ω)≤ (a(n))2K2(1+kX′(ω)k)2 ≤K2(1+R)2 (a(n))2 n≥N n+1 n n≥N n n≥N <∞, where the last inequality follows from assumption (A2)(ii). Therefore, P P P ∞ {ω ∈Ω: χ (ω)<∞}⊆{ω ∈Ω: E[(a(n))2kM k2|F ](ω)<∞}. n n+1 n n≥1 n=0 X X Therefore by Theorem 4.1, we have that the P( ∞ E[(a(n))2kM k2|F ] < ∞) = 1 and by n=0 n+1 n martingaleconvergencetheorem(see[3,Section11.3,Thm.11])wehavethat,thesquareintegrable martingale { n−1 a(m)M ,F } convergePs almost surely. m=0 m+1 n n≥1 (b) Thusforω lyPinginaprobabilityoneset,thereexistsN ≥1andR>0(dependingonω)suchthat along this sample path the iterates {X (ω)} , can be viewed as being generated by recursion n n≥N (5) with initial condition X (ω), their norms are bounded by R uniformly and the additive noise N terms {M (ω)} satisfy the hypothesis of [2, Prop.1.3]. Then by arguments similar to those of n n≥N Theorem 3.1(b) we have that, Corollary 4.2 Under assumptions (A1)−(A3) and (A4)′, for almost every ω, the iterates gener- ated by Algorithm 1, {X′(ω)} , are such that X′(ω)→A as n→∞. n n≥0 n 5 Proof of the lock-in probability theorem (Thm. 3.2) Proofofthelock-inprobabilityresultfollowsasaconsequenceofaseriesoflemmas. Theoverallstructure can be summarized as follows. (a) Ourfirstaimistoreplacetheset-valuedmapinrecursion(5)withanequivalentsingle-valuedlocally Lipschitz continuous function with an additional parameter. In order to accomplish this, we first embedthegraphoftheset-valuedmapF inthegraphofasequenceoflocallyLipschitzcontinuous set-valued maps. These maps are then parametrized using the Stiener selection procedure which preserves the modulus of continuity. (b) The relation between the solutions of DI (6) and that of differential inclusions with continuous set-valued maps which approximate F (as in (a) above) is established. (c) An ordinarydifferential equation(o.d.e) is defined using an appropriatesingle-valued parametriza- tion of F (as in (a) above). The existence of solutions to such an o.d.e. and further its uniqueness follow from Caratheodary’s existence theorem and locally Lipschitz nature of the vector field re- spectively. The solutions of this o.d.e. aide in separating the probability contributions due to the additive noise terms and the set-valued nature of the drift function. Using the results from part (b) above,we conclude that after a large number of iterations, the probability contributionis only due to the additive noise terms. 9 (d) We finally review the standard probability lower bounding procedure for the additive noise terms from [3, Ch. 4]. Using this bound in the result obtained in part (c) above gives us the desired lock-in probability bound. Throughout, we use U to denote the closed unit ball in Rd centered at the origin. Further, for every Y ,Y ⊆Rd and r ∈R, define, 1 2 • Y +Y :={y +y :y ∈Y and y ∈Y }, 1 2 1 2 1 1 2 2 • rY :={ry :y ∈Y }. 1 1 1 1 5.1 Upper semicontinuous set-valued maps and their approximation First we recall definitions of continuous set-valued maps and locally Lipschitz continuous set-valued maps. These notions are taken from [12, Ch. 1]. Definition A set-valued map F :Rd →{compact subsets of Rd} is, • upper semicontinuous (u.s.c.) if, for every x∈Rd, for every ǫ>0, there exists a δ >0 (depending on x and ǫ) such that, for every x′ ∈Rd satisfying kx′−xk<δ, we have that F(x′)⊆F(x)+ǫU, where F(x)+ǫU :={y+ǫu:y ∈F(x), u∈U}. • lower semicontinuous(l.s.c.) if,foreveryx∈Rd,foreveryRd-valuedsequence{x } converging n n≥1 to x, for every y ∈F(x), there exists a sequence {y ∈F(x )} converging to y. n n n≥1 • continuous if, it is both u.s.c. and l.s.c. • locally Lipschitz continuous if, for every x ∈Rd, there exists δ > 0 and L> 0 (depending on x ) 0 0 such that for every x,x′ ∈x +δU, we have that F(x)⊆F(x′)+Lkx−x′kU. 0 LetK(Rd)denotethefamilyofallnon-emptycompactsubsetsofRd. LetH:K(Rd)×K(Rd)→[0,∞) be defined such that, for every S , S ∈K(Rd), 1 2 H(S ,S ):=max sup inf ks −s k, sup inf ks −s k . (13) 1 2 1 2 1 2 (cid:26)s1∈S1s2∈S2 s2∈S2s1∈S1 (cid:27) With H as defined above, (K(Rd),H) is a complete metric space (for a proof see [13, Thm. 1.1.2]). The notions of continuity and local Lipschitz continuity of a set-valued map can be restated using the metric defined above and is stated as a lemma below for easy reference (for a proof see [12, Ch. 1, section 5, Cor. 1]). Lemma 5.1 A set-valued map F :Rd →K(Rd) is (a) Continuous, if and only if, for every x ∈Rd, for every ǫ>0, there exists δ >0 (depending on x 0 0 and ǫ), such that for every x∈x +δU, H(F(x),F(x ))<ǫ. 0 0 (b) locally Lipschitz continuous, if and only if, for every x ∈ Rd, there exists δ > 0 and L > 0 0 (depending on x ), such that for every x, x′ ∈x +δU, H(F(x),F(x′))≤Lkx−x′k. 0 0 Before we proceed further we look at a certain form of locally Lipschitz continuous set-valued maps thatariselater. The next lemmadefines suchmaps andalsostates thatthe sumoftwo locallyLipschitz continuous set-valued maps is again a locally Lipschitz continuous set-valued map, a result needed later to obtain locally Lipschitz continuous single-valued parametrization of map F in recursion (5). Lemma 5.2 (a) If f : Rd → R is a locally Lipschitz continuous map and C ∈ K(Rd), then the set- valued map F : Rd → K(Rd), given by F(x) := f(x)C for every x ∈ Rd, is a locally Lipschitz continuous set-valued map. (b) If for every i∈ {1,2}, F : Rd →K(Rd) is a locally Lipschitz continuous set-valued map, then the i set-valued map F : Rd → K(Rd), given by F(x) := F (x)+F (x) for every x ∈ Rd, is a locally 1 2 Lipschitz continuous set-valued map. 10