AN ANALYTICAL FRAMEWORK FOR A CONSENSUS-BASED GLOBAL OPTIMIZATION METHOD JOSE´ A.CARRILLO,YOUNG-PILCHOI,CLAUDIATOTZECK,ANDOLIVERTSE 7 1 0 Abstract. In this paper we provide an analytical framework for investigating the efficiency 2 of a consensus-based model for tackling global optimization problems. This work should shed lightontheprospectsofjustifyingtheefficacyoftheoptimizationalgorithminthemean-field n sense. Extensions of the mean-field equation to include nonlinear diffusion of porous medium a J type is introduced. Theoretical results on decay estimates are then underlined by numerical simulations. 4 2 Keywords: Global optimization, opinion dynamics, consensus formation, agent-based models, ] stochastic dynamics, mean-field limit P A 1. Introduction . h Over the last decades, individual-based models have been widely used in the investigation of t a complexsystemsthatmanifestself-organizationorcollectivebehavior. Examplesofsuchcomplex m systems include swarming behavior, crowd dynamics, opinion formation, synchronization, and [ many more, that are present in the field of mathematical biology, ecology and social dynamics, see for instance [5, 6, 13, 14, 18, 22, 26, 28, 34] and the references therein. 2 In the field of global optimization, metaheuristics, such as evolutionary algorithms [4, 1, 32] v 0 and swarm intelligence [20, 24], have played an increasing role in the design of fast algorithms 2 to provide sufficiently good solutions in tackling hard optimization problems, which includes the 2 traveling salesman problem that is known to be NP hard. Metaheuristics, in general, may be 0 considered as high level concepts for exploring search spaces by using different strategies, chosen 0 in such a way, that a dynamic balance is achieved between the exploitation of the accumulated . 2 search experience and the exploration of the search space [9]. Notable metaheuristics for global 0 optimization include, for example, the Ant Colony Optimization, Genetic Algorithms, Particle 6 1 Swarm Optimization and Simulated Annealing, all of which are stochastic in nature [7]. Unfor- : tunately, despite having to stand the test of time, a majority of metaheuristical methods lack the v proper justification of its efficacy in the mathematical sense. The universal intent of research in i X the field is to ascertain whether a given metaheuristic is capable of finding an optimal solution r whenprovidedwithsufficientinformation. Duetothestochasticnatureofmetaheuristics,answers a to this question are non-trivial, and they are always probabilistic. Recently, theuseofopiniondynamicsandconsensusformationinglobaloptimizationhasbeen introducedin[31],wheretheauthorsshowedsubstantialnumericalandpartialanalyticalevidence of its applicability to solving multi-dimensional optimization problems of the form min f(x), Ω⊂Rd a domain, x∈Ω foragivencostfunctionf ∈C (Rd),whichisassumedtobenon-negativewithoutlossofgenerality. b The optimization algorithm involves the use of multiple agents located within the domain Ω to dynamically establish a consensual opinion amongst themselves in finding the global minimizer to the minimization problem, while taking into consideration the opinion of all active agents. First order models for consensus have been studied in the mathematical community interested in granular materials and swarming leading to aggregation-diffusion and kinetic equations, which have nontrivial stationary states or flock solutions, see [11, 16, 17, 12] and the references therein. Date:January26,2017. 1 2 CARRILLO,CHOI,TOTZECK,ANDTSE They are also common tools in control engineering to establish consensus in graphs [29, 37] for instance among many others. In order to achieve the goal of optimizing a given function f(x), we consider an interacting stochastic system of N ∈ N agents with position Xi ∈ Rd, described by the system of stochastic t differential equations (1a) dXi =−λ(Xi−v )dt+σ|Xi−v |dWi, t t t t t t (1b) v =(cid:88)N Xi (cid:32) ωfα(Xti) (cid:33), t t (cid:80)N ωα(Xi) i=1 i=1 f t withλ,σ >0,whereωαisaweight,whichwetakeasωα(x)=exp(−αf(x))forsomeappropriately f f chosenα>0. Noticethat(1)resemblesageometricBrownianmotion,whichdriftsinthedirection v ∈Rd. This system is a simplified version of the algorithm introduced in [31], while keeping the t essentialingredientsandmathematicaldifficulties. Thefirsttermin(1)imposesaglobalrelaxation towards a position determined by the behavior of the normalized moment given by v , while the t diffusiontermtriestoconcentrateagainaroundthebehaviorofv . Infact, agentswithaposition t differing a lot from v are diffused more and then they explore a larger portion of the landscape t of the graph of f(x), while the explorer agents closer to v diffuse much less. The normalized t moment v is expected to dynamically approach the global minimum of the function f, at least t when α is large enough, see below. This idea is also used in simulated annealing algorithms. The well-posedness of this system will be thoroughly investigated in Section 2. Formal passage to the mean-field limit, N →∞, for this system yields the nonlinear process (2a) dX¯ =−λ(X¯ −v [ρ ])dt+σ|X¯ −v [ρ ]|dW t t f t t f t t (cid:90) (2b) v [ρ ]= xdηα, ηα =ωαρ /(cid:107)ωα(cid:107) , ρ =law(X¯ ), f t t t f t f L1(ρt) t t subject to the initial condition law(X¯ )=ρ . We call ηα the α-weighted measure. 0 0 t The measure ρ =law(X¯ )∈P(Rd) is a Borel probability measure, which describes the evolu- t t tionofaone-particlemean-fielddistribution. Inthecasethatρ isabsolutelycontinuousw.r.t.the t Lebesgue measure dx, i.e., dρ = u dx, for some u ∈ L1(Rd;dx), we recall from [31, Remark 1], t t t see also [19], that ωαρ satisfies the well-known Laplace principle: f t (cid:18) 1 (cid:18)(cid:90) (cid:19)(cid:19) lim − log e−αfdρ =inff ≥0. α→∞ α t Therefore, if f attains a single minimum x ∈supp(ρ ), then the α-weighted measure ηα ∈P(Rd) ∗ t t approximatesaDiracdistributionδ atx ∈Rd forlargeα(cid:29)1. Consequently,thefirstmoment x∗ ∗ of ηα, given by v [ρ ], provides a good estimate of x =argminf. t f t ∗ The (infinitesimal) generator corresponding to the nonlinear process (2a) is given by (3) Lϕ=κ∆ϕ−µ·∇ϕ, for ϕ∈C2(Rd), c with drift and diffusion coefficients µ =λ(x−v [ρ ]), κ =(σ2/2)|x−v [ρ ]|2, t f t t f t respectively. Therefore, the Fokker–Planck equation reads (4) ∂ ρ =∆(κρ )+∇·(µρ ), lim ρ =ρ , t t t t t→0 t 0 where ρ ∈P(Rd) for t≥0 satisfies (4) in the weak sense. t Notice that the Fokker–Planck equation (4) is a nonlocal, nonlinear degenerate drift-diffusion equation, which makes its analysis a nontrivial task. Its well-posedness will be the topic of Sec- tion3. Furthermore,usingestablishedideastakenfrom[10,33],wemakethepassagetomean-field rigorous in Section 4. Having justified the validity of the Fokker–Planck equation (4) as a mean- field limit of the microscopic system (1), we proceed to give justifications for the applicability of the microscopic system (1) as a tool for solving global optimization problems, via its mean-field counterpart. CONSENSUS-BASED OPTIMIZATION 3 Morespecifically,wewillshowinSection5thatundercertainassumptionsonthecostfunction f, one obtains a uniform consensus as the limiting measure (t→∞) corresponding to (4), i.e., ρ −→δ as t→∞, t xˆ forsomexˆ∈Rd possiblydependingontheinitialdensityρ . Itisalsoshownthatthisconvergence 0 happensexponentiallyintime. Moreover,underthesameassumptionsonf,thepointofconsensus xˆ may be made sufficiently close to x =argminf by choosing α (cid:29)1 sufficiently large, which is ∗ the main goal for global optimization. WeconcludethepaperwithanextensionoftheFokker–Planckequation(4)toincludenonlinear diffusion of porous medium type and provide numerical evidence for consensus formation in the one dimensional case. For this reason, we introduce an equivalent formulation of the mean-field equationintermsofthepseudo-inversedistributionχ (η)=inf{x∈R|ρ ((−∞,x])>η}. Wealso t t comparethemicroscopicapproximationcorrespondingtotheporousmediumtypeFokker–Planck equation with the original consensus-based microscopic system (1) and the proposed algorithm in [31], showcasing the exponential decay rate of the error in suitable transport distances towards the global minimizers. 2. Well-posedness of the microscopic model In this section we study the existence of a unique process {(X(1,N),...,X(N,N))|t≥0}, which t t satisfiesourconsensus-basedoptimizationscheme(1). WedenoteX(N) :=(X(1,N),...,X(N,N))(cid:62), t t t and write, for an arbitrary but fixed N ∈N, system (1) as (5) dX(N) =−λF (X(N))dt+σM (X(N))dW(N), t N t N t t where W :=(W(1,N),...,W(N,N))(cid:62) denotes the standard Wiener process in RNd, and t t t (cid:80) (Xi−Xj)ωα(Xj) F (X)=(F1(X),...,FN(X))(cid:62) ∈RNd, Fi (X)= j(cid:54)=i f ∈Rd, N N N N (cid:80) ωα(Xj) j f M (X)=diag(|F1(X)|I ,...,|FN(X)|I )∈RNd×Nd. N N d N d At this point, we make smoothness assumptions regarding our cost function f. Assumption 1. The cost function f: Rd → R is locally Lipschitz continuous and nonnegative. In short, f ∈Lip (Rd), f ≥0. loc Under these conditions on f, we easily deduce that Fi , 1 ≤ i ≤ N, is locally Lipschitz N continuous and has linear growth. Consequently, F and M are locally Lipschitz continuous N N and have linear growth. To be more precise, we obtain the following result. Lemma 2.1. Let N ∈N, α,k >0 be arbitrary. Then for any X,Xˆ ∈RNd with |X|,|Xˆ|≤k: (cid:18) 2c (cid:113) (cid:19) |Fi (X)−Fi (Xˆ)|≤|Xi−Xˆi|+ 1+ k N|Xˆi|2+|Xˆ|2 |X−Xˆ|, N N N |Fi (X)|≤|Xi|+|X|, N with constant c =α(cid:107)∇f(cid:107) exp(α(cid:107)f(cid:107) ), where B ={x∈Rd ||x|≤k}. k L∞(Bk) L∞(Bk) k Proof. Let X,Xˆ ∈RNd with |X|,|Xˆ|≤k for some k ≥0. Then Fi (X)−Fi (Xˆ)= (cid:80)j(cid:54)=i(Xi−Xj)ωfα(Xj) − (cid:80)j(cid:54)=i(Xˆi−Xˆj)ωfα(Xˆj) =(cid:88)3 I , N N (cid:80) ωα(Xj) (cid:80) ωα(Xˆj) (cid:96)=1 (cid:96) j f j f where the terms I , (cid:96)=1,2,3, are given by (cid:96) (cid:16) (cid:17) (cid:80) (Xi−Xˆi+Xˆj −Xj)ωα(Xj) (cid:80) (Xˆi−Xˆj) ωα(Xj)−ωα(Xˆj) I = j(cid:54)=i f , I = j(cid:54)=i f f , 1 (cid:80) ωα(Xj) 2 (cid:80) ωα(Xj) j f j f (cid:16) (cid:17) (cid:80) ωα(Xˆj)−ωα(Xj) I =(cid:88) (Xˆi−Xˆj)ωα(Xˆj) j f f , 3 j(cid:54)=i f (cid:80) ωα(Xj)(cid:80) ωα(Xˆj) j f j f 4 CARRILLO,CHOI,TOTZECK,ANDTSE that may easily be estimated by c (cid:113) |I |≤|Xi−Xˆi|+|X−Xˆ|, |I |≤ k|X−Xˆ| N|Xˆi|2+|Xˆ|2, 1 2 N c (cid:113) |I |≤ k|X−Xˆ| N|Xˆi|2+|Xˆ|2. 3 N Putting all these terms together yields the required estimate. As for the estimate of |Fi (X)|, we easily obtain N (cid:12)(cid:12) (cid:80) Xjωα(Xj)(cid:12)(cid:12) |Fi (X)|=(cid:12)Xi− j f (cid:12)≤|Xi|+|X|, N (cid:12) (cid:80) ωα(Xj) (cid:12) (cid:12) j f (cid:12) thereby concluding the result. (cid:3) Due to Lemma 2.1, we may invoke standard existence results of strong solutions for (5) [21]. Theorem 2.1. For each N ∈ N, the stochastic differential equation (5) has a unique strong solution {X(N)|t≥0} for any initial condition X(N) satisfying E|X(N)|2 <∞. t 0 0 Proof. As mentioned above, we make use of a standard result on existence of a unique strong solution. To this end, we show the existence of a constant b >0, such that N (6) −2λX·F (X)+σ2trace(M M(cid:62))(X)≤b |X|2. N N N N Indeed, since the following inequalities hold: (cid:80) (Xi−Xj)ωα(Xj) −Xi·Fi (X)=−Xi· j(cid:54)=i f ≤−|Xi|2+|Xi||X|, N (cid:80) ωα(Xj) j f (cid:12)(cid:12)(cid:80) (Xi−Xj)ωα(Xj)(cid:12)(cid:12)2 |Fi (X)|2 =(cid:12) j(cid:54)=i f (cid:12) ≤2(cid:0)|Xi|2+|X|2(cid:1), N (cid:12) (cid:80) ωα(Xj) (cid:12) (cid:12) j f (cid:12) we conclude that −2λ(cid:104)X,F (X)(cid:105)+σ2trace(M M(cid:62))(X)=(cid:88) (cid:0)−2λ(cid:104)Xi,Fi (X)(cid:105)+dσ2|Fi (X)|2(cid:1) N N N N N i ≤(cid:88) 2λ(cid:0)−|Xi|2+|Xi||X|(cid:1)+2dσ2(cid:0)|Xi|2+|X|2(cid:1) i (cid:16) √ (cid:17) ≤2 λ N +2dσ2N |X|2 =:b |X|2. N AlongwiththelocalLipschitzcontinuityandlineargrowthofF andM ,weobtaintheassertion N N by applying [21, Theorem 3.1]. (cid:3) Remark 2.1. Infact,theestimate(6)yieldsauniformboundonthesecondmomentofX . Indeed, t by application of the Itˆo formula, we obtain d E|X(N)|2 =−2λE[(cid:104)X ,F (X(N))(cid:105)]+σ2E[trace(M M(cid:62))(X(N))]≤b E|X(N)|2. dt t t N t N N t N t Therefore, the Gronwall inequality yields E|X(N)|2 ≤ebNtE|X(N)|2 for all t≥0, t 0 i.e., the solution exists globally in time for each fixed N ∈N. Unfortunately, for the mean-field limit (N → ∞) we lose control of the previous bound, since b →∞ as N →∞. Therefore, we will need a finer moment estimate on X(N). In particular, we N need an N independent moment estimate for X(N). Consider the evolution of ξij :=X(i,N)−X(j,N) given by t t t (cid:16) (cid:17) dξij =−λξijdt+σ |X(i,N)−v(N)|−|X(j,N)−v(N)| dWi+σ|X(j,N)−v(N)|dWij, t t t t t t t t t t CONSENSUS-BASED OPTIMIZATION 5 where Wij =Wi−Wj is again a standard Wiener process. Applying the Itˆo formula gives t t t d E|ξij|2 ≤−2λE|ξij|2+dσ2E[(|X(i,N)−v(N)|−|X(j,N)−v(N)|)2]+2dσ2E[|X(j,N)−v(N)|2] dt t t t t t t t t +dσ2E[(|X(i,N)−v(N)|−|X(j,N)−v(N)|)|X(j,N)−v(N)|] t t t t t t (cid:18) 3 (cid:19) 5dσ2 ≤− 2λ− dσ2 E|ξij|2+ E[|X(j,N)−v(N)|2], 2 t 2 t t where we used the fact that (cid:12) (cid:12) (cid:12)|X(i,N)−v(N)|−|X(j,N)−v(N)|(cid:12)≤|X(i,N)−X(j,N)|=|ξij|. (cid:12) t t t t (cid:12) t t t In order to estimate the last term on the right hand side, we first observe that |X(j,N)−v(N)|≤ (cid:80)(cid:96)ωfα(Xt((cid:96),N))|Xt(j,N)−Xt((cid:96),N)| ≤(cid:88) β((cid:96),N)|ξ(cid:96)j|=(cid:104)βN,ζj(cid:105), t t (cid:80) ωα(X((cid:96),N)) (cid:96) t t t t (cid:96) f t where βN =(β(1,N),...,β(N,N))(cid:62), β((cid:96),N) =ωα(X((cid:96),N))(cid:46)(cid:88) ωα(X((cid:96),N)), f t f t (cid:96) and ζj =(|ξ1j|,...,|ξNj|)(cid:62). We further note that βN ≥0, (cid:88) β((cid:96),N) =1=(cid:88) e((cid:96),N), eN =(cid:18) 1 ,..., 1 (cid:19)(cid:62). t (cid:96) t (cid:96) N N Moreover, majorization theory tells us that eN ≺βN, i.e., βN strongly majorizes eN [30]. In this t t case, Horn’s lemma provides the existence of an orthostochastic matrix A=(Q)2, Q orthogonal, such that eN =AβN [23]. Consequently, (cid:104)βN,ζj(cid:105)=(cid:104)Q(cid:62)eN,Q ζj(cid:105)≤|Q(cid:62)eN| |Q ζj| =|eN| |ζj| = √1 (cid:16)(cid:88) |ξ(cid:96)j|2(cid:17)12 , t t t t t t 2 t t 2 2 t 2 t N (cid:96) and therefore (7) |X(j,N)−v(N)|2 ≤ 1 (cid:88) |ξ(cid:96)j|2. t t N (cid:96) t Finally, we obtain d (cid:18) 3 (cid:19) 5dσ2 (cid:88) (8) E|ξij|2 ≤− 2λ− dσ2 E|ξij|2+ E|ξ(cid:96)j|2. dt t 2 t 2N (cid:96) t We now introduce the functions S(N) := 1 (cid:88) E|ξij|2, R(N) :=max E|ξij|2. t N i,j t t 1≤i,j≤N t Then it follows from (8) that (9) S(N) ≤exp(−2γt)S(N), R(N) ≤exp(−2γt)R(N), t 0 t 0 with γ =λ−2dσ2. From these observations, we have the following theorem. Theorem 2.2. Let {X(N)|t > 0} be a strong solution to the stochastic differential equation (5) t with the initial data X(N) satisfying E|X(N)|2 <∞. Then we have 0 0 E|X(N)|2 ≤eλtE|X(N)|2+ λ+dσ2S(N)(cid:0)eλt−e−2γt(cid:1), t 0 λ+2γ 0 with γ =λ−2dσ2. Furthermore, if γ >0, then we have max E[|X(i,N)−v(N)|2]→0 as t→∞, 1≤i≤N t t exponentially fast with decay rate γ >0. 6 CARRILLO,CHOI,TOTZECK,ANDTSE Proof. Similarly as before, we get d E|X(N)|2 =−2λE[X(N)·F (X(N))]+dσ2E[|F (X(N))|2] dt t t N t N t ≤λE|X(N)|2+(λ+dσ2)E[|F (X(N))|2]. t N t On the other hand we have from (9) that E[|F (X(N))|2]≤(cid:88) E[|X(i,N)−v(N)|2]≤ 1 (cid:88) E|ξ(cid:96)i|2 ≤S(N) ≤S(N)e−2γt. N t i t t N i,(cid:96) t t 0 This yields E|X(N)|2 ≤eλtE|X(N)|2+ λ+dσ2S(N)(cid:0)eλt−e−2γt(cid:1). t 0 λ+2γ 0 Now let γ >0. Notice that (7) gives E[|X(i,N)−v(N)|2]≤ 1 (cid:88) |ξij|2 ≤R(N). t t N j t t Together with (9), this gives max E[|X(i,N)−v(N)|2]≤R(N)e−2γt, 1≤i≤N t t 0 as t→∞. This completes the proof. (cid:3) Remark 2.2. Since (cid:96)p-norms are equivalent in finite dimensions for any 1 ≤ p ≤ ∞, the initial condition E|X(N)|2 <∞ implies S(N) ≤4E|X(N)|2 and R(N) ≤4E|X(N)|2. 0 0 0 0 0 3. Well-posedness of the mean-field equation Inthissection,weprovidethewell-posednessofthenonlocal,nonlinearFokker–Planckequation (4). Since we will be working primarily with Borel probability measures on Rd with finite second moment, we provide its definition for the readers convenience. We denote by (cid:26) (cid:90) (cid:27) P (Rd):= µ∈P(Rd) such that |z|2µ(dz)<∞ 2 Rd to be the space of Borel probability measures on Rd with finite second moment. This space may be equipped with the 2-Wasserstein distance W defined by 2 (cid:20)(cid:90) (cid:21) W2(µ,µˆ)= inf |z−zˆ|2π(dz,dzˆ) , µ,µˆ ∈P (Rd), 2 2 π∈Π(µ,µˆ) Rd×Rd where Π(µ,µˆ) denotes the collection of all Borel probability measures on Rd×Rd with marginals µ and µˆ on the first and second factors respectively. The set Π(µ,µˆ) is also known as the set of all couplings of µ and µˆ. Equivalently, the Wasserstein distance may be defined by (cid:104) (cid:105) W2(µ,µˆ)=infE |Z−Zˆ|2 , 2 where the infimum is taken over all joint distributions of the random variables Z and Zˆ with marginals µ and µˆ respectively. It is known that (P (Rd),W ) is Polish, where W metricizes the 2 2 2 weak convergence in P (Rd), as well as, provides convergence of the first two moments [2, 36]. 2 The main result of this section is provided by the following theorem. Theorem 3.1. Let f ∈Lip (Rd), f ≥0 and ρ ∈P (Rd). Then there exists a unique nonlinear loc 0 2 process X¯ ∈C([0,T],Rd) satisfying (2) in the strong sense, and its law ρ =law(X¯ ) satisfies the t t Fokker–Planck equation (4) with ρ∈C([0,T],P (Rd)), lim ρ =ρ ∈P (Rd). 2 t→0 t 0 2 CONSENSUS-BASED OPTIMIZATION 7 Proof. Forsomegivenu∈C([0,T],Rd),wemayuniquelysolvethestochasticdifferentialequation (10) dX =−λ(X −u )dt+σ|X −u |dW , law(X )=ρ , t t t t t t 0 0 for some fixed initial measure ρ ∈P (Rd), which induces ρ =law(X ). Since X ∈C([0,T],Rd), 0 2 t t we obtain ρ∈C([0,T],P (Rd)), which satisfies the following Fokker–Planck equation 2 d (cid:90) (cid:90) (cid:16) (cid:17) ϕdρ = (σ2/2)|x−u |2∆ϕ−λ(x−u )·∇ϕ dρ for all ϕ∈C2(Rd). dt t t t t b We then compute v [ρ]∈C([0,T],Rd) according to (2b). This provides the self-mapping property f of the map T : u (cid:55)→ v , which we now show to be a contraction in the metric space C([0,T],Rd), f endowed with the equivalent weighted norm (cid:107)u(cid:107) =sup |u |eβt, exp t∈[0,T] t for some β ∈R to be chosen appropriately later. We begin by establishing an estimate of the second moment of ρ =law(X¯ ) satisfying (10): t t d (cid:90) (cid:90) (cid:16) (cid:17) |x|2dρ = dσ2|x−u |2−2λ(cid:104)x−u ,x(cid:105) dρ dt t t t t (cid:90) (cid:16) (cid:17) = (dσ2−2λ)|x|2+(2λ−2dσ2)(cid:104)x,u (cid:105)+dσ2|u |2 dρ t t t (cid:90) ≤−(2λ−dσ2−|γ|) |x|2dρ +(dσ2+|γ|)|u |2, t t with γ :=λ−dσ2. From Gronwall’s inequality we deduce (cid:90) (cid:18)(cid:90) (cid:19) dσ2+|γ| (cid:16) (cid:17) (11) |x|2dρ ≤ |x|2dρ e−(2λ−dσ2−|γ|)t+ 1−e−(2λ−dσ2−|γ|)t (cid:107)u(cid:107)2 . t 0 2λ−dσ2−|γ| ∞ Consequently,wehavesup (cid:82) |x|2dρ ≤M forsomeM >0,dependingonlyonthecoefficients t∈[0,T] t λ, σ, the second moment of ρ and (cid:107)u(cid:107) . We then define the stopping time τ := inf{t > 0 : 0 ∞ k |X |≥k}whichreachesT fork largeenoughsincethesecondmomentisuniformlyboundedink. t Now let u,uˆ ∈ C([0,T],Rd) be given and X,Xˆ ∈ C([0,T],Rd) their corresponding solutions to (10)respectivelywithρ =law(X ),ρˆ =law(Xˆ )andlaw(X )=law(Xˆ )=ρ . Setτ¯ :=τ ∧τˆ . t t t 0 0 0 0 k k k Taking the difference z :=X −Xˆ and applying the Itˆo formula on |z |2 gives t t t t dE[|z |2]=−2λE[|z |2]+2λE[(cid:104)z ,u −uˆ (cid:105)]+dσ2E(cid:104)(cid:0)|X −u |−|Xˆ −uˆ |(cid:1)2(cid:105) dt t t t t t t t t t ≤−(λ−2dσ2)E[|z |2]+(λ+2dσ2)|u −uˆ |2. t t t From the estimate above, we obtain via Gronwall’s inequality (cid:90) t E[|z |]2 ≤E[|z |2]≤(λ+2dσ2)e−(λ−2dσ2)t e(λ−2dσ2)s|u −uˆ |2ds t t s s 0 (cid:90) t(cid:16) (cid:17)2 ≤(λ+2dσ2)e−(λ−2dσ2)t e(λ−2dσ2−δ)s/2|u −uˆ | eδsds s s 0 ≤c e−2βt(cid:107)u−uˆ(cid:107)2 , δ exp where c :=(λ+2dσ2)/δ and we set β :=(λ−2dσ2−δ)/2. This yields δ √ (12) E[|z |]eβt ≤ c (cid:107)u−uˆ(cid:107) . t δ exp We next provide a stability estimate. Note that (cid:82) xωαdρ E[X¯ ωα(X¯ )] v [ρ ]= f t = t f t . f t (cid:107)ωα(cid:107) E[ωα(X¯ )] f L1(ρt) f t Using the above equality together with taking care of the stopping time τ¯ yields k |ωfα(x)|≤1, E[ωfα(X¯t)]≥e−αfk, and E[|ωfα(X¯t)−ωfα(Xˆt)|]≤αfkE[|X¯t−Xˆt|], 8 CARRILLO,CHOI,TOTZECK,ANDTSE for 0≤t≤τ¯ , where f :=(cid:107)f(cid:107) . This allows us to obtain k k W1,∞(B2k) (cid:12)(cid:12)E[X¯ (ωα(X¯ )−ωα(Xˆ ))](cid:12)(cid:12) (cid:12)(cid:12)E[(X¯ −Xˆ )ωα(Xˆ )](cid:12)(cid:12) |v [ρ ]−v [ρˆ]|≤(cid:12) t f t f t (cid:12)+(cid:12) t t f t (cid:12) f t f t (cid:12) E[ωα(X¯ )] (cid:12) (cid:12) E[ωα(X¯ )] (cid:12) (cid:12) f t (cid:12) (cid:12) f t (cid:12) (cid:12)(cid:12)E[ωα(Xˆ )−ωα(X¯ )] (cid:12)(cid:12) +(cid:12) f t f t E[Xˆ ωα(Xˆ )](cid:12) (cid:12)(cid:12)E[ωfα(X¯t)]E[ωfα(Xˆt)] t f t (cid:12)(cid:12) ≤αkfkeαfkE[|X¯t−Xˆt|]+eαfkE[|X¯t−Xˆt|]+αkfke2αfkE[|X¯t−Xˆt|] ≤(cid:0)eαfk +2αkfke2αfk(cid:1)E[|zt|] for 0≤t≤τ¯k. Since τ¯ =T for k sufficiently large, we find that for some k k 0 |v [ρ ]−v [ρˆ]|≤c E[|z |] for 0≤t≤T, f t f t k0 t where ck0 :=eαfk0 +2αk0e2αfk0 >0. This together with (12) gives √ (cid:107)v [ρ ]−v [ρˆ](cid:107) ≤c c (cid:107)u−uˆ(cid:107) for 0≤t≤T. f t f t exp k0 δ exp √ Wefinallychooseδ >0largeenoughsuchthatc c <1tohavethecontractionofthemapping. k0 δ Applying the Banach fixed point theorem on the mapping T concludes the proof. (cid:3) The following result appears as a simple consequence of Theorem 3.1. Corollary 3.1. Let ρ ∈ C([0,T],P (Rd)) be the solution of the Fokker–Planck equation (4) with 2 initial data ρ ∈P (Rd) provided by Theorem 3.1. Then v [ρ]∈C([0,T],Rd) and 0 2 f (cid:90) (cid:90) dσ2+|γ| sup |x|2dρ ≤ |x|2dρ + (cid:107)v [ρ](cid:107)2 , t∈[0,T] t 0 2λ−dσ2−|γ| f ∞ i.e., the second moment of ρ is uniformly bounded in the interval [0,T]. t 4. Propagation of chaos and the mean-field limit Having the existence of a unique solution to the nonlinear process (2) at hand, we proceed to showaquantitativeestimate(inN ∈N)ofthedifferencebetweenthetwosolutionsXi andX¯i for t t any 1≤i≤N, and consequently also the difference between their respective laws. Unfortunately, the result holds under an additional assumption on the solution of the nonlinear process. Assumption 2. The solution ρ=law(X¯)∈C([0,T],P(Rd)) of (4) satisfies (cid:107)ωα(cid:107) ≥c for all f L1(ρt) t∈[0,T] for some constant c>0. Remark 4.1. In the next section, we provide a sufficient condition on f ∈ Lip (Rd) that will loc imply (cid:107)ωα(cid:107) ≥(cid:107)ωα(cid:107) , and therefore Assumption 2 (cf. Proposition 5.1). f L1(ρt) f L1(ρ0) Proposition 4.1. Let f ∈ Lip (Rd), f ≥ 0, and ρ ∈ P (Rd). Consider the solution X(N) of loc 0 2 t (1) and the solution X¯(i,N), 1 ≤ i ≤ N ∈ N of (2) for t ∈ [0,T] satisfying Assumption 2, with t mutually independent ρ -distributed initial data X(i,N) =X¯(i,N), 1≤i≤N. Then the estimate 0 0 0 (13) sup E[|X(i,N)−X¯(i,N)|2]≤CN−1, 1≤i≤N, t∈[0,T] t t holds for some constant C >0, independent of N ∈N. Proof. Without loss of generality, we may take the same Wiener processes for both X(i,N) and t X¯(i,N) and define the sequence {τ } of stopping times t k k≥0 τ :=inf{t>0||X¯(1,N)|≥k}∧inf{t>0||X(N)|≥k}, k ≥0. k t t Since the second moment for both systems are uniformly bounded in k and N, we obtain τ ≥T k for k sufficiently large. Denoting the difference of the solutions by Z(i,N) := X(i,N)−X¯(i,N), we t t∧τk t∧τk CONSENSUS-BASED OPTIMIZATION 9 apply Itˆo’s formula to obtain (cid:34)(cid:18)(cid:90) t∧τk (cid:19)2 (cid:18)(cid:90) t∧τk (cid:19)2(cid:35) E[|Z(i,N)|2]≤4λ2E |X(i,N)−X¯(i,N)|ds + |v(N)−v [ρ ]|ds t s s s f s 0 0 (cid:20)(cid:90) t∧τk (cid:90) t∧τk (cid:21) +4σ2E |X(i,N)−X¯(i,N)|2ds+ |v(N)−v [ρ ]|2ds s s s f s 0 0 (cid:18)(cid:90) t∧τk (cid:90) t∧τk (cid:19) ≤4(λ2T +σ2) E[|Z(i,N)|2]ds+ E[|v(N)−v [ρ ]|2]ds . s s f s 0 0 We remind the reader that (cid:16)(cid:88) (cid:17)(cid:46)(cid:16)(cid:88) (cid:17) v(N) = X(i,N)ωα(X(i,N)) ωα(X(i,N)) . s s f s f s i i In order to estimate the last term on the right-hand side, we write A+B−C+D v(N)−v [ρ ]= , s f s N(cid:107)ωα(cid:107) f L1(ρs) and estimate each term on the right-hand side separately, where A=(cid:88) (X(j,N)−X¯(j,N))ωα(X¯(j,N)), s s f s j B =(cid:88) (X(j,N)−v(N))(cid:0)ωα(X(j,N))−ωα(X¯(j,N))(cid:1), s s f s f s j C =v(N) (cid:88) (cid:0)ωα(X¯(j,N))−E[ωα(X¯(j,N))](cid:1), s f s f s j D =(cid:88) (cid:0)X¯(j,N)ωα(X¯(j,N))−E[X¯(j,N)ωα(X¯(j,N))](cid:1). s f s s f s j We begin by computing the second moment of A. Simple computations lead to (cid:88) E[|A|2]≤N E[|Z(j,N)|2], s j whereweusedJensen’sinequality. Notethatfors∈(0,t∧τ ),wehavethat|X¯(1,N)|∨|X(N)|≤k k s s for some k ≥0. Therefore, the estimate for B reads (cid:104)(cid:16)(cid:88) (cid:17)(cid:16)(cid:88) (cid:17)(cid:105) E[|B|2]≤c (k)2E |X(j,N)−v(N)|2 |Z(j,N)|2 ω s s s j j (cid:88) ≤2(1+N)k2c (k)2 E[|Z(j,N)|2], ω s j where c (k) = (cid:107)ωα(cid:107) . Now define κ (X¯(i,N)) = ωα(X¯(i,N))−E[ωα(X¯(i,N))]. Obviously ω f W1,∞(Bk) 1 s f s f s E[κ (X¯(i,N))]=0 for all 1≤i≤N. Moreover, 1 s E[κ (X¯(j,N))κ (X¯(k,N))]=E[κ (X¯(j,N))]E[κ (X¯(k,N))]=0 for j (cid:54)=k, 1 s 1 s 1 s 1 s since the processes X¯(j,N) and X¯(k,N) are independent for j (cid:54)=k. Consequently, s s E[|C|2]≤k2(cid:88) E[κ (X¯(j,N))κ (X¯(k,N))]≤Nk2E[κ (X¯(1,N))2] 1 s 1 s 1 s j,k Similarly,wedefineκ (X¯(i,N))=X¯(i,N)ωα(X¯(i,N))−E[X¯(i,N)ωα(X¯(i,N))]. Asforκ wehavethat 2 s s f s s f s 1 E[κ (X¯(i,N))]=0 for all 1≤i≤N and E[κ (X¯(j,N))κ (X¯(k,N))]=0 for j (cid:54)=k, which yields 2 s 2 s 2 s E[|D|2]≤(cid:88) E[κ (X¯(j,N))κ (X¯(k,N))]=NE[κ (X¯(1,N))2]. 2 s 2 s 2 s j,k 10 CARRILLO,CHOI,TOTZECK,ANDTSE Note that E[κ (X¯(1,N))2] and E[κ (X¯(1,N))2] are uniformly bounded in time. Indeed, we find 1 t 2 t (cid:90) E[κ (X¯(1,N))2]= |xωα(x)−E[X¯(1,N)ωα(X¯(1,N))]|2dρ 1 s f s f s s (cid:90) = |xωα(x)|2dρ −|E[X¯(1,N)ωα(X¯(1,N))]|2 f s s f s (cid:90) (cid:90) ≤ |xωα(x)|2dρ (x)≤ |x|2dρ (x). f s s Therefore,theuniformboundedness(ins)followsfromCorollary3.1. Weobtainasimilarestimate for κ . By assumption (cid:107)ωα(cid:107) ≥c for all s∈[0,T]. Hence, we obtain 2 f L1(ρs) (14) E[|z(i,N)|2]≤c (cid:90) t∧τkE[|z(i,N)|2]ds+c k2 1 (cid:88) (cid:90) t∧τkE[|z(i,N)|2]ds+c 1 t t 1 s 2 N j s 3N 0 0 with constants c ≥0. Averaging over 1≤i≤N gives i y(N) := 1 (cid:88) E[|z(i,N)|2]≤c (1+k2)(cid:90) t∧τky(N)ds+c 1 t, t N i t 0 s 0N 0 for some nonnegative constant c ≥0. An application of the Gronwall inequality yields 0 y(N) ≤(c /N)tec0(1+k2)t for all t∈[0,T]. t 0 Finally, we insert this estimate into (14) and use Gronwall’s inequality again to obtain c 1 (cid:16) (cid:17) E[|X(i,N)−X¯(i,N)|2]=E[|z(i,N)|2]≤ 0 T k2ec0(1+k2)T +1 ec0T. t∧τk t∧τk t (1+k2)N Since the estimate above holds for any k ∈N, and τ ≥T for some k ∈N, we finally obtain k0 0 c 1 (cid:16) (cid:17) E[|Xt(i,N)−X¯t(i,N)|2]≤ (1+0k2)NT k02ec0(1+k02)T +1 ec0T =:CN−1, 0 which is precisely the required estimate for any 1≤i≤N. (cid:3) Remark 4.2. The moment estimate (13) ensures (i) the so-called propagation of chaos property [33], as well as (ii) the convergence of the stochastic empirical measure 1 (cid:88) ρN(A)= δ (A), A∈B(Rd), t N i Xti towards the deterministic mean-field probability measure ρ ∈P (Rd) [35]. t 2 Indeed, for (i) we have that W2(ρ((cid:96),N),ρ⊗(cid:96))≤E[|(X1,...,X(cid:96))−(X¯1,...,X¯(cid:96))|2]≤(cid:96)CN−1, 1≤(cid:96)≤N, 2 t t t t t t where ρ((cid:96),N) ∈P (R(cid:96)d) denotes the (cid:96)-th marginal of the N-particle joint distribution on RNd. t 2 As for (ii), we have for any ϕ∈Lip (Rd) the estimate b E(cid:34)(cid:12)(cid:12)(cid:12)(cid:12)N1 (cid:88)iϕ(Xti)−(cid:90) ϕdρt(cid:12)(cid:12)(cid:12)(cid:12)2(cid:35) ≤2E(cid:34)N1 (cid:88)i|ϕ(Xti)−ϕ(X¯ti)|2+(cid:12)(cid:12)(cid:12)(cid:12)N1 (cid:88)iϕ(X¯ti)−(cid:90) ϕdρt(cid:12)(cid:12)(cid:12)(cid:12)2(cid:35)≤C˜N−1, for some constant C˜, independent of N and t∈[0,T]. Note that the estimate for second term in the first inequality follows from the law of large numbers, which holds since {X¯i} are mutually t independent and identically ρ -distributed. t Summarizing the above discussion, we have the following result.