SIAMJ.CONTROLOPTIM. (cid:176)c 1998SocietyforIndustrialandAppliedMathematics Vol.36,No.5,pp.1539{1575,September1998 003 SOLVING SCHEDULING PROBLEMS BY SIMULATED ANNEALING(cid:3) OLIVIER CATONIy Abstract. Wede(cid:12)neageneralmethodologytodealwithalargefamilyofschedulingproblems. Weconsiderthecasewheresomeoftheconstraintsareexpressedthroughtheminimizationofaloss function. We study in detail a benchmark example consisting of some jigsaw puzzle problem with additional constraints. We discuss some algorithmic issues typical of scheduling problems, such as the apparition of small unused gaps or the representation of proportionality constraints. We also carry on an experimental comparison between the Metropolis algorithm, simulated annealing, and theiteratedenergytransformationmethodtoseewhetherasymptoticaltheoreticalresultsareagood guidetowardspracticallye(cid:14)cientalgorithms. Key words. schedulingproblems,Metropolisalgorithm,simulatedannealing,IETalgorithm AMS subject classi(cid:12)cations. 60J10,90C42,82C80,65C05 PII. S0363012996307813 Introduction. The aim of this paper is to describe a general strategy to deal with scheduling problems and to illustrate its use on the resolution of jigsaw puzzles. We will assume that we can put our scheduling problem in the form of a task assign- ment problem, and we will turn it into the minimization of a cost function de(cid:12)ned on a suitable search space. This cost function will be minimized by a Monte Carlo algo- rithm of the Metropolis kind: either simulated annealing or our recently introduced iterated energy transformation (IET) method. We have already studied some of the theoretical aspects of these two methods in previous papers (see [6], [7]). Wehavechosentoexperimentonajigsawpuzzleproblemwithrectangularpieces because this is a typical instance of the kind of di(cid:14)culties encountered when building time tables, and because it is in itself a di(cid:14)cult problem (it is NP-complete) which deserves special attention. In the course of this experimentation, we will compare fouralgorithms: arandomizeddescentalgorithm(theMetropolisdynamicattemper- ature zero), the Metropolis algorithm, simulated annealing, and the iterated energy transformation algorithm. 1. An abstract task assignment framework. Let B be a (cid:12)nite set of tasks. Let E be a set of resources needed to perform these tasks. The set E may be any kind of set, a (cid:12)nite set, a domain in Rn, etc. In applications it can represent various things,suchasasetofpeoplewhoaretoperformthetasks,inwhichcaseitisnatural to see it as a (cid:12)nite set, or it can also represent space and time needed for the tasks, in which case it is sometimes natural to see it as a domain in Rn. More often it is a product space of both kinds. Anyhow, we will only consider a (cid:12)nite collection of subsets of E; therefore it will always be possible to consider that E is a (cid:12)nite set from the theoretical point of view. This is reasonable, because a computer can only handlea(cid:12)nitenumberofpossiblewaystoallocateresourcesandalsobecause,inmany problems of the time-table type, continuous quantities, such as time, are discretized (forinstancewhenonetriestoschedulelectures,theyareusuallyconstrainedtostart at full hours). Anyhow, the reader should think of E as a large set and our methods, (cid:3)Received by the editors August 5, 1996; accepted for publication (in revised form) August 20, 1997;publishedelectronicallyJune3,1998. http://www.siam.org/journals/sicon/36-5/30781.html yD.I.A.M.-LaboratoiredeMath(cid:19)ematiquesdel’EcoleNormaleSup(cid:19)erieure,U.A.762duC.N.R.S., 45rued’Ulm,75005Paris,France([email protected]). 1539 1540 OLIVIERCATONI inherited from statistical mechanics, are precisely meant to cope with a large state space. The abstract scheduling problem we will consider is to allocate to each task in B a set of resources in a way which satis(cid:12)es a set of constraints. At this level of generality, we will not represent the constraints by equations or logical relations; we will merely view them as a subset S of P(B (cid:2)E) (where P(A) is the set of subsets of the set A). We will call S the \solution space." A solution x in S is a subset of the product space B (cid:2)E. We will use the notations (cid:25) and (cid:25) B E for projections on B and E. The fact that (b;e) 2 x means that the task b uses the resource e. The set of resources used by b is (cid:25) ((cid:25)¡1(b)\x), for which we will use the E B functional notation x(b). We will assume that each solution x 2 S is a complete assignment, in the sense that all the tasks are scheduled: (cid:25) (x)=B for any x2S: B Our scheduling problem is to construct a solution x belonging to the solution space S. Theideaofconsideringschedulingproblemsasputtingobjectsinboxesinamulti- dimensionalspaceisnotnewandcanbefound,forinstance,inAbramson[1],wherea specializedsimulatedannealinghardwareisdescribedforhandlingsomegenerictypes of cost functions. 2. The jigsaw puzzle example. This example is meant to be a benchmark, where the main algorithmic issues of scheduling problems are present. The set of resources E will be a discretized rectangular frame E =f0;:::;M ¡1g(cid:2)f0;:::;N ¡1g(cid:26)Z2: The set of tasks B will be the set of pieces of the jigsaw puzzle. Each piece r has a rectangular shape de(cid:12)ned by its width w 2 N(cid:3) and by its height h 2 N(cid:3). The r r constraint is that pieces should not overlap. Thus the solution space is S=fx(cid:26)B(cid:2)E : x(r)=[a ;a +w [(cid:2)[b ;b +h [; (a ;b )2E; r 2B; r r r r r r r r and x(r)\x(r0)=;; r 6=r0 2Bg: The problem is to build the jigsaw puzzle; that is, to construct x2S. Although the shape of pieces is very simple, this problem can be seen to be very complex. In fact,itiseasytoseethatitisNPcomplete,becauseitcontainsthepartitionproblem among its instances (see [14]). Indeed, the partition of given integers fc ;:::;c g 1 N into two sets I and J such that X X c = c i j i2I j2J can be viewed as a jigsaw puzzle with N pieces, respectively, of width c and height P i 1, and a frame of width (1/2) N c and height 2 (see Fig. 2.1). i=1 i 3. A method of resolution based on the Metropolis dynamic. In this section we will sketch a methodology to solve the abstract problem of section 1. The general idea is to perform a random search for a solution in a state space larger than the solution space. This search space should be easy to describe and easy to search SOLVINGSCHEDULINGPROBLEMSBYSIMULATEDANNEALING 1541 Fig. 2.1. by a Markov chain performing a succession of elementary moves. Of course, we will not use a Markov chain which uniformly samples the search space because, usually, the search space we will be able to build will be very large when compared to the solution space, and drawing points at random in the search space would seldom lead to discovering a solution. InsteadwewilluseaMarkovchainwithraretransitions,whoseinvariantmeasure is concentrated in a neighborhood of the solution space. This optimization technique iswellknown,butitsimprovementisstillasubjectofactiveresearch. Theprototype algorithm we will start from is the Metropolis dynamic at low temperature. The Metropolisdynamichasbeendesignedtosimulatestatisticalmechanicssystems, and notforoptimizationpurposes. Inordertoimproveitsperformanceasanoptimization algorithm, some speed-up techniques have been proposed. The most famous one is simulated annealing [15], [18]. We have also proposed recently another technique, whichwecalledthe iterated energy transformation method(IET)[7]. Wewilldescribe and use both of these. 3.1. Choice of a search space. The (cid:12)rst step of the method is to choose a search space S~ containing the solution space S. The most popular way to construct S~ istorelaxsomeconstraintsaboutthesolutionandtomeasure,instead,howmuchthe constraints have been violated by a score function one has afterwards to minimize. For instance, in circuits placement applications (one of the earliest applications of simulated annealing) the constraint that circuits should not overlap is often relaxed, and the overlapping of circuits is instead merely discouraged by some score function of the surface of the overlap. Our strategy will be somewhat of the same kind, with the di(cid:11)erence that we will not relax a constraint which is speci(cid:12)c to the problem. Instead, we will allow partial solutions, where only some proportion of the tasks have been scheduled. De(cid:12)ning partial solutions is usually very easy and very natural. Most of the time, this is how the problem is posed from the beginning. Indeed the constraints come usually from incompatibilities between tasks, such as sharing the same resource or needing to be performed in a given order, and can be expressed without assuming that all the tasks are already scheduled. Fromthetechnicalpointofview, wewillassumethatthesearchspace(thespace of partial solutions) satis(cid:12)es the following properties: (cid:15) The empty solution is in the search space: ;2S~. (cid:15) Thereisapathfromtheemptysolutionleadingtoanypartialsolutionx2S~ along which tasks are scheduled one after the other. This can be expressed in the following way: For any x2S~;x6=;, there is b2(cid:25) (x) such that xn(cid:25)¡1(b)2S~. B B (cid:15) All complete solutions in the search space satisfy the constraints. In other words, the solution space is exactly made of the complete solutions of the 1542 OLIVIERCATONI search space. This is expressed by the following equation: S=fx2S~ : (cid:25) (x)=Bg: B Letusnoticethatthe\best"choiceforS~wouldbefx\(cid:25)¡1(C) : x2S;C (cid:26)Bg, B the set of all partial solutions contained in global solutions. Anyhow, this set is, in practical situations, never de(cid:12)ned by simple relations, because when you have scheduledsomeofthetasks,itisneverpossible(exceptfortrivialproblems)toforetell whethertherewillremainsuitableresourcestoscheduletheremainingones. Therefore thesearchspaceS~is,mostofthetime,muchbroaderthanSandcontainsmanydead ends. 3.2. Building the dynamic: Constructions and destructions. The next ideaistode(cid:12)neonS~twokindsofdynamics,aconstructive dynamicandadestructive dynamic. These two random dynamics are characterized by two Markov matrices q C and q , D q :S~(cid:2)S~¡![0;1]; C q :S~(cid:2)S~¡![0;1]: D We will assume that the transitions allowed by q consist of either keeping the C current partial solution or scheduling one more task. In a similar way the transitions allowedbyq consistofunschedulingagivennumberoftasks. Weallowunscheduling D of more than one task at a time, because it is in some situations more sensible to do so. For instance, if many tasks have to share the same resource, it may sometimes speed up the allocation process to unschedule all of them at the same time (think of students sharing the same teacher). Thisconceptionofconstructionsanddestructionscanbeexpressedbythefollow- ing equations, where jAj is the number of elements in the (cid:12)nite set A: 8 >>>>>>>>>>>>>>< (cid:15)(cid:15) ff=((xx;;fyy(x));y::)qqC:((yxy;;\xy))(cid:25)B¡>>100((cid:25);;xxB(6=6=x)yy)gg=x;j(cid:25)B(y)j=j(cid:25)B(x)j+1g; C (1) >>>>>>>>>>>>>>: (cid:26)f(x;y) :(cid:26)qD+[(1x;fy()x;>y)0g: qn(y;x)>0g: C n=1 Let us remark that, usually, constructions will decompose into two steps, one being to choose an unscheduled task b 2 Bn(cid:25) (x) and the second one being to try B to allocate a set of resources to it. This second step is sometimes unsuccessful (either because it is impossible or the proper allocation has not been discovered); therefore as a rule, we have q (x;x)>0 for a substantial number of partial solutions. On the C contrary, destructions are simple moves, where you have only to choose a scheduled task b in (cid:25) (x) and to remove it. Therefore as a rule, we will have q (x;x) = 0, B D except when x=;, for which q (;;;)=1. D When the two above assumptions are satis(cid:12)ed, the whole search space S~ can be constructedbyq startingfromtheemptysolution;, andreversely, anysolutioncan C SOLVINGSCHEDULINGPROBLEMSBYSIMULATEDANNEALING 1543 be shrunk to the empty solution by successive applications of q . More precisely, the D following proposition holds. Proposition 3.1. +[1 S~= fy : qn(;;y)>0g C n=0 +[1 = fx : qn(x;;)>0g: D n=0 3.3. Building the cost function. Nowwewillbuildacost,orenergy,function de(cid:12)nedonthesearchspace,whichpenalizespartialsolutions: wewillcallitU : S~¡! R. Namely, we will require the following properties to hold: 8 >>>>>< (cid:15) argmx2iSn~ U(x)=S. (cid:15) There is a positive con- (2) >>>>>: sUta(xn)t+(cid:176)(cid:176)suwchhetnhxat6=Uy(ya)n(cid:21)d q (x;y)>0. D A typical example for U is U(x)=(cid:22)(Bn(cid:25) (x)); B where (cid:22) is a positive measure on B. In this case the assumptions on U are satis(cid:12)ed and the largest choice of (cid:176) is (cid:176) =min(cid:22)(b): b2B 3.4. Building a Metropolis dynamic. FromProposition3.1, weseethatany Markov matrix of the form (cid:21)q +(1¡(cid:21))q with (cid:21)2]0;1[ is irreducible. Therefore a C D straightforward way to build a Metropolis dynamic would be to consider the Markov matrix 8 >>< ((cid:21)qC(x;y)+(1¡(cid:21))qD(x;y))e¡(U(y)¡U(x))+=T; x6=y p (x;y)= X T >>: 1¡ pT(x;z); x=y: z6=x In fact, we can do better because we know in advance that, during a construction, the energy will decrease, and that during a destruction, the energy will increase by a quantity at least equal to (cid:176). This avoids applying uselessly the kernel q at low D temperatures in situations where we know that it will, most of the time, generate a move to be rejected. More precisely, we will use the following Markov matrix: p (x;y)=(cid:21) e¡(cid:176)=Tq (x;y)e¡(U(y)¡U(x)¡(cid:176))+=T T 0 D 1 X (3) +q (x;y)@1¡(cid:21) q (x;z)e¡(U(z)¡U(x)¡(cid:176))+=T¡(cid:176)=TA; C D z2S~ 1544 OLIVIERCATONI where (cid:21) is again a positive parameter in the interval 0 < (cid:21) (cid:20) 1. A choice of (cid:21) < 1 avoids that destructions should always be chosen at high temperatures. Usually we will take (cid:21) = 1=2 or (cid:21) = 1. The positive part in (U(y)¡U(x)¡(cid:176))+ is needed only to cover the case where x=y. The computer implementation of this Metropolis dynamic is the following: start- ing from the state x, (cid:15) (cid:12)rst (cid:176)ip a coin with odds (cid:21)e¡(cid:176)=T and 1¡(cid:21)e¡(cid:176)=T to decide whether or not to try a destruction. (cid:15) in case a destruction is tried, { choose a transition (x;y), drawing y according to the probability distri- bution q (x;y). D { then (cid:176)ip a second coin with odds exp¡((U(y) ¡ U(x) ¡ (cid:176))+=T) and 1¡exp¡((U(y)¡U(x)¡(cid:176))+=T)todecidewhetherornottoapplythis move. (cid:15) iftheanswertooneofthetwoprevioustosseswasno,thenchooseatransition (x;y); where y is chosen according to the distribution q (x;y); and apply it. C The hypotheses we made about q , q ; and U are what are needed to prove the C D following proposition. Proposition 3.2. For any temperature T > 0, the matrix p is an irreducible T Markov matrix. Considering the rate function V :S~(cid:2)S~!R [f+1g de(cid:12)ned by + (cid:26) (U(y)¡U(x))+ if q (x;y)+q (x;y)>0 and x6=y; V(x;y)= D C +1 otherwise, we see that there is a positive constant (cid:20) such that, whenever x;y 2S~, x6=y, 1 (cid:20)e¡V(x;y)=T (cid:20)p (x;y)(cid:20) e¡V(x;y)=T: T (cid:20) Moreover V satis(cid:12)es the weak reversibility condition of Hajek{Trouv(cid:19)e with respect to U. More precisely, if ¡ is the set of paths from x to y, we put for any (cid:176) =((cid:176) = x;y 1 x;:::;(cid:176) =y) 2¡ r x;y H((cid:176))= max U((cid:176) )+V((cid:176) ;(cid:176) ) i i i+1 i=1;:::;r¡1 and H(x;y)= min H((cid:176)): (cid:176)2¡x;y The weak reversibility condition of Hajek{Trouv(cid:19)e states that for any x;y 2S~ H(x;y)=H(y;x): Due to this reversibility property, U is a quasi-potential for p . We mean by this T statement that the (unique) invariant probability measure (cid:22) of p satis(cid:12)es for some T T positive constant (cid:11) (independent of T) and for any x2S~ (cid:11)(cid:20)(cid:22) (x)e(U(x)¡minU)=T (cid:20)1=(cid:11): T SOLVINGSCHEDULINGPROBLEMSBYSIMULATEDANNEALING 1545 Corollary 3.1. We can build optimization algorithms based on p following T the results of Catoni [7] and Trouv(cid:19)e [22]. More precisely, for any (cid:12)xed value of the temperature T, the homogeneous Markov chain with transition matrix p is a gen- T eralized Metropolis algorithm with quasi-potential function U. In the same way, for any decreasing sequence of temperatures (T ) , the nonhomogeneous Markov chain n n2N (X ) on S~ with transitions n n2N P(X =y : X =x)=p (x;y) n n¡1 Tn is a generalized simulated annealing algorithm. Its behavior has been studied in [22] and [24] and is very similar to the behavior of classical simulated annealing as studied in [6]. We can also apply the iterated energy transformation method to p , which will be T described in a further section of this paper and is studied in [7]. Proof. The only nonstraightforward point to check is the Hajek{Trouv(cid:19)e weak reversibility condition. Let us consider x;y 2 S~; and (cid:176) 2 ¡ . We build a path x;y from y to x in the following way. Replace any edge (z;t) 2 (cid:176) by the edge (t;z) if q (z;t) > 0 or p (z;t) = 0. If neither of the above two conditions is true, this C T means that q (z;t) > 0; then there is a path ’ 2 ¡ such that q (u;v) > 0 for D t;z C any edge (u;v) 2 ’, and we replace (z;t) by ’. The path ’ is such that H(’) = U(z)+V(z;t) = U(t) because for any (u;v) 2 ’, U(v) < U(u) and V(u;v) = 0. Therefore by concatenating all these reversed edges and paths in reverse order we get a path ˆ 2¡ such that H(ˆ)=H(’). Therefore H(y;x)=H(x;y). y;x Remarks. (cid:15) Inthesearchspaceweconsider,thereisanaturalstartingpointforoptimiza- tion algorithms, which is the empty schedule ;. (cid:15) Inmanyschedulingproblems, itisnotknowninadvancewhetheracomplete solution exists or whether one can possibly be found within the available computer time. Our method has the advantage of (cid:12)nding at least a partial solution,wheresomeproportionofthetasksarescheduledinacoherentway. This is not the case if other constraints are relaxed, as is usually done. For instance,iftheaimistoschedulethelecturesatauniversity,asolutionwhere somelecturessharethesameroomatthesametimehasnopracticalinterest, whereas a solution where some proportion of the lectures are scheduled in a coherent way can be applied. (cid:15) A slight variant of the present setup is the case where the search space sat- is(cid:12)es condition (1), but one does not know whether it is possible to schedule all the tasks, and wants instead to schedule as many tasks as possible. In this situation the energy can weigh (through a positive measure) the relative importance of tasks. In the three following sections, we are going to recall brie(cid:176)y some theoretical results about the speed of convergence of three optimization algorithms. 3.5. Rate of convergence of the Metropolis algorithm. In this section we consider the canonical process (X ) on the canonical space (S~N;B), where B is n n2N the sigma (cid:12)eld generated by the events depending on a (cid:12)nite number of coordinates. ForanytemperatureT 2R ,P willbetheprobabilitydistributionon(S~N;B)of + T a Markov chain with transition matrix p (where p is as in Proposition 3.2). Under T T thisdistribution,(X ) isaMetropolisalgorithmandhasthefollowingconvergence n n2N speed. 1546 OLIVIERCATONI Proposition 3.3. There exists a positive constant d, depending only on the choice of the search space S~, of the constructive and destructive dynamics q and q C D and of the parameter (cid:21), 0<(cid:21)<1, such that for any energy function U satisfying the hypothesis (2) of section 3:3, for any positive constant (cid:17), maxP (U(X )(cid:21)U +(cid:17)jX =x) T N min 0 x2S~ (cid:18) (cid:18) (cid:19) (cid:19) N (cid:20)d exp¡ e¡H1=T +e¡(cid:17)=T ; d whereH (V)isthe(cid:12)rstcriticaldepthoftheratefunctionV de(cid:12)nedinProposition3:2: 1 TheexponentH (V)=T isoptimalwhen(cid:17) issmallandwhenT tendsto0andN tends 1 to +1. With the notations of Proposition 3:2; H (V)=maxminH(x;y)¡U(x): 1 x62S y2S As a consequence, considering 1=T = (1=H )log(NH =d(cid:17) logN), we see that there 1 1 is a constant d (independent of U and (cid:17)), such that inf maxP (U(X )(cid:21)U +(cid:17)jX =x) T N min 0 T2R+ x2S~ (cid:18) (cid:19) d(cid:17) logN (cid:17)=H1(V) (cid:20)d : H (V) N 1 Moreover the exponent (cid:17)=H (V) is optimal for small enough values of (cid:17) 2 (U(E)¡ 1 U ). min For a proof see, for instance, Cot and Catoni [9]. This proposition does not give a quantitative upper bound for the probability of failureforN iterations,sinceitdoesnotgiveanestimateforconstantd;nevertheless, the fact that this constant is independent of U allows us to compare the probability of failure for di(cid:11)erent energy functions U for a large (cid:12)nite number of iterations N and not only when N tends to in(cid:12)nity. More precisely, it shows that the convergence speed of the Metropolis algorithm is slow when there are states with energies close to U . Indeed if one wants to study the convergence to S, one has to choose min (cid:17) =minfU(x)¡U ; x2S~nSg: min If (cid:17) is small, then it will re(cid:176)ect on the exponent (cid:17)=H (V). 1 This is a theoretical justi(cid:12)cation for the introduction of simulated annealing, which will not su(cid:11)er from this drawback, when proper robust cooling schedules are used. 3.6. Rate of convergence of simulated annealing. We consider now a non- increasingtriangularsequenceTN (cid:21)TN (cid:21)(cid:1)(cid:1)(cid:1)(cid:21)TN oftemperaturesandthemeasure 1 2 N P on S~N of the nonhomogeneous Markov chain with transitions (TN;:::;TN) 1 N P (X =yjX =x)=p (x;y): (TN;(cid:1)(cid:1)(cid:1);TN) n n¡1 TN 1 N n Therateofconvergenceofsuchanalgorithmhasbeenstudiedin[6], [8], and[22] (see also [24] in English, translated from [22]). We give here a simple result; for more precise estimates, we refer to the original papers. SOLVINGSCHEDULINGPROBLEMSBYSIMULATEDANNEALING 1547 Proposition 3.4 ([6], [22], [24]). There is a positive constant K such that K¡1N¡D¡1 (cid:20) inf maxP (U(X )>U jX =x)(cid:20)KN¡D¡1; (TN;:::;TN) N min 0 TN(cid:21)(cid:1)(cid:1)(cid:1)(cid:21)TN x2S~ 1 N 1 N where the constant D = D(V) is the di(cid:14)culty of the rate function V. With the notations of Proposition 3:2; the de(cid:12)nition of D(V) is H(x;y)¡U(x) D(V)= max min : x2S~nSy2S U(x)¡minU For any A > 0, there is a positive constant K such that the triangular exponential schedule (cid:18) (cid:19) 1 A n=N TN = n A (logN)2 gives a convergence speed of (cid:18) (cid:19) logN loglogN D(V)¡1 maxP (U(X )>U jX =x)(cid:20)K ; x2S~ T:N N min 0 N for N large enough. Inthecaseofsimulatedannealing,wehaveaprobabilityoffailureforN iterations of order (cid:15)(cid:16) (1=N)1=D (meaning that the logarithms on both sides of this equation log are equivalent when N tends to in(cid:12)nity). The important feature of this theoretical result is that the exponent 1=D is independent of the precision with which we want toreachU butdepends,onthecontrary,onlyonthestructureofthelocalminima min of U. One other interesting point is that the exponential triangular cooling schedule TN = A¡1(A=(logN)2)n=N is robust: it gives a convergence rate with the optimal n exponent 1=D(V) for any energy function U. 3.7. Rate of convergence of the energy transformation method. We introduced in [7] the iterated energy transformation method as another mean to dis- courage uphill moves from low energy states more than from high energy states. In simulated annealing this e(cid:11)ect is produced by an exogenous control of the tempera- tureparameter: in\typical"successfulrunsofsimulatedannealing, theenergyofthe currentstateismovingdownwardsontheaverage,andatthesametimeuphillmoves are more and more discouraged. In the iterated energy transformation method, a temporaryhypothesisismadeaboutthevalueofU , andaconcavetransformation min is applied to U on the basis of this hypothesis. Then the algorithm is run at con- stant temperature using the transformed energy. This produces the desired e(cid:11)ect of discouraging more uphill moves from low energy states. Of course, in the beginning, the hypothesis about U is necessarily grossly underestimated, so that the energy min transform is not very e(cid:14)cient, but after some iterations, it can be improved (this will work with a probability close to one) depending on the values of the energies of the explored states. The convergence of the lower bound estimate for U towards the true value of min U is exponentially fast (with a probability close to one), and therefore the energy min transformation is quickly tuned to an e(cid:14)cient value. 1548 OLIVIERCATONI The iterated energy transformation method applied to our problem is described as follows. For any strictly concave, strictly increasing energy transformation F : R¡!R[f¡1g, we consider the Markov matrix ( p (x;y)=1 (cid:21)e(F(U(x))¡F(U(x)+(cid:176)))q (x;y)e(F(U(x)+(cid:176))¡F(U(y)))¡ F (F(x)>¡1) D ˆ !) X + q (x;y) 1¡(cid:21) q (x;z)eF(U(x))¡F(U(x)+(cid:176))+(F(U(x)+(cid:176))¡F(U(z)))¡ C D z + 1 1 : (F(x)=¡1) (x=y) Consider for any positive constant (cid:11), any real shift (cid:28), and any positive tempera- ture T, the transformation ( 1 (cid:11)U + log(U +(cid:28)) if U +(cid:28) >0; F(cid:11);T;(cid:28)(U)= T ¡1 otherwise : Let us introduce the simpli(cid:12)ed notation p =p . (cid:11);T;(cid:28) F(cid:11);T;(cid:28) GivenparametersM 2N(numberofiterationsperformedwitheachenergytrans- form), tworealnumbers(cid:26)>0and(cid:17) (cid:21)0(twoparametersfortheupdateoftheshift 0 (cid:28)), and an initial lower bound (cid:14) < U , we consider the canonical process (X ) min n n2N on S~N with probability distribution P de(cid:12)ned by the following conditional (cid:11);T;M;(cid:26);(cid:17)0 distributions: P (X =yj(X ;:::;X )=(x ;:::;x )) (cid:11);T;M;(cid:26);(cid:17)0 n 0 n¡1 0 n¡1 =p (x ;y); (cid:11);T;(cid:28)n(x0;:::;xn¡1) n¡1 with 8 < (cid:28) = (cid:17) ¡(cid:14); 0<r (cid:20)M; r 0 (cid:18) (cid:19) 1 : (cid:28) = (cid:28) ¡ min U(X )+(cid:28) +(cid:17) ; 0<k;0<r (cid:20)M: kM+r kM 1+(cid:26) n;n(cid:20)kM n kM 0 We have proved in [7] the following theorem. Theorem 3.1 (Catoni). For any (cid:12)xed (cid:11) > 0; the family of processes described above satis(cid:12)es for some positive constants B and K, for any choice of r 2N, (cid:17) (cid:21)0, 0 (cid:26)>0, maxP (U(X )(cid:21)(cid:17)jX =x)(cid:20)(cid:15) (cid:18) rM 0 x2S~ where (cid:18) =((cid:11);T;M;(cid:26);(cid:17) ), 0 log(1+(cid:26)) T = log(Kr=(cid:15)) (cid:16) (cid:17) M =B (cid:15) ¡log(1+D~(cid:17)0)=log(1+(cid:26))logKr; Kr (cid:15) (cid:16) (cid:17) B 1=T = log(1+(cid:26)) 1+D~ ; T (cid:17)0 (cid:18) (cid:19) (cid:26) r¡1 (cid:17) =U +(cid:26) (U ¡(cid:14)+(cid:17) )+(cid:17) (cid:26)(1+(cid:26)); min 1+(cid:26) min 0 0
Description: