Table Of ContentCopyrightinformationtobeinsertedbythePublishers
Parallel Continuous Simulated Annealing
for Global Optimization
(cid:3)
Beidi HAMMA
Parallel Algorithm Team, CERFACS , 31057 Toulouse-cedex, France.
SamiVIITANEN and AimoTO(cid:127)RN
Computer Science Department, (cid:23)Abo Akademi University, 20520 (cid:23)Abo, Finland.
In this paper a parallel algorithm for simulated annealing (S.A.) in the continuous case, the
MultipleTrialsandAdaptiveSupplementarySearch,MTASSalgorithm,ispresented. Itisbased
onacombinationofmultipletrials,localimprovedsearchsandanadaptivecoolingschedule.The
resultsinoptimizingsomestandardtestproblemsarecomparedwithasequentialS.A.algorithms
andanotherparallelprobabilisticalgorithm.
KEYWORDS: Simulatedannealing,globaloptimization,parallelalgorithms
1 INTRODUCTION
Thesimulatedannealingmethodwasintroduced in[10]byKirkpatrick,Gelattand
Vecchi and by C(cid:20)erny in [6] for solving large combinatorial optimization problems
by using algorithms simulating the annealing of solids. Such an approach was
(cid:12)rst proposed in [12]. For an introduction to simulated annealing see for example
[11]. Its principle which can be summarized as Move-Evaluate-Accept has been
successfully used in solving such problems as e.g. Traveling Salesman Problem
(TSP). More generally, it was successfully used in special situations like \Black
Boxes" problems.
The method has also been adopted for the continuous case starting with the
works of [15, 5]. An application where continuous simulated annealing is applied
to a set of standard test functions in global optimization is found in [7]. Because
applicationsofthealgorithmtolargecombinatorialproblemsmayuseconsiderable
computer time, interest in parallel versions of the algorithm arose early in [1, 2]
and is continuing in [16].
(cid:3) This workwas completedwhilethisauthorwas visitingtheComputerScienceDepartment
of(cid:23)AboAkademiUniversity,whomhewishes tothankfortheirkindhospitality. He alsothanks
theUniversit(cid:19)ePaulSabatierofToulouse,France,forits(cid:12)nancialsupport(bourseATUPS).
1
2 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN
The convergence of the algorithm in the continuous case is treated in [3]. The
convergence in probabilityof the simulatedannealing algorithmto the globalmin-
imum f(cid:3) means that the probability that the working point X will end up in a
set Bf(cid:3)(") for which f(X) (cid:0)f(cid:3) < " approaches 1 for arbitrary small " > 0. A
necessary condition for this is that the Move procedure is such that any region of
positive measure will be visited in the long run. This condition is also a su(cid:14)cient
conditionforthe primarytaskto be solvednamelyto (cid:12)ndthe globalminimumbut
is not su(cid:14)cient for convergence. Knowingthat the algorithmconverges is not very
helpful when applying it to a real problem. There is neither a guarantee that the
global minimum will be found nor that such convergence will occur because the
algorithmmustbe stopped after a (cid:12)nite number of steps. In practice the stopping
condition and also some other entities are (cid:12)xed based on heuristicsand their in-
(cid:13)uence on the result is not exactly known. This fact has lead to algorithms with
additional heuristics introduced (e.g. local optimizations) to gain e(cid:14)ciency and
making the Move step which originally favored local moves to cover the decision
space uniformlyin order to facilitateglobal exploration ([7]; [16]).
In this paper we will present a heuristic parallel simulated annealing algorithm
forthecontinuouscasebasedontheideaofheatingcycles[4]. Heatingcyclesmeans
an algorithmwhere several applications of the simulatedannealing procedures are
madewith di(cid:11)erent possibly decreasing initialtemperatures.
2 CONTINUOUS SIMULATED ANNEALING
The general simulated annealing algorithmis a Monte Carlo technique for (cid:12)nding
theglobalminimumofareal-valuedfunctionf onaset A. Wearegivenaselection
MarkovkernelR((cid:1);(cid:1))forselectingtrialpoints. Nowgivenacoolingscheduleconsist-
ingoftemperatures T0; T1; T2;:::we construct asequence ofstates X0; X1; X2;:::
andasequenceoftrialpointsY0; Y1; Y2;:::asfollows. Startingfromainitialcon(cid:12)g-
uration X0; T0 and having constructed ((X1;:::;Xk); (Y1;:::;Yk); (T1;:::;Tk)) the
nexttrialpointYk+1isgeneratedaccordingtotheprobabilitydistributionR(Xk;(cid:1)).
Then the next state is
(
Yk+1 with probabilityp(Xk;Yk+1;Tk)
Xk+1 =
Xk with probability1(cid:0)p(Xk;Yk+1;Tk)
where the probabilityp(Xk;Yk+1;Tk) ofaccepting atrialpointYk+1 when the cur-
rent state is Xk and the current temperature is Tk is called the acceptance proba-
bilityand the acceptance criterion (usuallyreferred to as the Metropolis criterion)
is given by
8 (cid:18) (cid:19)
< f(y)(cid:0)f(x)
exp (cid:0) if f(y) >f(x)
p(x;y;t)= t
:
1 if f(y) (cid:20)f(x)
This paper deals with continuous simulated annealing over a compact feasible
n
regionA(cid:26)IR . Weassumethattheglobalminimumoff isattainedintheinterior
ParallelSimulatedAnnealing 3
of A and that Bf(cid:3)(") has positive measure. We also assume that when selecting
the next trial point the whole of A is reachable in one step. This can be done for
exampleby using a normaldistributioncentered at the current pointXk as in this
paper,orbyusinguniformsamplinginA. Ithasbeenshown([3])thatconvergence
to a point X 2 Bf(cid:3)(") is assured from every starting point X0 for any cooling
schedule such that the temperature converges in probabilityto 0,i.e.,for every set
ofinitialcondition(X0;T0)the sequence offunctionvalues(f(Xk); k =0; 1; 2;:::)
converges in probabilityto the globalminimumf(cid:3). For the proof of this see [3].
3 MULTIPLE TRIALS WITH ADAPTIVE SUPPLEMENTARY SEARCH
This section describes a parallel algorithm for simulated annealing. The main
algorithm is based on the heating cycles of Bonnemoy and Hamma [4]. Their
algorithmcan be roughly stated as follows:
0
1.Initialize T (0)=CR and the variance V =(cid:23)
2.Perform S.A. until classical tests for stop are full(cid:12)lled
c c(cid:0)1
3.Change V and re-heat T (0)=R(cid:3)T (0)
4.Continue (for a user-(cid:12)xed number of cycles) from step 2
c(cid:0)1
where T (0) is the starting temperature of the previous cycle and R is a reduc-
tion factor.
The principle is to increase the temperature in the S.A. algorithmafter classical
tests for stop are satis(cid:12)ed. Without this, usually, k (the number of iterations) is
high and thus the acceptance probability is near zero. This means that there will
not be any more accepted ascent points. At this stage if the algorithm has found
a globalminimizerit is OK. But according to numerical experiments, for di(cid:14)cult
problems (high dimension, many local minimizers,...) the temperature function T
decreases tozero beforethe algorithmhasescaped fromalldi(cid:14)cultlocalminimizer
states. Soat thisstage ofrunningthe algorithm,one can \re-heat" to try to(cid:12)nd a
better minimum. If a better minimumis not found, then just some more function
evaluations have been wasted.
So, to sum up, in the heating cycles algorithmS.A. is performed but the result
is not immediatlyaccepted as is. Instead a (user-(cid:12)xed) number of supplementary
cyclesofS.A.areperformed. Everynewcyclestarts withthelastcycle's best point
and with an initialtemperature lower than the previous cycle's. A cycle ends only
when a classical test is satis(cid:12)ed.
Continuing on the same idea of a supplementary stopping criteria we introduce
the MTASS algorithm. In the MTASS algorithm the variances and temperatures
are also changed on a heating basis. The di(cid:11)erence is that the MTASS algorithm
isessentially a doubleloop. At every start of the outer loopthe variances are reset
totheir originalvaluesandtheinitialtemperatures arere-heated (as intheheating
cycles algorithmdescribed above). The inner loop (called a supplementary search)
4 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN
consists of a sequence of trials terminated when a new best point is found. The
variances are changed at the start of the inner loop and the initial temperature is
set to a value less than the value at the start of the previous iteration of the outer
loop. In case a new best point is found (the supplementary search is a success) a
new iterationofthe outer loopisstarted. In the opposite case (one iterationofthe
inner loop fails) the variances are again changed and a new iteration of the inner
loop starts with the temperature increased to a value less than the value at the
start of the previous iteration of the inner loop. After a number of failures of the
innerloopthesupplementarysearchisconsidered afailureandthewholealgorithm
terminates.
It is assumed that P processors (cid:25)1:::(cid:25)P are available. The generation of trial
pointsis done usinga normaldistribution centered atthe current point. Toobtain
moree(cid:14)cient local explorationof A somelocal improvement searches willbe done.
These will be done after the termination of the main algorithm such that each
processor will perform a supplementary search.
3.1 Multiple Trials
De(cid:12)nition 1
At iteration k, a point X generated fromXk is called:
A point of descentif f(X) (cid:20)f(Xk)
A point of accepted ascentif f(X) >f(Xk) but X is accepted according to x2
A point of refused ascent otherwise
Inordertocoveralargerregion(notonlyindi(cid:11)erent directions),everyprocessor
uses it's own variance. The processors are divided into two sets as follows:
The set QC = f(cid:25)1;(cid:25)2;:::;(cid:25)jg of processors with variances Vi < (cid:23); (i = 1::j(cid:0)1)
and Vj =(cid:23).
The set QE = f(cid:25)j+1;(cid:25)j+2;:::;(cid:25)Pg of processors with variances Vm > (cid:23); (m =
j+1::P),
where (cid:23) is user-de(cid:12)ned.
At iteration k, we initialize all P processors to be in the state Xk. They then
determine P points Xk;i (each Xk;i is drawn by processor (cid:25)i). Every Xk;i is then
said to belong to one of three sets depending on whether it is a point of descent,
ofaccepted ascent or ofrefused ascent. Denote by GD the set of pointsof descent,
by GA the set of points of accepted ascent and by GR the set of points of refused
ascent. The next point Xk+1 is then chosen as follows:
1.There are some points of descent(GD 6=;) :
Whenever there is at least one point of descent, we take
Xk+1 =Xk;d
where Xk;d 2GD and f(Xk;d)(cid:20) f(Xk;j) 8 Xk;j 2GD i.e. we take the point of
largest descent
ParallelSimulatedAnnealing 5
2.There are no points of descent but some points of accepted ascent
( (GD =;) and (GA 6=;) :
In this case we take
Xk+1 =Xk;a
where Xk;a is the point of GA with the biggest acceptance probability
3.There are only points of refused ascent ((GD =;) and (GA =;)) :
In this last situation we stay at the same point,i.e.
Xk+1 =Xk
1
These iterations of Multiple Trials we call them impulse speed search . After
classic tests for stop are full(cid:12)lledthe algorithmenters warp speed for the Adaptive
Supplementary Search.
3.2 Cooling schedule
Oneofthekeyissuesinglobaloptimizationisthequestionofveri(cid:12)cation,i.e.,given
a candidate X^(cid:3) is it true that X^(cid:3) 2Bf(cid:3)("). For a su(cid:14)ciently large number of trial
points N, this will indeed be the case, but in many practical applications one can
only a(cid:11)ord a moderately large N. Therefore the trials making up N have to be
chosen ina heuristically sound way. Since a multimodalfunction f has manylocal
minimizersoneneedsamechanismthatexploresthese minimizersinordertoarrive
inBf(cid:3)("). That is, ideally,given a set oflocal minima,visiteach of themand take
the deepest as the globalsolution.
An e(cid:14)cient algorithm then is such that it (cid:12)nds the attraction region of a local
minimizer,proceeds to(cid:12)ndthisminimizer,andthenleaves this region ofattraction
as quickly as possible to (cid:12)nd another region of attraction.
WenoticethatthesequentialS.A.algorithmisnotverywellsuitedfortheabove
task. In the early stages the acceptance probability is high and the algorithm
proceeds by \bouncing" around in an \aimless" manner. In the later stages the
e(cid:11)ect is the opposite: since the acceptance probability is low the algorithm is
\stuck"forlongtimesatone localminimizerbefore beingabletojumptoanother.
WhatweproposehereisanadaptivecoolingschedulethatallowstheS.A.algorithm
toquicklyhomeinonalocalminimizer,andoncethere, allowsittoquicklyleaveit.
This is achieved with an iterative rapid cooling schedule where the algorithmdoes
not \bounce" around but is forced to \climb down" towards the local minimizer
due to the rapidly decreasing acceptance probability. Once at the local minimizer,
the temperature is increased (re-heated) allowing a possible quick \climb out" of
the local minimizerand so on.
The initialvalueofthe temperature shouldbe chosen such thatmosttrialpoints
will be accepted at that value. This can be achieved for instance by generating a
small sample of S trial points x1:::xS and then setting T0 = 10(fM (cid:0)fm), where
fM = maxi=1::Sf(xi);fm = mini=1::Sf(xi) i.e., 10 times the di(cid:11)erence between
1 TheanalogyistakenfromtheStar Trek series.
6 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN
the maximumand minimumvalue of the cost function in the initial sample. This
givestheprobabilityofe(cid:0)110 (approximately0.9)ofacceptingapointwithfunction
valueequaltofM asthenextpointifcurrently atapointwithfunctionvalueequal
to fm.
In order to encourage a rapid convergence towards a localminimumthe temper-
ature should decrease rapidly. Therefore the temperature will be decreased after
each iteration. The number of processors used should a(cid:11)ect the temperature re-
duction so that the more processors used the faster the reduction in temperature,
i.e.,the moretrialpoints examinedthe faster the temperature drop. Therefore the
P
temperaturedecrementwillbeoftheformTk+1 =(cid:28) (cid:3)Tk,where (cid:28) isauser-de(cid:12)ned
parameter determining the speed of decrement.
Toencouragequickescapeoutofthevicinityofalocalminimizerthetemperature
will be increased (re-heated) at regular intervals. This will happen whenever the
current temperature Tk drops belowsomeuser-de(cid:12)ned treshold Tmin. A re-heating
can raise the temperature either to T0 or to some fraction of the temperature of
the previous re-heating cycle. Theoretically the temperature should converge to 0
before the algorithm terminates, but in practice some limit temperature must be
used. We use a terminationcondition of the formTr (cid:20)T0=L where Tr is the start
temperature of the r:th re-heating and L is a user-de(cid:12)ned parameter intended to
re(cid:13)ect upon the complexityof the problemto be solved. If the initialtemperature
of each re-heating cycle is choosen as T0, the terminationcriteria could instead be
some maximumnumber of re-heatings.
3.3 Adaptive Supplementary Search
The Multiple Trials with heating cycles willfavor globalexploration of the area of
interest. This will hopefully (cid:12)nd the region of attraction of promising local mini-
mizers,but inorder toexplore these locallyinan e(cid:14)cientwaylocalsupplementary
search steps are needed. In the other hand for deep local minimum states there
should be a need of big steps which will enable the algorithm to escape. So, to
avoidthe (relative) contradictionof needing both smalland bigsteps this phase of
supplemetary search is performed.
Thisphasecanbeseenasasupplementary(safe)stoppingtest. Thisisdoneafter
one,atleast,oftheclassictests(minimumtemperature,numberofacceptationions,
maximalnumber of iterations, ...) have been successfull. The supplementary safe
testrunsasequenceofsupplementarysearchescloserandcloserbysomeprocessors,
andatthe sametime,further and further awaybythe other processors. So,even if
a classic test is successfull, the algorithmwillstop only when these supplementary
searches fail.
Thesupplementarysearchesaredonebyaseriesofcontractionsandexpansionsof
the variances and then by performingS.A.on each processor with its new variance
(contracted or expanded), hoping to (cid:12)nd a point of descent outside the regions
which are otherwise investigated by the S. A. algorithm. This is done as follows:
(cid:15)ThevariancesonthoseprocessorswithvariancesVj (cid:20)(cid:23)aredividedby(cid:11);((cid:11)>1).
(cid:15) The variances on those processors with variances Vj > (cid:23) are multiplied with
ParallelSimulatedAnnealing 7
(cid:12);((cid:12) >1),
where (cid:23) (reference-variance), (cid:11) and (cid:12) are user-de(cid:12)ned.
(cid:3)
Let us assume that, after classical tests occure, X is candidate to be the global
minimum. We then call the Adaptative Supplementary Search. This means that
(cid:3)
the algorithm has now entered Warp 1. X will now become a new X0 and the
Multiple Trials continue. Nevertheless, at warp speed the Multiple Trials behave
di(cid:11)erentlyfrom\normalspeed"(alsoknownas\impulsespeed"). Ifatanyiteration
one of the processors with variances Vj > (cid:23) (cid:12)nds a point of descent (better than
the new X0) the algorithm will immidiately revert to impulse speed again. This
means that the new best point will become the new X0, all variances will be reset
to their originalvalues, and the whole algorithmstarts allover again.
In the opposite case (another K iterations without (cid:12)nding a better point) the
algorithm will enter Warp 2, i.e. reiterate Warp 1 (by dividing, once more, the
contracted variances and multiplying the expanded ones) , and so on, up to a
(user-de(cid:12)ned) maximumwarp.
For some number of warps the consequence will be either a con(cid:12)rmation of the
classic stopping tests by exploring closer and closer to the candidate (with the
processors using contracted variances) giving improved precision by (cid:12)nding close
points of descent, or an invalidation of the classic tests whenever a (far) point of
descent isfound(withtheprocessors usingexpandedvariances). Inthe formercase
the MTASS terminates, in the latter case the variances are reset to their original
values and the point of descent will be the new continuation point for the whole
algorithm. In this latter case we say that the supplementary search is a success.
8 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN
Procedure MTASS-Simulated-Annealing;
begin
initialisations;(* X0 ;T0 ;p, ... *)
label 1 variances-setting;
Whileclassical-stopping-conditions are false do
callprocedure Multiple-Trials;
endwhile(* stopping-conditions false *);
succes-of-supplementary-search := false;
call procedure Adaptative-Supplementary-Search;
if succes-of-supplementary-search:= true go to label 1;
end(* procedure MTASS-Simulated-Annealing*).
Procedure variances-setting;
begin
for j := 1 to p
assign a Vj to each processor Pj;
end(* procedure variances-setting *);
Procedure Multiple-Trials;
begin
for j := 1 to p do
assign Xk to each processor Pj;
perfomthe Metropolis Dynamiqueon each Pj;
choose Xk+1 := Xk;j0;
(cid:3)
reactualize X (if necessary);
end(* procedure Multiple-Trials*);
Procedure Adaptative-Supplementary-Search;
begin
m:=1;
repeat
divide the variances Vj of processors QC by (cid:11);
multiplythe variances Vj of processors QE with (cid:12);
for j := 1 to p do
assign Xk to each processor Pj;
perform the Metropolis Dynamiquefor each Pj;
m:=m+1;
until( succes-of-supplementary-search := true ) or ( m> max-search );
reactualize Xk+1 (* if necessary *);
end(* procedure Adaptative-Supplementary-Search *);
ParallelSimulatedAnnealing 9
De(cid:12)nition2
A Supplementary Search is said to be a success when one point of descent , at
least, is obtained.
4 NUMERICAL RESULTS
TheMTASSalgorithmwasimplementedinOCCAM-2onHathi-2,a100IMST800
transputer system. Some tests were then done using 8 transputers (i.e., P =8).
We were basically interested in two things.
- First, lookingonlyat the Multiple Trials,how muchwouldthe precision improve
over sequential S.A. given a (cid:12)xed set of stopping criteria.
- Secondly,what is the cost ofthe Adaptive Supplementary Search, i.e.,how much
will the results improve over Multiple Trials compared to the added number of
function evaluations.
In the following we give results for some standard test functions taken from
TowardsGlobalOptimization2[8]. ThesetestfunctionsareRCOS(Branin),Shekel
5 (SQRIN) and Hartman 6. Both for dimensions n=2;4;10.
4.1 Improvement due to Multiple Trials
Forthe(cid:12)rst test-case MTASSwascomparedtoaplainsequentialalgorithmofS.A.
Table 3 shows the precision obtained with the two algorithms using a (cid:12)xed set
of stopping criteria. Here MTASS was run without the Adaptive Supplementary
Search and the followingset of parameters were used for both algorithms:
(cid:15)T0 =8:0 (initialtemperature).
(cid:15)(cid:23) =4 (reference-variance).
(cid:15)L=3 (temperature reduction factor).
(cid:15)The r(cid:0)th re-heating willincrease the temperature to T0=ln(r+2).
(cid:15)Tmin =T0=100 (minimumtemperature for stopping test)
(cid:15)X0 (intialpoint) chosen randomlywithin the constraints
(cid:15)Stopping criteria 1: maxnumber of iterations = 10000
(cid:15)Stopping criteria 2: max500 iterations without (cid:12)nding a better solution
InthetablesS%isthesuccess-rate measuredasthepercentageofrunsthatreached
the domainof the global optimum. (cid:1)X is the Euclidean distance from the known
optimum(average of those runs that reached the domain of the global optimum)
(cid:3)
and(cid:1)f isthe numericalprecisionobtained,givenasf(X(cid:3))(cid:0)f(X )where f(X(cid:3))is
(cid:3)
the result obtained bythe algorithmandf(X ) isthe knownglobalvalue(average
of those runs that reached the domainof the globaloptimum).
10 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN
MTASS PlainS.A.
f FE S% (cid:1)X (cid:1)f FE S% (cid:1)X (cid:1)f
RCOS 810 100 10-2 9*10-2 1625 80 4*10-1 9*10-1
SQRIN 778 100 10-2 6*10-1 949 97 2*10-1 10-1
Hartman6 800 59 6*10-2 10-1 562 37 8*10-1 0.9
Table1: Numericalprecision
(MTASSisrunwithouttheSupplementarySearch)
As expected the MultipleTrialsofMTASSgive aclear improvementin the numer-
icalprecision. Nevertheless the use of only Multiple Trialsis not enough to always
guaranty numerical convergence e.g. Hartman 6 which is a very di(cid:14)cult global
optimization problem). And this is why we introduce the Supplementary Search
whichisadaptedtothe stateofthe algorithmandwhichcancon(cid:12)rmorinvalidates
any candidate to be a globalminimizer.
4.2 Improvement due to the Adaptive Supplementary Search
Totestthecostandbene(cid:12)tsoftheAdaptiveSupplementarySearch,thefullMTASS
algorithm was also compared against the heating cycles algorithm of Bonnemoy
and Hamma[4], in addition to the sequential S.A. algorithm. The scheme was the
following:
1.Run the full MTASS to the end and notice the number of iterations.
2.Run the heating cycles and plain S.A. algorithmsfor the same number of itera-
tions and notice the di(cid:11)erence in results.
The followingset of parameters were used for MTASS:
(cid:15)T0 =8; (cid:11)=(cid:12) =2 (contraction, expansion factors) ,(cid:23) =4 (reference-variance)
(cid:15)The temperature reduction factor R=1
(cid:15)X0 randomwithin the constraints
(cid:15)Enter Warp speed after 500 iterations without (cid:12)nding a better solution
(cid:15)Go fromWarp n to Warp n+1 after 500 iterations at Warp n without (cid:12)nding a
better solution
(cid:15)Stopping criteria: maxWarp = 3
For the heating cycles algorithmthe followingset of parameters were used:
(cid:15)T0 =8
(cid:15)(cid:23) =4 in alldimensions
(cid:15)X0 randomwithin the constraints
(cid:15)Re-heat after 500 iterations without (cid:12)nding a better solution.
Description:Computer Science Department, Abo Akademi University, 20520 Abo, Finland. Stopping criteria 1: max number of iterations = 10000 . \cheap" objective functions would be to synchronize the processors only L. C. W. Dixon and G. P. Szeg o (Eds.), Towards Global Optimization 2, North-Holland,.