ebook img

Parallel Continuous Simulated Annealing for Global Optimization PDF

14 Pages·2007·0.17 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Parallel Continuous Simulated Annealing for Global Optimization

CopyrightinformationtobeinsertedbythePublishers Parallel Continuous Simulated Annealing for Global Optimization (cid:3) Beidi HAMMA Parallel Algorithm Team, CERFACS , 31057 Toulouse-cedex, France. SamiVIITANEN and AimoTO(cid:127)RN Computer Science Department, (cid:23)Abo Akademi University, 20520 (cid:23)Abo, Finland. In this paper a parallel algorithm for simulated annealing (S.A.) in the continuous case, the MultipleTrialsandAdaptiveSupplementarySearch,MTASSalgorithm,ispresented. Itisbased onacombinationofmultipletrials,localimprovedsearchsandanadaptivecoolingschedule.The resultsinoptimizingsomestandardtestproblemsarecomparedwithasequentialS.A.algorithms andanotherparallelprobabilisticalgorithm. KEYWORDS: Simulatedannealing,globaloptimization,parallelalgorithms 1 INTRODUCTION Thesimulatedannealingmethodwasintroduced in[10]byKirkpatrick,Gelattand Vecchi and by C(cid:20)erny in [6] for solving large combinatorial optimization problems by using algorithms simulating the annealing of solids. Such an approach was (cid:12)rst proposed in [12]. For an introduction to simulated annealing see for example [11]. Its principle which can be summarized as Move-Evaluate-Accept has been successfully used in solving such problems as e.g. Traveling Salesman Problem (TSP). More generally, it was successfully used in special situations like \Black Boxes" problems. The method has also been adopted for the continuous case starting with the works of [15, 5]. An application where continuous simulated annealing is applied to a set of standard test functions in global optimization is found in [7]. Because applicationsofthealgorithmtolargecombinatorialproblemsmayuseconsiderable computer time, interest in parallel versions of the algorithm arose early in [1, 2] and is continuing in [16]. (cid:3) This workwas completedwhilethisauthorwas visitingtheComputerScienceDepartment of(cid:23)AboAkademiUniversity,whomhewishes tothankfortheirkindhospitality. He alsothanks theUniversit(cid:19)ePaulSabatierofToulouse,France,forits(cid:12)nancialsupport(bourseATUPS). 1 2 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN The convergence of the algorithm in the continuous case is treated in [3]. The convergence in probabilityof the simulatedannealing algorithmto the globalmin- imum f(cid:3) means that the probability that the working point X will end up in a set Bf(cid:3)(") for which f(X) (cid:0)f(cid:3) < " approaches 1 for arbitrary small " > 0. A necessary condition for this is that the Move procedure is such that any region of positive measure will be visited in the long run. This condition is also a su(cid:14)cient conditionforthe primarytaskto be solvednamelyto (cid:12)ndthe globalminimumbut is not su(cid:14)cient for convergence. Knowingthat the algorithmconverges is not very helpful when applying it to a real problem. There is neither a guarantee that the global minimum will be found nor that such convergence will occur because the algorithmmustbe stopped after a (cid:12)nite number of steps. In practice the stopping condition and also some other entities are (cid:12)xed based on heuristicsand their in- (cid:13)uence on the result is not exactly known. This fact has lead to algorithms with additional heuristics introduced (e.g. local optimizations) to gain e(cid:14)ciency and making the Move step which originally favored local moves to cover the decision space uniformlyin order to facilitateglobal exploration ([7]; [16]). In this paper we will present a heuristic parallel simulated annealing algorithm forthecontinuouscasebasedontheideaofheatingcycles[4]. Heatingcyclesmeans an algorithmwhere several applications of the simulatedannealing procedures are madewith di(cid:11)erent possibly decreasing initialtemperatures. 2 CONTINUOUS SIMULATED ANNEALING The general simulated annealing algorithmis a Monte Carlo technique for (cid:12)nding theglobalminimumofareal-valuedfunctionf onaset A. Wearegivenaselection MarkovkernelR((cid:1);(cid:1))forselectingtrialpoints. Nowgivenacoolingscheduleconsist- ingoftemperatures T0; T1; T2;:::we construct asequence ofstates X0; X1; X2;::: andasequenceoftrialpointsY0; Y1; Y2;:::asfollows. Startingfromainitialcon(cid:12)g- uration X0; T0 and having constructed ((X1;:::;Xk); (Y1;:::;Yk); (T1;:::;Tk)) the nexttrialpointYk+1isgeneratedaccordingtotheprobabilitydistributionR(Xk;(cid:1)). Then the next state is ( Yk+1 with probabilityp(Xk;Yk+1;Tk) Xk+1 = Xk with probability1(cid:0)p(Xk;Yk+1;Tk) where the probabilityp(Xk;Yk+1;Tk) ofaccepting atrialpointYk+1 when the cur- rent state is Xk and the current temperature is Tk is called the acceptance proba- bilityand the acceptance criterion (usuallyreferred to as the Metropolis criterion) is given by 8 (cid:18) (cid:19) < f(y)(cid:0)f(x) exp (cid:0) if f(y) >f(x) p(x;y;t)= t : 1 if f(y) (cid:20)f(x) This paper deals with continuous simulated annealing over a compact feasible n regionA(cid:26)IR . Weassumethattheglobalminimumoff isattainedintheinterior ParallelSimulatedAnnealing 3 of A and that Bf(cid:3)(") has positive measure. We also assume that when selecting the next trial point the whole of A is reachable in one step. This can be done for exampleby using a normaldistributioncentered at the current pointXk as in this paper,orbyusinguniformsamplinginA. Ithasbeenshown([3])thatconvergence to a point X 2 Bf(cid:3)(") is assured from every starting point X0 for any cooling schedule such that the temperature converges in probabilityto 0,i.e.,for every set ofinitialcondition(X0;T0)the sequence offunctionvalues(f(Xk); k =0; 1; 2;:::) converges in probabilityto the globalminimumf(cid:3). For the proof of this see [3]. 3 MULTIPLE TRIALS WITH ADAPTIVE SUPPLEMENTARY SEARCH This section describes a parallel algorithm for simulated annealing. The main algorithm is based on the heating cycles of Bonnemoy and Hamma [4]. Their algorithmcan be roughly stated as follows: 0 1.Initialize T (0)=CR and the variance V =(cid:23) 2.Perform S.A. until classical tests for stop are full(cid:12)lled c c(cid:0)1 3.Change V and re-heat T (0)=R(cid:3)T (0) 4.Continue (for a user-(cid:12)xed number of cycles) from step 2 c(cid:0)1 where T (0) is the starting temperature of the previous cycle and R is a reduc- tion factor. The principle is to increase the temperature in the S.A. algorithmafter classical tests for stop are satis(cid:12)ed. Without this, usually, k (the number of iterations) is high and thus the acceptance probability is near zero. This means that there will not be any more accepted ascent points. At this stage if the algorithm has found a globalminimizerit is OK. But according to numerical experiments, for di(cid:14)cult problems (high dimension, many local minimizers,...) the temperature function T decreases tozero beforethe algorithmhasescaped fromalldi(cid:14)cultlocalminimizer states. Soat thisstage ofrunningthe algorithm,one can \re-heat" to try to(cid:12)nd a better minimum. If a better minimumis not found, then just some more function evaluations have been wasted. So, to sum up, in the heating cycles algorithmS.A. is performed but the result is not immediatlyaccepted as is. Instead a (user-(cid:12)xed) number of supplementary cyclesofS.A.areperformed. Everynewcyclestarts withthelastcycle's best point and with an initialtemperature lower than the previous cycle's. A cycle ends only when a classical test is satis(cid:12)ed. Continuing on the same idea of a supplementary stopping criteria we introduce the MTASS algorithm. In the MTASS algorithm the variances and temperatures are also changed on a heating basis. The di(cid:11)erence is that the MTASS algorithm isessentially a doubleloop. At every start of the outer loopthe variances are reset totheir originalvaluesandtheinitialtemperatures arere-heated (as intheheating cycles algorithmdescribed above). The inner loop (called a supplementary search) 4 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN consists of a sequence of trials terminated when a new best point is found. The variances are changed at the start of the inner loop and the initial temperature is set to a value less than the value at the start of the previous iteration of the outer loop. In case a new best point is found (the supplementary search is a success) a new iterationofthe outer loopisstarted. In the opposite case (one iterationofthe inner loop fails) the variances are again changed and a new iteration of the inner loop starts with the temperature increased to a value less than the value at the start of the previous iteration of the inner loop. After a number of failures of the innerloopthesupplementarysearchisconsidered afailureandthewholealgorithm terminates. It is assumed that P processors (cid:25)1:::(cid:25)P are available. The generation of trial pointsis done usinga normaldistribution centered atthe current point. Toobtain moree(cid:14)cient local explorationof A somelocal improvement searches willbe done. These will be done after the termination of the main algorithm such that each processor will perform a supplementary search. 3.1 Multiple Trials De(cid:12)nition 1 At iteration k, a point X generated fromXk is called: A point of descentif f(X) (cid:20)f(Xk) A point of accepted ascentif f(X) >f(Xk) but X is accepted according to x2 A point of refused ascent otherwise Inordertocoveralargerregion(notonlyindi(cid:11)erent directions),everyprocessor uses it's own variance. The processors are divided into two sets as follows: The set QC = f(cid:25)1;(cid:25)2;:::;(cid:25)jg of processors with variances Vi < (cid:23); (i = 1::j(cid:0)1) and Vj =(cid:23). The set QE = f(cid:25)j+1;(cid:25)j+2;:::;(cid:25)Pg of processors with variances Vm > (cid:23); (m = j+1::P), where (cid:23) is user-de(cid:12)ned. At iteration k, we initialize all P processors to be in the state Xk. They then determine P points Xk;i (each Xk;i is drawn by processor (cid:25)i). Every Xk;i is then said to belong to one of three sets depending on whether it is a point of descent, ofaccepted ascent or ofrefused ascent. Denote by GD the set of pointsof descent, by GA the set of points of accepted ascent and by GR the set of points of refused ascent. The next point Xk+1 is then chosen as follows: 1.There are some points of descent(GD 6=;) : Whenever there is at least one point of descent, we take Xk+1 =Xk;d where Xk;d 2GD and f(Xk;d)(cid:20) f(Xk;j) 8 Xk;j 2GD i.e. we take the point of largest descent ParallelSimulatedAnnealing 5 2.There are no points of descent but some points of accepted ascent ( (GD =;) and (GA 6=;) : In this case we take Xk+1 =Xk;a where Xk;a is the point of GA with the biggest acceptance probability 3.There are only points of refused ascent ((GD =;) and (GA =;)) : In this last situation we stay at the same point,i.e. Xk+1 =Xk 1 These iterations of Multiple Trials we call them impulse speed search . After classic tests for stop are full(cid:12)lledthe algorithmenters warp speed for the Adaptive Supplementary Search. 3.2 Cooling schedule Oneofthekeyissuesinglobaloptimizationisthequestionofveri(cid:12)cation,i.e.,given a candidate X^(cid:3) is it true that X^(cid:3) 2Bf(cid:3)("). For a su(cid:14)ciently large number of trial points N, this will indeed be the case, but in many practical applications one can only a(cid:11)ord a moderately large N. Therefore the trials making up N have to be chosen ina heuristically sound way. Since a multimodalfunction f has manylocal minimizersoneneedsamechanismthatexploresthese minimizersinordertoarrive inBf(cid:3)("). That is, ideally,given a set oflocal minima,visiteach of themand take the deepest as the globalsolution. An e(cid:14)cient algorithm then is such that it (cid:12)nds the attraction region of a local minimizer,proceeds to(cid:12)ndthisminimizer,andthenleaves this region ofattraction as quickly as possible to (cid:12)nd another region of attraction. WenoticethatthesequentialS.A.algorithmisnotverywellsuitedfortheabove task. In the early stages the acceptance probability is high and the algorithm proceeds by \bouncing" around in an \aimless" manner. In the later stages the e(cid:11)ect is the opposite: since the acceptance probability is low the algorithm is \stuck"forlongtimesatone localminimizerbefore beingabletojumptoanother. WhatweproposehereisanadaptivecoolingschedulethatallowstheS.A.algorithm toquicklyhomeinonalocalminimizer,andoncethere, allowsittoquicklyleaveit. This is achieved with an iterative rapid cooling schedule where the algorithmdoes not \bounce" around but is forced to \climb down" towards the local minimizer due to the rapidly decreasing acceptance probability. Once at the local minimizer, the temperature is increased (re-heated) allowing a possible quick \climb out" of the local minimizerand so on. The initialvalueofthe temperature shouldbe chosen such thatmosttrialpoints will be accepted at that value. This can be achieved for instance by generating a small sample of S trial points x1:::xS and then setting T0 = 10(fM (cid:0)fm), where fM = maxi=1::Sf(xi);fm = mini=1::Sf(xi) i.e., 10 times the di(cid:11)erence between 1 TheanalogyistakenfromtheStar Trek series. 6 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN the maximumand minimumvalue of the cost function in the initial sample. This givestheprobabilityofe(cid:0)110 (approximately0.9)ofacceptingapointwithfunction valueequaltofM asthenextpointifcurrently atapointwithfunctionvalueequal to fm. In order to encourage a rapid convergence towards a localminimumthe temper- ature should decrease rapidly. Therefore the temperature will be decreased after each iteration. The number of processors used should a(cid:11)ect the temperature re- duction so that the more processors used the faster the reduction in temperature, i.e.,the moretrialpoints examinedthe faster the temperature drop. Therefore the P temperaturedecrementwillbeoftheformTk+1 =(cid:28) (cid:3)Tk,where (cid:28) isauser-de(cid:12)ned parameter determining the speed of decrement. Toencouragequickescapeoutofthevicinityofalocalminimizerthetemperature will be increased (re-heated) at regular intervals. This will happen whenever the current temperature Tk drops belowsomeuser-de(cid:12)ned treshold Tmin. A re-heating can raise the temperature either to T0 or to some fraction of the temperature of the previous re-heating cycle. Theoretically the temperature should converge to 0 before the algorithm terminates, but in practice some limit temperature must be used. We use a terminationcondition of the formTr (cid:20)T0=L where Tr is the start temperature of the r:th re-heating and L is a user-de(cid:12)ned parameter intended to re(cid:13)ect upon the complexityof the problemto be solved. If the initialtemperature of each re-heating cycle is choosen as T0, the terminationcriteria could instead be some maximumnumber of re-heatings. 3.3 Adaptive Supplementary Search The Multiple Trials with heating cycles willfavor globalexploration of the area of interest. This will hopefully (cid:12)nd the region of attraction of promising local mini- mizers,but inorder toexplore these locallyinan e(cid:14)cientwaylocalsupplementary search steps are needed. In the other hand for deep local minimum states there should be a need of big steps which will enable the algorithm to escape. So, to avoidthe (relative) contradictionof needing both smalland bigsteps this phase of supplemetary search is performed. Thisphasecanbeseenasasupplementary(safe)stoppingtest. Thisisdoneafter one,atleast,oftheclassictests(minimumtemperature,numberofacceptationions, maximalnumber of iterations, ...) have been successfull. The supplementary safe testrunsasequenceofsupplementarysearchescloserandcloserbysomeprocessors, andatthe sametime,further and further awaybythe other processors. So,even if a classic test is successfull, the algorithmwillstop only when these supplementary searches fail. Thesupplementarysearchesaredonebyaseriesofcontractionsandexpansionsof the variances and then by performingS.A.on each processor with its new variance (contracted or expanded), hoping to (cid:12)nd a point of descent outside the regions which are otherwise investigated by the S. A. algorithm. This is done as follows: (cid:15)ThevariancesonthoseprocessorswithvariancesVj (cid:20)(cid:23)aredividedby(cid:11);((cid:11)>1). (cid:15) The variances on those processors with variances Vj > (cid:23) are multiplied with ParallelSimulatedAnnealing 7 (cid:12);((cid:12) >1), where (cid:23) (reference-variance), (cid:11) and (cid:12) are user-de(cid:12)ned. (cid:3) Let us assume that, after classical tests occure, X is candidate to be the global minimum. We then call the Adaptative Supplementary Search. This means that (cid:3) the algorithm has now entered Warp 1. X will now become a new X0 and the Multiple Trials continue. Nevertheless, at warp speed the Multiple Trials behave di(cid:11)erentlyfrom\normalspeed"(alsoknownas\impulsespeed"). Ifatanyiteration one of the processors with variances Vj > (cid:23) (cid:12)nds a point of descent (better than the new X0) the algorithm will immidiately revert to impulse speed again. This means that the new best point will become the new X0, all variances will be reset to their originalvalues, and the whole algorithmstarts allover again. In the opposite case (another K iterations without (cid:12)nding a better point) the algorithm will enter Warp 2, i.e. reiterate Warp 1 (by dividing, once more, the contracted variances and multiplying the expanded ones) , and so on, up to a (user-de(cid:12)ned) maximumwarp. For some number of warps the consequence will be either a con(cid:12)rmation of the classic stopping tests by exploring closer and closer to the candidate (with the processors using contracted variances) giving improved precision by (cid:12)nding close points of descent, or an invalidation of the classic tests whenever a (far) point of descent isfound(withtheprocessors usingexpandedvariances). Inthe formercase the MTASS terminates, in the latter case the variances are reset to their original values and the point of descent will be the new continuation point for the whole algorithm. In this latter case we say that the supplementary search is a success. 8 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN Procedure MTASS-Simulated-Annealing; begin initialisations;(* X0 ;T0 ;p, ... *) label 1 variances-setting; Whileclassical-stopping-conditions are false do callprocedure Multiple-Trials; endwhile(* stopping-conditions false *); succes-of-supplementary-search := false; call procedure Adaptative-Supplementary-Search; if succes-of-supplementary-search:= true go to label 1; end(* procedure MTASS-Simulated-Annealing*). Procedure variances-setting; begin for j := 1 to p assign a Vj to each processor Pj; end(* procedure variances-setting *); Procedure Multiple-Trials; begin for j := 1 to p do assign Xk to each processor Pj; perfomthe Metropolis Dynamiqueon each Pj; choose Xk+1 := Xk;j0; (cid:3) reactualize X (if necessary); end(* procedure Multiple-Trials*); Procedure Adaptative-Supplementary-Search; begin m:=1; repeat divide the variances Vj of processors QC by (cid:11); multiplythe variances Vj of processors QE with (cid:12); for j := 1 to p do assign Xk to each processor Pj; perform the Metropolis Dynamiquefor each Pj; m:=m+1; until( succes-of-supplementary-search := true ) or ( m> max-search ); reactualize Xk+1 (* if necessary *); end(* procedure Adaptative-Supplementary-Search *); ParallelSimulatedAnnealing 9 De(cid:12)nition2 A Supplementary Search is said to be a success when one point of descent , at least, is obtained. 4 NUMERICAL RESULTS TheMTASSalgorithmwasimplementedinOCCAM-2onHathi-2,a100IMST800 transputer system. Some tests were then done using 8 transputers (i.e., P =8). We were basically interested in two things. - First, lookingonlyat the Multiple Trials,how muchwouldthe precision improve over sequential S.A. given a (cid:12)xed set of stopping criteria. - Secondly,what is the cost ofthe Adaptive Supplementary Search, i.e.,how much will the results improve over Multiple Trials compared to the added number of function evaluations. In the following we give results for some standard test functions taken from TowardsGlobalOptimization2[8]. ThesetestfunctionsareRCOS(Branin),Shekel 5 (SQRIN) and Hartman 6. Both for dimensions n=2;4;10. 4.1 Improvement due to Multiple Trials Forthe(cid:12)rst test-case MTASSwascomparedtoaplainsequentialalgorithmofS.A. Table 3 shows the precision obtained with the two algorithms using a (cid:12)xed set of stopping criteria. Here MTASS was run without the Adaptive Supplementary Search and the followingset of parameters were used for both algorithms: (cid:15)T0 =8:0 (initialtemperature). (cid:15)(cid:23) =4 (reference-variance). (cid:15)L=3 (temperature reduction factor). (cid:15)The r(cid:0)th re-heating willincrease the temperature to T0=ln(r+2). (cid:15)Tmin =T0=100 (minimumtemperature for stopping test) (cid:15)X0 (intialpoint) chosen randomlywithin the constraints (cid:15)Stopping criteria 1: maxnumber of iterations = 10000 (cid:15)Stopping criteria 2: max500 iterations without (cid:12)nding a better solution InthetablesS%isthesuccess-rate measuredasthepercentageofrunsthatreached the domainof the global optimum. (cid:1)X is the Euclidean distance from the known optimum(average of those runs that reached the domain of the global optimum) (cid:3) and(cid:1)f isthe numericalprecisionobtained,givenasf(X(cid:3))(cid:0)f(X )where f(X(cid:3))is (cid:3) the result obtained bythe algorithmandf(X ) isthe knownglobalvalue(average of those runs that reached the domainof the globaloptimum). 10 B.HAMMA, S.VIITANEN andA.TO(cid:127)RN MTASS PlainS.A. f FE S% (cid:1)X (cid:1)f FE S% (cid:1)X (cid:1)f RCOS 810 100 10-2 9*10-2 1625 80 4*10-1 9*10-1 SQRIN 778 100 10-2 6*10-1 949 97 2*10-1 10-1 Hartman6 800 59 6*10-2 10-1 562 37 8*10-1 0.9 Table1: Numericalprecision (MTASSisrunwithouttheSupplementarySearch) As expected the MultipleTrialsofMTASSgive aclear improvementin the numer- icalprecision. Nevertheless the use of only Multiple Trialsis not enough to always guaranty numerical convergence e.g. Hartman 6 which is a very di(cid:14)cult global optimization problem). And this is why we introduce the Supplementary Search whichisadaptedtothe stateofthe algorithmandwhichcancon(cid:12)rmorinvalidates any candidate to be a globalminimizer. 4.2 Improvement due to the Adaptive Supplementary Search Totestthecostandbene(cid:12)tsoftheAdaptiveSupplementarySearch,thefullMTASS algorithm was also compared against the heating cycles algorithm of Bonnemoy and Hamma[4], in addition to the sequential S.A. algorithm. The scheme was the following: 1.Run the full MTASS to the end and notice the number of iterations. 2.Run the heating cycles and plain S.A. algorithmsfor the same number of itera- tions and notice the di(cid:11)erence in results. The followingset of parameters were used for MTASS: (cid:15)T0 =8; (cid:11)=(cid:12) =2 (contraction, expansion factors) ,(cid:23) =4 (reference-variance) (cid:15)The temperature reduction factor R=1 (cid:15)X0 randomwithin the constraints (cid:15)Enter Warp speed after 500 iterations without (cid:12)nding a better solution (cid:15)Go fromWarp n to Warp n+1 after 500 iterations at Warp n without (cid:12)nding a better solution (cid:15)Stopping criteria: maxWarp = 3 For the heating cycles algorithmthe followingset of parameters were used: (cid:15)T0 =8 (cid:15)(cid:23) =4 in alldimensions (cid:15)X0 randomwithin the constraints (cid:15)Re-heat after 500 iterations without (cid:12)nding a better solution.

Description:
Computer Science Department, Abo Akademi University, 20520 Abo, Finland. Stopping criteria 1: max number of iterations = 10000 . \cheap" objective functions would be to synchronize the processors only L. C. W. Dixon and G. P. Szeg o (Eds.), Towards Global Optimization 2, North-Holland,.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.