ebook img

Dominance-Based Multi-Objective Simulated Annealing PDF

21 Pages·2008·1.98 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Dominance-Based Multi-Objective Simulated Annealing

ORE Open Research Exeter TITLE Dominance-Based Multiobjective Simulated Annealing AUTHORS Smith, Kevin I.; Everson, Richard M.; Fieldsend, Jonathan E.; et al. JOURNAL IEEE Transactions on Evolutionary Computation DEPOSITED IN ORE 05 March 2013 This version available at http://hdl.handle.net/10871/15260 COPYRIGHT AND REUSE Open Research Exeter makes this work available in accordance with publisher policies. A NOTE ON VERSIONS The version presented here may differ from the published version. If citing, you are advised to consult the published version for pagination, volume/issue and date of publication IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION, DRAFT UNDERREVIEW 1 Dominance-Based Multi-Objective Simulated Annealing Kevin I. Smith, Richard M. Everson, Jonathan E. Fieldsend Member, IEEE, Chris Murphy and Rashmi Misra Abstract—Simulated annealing is a provably convergent opti- thereareseveralwelldevelopedgeneticalgorithmsandevolu- miser for single-objective problems. Previously proposed multi- tionaryschemestoaddresssuchmulti-objectiveproblems(see, objective extensions have mostly taken the form of a single- [4], [5] for recent reviews), simulated annealing does not, in objective simulated annealer optimising a composite function of its usual formulation, provide a method for optimising more the objectives. We propose a multi-objective simulated annealer utilisingtherelativedominanceofasolutionasthesystemenergy than a single objective.Simulated annealinghasbeen adapted foroptimisation,eliminatingproblemsassociatedwithcomposite to multi-objective problems by combining the objectives into objective functions. We also propose a method for choosing a single objective function [6]–[10]; however, these methods perturbationscalingspromotingsearch bothtowardsandacross either damage the proof of convergence, or are limited (po- the Pareto front. tentially severely)in theirability to fullyexplorethe trade-off We illustrate the simulated annealer’s performance on a suite of standard test problems and provide comparisons with surface. another multi-objective simulated annealer and the NSGA-II Weproposeamodifiedsimulatedannealingalgorithmwhich genetic algorithm. The new simulated annealer is shown to maps the optimisation of multiple objectives to a single- promote rapid convergence to the true Pareto front with a good objective optimisation using the true trade-off surface, main- coverage of solutions across it comparing favourably with the taining the convergence properties of the single-objective an- other algorithms. An application of the simulated annealer to an industrial nealer while encouragingexplorationof the full trade-offsur- problem, the optimisation of a Code Division Multiple Access face. A method of practical implementation is also described, (CDMA) mobile telecommunications network’s air interface, is usingtheavailablenon-dominateddatapointsfromthecurrent presented and the simulated annealer is shown to generate optimisation to overcome the limitation that the true trade-off non-dominated solutions with an even and dense coverage that surface is unavailable for most real-world problems. outperform single objective genetic algorithm optimisers. In this paper, following some introductory material in sec- Index Terms—Multiple objectives, simulated annealing, dom- tion II, we start by briefly discussing methods that combine inance, CDMA networks. objectives into a single composite objective. In section III we describe our dominance-based SA algorithm and, in section I. INTRODUCTION IV, methods are described for improving the quality of the A popularandrobustalgorithmforsolvingsingle-objective optimisation energy measure when the available data points optimisation problems (those in which the user cares only are few. Choosing an efficient scale for perturbations is an about a single dependant variable of the system) is simulated important component of scalar SA algorithms. The issue is annealing (SA) [1], [2]. Geman & Geman [3] provided a further complicated in multi-objective algorithms because a proofthatsimulatedannealing,ifannealedsufficientlyslowly, perturbation may not only move the current state closer to or converges to the global optimum, and although the required furtherfromtheParetofront,butalsotransversally(i.e.,across cooling rate is infeasibly slow for most purposes, simulated the front). In section VI we describe a method for setting the annealing often gives well convergedresults when run with a scale of perturbations and other run-time parameters. Results fastercoolingschedule.Itisfrequentlythecaseinoptimisation showing that the algorithm converges on a range of standard problems, however, that there are several objectives of the test problems are given in section VII, and we show that system which the user is interested in optimising simultane- the algorithm compares favourably with both the popular ously.Clearly,simultaneousoptimisationofseveralobjectives NSGA-II multi-objective genetic algorithm [11] and a multi- is usually impossible and the curve (for two objectives) or objective simulated annealer suggested by Nam & Park [8]. surface (forthree or more objectives)thatdescribesthe trade- InsectionVIIIwepresentresultsdemonstratingthesimulated offbetweenobjectivesis knownas thePareto-front.Although annealer’sperformanceontheoptimisationoftheairinterface of a Code Division Multiple Access (CDMA) network in Duringthegenerationofthismanuscript,KevinSmithwassupportedbythe the mobile telecommunicationsdomain.We draw conclusions EngineeringandPhysicalSciencesResearchCouncil(EPSRC)andMotorola in section IX. A preliminary report on this work appeared andJonathan FieldsendwassupportedbytheEPSRC,grantGR/R24357/01. KIS, RME and JEF are with the Department of Computer in [12]; here we provide additional detail on the theoretical Science, University of Exeter, Exeter, EX4 4QF, UK. (e-mail: foundations of the algorithm and present extensive empirical {K.I.Smith, R.M.Everson, J.E.Fieldsend}@exeter.ac.uk); CM and RM results comparing the algorithm with the NSGA-II genetic are with Motorola, Thamesdown Drive, Swindon, SN25 4XY, UK. (e-mail: {Chris.Murphy,Rashmi.Misra}@motorola.com) algorithm and the Nam & Park simulated annealer, together 0000–0000/00$00.00 (cid:13)c 2005IEEE IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION, DRAFT UNDERREVIEW 2 with the application to CDMA networks. Algorithm 1 Simulated annealing Inputs: II. BACKGROUND {Lk}Kk=1 Sequence of epoch durations A. Dominance and Pareto Optimality {Tk}Kk=1 Sequence of temperatures, Tk+1 <Tk x Initial feasible solution In multi-objective optimisation we attempt to simultane- ously maximise or minimise D objectives, y , which are i 1: for k :=1,...,K functionsof P variable parametersor decision variables, x= 2: for i:=1,...,L (x1,x2,...,xP): 3: x′ :=perturb(x)k y =f (x), i=1,...,D. (1) 4: δE(x′,x):=E(x′)−E(x) i i 5: u:=rand(0,1) Withoutlossofgenerality,weassumethattheobjectivesareto 6: if u<min(1,exp(−δE(x′,x)/T )) beminimised,sothatthemulti-objectiveoptimisationproblem k 7: x:=x′ may be expressed as: 8: end Minimise y=f(x)≡(f1(x),...,fD(x)). (2) 9: end 10: end The idea of dominance is generally used to compare two solutions f and g. If f is no worse for all objectives than g and wholly better for at least one objective it is said that f dominates g, written f ≺g. Thus f ≺g iff: the simulation accordingto an annealingschedule. At each T the SA algorithm aims to draw samples from the equilibrium f ≤g ∀i=1,...,D and i i (3) distribution π (x)∝exp{−E(x)/T}. As T →0 sufficiently f <g for at least one i. T i i slowlyanincreasingproportionoftheprobabilitymassofπ , T By a slight abuse of notation, dominance in objective space is concentratedin the region of the global minimum of E, so is extended to parameter space; thus it is said that a≺ b iff eventually, assuming a sufficiently slow annealing schedule f(a)≺f(b). is used, any sample from π will almost surely lie at the T Thedominatesrelationisnotatotalorderandtwosolutions minimum of E. aremutuallynon-dominatingif neitherdominatesthe other.A Sampling from the equilibrium distribution π (x) at any T set F of solutions is said to be a non-dominating set if no particular temperature is usually achieved by Metropolis- element of the set dominates any other: Hastings sampling [2], which involves making proposals x′ that are accepted with probability a6≺b ∀ a,b∈F (4) A solution is said to be globally non-dominated, or Pareto- A=min(1,exp{−δE(x′,x)/T}) (5) optimal, if no other feasible solution dominates it. The set of all Pareto-optimalsolutionsis knownas the Pareto-optimal where front,ortheParetoset,P;solutionsintheParetosetrepresent the possible optimal trade-offsbetween competing objectives. δE(x′,x)≡E(x′)−E(x). (6) A human operator can select a solution with a knowledge of the trade-offs involved once this set has been revealed. Intuitively, when T is high perturbations from x to x′ which Heuristic procedures, such as multiple objective evolutionary increase the energy are likely to be accepted (in addition algorithms and the multi-objective simulated annealing algo- to perturbations which decrease the energy, which are al- rithms discussed here, yield sets of mutually non-dominating ways accepted) and the samples can explore the state space. solutions which will be only an approximation to the true Subsequently, as T is reduced, only perturbations leading Paretofront.Somecarewithterminologyisthereforerequired, to small increases in E are accepted, so that only limited and in this paper the set produced by such an algorithm is explorationispossibleasthesystemsettleson(hopefully)the referred to as the estimated Pareto front, which we denote by global minimum. The algorithm is summarised in Algorithm F. 1:duringeach of K epochs,the computationaltemperatureis fixed at T and L samples are drawn from π before the k k Tk B. Simulated Annealing temperature is lowered in the next epoch. Each sample is a Simulated annealing, introduced by Kirkpatrick et al. [1] perturbation (‘mutation’ in the nomenclature of evolutionary may be thought of as the computational analogue of slowly algorithms) of the current state from a proposal density (line cooling a metal so that it adopts a low-energy, crystalline 3); the perturbed state x′ is accepted with probability given state. At high temperatures particles are free to move around, by (5), as shown in lines 4-8. whereas as the temperature is lowered they are increasingly As already alluded to, convergence is guaranteed if and confined due to the high energy cost of movement. It is only if the cooling schedule is sufficiently gradual [3], but physically appealing to call the function to be minimised the experience has shown SA to be a very effective optimisation energy, E(x), of the state x, and to introduce a parameter T, technique even with relatively rapid cooling schedules [13], the computational temperature, which is lowered throughout [14]. IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION, DRAFT UNDERREVIEW 3 C. Multi-Objective SA with Composite Objective Functions is re-seeded with another solution from the non-dominated archive to promote a more even coverage. Suppapitnarm et An attractive approachto multi-objectivesimulated anneal- al. promote exploration along the front by unconditionally ing (MOSA), adopted by several investigators[7]–[10],[15]– accepting proposals that are not dominated by any member [18], is to combine the objectives as a weighted sum: of F, otherwise using (5). D Of the multi-objective simulated annealing techniques in E(x)= w f (x). (7) X i i the current literature, perhaps the most promising is that of i=1 Nam & Park [8] due to their use of dominance in state The composite objective is then used as the energy to be change probabilities. In this algorithm the relative dominance minimised in a scalar SA optimiser. An equivalentalternative of the current and proposed solutions is tested and when the [6] is to sum logfi(x), and others (e.g., [8], [16]) have proposedsolution dominatesthe currentsolution the proposal investigated a number of non-linear and stochastic composite is accepted; this is analogous to the automatic acceptance energies. of proposals with a lower state energy in single-objective Itisclearthatsimulatedannealingwitha compositeenergy simulated annealing. In addition to the widespread practise (7) will converge to points on the Pareto optimal front where of employing a state change probability which guarantees the objectives have ratios given by w−1, if such points exist. acceptance of strictly superior perturbations, Nam & Park i However, it is unclear how to choose the weights in advance, modify the acceptance rule so that proposals are accepted indeed, one of the principal advantages of multi-objective with probability given by (5) and (7) if they are dominated optimisation is that the relative importance of the objectives by x, but unconditionally accepted if x′ ≺ x or if x′ and can be decided with the estimated Pareto front on hand. x are mutually non-dominating. This promotes exploration Perhaps more importantly, parts of the front are inaccessible of the search space and escape from local fronts but as the with fixed weights [19]. Recognising this, investigators have dimensionality increases so does the proportion of all moves proposed a variety of schemes for adapting the wi during the which are accepted unconditionally.This limits the behaviour annealing process to encourage exploration along the front. of the algorithm to that of a random walk through the search See for example [20]. space when dealing with problems with high dimensionality. Itisnaturalto keepan archive,F, of allthenon-dominated When the proposed solution is dominated by the current solutions found so far, and this archive may be utilised to solution,Nam& Parkdefinetheenergydifferencecontrolling furtherexplorationbyperiodicallyrestartingtheannealerfrom acceptanceas the average differencein objectivevalues. Nam a randomly chosen element of F [10]. & Park also employ 100 separate agents during optimisation, A proposal x′ in scalar SA is either better or worse than whereeachagentisanindependentcopyofthealgorithm;this thecurrentstatexdependingonthesignofδE(x′,x);except serves a similar function to Suppapitnarm etal.’s return-to- for pathological problems the probability that δE = 0 is base approach to promoting diversity of the solutions located vanishingly small. In multi-objective SA, however, x′ may by the algorithm. dominate x or x′ may be dominated by x or they may Although it is clear that the assurance of a convergence be mutually non-dominating: in fact, the probability that a proofcanbeprovidedforamulti-objectivesimulatedannealer pair of randomly chosen points in D-dimensional space are using a scalar objective function and fixed weights (7), such mutually non-dominatingis 1−2 1 D, so the mutually non- annealers are fundamentally limited in their coverage of the (cid:0)2(cid:1) dominating case becomes increasingly common with more Pareto front. On the other hand, it is difficult to see how objectives.However,energiessuchas(7)mayleadtox′ being proofs of convergence might be obtained with the heuristic accepted unconditionally (δE < 0) even though x′ 6≺ x, modifications designed to promote exploration transversal to because a large negative energy change from one objective the front. Given these difficulties, defining a multi-objective may outweigh small positive changes on the other objectives. simulated annealer which utilises a composite objective func- Each multi-objective simulated annealing algorithm which tion is undesirable. With this in mind, we investigate the utilisesacompositeobjectivefunctionmustthereforedealwith efficacy of an energy function based on the defining notion this behaviour in some manner. of dominance. The aim is the definition of a single energy A good example of a composite objective function ap- functionappropriatetoallcasesofrelativedominancebetween proach to multi-objective simulated annealing is given by x and x′ withoutrequiringspecialcases forwhere x′ ≺x, or Suppapitnarm etal.[10]. Instead of weighting and summing where x′ and x are mutuallynon-dominating,as has beenthe the objectives to produce a composite energy difference for case in previous algorithms. the acceptance criteria, this algorithm uses a multiplicative function with individual temperatures for each objective each III. ADOMINANCE BASED ENERGY FUNCTION of which is adjusted independently by the algorithm. This negates the need for a priori weighting of the objectives, In single objective optimisation problems the sign of the and can be considered to function as a weighted compos- difference in energy δE(x′,x) tells us whether the proposal ite sum approach with algorithmically controlled weightings. x′ is a better, worse or (very rarely) equally good solution This is vulnerable to the concentrated search properties of as the current solution x. Likewise the dominance relation other composite objective techniques and Suppapitnarmetal. can be used to compare the relative merit of x′ and x in employ a return-to-base scheme whereby the current solution multi-objectiveproblems,butnotethatitgivesessentiallyonly IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION, DRAFT UNDERREVIEW 4 P then E(x)=0, and solutionsmore distant fromthe frontare, in general, dominated by a greater proportion of P and so 2 have a higher energy; in Figure 1 the solution marked by an e v open circle has a greater energy than the one marked by a cti e filled circle. bj O Clearly this formulation of an energy does not rely on an a priori weighting of the objectives and the assurances of convergence [3] for uni-objective SA continue to hold in this case. Since all solutions lying on the front have equal minimum energy, we would anticipate that a simulated annealer using this energy would, having reached the front, perform a random walk exploration of the front. We note that Fleischer [21] has proposed an alternative measure of a non-dominated set, which may be loosely char- Objective 1 acterised as being based on the volume dominated by the set rather than the area of the dominating set. Fig.1. EnergyfromareaofthetruePareto frontP dominating asolution. Unfortunately,the true Pareto frontP is unavailableduring SolutionsaremarkedbycirclesandlinesindicatetheregionsofPdominating the course of an optimisation.We therefore proposeto use an eachone. energy defined in terms of the current estimate of the Pareto front,F,whichisthesetofmutuallynon-dominatingsolutions P foundthus far in the annealing.Denote byF˜ the unionof the 2 F, the currentsolutionx and the proposedsolution x′, that is e ctiv F˜ =F ∪{x}∪{x′}. (10) e bj Then, in a similar manner to (8), let F˜ be the elements of F˜ O x that dominate x: F F˜ ={y∈F˜|y≺x}. (11) x We note that |F˜ | is a quantity similar to one used in the x rankingmethodproposedbyFonseca&Fleming[22],namely the number of solutions in a search population that dominate x plus 1. Using F˜ we obtain an energy difference between x the current and proposed solutions of Objective 1 1 δE(x′,x)= |F˜|(cid:16)|F˜x′|−|F˜x|(cid:17). (12) Fig.2. EnergyfromproportionoftheestimatedParetofrontF dominating pointsdominatingasolution.ElementsofF areshownassmallgreycircles, Division by |F˜| ensures that δE is always less than unity and solutions areshownaslargeropenorfilledcircles. provides some robustness against fluctuations in the number of solutions in F. If F˜ is a non-dominating set the energy difference between any two of its elements is zero. Note also three values of quality—better, worse, equal—in contrast to that δE(x′,x) = −δE(x,x′). The inclusion of the current the energydifferencein uni-objectiveproblemswhichusually solution and the proposal in F˜ means that δE(x′,x) < 0 if gives a continuum. x′ ≺x, which ensures that proposalsthat move the estimated If the true Pareto frontP were available, we coulddefine a fronttowardsthetruefrontarealwaysaccepted.Proposalsthat simple energyof x as the measureof the frontthatdominates aredominatedbyoneormoremembersofthecurrentarchive f(x). Let P be the portion of P that dominates f(x): x are acceptedwith a probabilitydependinguponthe difference P ={y∈P|y≺f(x)}. (8) in the number of solutions in the archive that dominate x′ x and x. We emphasise that this probability does not depend Then we define uponmetricinformationinobjectivespace;weputnoapriori weighting on the objectives and the acceptance probability is E(x)=µ(P ) (9) x unaffected by rescalings of the objectives. where µ is a measure defined on P. We shall be principally Afurtherbenefitofthisenergymeasureisthatitencourages interested in finite sets approximating P and so shall take explorationofsparselypopulatedregionsofthefront.Imagine µ(P )tobesimplythecardinalityofP .IfP isacontinuous two proposals, each dominated by some solutions in F; for x x set, we can take µ to be the Lebesgue measure (informally, example, the solutions illustrated by the filled and unfilled thelength,areaorvolumefor2,3or4objectives);wefurther circles in Figure 2. The solution that is dominated by fewer discussmeasuresinducedonP insectionVII-E.Asillustrated elements (the unfilled circle) has the lower energy and would in Figure1, thisenergyhasthepropertieswe desire:if x∈P therefore be more likely to be accepted as a proposal. IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION, DRAFT UNDERREVIEW 5 Defining the energy in this manner, unlike some pro- A. Conditional Removal of Dominated Points posed multi-objective enhancements to simulated annealing A straightforward method for increasing the size of the discussedinsectionII-C,providesasingleenergyfunctionen- archive is not to delete solutions known to be dominated couragingbothconvergencetoandcoverageoftheParetofront if deleting them would reduce |F| below some predefined without requiring other modifications to the single-objective minimum. However, the existence of old solutions in F, may simulated annealing algorithm (beyond the obvious storage lead to desirable proposals(i.e., notnon-dominatedsolutions) of an archive of the estimated Pareto front). In particular no being rejected. In addition the old solutions may bias the additionalrulesarerequiredforcasesinwhichthecurrentand search away from regions of the front that were previously proposed solutions are mutually non-dominating. well populated. Convergence to the true Pareto front is no longer an A further disadvantage of this method is that the retained immediate consequence of Geman & Geman’s work [3], solutionsmaybe positionedso thattheyaredominatedbythe because the energy based on F is only an approximation archive and indeed by the current point and the vast majority to (9). However, Greening [23] offers proof of convergence, of proposals.In this case they serve to increase the resolution albeit more slowly, even when the energy contains errors. of the energy at the expense of the range. By contrast the Current work is investigating the application of this work to interpolation method using the attainment surface that we MOSA and in section VII we offer empirical evidence of the proposebelowinsiststhatinterpolatingpointsareonlyweakly convergence. dominated by the archive. An energy function based on (12) is straightforward to calculate;countingthenumberofelementsofF˜ thatdominate B. Linear Interpolation x and x′ can be achieved in logarithmic time [24], [25]. Another apparently suitable method of augmenting F is Our proposed multi-objective algorithm closely follows the linear interpolation (in objective space) between the solutions standardSA algorithm(Algorithm1),theonlyadditionthatis in F. In this method, when the archive is smaller than some necessary is to maintain an archive, F of the currentestimate predefined size, new points in objective space are generated of the Pareto front and to calculate the energy difference on the simplices defined by an element of F and its D − using (12). However, we postpone detailed description of the 1 nearest neighbours in F. This overcomes the limitations algorithm until methods of increasing the empirical energy of the previous method: Since new solutions are generated resolution have been discussed. ‘on’ the current estimated Pareto front, the problems which could occur with using old, dominated elements of F in the energy calculations are avoided. The interpolated points IV. INCREASING ENERGY RESOLUTION generated can also be evenly distributed between the current estimated Pareto-optimal solutions, which is beneficial as it doesnot deter the algorithmfrom exploringany region of the As mentioned earlier, the true Pareto-optimal front of so- estimated front which is not already densely populated. The lutions is, in general, unavailable to us. While using the principaldisadvantageofthismethodisthatproposalsmaybe archive of the estimated Pareto front F provides an estimate dominatedby an interpolatedpoint,but notby anyof the real of solution energy, when F is small the resolution in the elementsof F, meaning that the proposalmay erroneouslybe energies can be very coarse. In fact, the difference in energy betweentwosolutionsisanintegermultipleof1/|F˜|between disregarded. 0 and 1. Since the acceptance criterion (5) for new solutions is determined by the difference in energy δE(x,x′) between C. Attainment Surface Sampling the current solution and the proposed solution, low resolu- Consideration of the previous two methods of augmenting tion of the energies leads to a low resolution in acceptance theestimatedParetofrontsuggeststhattheaugmentingpoints probabilities. At low computational temperatures and with should have the following properties. small archives it will become increasingly likely that this 1) The augmentingpoints must be sufficiently close to the granularity will make it almost impossible for even slightly currentestimationoftheParetofrontthattheycanaffect detrimental moves (i.e., moves that increase E(x)) to be the energy of solutions generated near to the current made. This is undesirable as, at its most severe, this effect estimated Pareto front. reduces the algorithm to behaviour similar to a greedy search 2) Theymustbeevenlydistributedacrossthecurrentlyesti- optimiser,andpreventstheexploratorybehaviourprovidedby matedParetofrontsoastonotdiscouragethealgorithm detrimental moves. fromacceptingproposalsinpoorlypopulatedregionsof For this reason, and because a limited archive may inhibit the front. convergence [24], [26], we do not constrain the size of the 3) Theymustnotdominateanyproposalwhichisnotdom- archive. In fact, in order to increase the energy resolution we inatedbyanymemberofF, so thatpotentialentrantsto examinemethodsforusingalargersetforenergycalculations. the archive are not discarded. A consequence of this is There are a couple of straightforward, but ultimately inade- that they must all be dominatedby at least one member quate, methods for artificially increasing the size of F which of F. we now briefly discuss before describing a method using the The attainment surface, which has previously been used attainment surface. for estimated Pareto front visualisation [27] and is closely IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION, DRAFT UNDERREVIEW 6 2 U e v cti 1 bje 0.9 O 0.8 0.7 S Objective 300..56 0.4 H 0.3 0 0.2 0.5 0.1 0 0.2 0.4 0.6 0.8 1 1 Objective 1 Objective 1 Objective 2 Fig. 4. 10000 samples from the attainment surface for an archive of 10 Fig.3. Attainment surfaceSF istheboundaryoftheregion, U dominated points,whicharemarkedwithheavydots. bythenon-dominatedsetF,whoseelementsaremarkedasdots.Dashedlines denoteH theminimumrectangle containing F. Algorithm 2 Sampling a point from the attainment surface Then,asillustratedinFigure3,weinterpolateF withrandom samples uniformly distributed on S ∩ H , the attainment Inputs: F F {L }D Elements of F, sorted by surface restricted to HF. From the definition of SF it is d d=1 apparentthat interpolatedpointsare dominatedby an element increasing coordinate d of F, thus satisfying the third criterion. Uniform random sampling ensures that the second criterion is met, as is the 1: for i:=1,...,D Generate a random point, v 2: v :=rand(min(L ),max(L )) first criterion because SF interpolates F. i i i Sampling from S may be performed using Algorithm 2, 3: end F 4: d:=randint(1,D) Choose a dimension, d which works by sampling a point from a uniform distribution on the surface of H and then restricting one coordinate 5: for i:=1,...,|F| Find smallest v s.t. v is F d so that the point is dominated by an element of F. This is 6: u:=L dominated by an element of F d,i facilitatedbytheuseoflistsL ,d=1,...,Dwhichcomprise 7: v :=u d d d the elements of F sorted in increasing order of coordinate d. 8: if F ≺v Determining whether an element of F dominates v on line 8 9: return v may be efficiently implemented using binary searches of the 10: end listsL ,inwhichcase thealgorithmrequiresO(|F|log(|F|)) 11: end d time for the generation of each sample. Figure 4 illustrates the sampled attainment surface for a set of ten 3-dimensional points; 10000 samples are shown for visualisation. In the related to the attainment function [28], is an interpolating experiments reported in section VII F was augmented with surface between the elements of F that has the requisite 100 samples from S before calculating the energy of the properties. The attainment surface, S , corresponding to F F F proposal. With more objectives the energy resolution can be is a conservative interpolation of the elements of F so that beneficially increased by sampling more interpolating points. every point of S is dominated by an element of F. The F It is important to note that the purpose of attainment surface attainment surface for an F comprising four two-dimensional samplingis to uniformlyincrease the resolution of the energy elementsissketchedinFigure3.Moreformally,theattainment function across F and that, if performedextensively,this will surfaceistheboundaryoftheregioninobjectivespacewhich partially negate the benefit of the energy function guiding is dominated by elements of F. If u,v ∈ RD, we say that u properly dominates v (denoted u⊳v) iff u < v ∀i = search towardslesser-populatedregionsof F. For this reason, i i the number of sampled points should not be too high and it 1,...,D. Then if is advisable to only sample when it is necessary to increase F ={y|u≺y for some u∈F} (13) the resolution. The results presented here, where 100 samples U ={y|u⊳y for some u∈F} (14) from SF are always taken, demonstrate that sampling when not strictly necessary does not prevent convergence. the attainment surface is S =F \U =∂U. F Let H be the minimum axis-parallel hyper-rectanglecon- F taining F; that is, the hyper-rectangle defined by V. MULTI-OBJECTIVESIMULATEDANNEALING ALGORITHM HF =[min(y1),max(y1)]×...×[min(yD),max(yD)]. y∈F y∈F y∈F y∈F Having discussed sampling from the attainment surface to (15) increase the energy resolution, we are now in a position to IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION, DRAFT UNDERREVIEW 7 summarise the main points of our proposed multi-objective Algorithm 3 Multi-objective simulated annealing simulated annealing algorithm. As shown in Algorithm 3, Inputs: the multi-objective algorithm differs from the uni-objective {L }K Sequence of epoch durations k k=1 algorithm in that an archive F of non-dominated solutions {Tk}Kk=1 Sequence of temperatures, Tk+1 <Tk foundso far is maintained,and the energydifferencebetween x Initial feasible solution the proposed and current solution is calculated based on the current archive or its attainment surface. 1: F :={x} Initialise archive The archive is initialised with the initial feasible point 2: for k :=1,...,K (line 1 of Algorithm 3). At each stage the current solution 3: for i:=1,...,L k x is perturbed to form the proposed solution x′. In the work 4: x′ :=perturb(x) reported here, in which the parameters x are continuous and 5: if |F|<S If F is small real valued, we perturb each element of x singly, drawing Construct attainment surface the perturbationsfrom a Laplacian distribution centred on the 6: S :=interpolate(F) F current value. 7: F˜ :=S ∪F ∪{x}∪{x′} F IftherearesufficientlymanysolutionsinF,theaugmented 8: else archiveF˜ isconstructedbyaddingxandx′ (line9)toF and 9: F˜ :=F ∪{x}∪{x′} the energy difference between x′ and x is calculated using 10: end (12). If there are fewer than S solutions in then additional Energy difference base on F˜ samples are drawn from the attainment surface SF using 11: δE(x′,x):=E(x′)−E(x) Algorithm 2 (line 6); the energy difference is then calculated 12: u:=rand(0,1) based on the sampled attainment surface, x and x′. In the 13: if u<min(1,exp(−δE(x′,x)/T )) k work reported here, we always augment F with 100 samples 14: x:=x′ Accept new current point from SF. As even when there are a largenumberof solutions If x is not dominated by any element of F in the archive of the estimated Pareto front it is worthwhile 15: if z6≺x ∀z∈F sampling from SF as this samples evenly across the front, Remove dominated points from F providinggreaterresolutionin sparsely populatedareasof the 16: F :={z∈F |x⊀z} front. 17: F :=F ∪x Add x to F If the proposal is accepted (line 14), the archive must be 18: end updated.Ifxisnotdominatedbyanyofthearchivalsolutions, 19: end allarchivalsolutionsthataredominatedbyxaredeletedfrom 20: end the archive (line 16) and x is added to the archive (line 17). 21: end Clearly F is always a non-dominated set, although note that x′ may be dominated by members of F. IntheworkreportedhereallepochsL areofequallength, k VI. REALTIMEALGORITHMPARAMETER OPTIMISATION Lk = 100 and we adjust the temperature according to Tk = The performance of this algorithm, in common with other βkT0, where β is chosen so that Tk is 10−5 after two thirds of the evaluations are completed . simulatedannealingsystems,dependsuponparametersforthe initial temperature, the annealing schedule and the size of B. Perturbation Scalings perturbationsmade to solutions when generatingnew propos- als. Here we give details of methods which permit automatic Forsimplicitya proposalisgeneratedfromxbyperturbing setting of the initial temperature, and which adjust the scale only one parameter or decision variable of x. The parameter of perturbations made to maximise the quality of proposed to be perturbed is chosen at random and perturbed with solutions. a random variable ǫ drawn from a Laplacian distribution, p(ǫ) ∝ e−|σǫ|, where the scaling factor σ sets magnitude of theperturbation.TheLaplaciandistributionhastailsthatdecay A. Annealing Schedule relativelyslowly,thusensuringthatthere isa highprobability If the initial computational temperature is set too high, of exploring regions distant from the current solutions. all proposed solutions will be accepted, irrespective of their We maintain two sets of scaling factors, since the per- relative energies, and if set too low proposals with a higher turbations generating moves to a non-dominated proposal energythan the currentsolutionwill neverbe accepted,trans- within a front (we call these traversals) may potentially be forming the algorithm into a greedy search. As a reasonable very different from those required to locate a front closer starting point we set the initial temperature to achieve an to P, which we call location moves. We maintain a scaling initial acceptance rate of approximately 50% on derogatory factor for each dimension of parameter space for each of proposals.Thisinitialtemperature,T0,canbeeasilycalculated the location perturbations and the traversal perturbations, and by using a short ‘burn-in’ period during which all solutions adjust these independently to increase the probability of such are accepted and setting the temperature equal to the average a move being generated. When perturbing a solution, it is positive change of energy divided by ln(2). chosen randomly with equal probability whether the location IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION, DRAFT UNDERREVIEW 8 TABLEI scaling set will be used, or the traversal scaling set. Statistics TESTPROBLEMDEFINITIONOFDTLZ1–DTLZ7OF[29]FOR3 are kept on perturbations generating traversal and location OBJECTIVES(USINGTHESUGGESTEDPARAMETERSIZES).(DEFINITION moves; clearly these can be updated only after the proposal OFDTLZ5CORRECTED.) has been generated so that the type of move is known. The scalings are adjusted throughoutthe optimisation, whenevera Problem Definition suitably large statistic set is available to reliably calculate an appropriatescalingfactor.Thesescalingsareinitiallysetlarge f1(x)= 12x1x2(1+g(x)) eno1u)gThratovesrasmalpSlecaflrionmg:thTeheenttriarveefresaalsirbelsecaspliancge.for a particu- DTLZ1 fgf23(((xxx)))===1122100x((11|x(−1|−x−12x)(21)(+1g+(gx)()x)) 5la0rtdreacviesrisoanlvpaerritaubrbleatxiojnisshpaevrefobrmeeendmwahdeenetovexrapsipnrcoexitmheatlealsyt 0≤+xPi≤Pi=13,(fxoir−i=0.51),22,−..c.o,sP(,2P0π=(x7i−0.5))) j f1(x)=cos(x1π/2)cos(x2π/2)(1+g(x)) rescaling. f2(x)=cos(x1π/2)sin(x2π/2)(1+g(x)) In order to ensure wide coverage of the front we wish to DTLZ2 f3(x)=sin(x1π/2)(1+g(x)) mtraavxeirmsaislse ttoheendsiusrtaentchee(einntiorebjfercotnivteisspeavceen)lycocvoevreerdedb.yGethne- fg01(≤(xx)x)=i=≤Pc1oPi,s=(f3oxr(1xπii/=−2)10c,.o25s,).2(.x.2,πP/,2P)(=1+12g(x)) erating traversals travelling a small distance will concentrate f2(x)=cos(x1π/2)sin(x2π/2)(1+g(x)) theestimatedfrontaroundthepointatwhichthecurrentfront DTLZ3 f3(x)=sin(x1π/2)(1+g(x)) g(x)=100(|x|−2 was discovered, an effect we aim to avoid. We seek to generate proposals on approximately the scale 0≤+xPi≤Pi=13,(fxoir−i=0.51),22,−..c.o,sP(,2P0π=(x1i2−0.5))) atttrhrbiaassvteoechlrutasetasdelps.srineizTveoiooruodasfeclrhy,pieebgrveitveeunirntbhsgauitscit,ohcnertehseseinfugplrpeoiarnutruapgrmseb,neaeotterinoaretnissnopgfaacwrteehi,deseao-snrrmtdaenadgtlhliebnesyngt DTLZ4 fffg0123(≤(((xxxx)x)))=i===≤Pccsi1oonPi,ss=```f3oxxxr(α1α1α1xiπππi=///−2221´´´0,(cs.215ion,)s+.2``.xxg.α2,α2(Pπxπ,/)/)2P2´´=((111++2gg((xx)))) third of perturbations, the largest third of perturbations, and f1(x)=cos(θ1π/2)cos(θ2)(1+g(x)) theremainingperturbations.Foreachgroupthemeantraversal f2(x)=cos(θ1π/2)sin(θ2)(1+g(x)) ssiizzee icsaumseeadsubryedthaes ptheertuErubcaltiidoenasnidsisctaalnccuelattreadv.elTlehdeitnraovbejresca-l DTLZ5 fgθ13((x=x))x==1PsinPi=(3θ1(πx/i2−)(01.5+)2g(x)) tivespacewhenthecurrentsolutionandtheproposedsolution θ2= 4(1+πg(x))(1+2g(x)x2) aremutuallynon-dominating.Ifaperturbationandthecurrent 0≤xi≤1,fori=1,2,...,P,P =12 solutionarenotmutuallynon-dominating,thetraversalsize is ff21((xx))==xx21 counted as being 0. The traversal perturbationscaling for this DTLZ6 f3(x)=(1+g(x))h(f1,f2,f3,g) decision variableis then set to the averageperturbationof the g(x)= P−92PPi=3xi group which generated the largest average traversal. h(f1,f2,f3,g)=3−P3i=1“1f+ig(1+sin(3πfi))” This heuristic is open to the criticism that it depends 0≤xi≤1,fori=1,2,...,P,P =22 uwatpeeoitgnhhimtsinedgaifsfioucfriutnlhgtey,dDhisotowanbecjveeecsrt,iivntheoefbuojnbeccjettiicovtnievssepissamcuenaykwnbhoeiwlrenetn.hoTeromreaallallietsivveide- Ds.tT.LZ7 fgff3112((((xxxx))))====f1111113000(PPPx)1i2i3i+===000112411xfxx1iii(x)−1≥0 during optimisation so the front has approximately the same s.t. g2(x)=f3(x)+4f2(x)−1≥0 extent in each objective. We emphasise that, of course, the s.t. g3(x)=2f3(x)+f1(x)+f2(x)−1≥0 use of metric informationfor setting the approximatescale of 0≤xi≤1,fori=1,2,...,P,P =30 perturbations does not affect the dominance-based energy. 2) Location Scaling: Drawing from methods widely used in evolutionary algorithms (see [30]–[32] for recent work in and so the scalings are kept at the most recent valid value. this area),we aim to adjustthe scale of locationperturbations Counting only moves generated from perturbations to a to keep the acceptance rate for x′ that have a higher energy particular dimension of parameter space, the acceptance rate thanxtoapproximately1/3,sothatexploratoryproposalsare of derogatory moves α is the fraction of proposals to a made and accepted at all temperatures. greater energy which are accepted. If σ denotes the location The location perturbation scaling is recalculated for each perturbation scaling for a particular dimension, the new σ is parameter for which 20 proposals having δE(x′,x) > 0 set as: have been generated, after which the count is reset. Location perturbation rescaling is omitted in two cases: Firstly, when σ(1+2(α−0.4)/0.6) if α>0.4 the archive of the estimated Pareto front F has fewer than 10  members.Secondly,when the combinedsize of F augmented σ :=σ if 0.3≤α≤0.4 (16) by the samples from the attainment surface when multiplied σ/(1+2(0.3−α)/0.3) if α<0.3 by the temperature does not exceed 1. This is because we  adjust the scalings to attempt to keep the acceptance rate of This update works because, in general, smaller perturbations derogatory moves approximately a third; when this value is in parameter space are more likely to generate small changes too small, it becomes impossible to generate such a scaling, in objective space, resulting in smaller changes in energy. IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION, DRAFT UNDERREVIEW 9 MOSA − DTLZ1 Nam & Park − DTLZ1 NSGA−II − DTLZ1 f3 f3 f3 1 300 300 200 200 0.5 100 100 0 0 0 0 1 0 200 0 300 100 f0.15 0.5 f2 f1200 100 f2 2f010 100 200 f2 1 0 300 0 400 0 Fig.5. Archives ontestproblemDTLZ1after5000function evaluations foreachofthethreealgorithms. 102 7000 101 6000 100 5000 ce10−1 size4000 Distan10−2 chive 3000 Ar 10−3 2000 10−4 1000 10−5 0 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 Iteration x 104 Iteration x 104 Fig. 6. Left: Distance of current point, x, and archive F from the true Pareto front, P, versus iteration for DTLZ1. The dotted line shows median over 20runsofdistanceofxfromP;dashedlinesshowmaximumandminimum(overthe20runs)distances ateachiteration. Thethicklineshowsthemedian (over20runs)ofthemediandistanceofarchivememberstoP.Right:Archivegrowthversusiteration. Thicklineshowsmedian(over20runs)archivesize anddashedlines showmaximumandminimum. TABLEII VII. EXPERIMENTS ANNEALINGSCHEDULES Weillustratetheperformanceofthisannealeronsomewell- known test functions from the literature, namely the DTLZ Problem Run Timeto test functions of Deb et al. [29], [33], and compare them to length Tk =10−5 theperformanceofthewellestablishedNSGA-IIevolutionary DTLZ1 5000 3000 DTLZ2 1000 500 algorithm[11](usingthePISAreferenceimplementation[34]) DTLZ3 15000 10000 and Nam & Park’s multi-objective simulated annealer [8] DTLZ4 5000 3000 which we discuss in section I. The benefitof using the DTLZ DTLZ5 1000 500 DTLZ6 5000 3000 test functionsis thatthe true Paretofront,P,is known,so we DTLZ7 9000 6000 candiscoverhowcloseourestimatedarchiveF istoP,aswell as compare results from each algorithm. Note that we rectify a couple of minor typographical errors in the description of DTLZ5 and DTLZ6 here, as the formulae published in according to Tk = βkT0, where β is chosen so that Tk is [29], [33] do not yield the Pareto fronts described.1 For 10−5 after approximately two thirds of the evaluations are completeness we give the problem definitions in Table I; in completed; run lengths and the exact number of evaluations all the experiments we use D =3 objectives. before Tk is 10−5 are given in Table II. For MOSA, the In the workreportedhereall epochsL are ofequallength parameter perturbations are controlled using the scheme de- k for the annealers, L = 100 and we adjust the temperature scribed in section VI-B. The perturbations for Nam & Park’s k annealer are performed using a scheme similar to that for 1In equation (25) of [29] only θ1 should be multiplied by π/2 when MOSA but without the automatic rescaling feature novel to calculating f1,...,fM. In equation (27) the calculation of g(xM) is MOSA;thescalingsare fixedat0.1(determinedfroma small inconsistent with the results provided, meaning all f3 values in the figure in[33]arehalved. empiricalstudy,althoughtheresultsareonlymildlydependent

Description:
Abstract— Simulated annealing is a provably convergent opti- miser for single-objective problems. Previously proposed multi- objective extensions have mostly taken the form of a single- objective simulated annealer optimising a composite function of the objectives. We propose a multi-objective
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.