ebook img

OPTIMIZATION BY SIMULATED ANNEALING: AN EXPERIMENTAL PDF

28 Pages·2015·1.11 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview OPTIMIZATION BY SIMULATED ANNEALING: AN EXPERIMENTAL

OPTIMIZATIOBNY SIMULATEDA NNEALINGA: N EXPERIMENTAL EVALUATIONP; ART 1, GRAPH PARTITIONING DAVIDS . JOHNSON A T&T Bell Laboratories, Murray Hill, New Jersey CECILIAR . ARAGON University of California, Berkeley, California LYLE A. McGEOCH Amherst College, Amherst, Massachusetts CATHERINES CHEVON Johns Hopkins University, Baltimore, Maryland (ReceivedF ebruary1 988;r evisionr eceivedJ anuary1 989;a cceptedF ebruary1 989) In this and two companionp apers,w e reporto n an extendede mpiricals tudy of the simulateda nnealinga pproacht o combinatorialo ptimizationp roposed by S. Kirkpatricke t al. That study investigatedh ow best to adapt simulated annealingt o particularp roblemsa nd comparedi ts performanceto that of more traditionala lgorithmsT. his paper( Part I) discusses annealing and our parameterizedg eneric implementationo f it, describesh ow we adapted this generic algorithmt o the graphp artitioningp roblem,a nd reportsh ow well it comparedt o standarda lgorithmsli ke the Kernighan- Lin algorithm.( For sparser andom graphs,i t tended to outperformK ernighan-Lina s the numbero f verticesb ecome large,e ven when its much greaterr unningt ime was taken into account.I t did not performn earlys o well, however,o n graphsg eneratedw ith a built-ing eometrics tructure.W) e also discussh ow we went abouto ptimizingo ur implementation, and describet he effectso f changingt he variousa nnealingp arameterso r varyingt he basica nnealinga lgorithmi tself. A ew approach to the approximate solution of new problems (even in the absence of deep insight difficult combinatorial optimization problems into the problems themselves) and, because of its recently has been proposed by Kirkpatrick,G elatt and apparent ability to avoid poor local optima, it offers Vecchi (1983), and independently by Cerny (1985). hope of obtaining significantly better results. This simulateda nnealinga pproach is based on ideas These observations, together with the intellectual from statistical mechanics and motivated by an anal- appeal of the underlying physical analogy, have ogy to the behavior of physical systems in the presence inspired articlesi n the popular scientific press (Science of a heat bath. The nonphysicist, however, can view 82, 1982 and Physics Today 1982) as well as attempts it simply as an enhanced version of the familiar tech- to apply the approach to a variety of problems, in nique of local optimizationo r iterativei mprovement, areas as diverse as VLSI design (Jepsen and in which an initial solution is repeatedly improved by Gelatt 1983, Kirkpatrick, Gelatt and Vecchi 1983, making small local alterationsu ntil no such alteration Vecchi and Kirkpatrick 1983, Rowan and Hennessy yields a better solution. Simulated annealing random- 1985), pattern recognition (Geman and Geman 1984, izes this procedure in a way that allows for occasional Hinton, Sejnowski and Ackley 1984) and code gen- uphillm oves( changes that worsen the solution), in an eration (El Gamal et al. 1987), often with substantial attempt to reduce the probability of becoming stuck success. (See van Laarhoven and Aarts 1987 and in a poor but locally optimal solution. As with local Collins, Eglese and Golden 1988 for more up-to-date search, simulated annealing can be adapted readily to and extensive bibliographieso f applications.)M any of Subject classifications: Networks/graphs, heuristics: algorithms for graph partitioning. Simulation, applications: optimization by simulated annealing. Operations Research 0030-364X/89/3706-0865 $01.25 Vol. 37, No. 6, November-December 1989 865 ? 1989 Operations Research Society of America This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions 866 / JOHNSON ET AL. the practical applications of annealing, however, have for sparse random graphs it tended to outperform been in complicated problem domains, where pre- Kernighan-Lin as the number of vertices became vious algorithms either did not exist or perfom-led large. For a class of random graphs with built-in quite poorly. In this paper and its two companions, geometric structure,h owever, Kernighan-Linw on the we investigate the performanceo f simulated annealing comparisons by a substantialm argin. Thus, simulated in more competitive arenas, in the hope of obtaining annealing's success can best be described as mixed. a better view of the ultimate value and limitations of Section 4 describes the experiments by which we the approach. optimized the annealing parametersu sed to generate The arena for this paper is the problem of partition- the results reportedi n Section 3. Section 5 investigates ing the vertices of a graph into two equal size sets to the effectiveness of various modifications and alter- minimize the number of edges with endpoints in natives to the basic annealing algorithm. Section 6 both sets. This application was first proposed by discusses some of the other algorithms that have been Kirkpatrick, Gelatt and Vecchi, but was not exten- proposed for graph partitioning, and considers how sively studied there. (Subsequently, Kirkpatrick 1984 these might factor into our comparisons. We conclude went into the problem in more detail, but still did not in Section 7 with a summary of our observationsa bout deal adequately with the competition.) the value of simulated annealing for the graph parti- Our paper is organized as follows. In Section 1, we tioning problem, and with a list of lessons learned that introduce the graph partitioning problem and use it may well be applicable to implementations of simu- to illustrate the simulated annealing approach. We lated annealing for other combinatorial optimization also sketch the physical analogy on which annealing problems. is based, and discuss some of the reasons for optimism In the two companion papers to follow, we will (and for skepticism) concerning it. Section 2 presents report on our attempts to apply these lessons to three the details of our implementation of simulated anneal- other well studied problems: Graph Coloring and ing, describing a parameterized, generic annealing Number Partitioning (Johnson et al 19 90a), and the algorithm that calls problem-specifics ubroutines, and Traveling Salesman Problem (Johnson et al. 19 90b). hence, can be used in a variety of problem domains. Sections 3 through 6 present the results of our 1. SIMULATEDA NNEALING:T HE BASIC experiments with simulated annealing on the graph CONCEPTS partitioning problem. Comparisons between anneal- 1.1. Local Optimization ing and its rivals are made difficult by the fact that the performance of annealing depends on the partic- To understand simulated annealing, one must first ular annealing schedule chosen and on other, more understand local optimization. A combinatorial opti- problem-specific parameters. Methodological ques- mization problem can be specified by identifying a set tions also arise because annealing and its main com- of solutions together with a cost function that assigns petitors are randomized algorithms (and, hence, can a numerical value to each solution. An optimal solu- give a variety of answers for the same instance) and tion is a solution with the minimum possible cost because they have running times that differ by factors (there may be more than one such solution). Given as large as 1,000 on our test instances. Thus, if com- an arbitrarys olution to such a problem, local opti- parisons are to be convincing and fair, they must be mization attempts to improve on that solution by a based on large numbers of independent runs of the series of incremental, local changes. To define a local algorithms, and we cannot simply compare the aver- optimization algorithm, one first specifies a method age cutsizes found. (In the time it takes to perform for perturbings olutions to obtain different ones. The one run of the slower algorithm, one could perform set of solutions that can be obtained in one such step many runs of the faster one and take the best solution from a given solution A is called the neighborhoodo f found.) A. The algorithmt hen performst he simple loop shown Section 3 describes the problem-specific details of in Figure 1, with the specific methods for choosing S our implementation of annealing for graph partition- and S' left as implementation details. ing. It then introduces two general types of test graphs, Although S need not be an optimal solution when and summarizes the results of our comparisons the loop is finally exited, it will be locally optimal in between annealing, local optimization, and an algo- that none of its neighbors has a lower cost. The hope rithm due to Kernighan-Lin( 1970) that has been the is that locally optimal will be good enough. long-reigning champion for this problem. Annealing To illustratet hese concepts, let us consider the graph almost always outperformed local optimization, and partitioning problem that is to be the topic of Section This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions GraphP artitioningb y SimulatedA nnealing / 867 Simulated annealing is an approach that attempts 1. Get an initials olution S. to avoid entrapment in poor local optima by allowing 2. While there is an untested neighbor of S do the following. an occasional uphill move. This is done under the 2.1 Let S' be an untested neighbor of S. influence of a random number generatora nd a control 2.2 If cost (S') < cost (S), set S = S'. parameterc alled the temperature.A s typically imple- 3. Return S. mented, the simulated annealing approach involves a pair of nested loops and two additional parameters,a Figure 1. Local optimization. cooling ratio r, 0 < r < 1, and an integer temperature length L (see Figure 3). In Step 3 of the algorithm, the 3. In this problem, we are given a graph G = (V, E), term frozen refers to a state in which no further where V is a set of vertices (with I V I even) and E is a improvement in cost(S) seems likely. set of pairs of vertices or edges. The solutions are The heart of this procedure is the loop at Step 3.1. partitions of V into equal sized sets. The cost of a Note that e-A/T will be a number in the interval (0, 1) partition is its cutsize, that is, the number of edges in when A and T are positive, and rightfully can be E that have endpoints in both halves of the partition. interpreteda s a probability that depends on A and T. We will have more to say about this problem in The probability that an uphill move of size A\w ill be Section 3, but for now it is easy to specify a natural accepted diminishes as the temperatured eclines, and, local optimization algorithm for it. Simply take the for a fixed temperature T, small uphill moves have neighbors of a partition II =I V1U V2I to be all those higherp robabilitieso f acceptancet han largeo nes. This partitions obtainable from II by exchanging one ele- particular method of operation is motivated by a ment of V, with one element of V2. physical analogy, best describedi n terms of the physics For two reasons, graph partitioning is typical of the of crystal growth. We shall discuss this analogy in the problems to which one might wish to apply local next section. optimization. First, it is easy to find solutions, perturb 1.3. A Physical Analogy With Reservations them into other solutions, and evaluate the costs of such solutions. Thus, the individual steps of the iter- To grow a crystal, one starts by heating the raw ative improvement loop are inexpensive. Second, like materials to a molten state. The temperature of this most interesting combinatorial optimization prob- crystal melt is then reduced until the crystal structure lems, graph partitioning is NP-complete (Garey, is frozen in. If the cooling is done very quickly (say, Johnson and Stockmeyer 1976, Garey and Johnson by dropping the external temperaturei mmediately to 1979). Thus, finding an optimal solution is presum- absolute zero), bad things happen. In particular,w ide- ably much more difficult than finding some solution, spread irregularitiesa re locked into the crystal struc- and one may be willing to settle for a solution that is ture and the trapped energy level is much higher than merely good enough. in a perfectly structuredc rystal. This rapid quenching Unfortunately, there is a third way in which graph process can be viewed as analogous to local optimi- partitioning is typical: the solutions found by ldcal zation. The states of the physical system correspond optimization normally are not good enough. One can to the solutions of a combinatorial optimization prob- be locally optimal with respect to the given neighbor- lem; the energy of a state correspondst o the cost of a hood structure and still be unacceptably distant from the globally optimal solution value. For example, Fig- ure 2 shows a locally optimal partition with cutsize 4 for a graph that has an optimal cutsize of 0. It is clear that this small example can be generalizedt o arbitrar- ily bad ones. 1.2. Simulated Annealing It is within this context that the simulated annealing approach was developed by Kirkpatrick, Gelatt and Vecchi. The difficulty with local optimization is that it has no way to back out of unattractivel ocal optima. Figure 2. Bad but locally optimal partition with We never move to a new solution unless the direction respect to pairwise interchange. (The dark is downhill, that is, to a better value of the cost and light vertices form the two halves of the function. partition.) This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions 868 / JOHNSON ET AL. The simulated annealing approach was first devel- 1. Get an initials olution S. oped by physicists, who used it with success on the 2. Get an initialt emperature T > 0. 3. While not yet frozen do the following. Ising spin glass problem (Kirkpatrick, Gelatt and 3.1 Perform the following loop L times. Vecchi), a combinatorial optimization problem where 3.1.1 Pick a random neighbor S' of S. the solutions actually are states (in an idealized model 3.1.2 Let A = cost (S')- cost (S). of a physical system), and the cost function is the 3.1.3 If A ? 0 (downhill move), amount of (magnetic) energy in a state. In such an Set S = S'. 3.1.4 If A > 0 (uphill move), application, it was natural to associate such physical Set S = S' with probabilitye -1T. notions as specific heat and phase transitions with the 3.2 Set T = rT (reduce temperature). simulated annealing process, thus, furthere laborating 4. Return S. the analogy with physical annealing. In proposing that the approach be applied to more traditional combi- Figure 3. Simulated annealing. natorial optimization problems, Kirkpatrick, Gelatt and Vecchi and other authors (e.g., Bonomi and solution, and the minimum energy or ground state Lutton 1984, 1986, White 1984) have continued corresponds to an optimal solution (see Figure 4). to speak of the operation of the algorithm in these When the external temperature is absolute zero, no physical terms. state transition can go to a state of higher energy. Many researchers,h owever, including the authors Thus, as in local optimization, uphill moves are pro- of the current paper, are skeptical about the relevance hibited and the consequences may be unfortunate. of the details of the analogy to the actual performance When crystals are grown in practice, the danger of of simulated annealing algorithms in practice. As a bad local optima is avoided because the temperature consequence of our doubts, we have chosen to view is lowered in a much more gradual way, by a process the parameterized algorithm described in Figure 3 that Kirkpatrick, Gelatt and Vecchi call "careful simply as a procedure to be optimized and tested, free annealing." In this process, the temperatured escends from any underlying assumptions about what the slowly through a series of levels, each held long enough parametersm ean. (We have not, however, gone so far for the crystal melt to reach equilibrium at that tem- as to abandon such standard terms as temperature.) perature.A s long as the temperaturei s nonzero, uphill Suggestions for optimizing the performance of simu- moves remain possible. By keeping the temperature lated annealing that are based on the analogy have from getting too far ahead of the current equilibrium been tested, but only on an equal footing with other energy level, we can hope to avoid local optima until promising ideas. we are relatively close to the ground state. Simulated annealing is the algorithmic counterpart 1.4. Mathematical Results, With Reservations to this physical annealing process, using the well known Metropolis algorithm as its inner loop. The In addition to the supportt hat the simulated annealing Metropolis algorithm (Metropolis et al. 1953) was approachg ains from the physical analogy upon which developed in the late 1940's for use in Monte Carlo it is based, there are more rigorous mathematical simulations of such situations as the behavior of gases justifications for the approach, as seen, for instance, in the presence of an external heat bath at a fixed in Geman and Geman (1984), Anily and Federgruen temperature (here the energies of the individual gas (1985), Gelfand and Mitter (1985), Gidas (1985), molecules are presumed to jump randomly from level Lundy and Mees (1986) and Mitra, Romeo and to level in line with the computed probabilities). The Sangiovanni-Vincentelli (1986). These formalize the name simulated annealing thus refers to the use of physical notion of equilibrium mathematically as the this simulation technique in conjunction with an equilibrium distributiono f a Markov chain, and show annealing schedule of declining temperatures. that there are cooling schedules that yield limiting equilibrium distributions, over the space of all solu- tions, in which, essentially, all the probability is con- PHYSICAL SYSTEM OPTIMIZATIONP ROBLEM centrated on the optimal solutions. State Feasible Solution Unfortunately, these mathematical results provide Energy Cost little hope that the limiting distributions can be Ground State Optimal Solution reached quickly. The one paper that has explicitly Rapid Quenching Local Search Careful Annealing Simulated Annealing estimated such convergence times (Sasaki and Hajek 1988) concludes that they are exponential even for a Figure 4. The analogy. very simple problem. Thus, these results do not seem This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions GraphP artitioningb y SimulatedA nnealing / 869 to provide much direct practicalg uidance for the real- paper and its companions has been to subject simu- world situation in which one must stop far short of lated annealing to rigorous competitive testing in the limiting distribution, settling for what are hoped domains where sophisticated alternatives already to be near-optimal,r athert han optimal solutions. The exist, to obtain a more complete view of its robustness mathematical results do, however, provide intuitive and strength. support to the suggestion that slower cooling rates (and, hence, longer running times) may lead to better solutions, a suggestion that we shall examine in some 2. FILLINGI N THE DETAILS detail. The first problem faced by someone preparingt o use or test simulated annealing is that the procedure is 1.5. Claims and Questions more an approach than a specific algorithm. Even if Although simulated annealing has already proved its we abide by the basic outline sketched in Figure 3, we economic value in practical domains, such as those still must make a variety of choices for the values of mentioned in the Introduction, one may still ask if it the parameters and the meanings of the undefined is truly as good a general approach as suggested by its terms. The choices fall into two classes: those that are first proponents. Like local optimization, it is widely problem-specific and those that are generic to the applicable, even to problems one does not understand annealing process (see Figure 5). very well. Moreover, annealing apparently yields bet- We include the definitions of solution and cost in ter solutions than local optimization, so more of these the list of choices even though they are presumably applications should prove fruitful. However, there are specified in the optimization problem we are trying to certain areas of potential difficulties for the approach. solve. Improved performance often can be obtained First is the question of running time. Many by modifying these definitions: the graph partitioning researchers have observed that simulated annealing problem covered in this paper offers one example, as needs large amounts of running time to perform well, does the graph coloring problem that will be covered and this may push it out of the range of feasible in Part II. Typically, the solution space is enlarged approachesf or some applications. Second is the ques- and penalty terms are added to the cost to make the tion of adaptability. There are many problems for nonfeasible solutions less attractive. (In these cases, which local optimization is an especially poor heuris- we use the term feasible solution to characterize tic, and even if one is preparedt o devote largea mounts those solutions that are legal solutions to the original of running time to simulated annealing, it is not clear problem.) that the improvement will be enough to yield good Given all these choices, we face a dilemma in eval- results. Underlying both these potential drawbacksi s uating simulated annealing. Although experiments are the fundamental question of competition. capable of demonstrating that the approach performs Local search is not the only way to approach com- well, it is impossible for them to prove that it performs binatorial optimization problems. Indeed, for some poorly. Defenders of simulated annealing can always problems it is hopelessly outclassed by a more con- say that we made the wrong implementation choices. structive technique one might call successive augmen- In such a case, the best we can hope is that our tation. In this approach, an initially empty structure experiments are sufficiently extensive to make the is successively augmented until it becomes a solution. existence of good parameterc hoices seem unlikely, at This, in particular, is the way that many of the effi- ciently solvable optimization problems, such as the Minimum Spanning Tree Problem and the Assign- PROBLEM-SPECIFIC ment Problem, are solved. Successive augmentation 1. Whati s a solution? is also the design principle for many common heuris- 2. Whata re the neighborso f a solution? tics that find near-optimal solutions. 3. Whati s the cost of a solution? Furthermore, even when local optimization is the 4. How do we determinea n initials olution? method of choice, there are often other ways to GENERIC 1. How do we determinea n initialt emperature? improve on it besides simulated annealing, either by 2. How do we determinet he cooling ratior ? sophisticated backtracking techniques, or simply by 3. How do we determinet he temperaturele ngthL ? running the local optimization algorithm many times 4. How do we know when we are frozen? from different starting points and taking the best solution found. Figure 5. Choices to be made in implementing sim- The intent of the experiments to be reported in this ulated annealing. This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions 870 / JOHNSON ET AL. least in the absence of firm experimental evidence that ITERNUM such choices exist. The number of annealing runs to be performed with this To provide a uniform framework for our experi- set of parameters. ments, we divide our implementations into two parts. INITPROB The first part is a generic simulated annealing pro- Used in determining an initial temperature for the current set of runs. Based on an abbreviated trial annealing run, a gram, common to all our implementations. The sec- temperature is found at which the fraction of accepted ond part consists of subroutines called by the generic moves is approximately INITPROB,a nd this is used as the program which are implemented separately for each starting temperature (if the parameter STARTTEMPis set, problem domain. These subroutines,a nd the standard this is taken as the starting temperature, and the trial run names we have established for them, are summarized is omitted). TEMPFACTOR in Figure 6. The subroutines share common data This is a descriptive name for the cooling ratio r of structuresa nd variablest hat are unseen by the generic Figure 3. part of the implementation. It is easy to adapt our SIZEFACTOR annealing code to new problems, given that only the We set the temperature length L to be N*SIZEFACTOR, problem-specifics ubroutines need be changed. where N is the expected neighborhood size. We hope to be able to handle a range of instance sizes with a fixed The generic part of our algorithm is heavily para- value for SIZEFACTOR;t emperature length will remain meterized, to allow for experiments with a variety of proportional to the number of neighbors no matter what factorst hat relate to the annealing process itself. These the instance size. parametersa re describedi n Figure 7. Because of space MINPERCENT limitations, we have not included the complete generic This is used in testing whether the annealing run is frozen (and hence, should be terminated). A counter is maintained code, but its functioning is fully determined by the that is incremented by one each time a temperature is information in Figures 3, 6, and 7, with the exception completed for which the percentage of accepted moves is of our method for obtaining a starting temperature, MINPERCENTor less, and is reset to O each time a solution given a value for INITPROB, which we shall discuss is found that is better than the previous champion. If the in Section 3.4.2. counter ever reaches 5, we declare the process to be frozen. Although the generic algorithm served as the basis for most of our experiments, we also performed lim- Figure 7. Generic parametersa nd their uses. ited tests on variantso f the basic scheme. For example, we investigated the effects of changing the way tem- peraturesa re reduced, of allowing the number of trials per temperaturet o vary from one temperaturet o the next, and even of replacing the e-/T of the basic loop by a different function. The results of these experi- READ-INSTANCE() ments will be reported in Section 6. Reads instance and sets up appropriated ata structures; returnst he expected neighborhoods ize N for a solution (to be used in determiningth e initiatl emperaturea nd the 3. GRAPH PARTITIONING numberL of trialsp er temperature). INITIAL-SOLUTION() The graph partitioning problem described in Section Constructs an initials olution S0 and returns cost (SO), 2 has been the subject of much researcho ver the years, setting the locally-storedv ariableS equal to So and the because of its applications to circuit design and locally-storedv ariablec * to some trivialu pper bound on the optimalf easible solutionc ost. because, in its simplicity, it appeals to researchersa s PROPOSE-CHANGE() a test bed for algorithmic ideas. It was proved NP- Chooses a randomn eighborS ' of the currents olutionS complete by Garey, Johnson and Stockmeyer (1976), and returnst he differencec ost (S') - cost (S), saving S' but even before that researchers had become con- for possible use by the next subroutine. vinced of its intractability, and hence, concentrated CHANGE-SOLUTION() Replaces S by S' in local memory,u pdatingd ata struc- on heuristics, that is, algorithms for finding good but tures as appropriateI. f S' is a feasible solutiona nd cost not necessarily optimal solutions. (S') is better than c*, then sets c* = cost (S') and sets For the last decade and a half, the recognizedb ench- the locallys tored champions olutionS * to S'. mark among heuristics has been the algorithm of FINALSOLUTION() Kernighan and Lin (1970), commonly called the Modifiest he currents olutionS to obtaina feasibles olution S". (S" = S if S is alreadyf easible). If cost (S") S c*, Kernighan-Lin algorithm. This algorithm is a very outputs S"; otherwiseo utputsS *. sophisticated improvement on the basic local search procedure described in Section 2.1, involving Figure 6. Problem-specific subroutines. an iterated backtracking procedure that typically This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions GraphP artitioningb y SimulatedA nnealing / 871 finds significantly better partitions. (For more solutions, it penalizes them according to the square of details, see Section 7.) Moreover, if implemented the imbalance. Consequently, at low temperaturest he using ideas from Fiduccia and Mattheyses (1982), the solutions tend to be almost perfectly balanced. This Kernighan-Lin algorithm runs very quickly in prac- penalty function approach is common to implemen- tice (Dunlop and Kernighan 1985). Thus, it repre- tations of simulated annealing, and is often effective, sents a potent competitor for simulated annealing, perhaps because the extra solutions that are allowed and one that was ignored by Kirkpatrick, Gelatt provide new escape routes out of local optima. In the and Vecchi when they first proposed using simulated case of graph partitioning, there is an extra benefit. annealing for graph partitioning. (This omission was Although this scheme allows more solutions than the partially rectified in Kirkpatrick (1984), where original one, it has smaller neighborhoods (n neigh- limited experiments with an inefficient implementa- bors versus n2/4). Our experiments on this and other tion of Kernighan-Lina re reported.) problems indicate that, under normal cooling rates This section is organized as follows. In 3.1, we such as r = 0.95, temperature lengths that are signifi- discuss the problem-specific details of our implemen- cantly smaller than the neighborhood size tend to give tation of simulated annealing for graph partitioning poor results. Thus, a smaller neighborhood size may and in 3.2 we describe the types of instances on which well allow a shorterr unning time, a definite advantage our experiments were performed. In 3.3, we present if all other factors are equal. the results of our comparisons of local optimization, Two final problem-specific details are our method the Kernighan-Lin algorithm, and simulated anneal- for choosing an initial solution and our method for ing. Our annealing implementation generally outper- turning a nonfeasible final solution into a feasible one. forms the local optimization scheme on which it is Initial solutions are obtained by generating a random based, even if relative running times are taken into partition (for each vertex we flip an unbiased coin to account. The comparison with Kernighan-Lini s more determine whether it should go in V1 or V2). If the problematic, and depends on the type of graph tested. final solution remains unbalanced, we use a greedy heuristic to put it into balance. The heuristic repeats 3.1. Problem-Specific Details the following operation until the two sets of the par- tition are the same size: Find a vertex in the largers et Although the neighborhood structure for graph parti- tioning described in Section 2.1 has the advantage of that can be moved to the opposite set with the least increase in the cutsize, and move it. We output the simplicity, it turns out that better performancec an be obtained through indirection. We shall follow best feasible solution found, be it this possibly modi- Kirkpatrick,G elatt and Vecchi in adopting the follow- fied final solution or some earlier feasible solution ing new definitions of solution, neighbor, and cost. encountered along the way. This completes the description of our simulated annealing algorithm, Recall that in the graph partitioning problem we are given a graph G = (V, E) and are asked to find modulo the specification of the individual parameters, that partition V = V1 U V2 of V into equal sized sets which we shall provide shortly. that minimizes the number of edges that have end- points in different sets. For our annealing scheme, a 3.2. The Test Beds solution will be any partition V = VI U V2 of the Our explorations of the algorithm, its parameters,a nd vertex set (not just a partition into equal sized sets). its competitors will take place within two general Two partitionsw ill be neighborsi f one can be obtained classes of randomly generated graphs. The first type from the other by moving a single vertex from one of of graph is the standard random graph, defined in its sets to the other (rather than by exchanging two terms of two parameters, n and p. The parameter n vertices, as in Section 2.1). To be specific, if (V1, V2) specifies the number of vertices in the graph; the is a partition and v E V1, then (VI - uv},V 2 U {v}) parameterp , 0 < p < 1, specifies the probability that and (Vl, V2) are neighbors. The cost of a partition any given pair of vertices constitutes an edge. (We (VI, V2)i s defined to be make the decision independently for each pair.) Note that the expected average vertex degree in the random c(VI, V2)=I flU, v} E E: u E VI & v E V2} I graph is p(n - 1). We shall usually choose p so Gn, that this expectation is small, say less than 20, as most +a( VI I - 2 1)2 interesting applications involve graphs with a low where IX I is the number of elements in set X and a averaged egree, and because such graphsa re better for is a parameter called the imbalancef actor. Note that distinguishing the performance of different heuristics although this scheme allows infeasible partitions to be than more dense ones. (For some theoretical results This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions 872 / JOHNSON ET AL. about the expected minimum cutsizes for random compared this particular implementation (hereafter graphs, see Bui 1983.) referred to as Annealing with a capital A) to the Our second class of instances is based on a non- Kernighan-Lina lgorithm (hereafterr eferredt o as the standardt ype of random graph, one that may be closer K-L algorithm), and a local optimization algorithm to real applications than the standard one, in that the (referredt o as Local Opt) based on the same neigh- graphso f this new type will have by definition inherent borhood structure as our annealing algorithm, with structure and clustering. An additional advantage is the same rebalancing heuristic used for patching up that they lend themselves to two-dimensional depic- locally optimal solutions that were out-of-balance. tion, although they tend to be highly nonplanar. They (Experiments showed that this local optimization again have two parameters, this time denoted by n approach yielded distinctly better average cutsizes and d. The random geometricg raph U,,d has n vertices than the one based on the pairwise interchange neigh- and is generateda s follows. First, pick 2n independent borhood discussed in Section 2.1, without a substan- numbers uniformly from the interval (0, 1), and view tial increase in running time.) All computations were these as the coordinates of n points in the unit square. performed on VAX 11-750 computers with floating These points represent the vertices; we place an edge point accelerators and 3 or 4 megabytes of main between two vertices if and only if their (Euclidean) memory (enough memory so that our programsc ould distance is d or less. (See Figure 8 for an example.) run without delays due to paging), running under the Note that for points not too close to the boundary the UNIX operating system (Version 8). (VAX is a trade- expected average degree will be approximately nwrd2. mark of the Digital Equipment Corporation; UNIX Although neither of these classes is likely to arise in is a trademarko f AT&T Bell Laboratories.)T he pro- a typical application, they provide the basis for repeat- grams were written in the C programmingl anguage. able experiments, and, it is hoped, constitute a broad The evaluation of our experiments is complicated enough spectrum to yield insights into the general by the fact that we are dealing with randomizeda lgo- performance of the algorithms. rithms, that is, algorithms that do not always yield the same answer on the same input. (Although only the 3.3. Experimental Results simulated annealing algorithm calls its random num- As a result of the extensive testing reportedi n Section ber generator duringi ts operation, all the algorithms 5, we settled on the following values for the five are implemented to start from an initial random par- parameters in our annealing implementation: a = tition.) Moreover, results can differ substantiallyf rom 0.05, INITPROB = 0.4, TEMPFACTOR = 0.95, run to run, making comparisons between algorithms SIZEFACTOR = 16, and MINPERCENT= 2. We less straightforward. Consider Figure 9, in which histograms of the cuts found in 1,000 runs each of Annealing, Local Opt, and K-L are presented. The instance in question was a random graph with n = 500 and p = 0.01. This particularg raph was used as a benchmark in many of our experiments, and we shall refer to it as G500in the future. (It has 1,196 edges for an average degree of 4.784, slightly less than the expected figure of 4.99.) The histogramsf or Annealing and Local Opt both can be displayed on the same axis because the worst cut found in 1,000 runs of Annealing was substantially better (by a standardd eviation or so) than the bestc ut found during 1,000 runs of Local Opt. This disparity more than balances the differences in running time: Even though the average running time for Local Opt was only a second compared to roughly 6 minutesf or Annealing, one could not expect to improve on Annealing simply by spending an equivalent time doing multiple runs of Local Opt, as some critics suggested might be the case. Indeed, the best cut Figure 8. A geometric graph with n = 500 and found in 3.6 million runs of Local Opt (which took n7rd2 10. roughly the same 600 hours as did our 1,000 runs of This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions GraphP artitioningb y SimulatedA nnealing / 873 m >> k runs, and then compute the expected best of a random sample of k of these particular m runs, chosen without replacement. (This can be done by arranging the m results in order from best to worst and then looping through them, noting that, for 1 < j < m - k + 1, the probability that the jth best 51n00o- 0- 22l\ 24 - S 26 28 K 3 0 value in the overall sample is the best in a subsample of size k is k/(m - j + 1) times the probability that none of the earlier values was the best.) The reliability of such an estimate on the best of k runs, of course, 200 220 240 260 280 300 320 decreases rapidly as k approaches m, and we will SIMULATEDA NNEALING LOCALO PTIMIZATION usually cite the relevant values of m and k so that readersw ho wish to assess the confidence intervals for 150- our results can do so. We have not done so here as we are not interested in the precise values obtained from 100 any particular experiment, but rather in trends that show up across groups of related experiments. Table I shows our estimates for the expected best of 50- k runs of Annealing, k runs of K-L, and 10 0k runs of K-L for various values of k, based on the 1,000 runs of Annealing charted in Figure 9 plus a sample of 10,000 runs of K-L. Annealing clearly dominates 200 220 240 260 280 300 320 K-L if running time is not taken into account, and KERNIGHAN-LIN still wins when running time is taken into account, Figure 9. Histograms of solution values found for although the margin of victory is much less impressive graph G500 during 1,000 runs each of (but note that the margin increases as k increases). Annealing, Local Opt.and Kernighan-Lin. The best cut ever found for this graph was one of size (The X-axis corresponds to cutsize and the 206, seen once in the 1,000 Annealing runs. Y-axis to the number of times each cutsize To put these results in perspective, we performed was encountered in the sample.) similar, though less extensive, experiments with ran- dom graphst hat were generatedu sing differentc hoices Annealing) was 232, compared to 225 for the worst for n and p. We considered all possible combinations of the annealing runs. One can, thus, conclude that of a value of n from 124, 250, 500, 1,0001 with a this simulated annealing implementation is intrinsi- value of np (approximately the expected average cally more powerful than the local optimization heu- degree) from {2.5, 5, 10, 20[. We only experimented ristic on which it is based, even when running time is with one graph of each type, and, as noted above, the taken into account. overallp attern of resultsi s much more significantt han Somewhat less conclusive is the relative perform- any individual entry. Individual variability among ance of Annealing and the sophisticated K-L algo- graphs generated with the same parameters can be rithm. Here the histogramsw ould overlap if they were substantial:t he graph with n = 500 and p = 0.01 used placed on the same axis, although the median and in these experiments was significantly denser than other order statistics for Annealing all improve on the correspondings tatistics for K-L. However, once again, Table I Annealing is by far the slower of the two algorithms, Comparisono f Annealinga nd Kernighan-Lino n this time by a factor of roughly 100 (K-L had an G50 average running time of 3.7 seconds on G500).T hus, Anneal K-L K-L ideally we should compare the best of 100 runs of K- k (Best of k) (Best of k) (Best of 100k) L versus one run of Annealing, or the best of 100k 1 213.32 232.29 214.33 runs versus the best of k. 2 211.66 227.92 213.19 Fortunately, there is a more efficient way to obtain 5 210.27 223.30 212.03 10 209.53 220.49 211.38 an estimate of the expected best of k runs than simply 25 208.76 217.51 210.81 to repeatedly perform sets of k runs and compute 50 208.20 215.75 210.50 the average of the bests. We perform some number 100 207.59 214.33 210.00 This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions 874 / JOHNSON ET AL. G500w, hich was generated using the same parameters. Table III It had an average degree of 4.892 versus 4.784 for AverageA lgorithmicR esults for 16 Random G500,a nd the best cut found for it was of size 219 Graphs( PercentA bove Best Cut Ever Found) versus 206. Thus, one cannot- expect to be able to Expected Average Degree replicate our experiments exactly by independently lvi 2.5 5.0 10.0 20.0 Algorithm generating graphs using the same parameters. The 124 87.8 24.1 9.5 5.6 Local Opt cutsizes found, the extent of the lead of one algorithm 18.7 6.5 3.1 1.9 K-L over another, and even their rank order, may vary 4.2 1.9 0.6 0.2 Annealing from graph to graph. We expect, however, the same 250 101.4 26.5 11.0 5.5 Local Opt general trends to be observable. (For the record, the 21.9 8.6 4.3 1.9 K-L 10.2 1.8 0.8 0.4 Annealing actual average degrees for the 16 new graphs are: 2.403, 5.129, 10.000,20.500 for the 124-vertexg raphs; 500 102.3 32.9 12.5 5.8 Local Opt 23.4 11.5 4.4 2.4 K-L 2.648, 4.896, 10.264, 19.368 for the 250-vertex graphs; 10.0 2.2 0.9 0.5 Annealing 2.500, 4.892, 9.420, 20.480 for the 500-vertex graphs; 1,000 106.8 31.2 12.5 6.3 Local Opt and 2.544, 4.992, 10.128, 20.214 for the 1,000-vertex 22.5 10.8 4.8 2.7 K-L graphs.) 7.4 2.0 0.7 0.4 Annealing The results of our experiments are summarized in Tables II-V. We performed 20 runs of Annealing for each graph, as well as 2,000 runs each of Local Opt A comment about the running times in Table IV is and K-L. Table II gives the best cuts ever found for in order. As might be expected, the running times for each of the 16 graphs,w hich may or not be the optimal Kernighan-Lina nd Local Opt increase if the number cuts. Table III reports the estimated means for all the of vertices increases or the density (number of edges) algorithms, expressed as a percentage above the best increases. The behavior of Annealing is somewhat cut found. Note that Annealing is a clear winner in anomalous, however. For a fixed number of vertices, these comparisons, which do not take running time the running time does not increase monotonically into account. with density, but instead goes through an initial de- Once again, however, Annealing dominates the cline as the average degree increases from 2.5 to 5. other algorithmsi n amount of time required,a s shown This can be explained by a more detailed look at the in Table IV. The times listed do not include the time way Annealing spends its time. The amount of time needed to read in the graph and create the initial data per temperaturei ncreases monotonically with density. structures,a s these are the same for all the algorithms, The number of temperature reductions needed, how- independently of the number of runs performed, and ever, declines as density increases, that is, freezing were, in any case, much smaller than the time for a sets in earlier for the denser graphs. The interaction single run. The times for Annealing also do not between these two phenomena accounts for the include the time used in performing a trial run to nonmonotonicity in total running time. determine an initial temperature.T his is substantially Table V gives results better equalized for running less than that required for a full run; moreover, we time. Instead of using a single run of Annealing as our expect that in practice an appropriates tartingt emper- standard for comparison, we use the procedure that ature would be known in advance-from experience runs Annealing 5 times and takes the best result. As with similar instances-or could be determined ana- can be seen, this yields significantlyb etter results, and lytically as a simple function of the numbers of vertices is the recommended way to use annealing in practice, and edges in the graph, a line of research we leave to assuming enough running time is available. For each the interested reader. of the other algorithms, the number of runs corre- sponding to 5 Anneals was obtained separately for each graph, based on the running times reported in Table II Table IV. Observet hat once again Annealing's advan- Best Cuts Found for 16 Random Graphs tage is substantially reduced when running times are Expected Average Degree taken into account. Indeed, it is actually beaten by Kernighan-Lin on most of the 124- and 250-vertex lvi 2.5 5.0 10.0 20.0 graphs,o n the sparsest5 00-vertex graph, and by Local 124 13 63 178 449 250 29 114 357 828 Opt on the densest 124-vertexg raph. Annealing does, 500 52 219 628 1,744 however, appear to be pulling away as the number of 1,000 102 451 1,367 3,389 vertices increases. It is the overall winner for the This content downloaded from 128.208.219.145 on Thu, 04 Jun 2015 21:20:57 UTC All use subject to JSTOR Terms and Conditions

Description:
OPTIMIZATION BY SIMULATED ANNEALING: AN EXPERIMENTAL EVALUATION; PART 1, GRAPH PARTITIONING DAVID S. JOHNSON A T&T Bell Laboratories, Murray Hill, New Jersey
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.