ARTICLES tends to dominate tm graphs as the size and/o but was roundly beaten I structure.) OPTIMIZATION BY SIMULATED ANNEALING: AN EXPERIMENTAL In this paper, we c context of two addition EVALUATION; PART II, GRAPH COLORING AND natorial optimization NUMBER PARTITIONING number partitioning, T causethey have been stl DAVID S. JOHNSON traditionally been appn AT&TBellLaboratories, MurrayHill, NewJersey the algorithmic template is based, CECILIA R. ARAGON The graph coloring I UniversityofCalifornia, Berkeley, California tions in areas such as ! see Leighton (1979), 0 LYLE A. MCGEOCH Werra (1985), We are asked to find the minim AmherstCollege, Amherst, Massachusetts can be partitioned into , of which contains botl CATHERINE SCHEVON Here, the apparent neE JohnsHopkins University, Baltimore, Maryland past may not have been (ReceivedFebruary 1989; revision receivedJuh~ 1990; acceptedSeptember 1990) definition of the cost extending the notion of This is the second in a series of three papers that empirically examine the competitiveness ofsimulated annealing in certain up with a variety of p well-studieddomains ofcombinatorialoptimization. Simulatedannealing isarandomizedtechniqueproposed byS. Kirkpatrick, mization that might yie C. D. Gelatt and M. P. Vecchi for improving local optimization algorithms. Here we report on experiments at adapting of attempting to minim simulated annealing to graph coloring and number partitioning, two problems for which local optimization had not previously annealing schemes bas been thought suitable. For graph coloring, we report on three simulated annealing schemes, all of which can dominate I) a penalty-function 2 traditional techniques for certain types ofgraphs, at least when large amounts ofcomputing time are available. For number partitioning, simulatedannealing is notcompetitivewith thedifferencingalgorithmofN. Karmarkarand R. M. Karp, excepton current authors, 2) a relatively small instances. Moreover, ifrunning time is taken into account, natural annealing schemes cannot even outperform interchanges and was multiple random runs of the local optimization algorithms on which they are based, in sharp contrast to the observed Shapiro (1986), and ~ performance ofannealing on other problems. orthogonal approach dl (1987). None of these create good colorings amounts oftime availal with (and often to dO! Simulated annealing is a new approach to the approxi In Part I (Johnson et al. 1989), we describe the approaches. (Not one t mate solution ofdifficult combinational optimization simulated annealing approach and its motivation, and dominates the other tw problems. It was originally proposed by Kirkpatrick, report on extensive experiments with it in the context of The second problerr Gelatt and Vecchi (1983) and Cerny (1985), who re the graph partitioning problem (given a graph G == was chosen less for it: ported promising results based on sketchy experiments. (V, E), find a partition of the vertices into two equal challenges it presents Since then there has been an immense outpouring of sized sets VI and V2, which minimizes the number of proach. In this problen papers on the topic, as documented in the extensive edges with endpoints in both sets). We were concerned numbers a"a ,···,a bibliographies ofCollins, Eglese and Golden (1988) and with two main issues: 1) how the various choices made 2 to partition them into t Van Laarhoven and Aarts (1987). The question of how in adapting simulated annealing to a particular problem well annealing stacks up against its more traditional affect its performance, and 2) how well an optimized competition has remained unclear, however, for a vari annealing implementation for graph partitioning com ety of important applications. The series of papers, petes against the best ofthe more traditional algorithms of which this is the second, attempts to rectify this for the problem. (For graph partitoning, the answer to is minimized, The ch2 situation. the second question was mixed: simulated annealing natural "neighborhood neighboring solutions ( Subjectclassifications: Mathematics, combinatorics: number partitioning heuristics. Networks/graphs, heuristics: graph coloring heuristics. Simula or two elements, have tion, applications: optimization by simulatedannealing. rain, in which neighl OperationsResearch 0030-364X/91/3903-0000$01.25 Vol. 39, No.3, May-June 1991 37R © 1991 OperationsResearchSocietyofAmerica Graph Coloring bySimulatedAnnealing / 379 to dominate traditional techniques on random quality. Thus, traditional local optimization algorithms as the size and/or density ofthe graphs increases, are not competitive with other techniques for this prob butwas roundly beaten on graphs with built-in geometric lem, in particular the "differencing" algorithm of structure.) Karmarkar and Karp (1982). Consequently, it seems ERIMENTAL In this paper, we consider the same issues in the unlikely that simulated annealing, which in essence is a NO context of two additional well-studied, NP-hard combi method for improving local optimization, can offer natorial optimization problems: graph coloring and enough ofan improvement to bridge the gap. Ourexper number partitioning. These problems were chosen be iments verify this intuition. Moreover, they show that causethey have been studied extensively, but neither had for this problem even multiple random-start local traditionally been approached using local optimization, optimization outperforms simulated annealing, a the algorithmic template upon which simulated annealing phenomenon we have not observed in any of the is based. other annealing implementations we have studied The graph coloring problem has widespread applica (even the mediocre ones). tions in areas such as scheduling and timetabling, e.g., Although some familiarity with simulated annealing see Leighton (1979), Opsut and Roberts (1981), and de will be helpful in reading this paper, our intention is that Werra (1985). We are given a graph G = (V, E), and it be self-contained. In particular, although we shall asked to find the minimum k such that the vertices of G frequently allude to Part I for background material, the can be partitioned into k colorclasses VI' ... ,Vk, none reader should be able to understand the results we pre of which contains both endpoints of any edge in E. sent here without reference to that paper. The remainder Here, the apparent neglect of local optimization in the of this paper is organized as follows. In Section 1, we past may not have been totallyjustified. By changing the briefly outline the generic annealing algorithm that is the definition of the cost of a solution, and possibly by basis for our implementations, as developed in Part I of extending the notion ofwhat a solution is, one can come this paper. Sections 2 and 3 are devoted to graph color ed annealing in certain up with a variety of plausible proposals for local opti ing and number partitioning, respectively. Section 4 osedbyS. Kirkpatrick, mization that might yield good colorings as a side effect concludes with a brief summary and a preview of the xperiments at adapting ofattempting to minimize the new cost. We investigate third and final paper in this series, which will cover our tion had not previously ,f which can dominate annealing schemes based on three of these proposals: experiments in applying simulated annealing to the infa available. For number 1) a penalty-function approach that originated with the mous traveling salesman problem. R. M. Karp, excepton current authors, 2) a variant that uses Kempe chain All running times quoted in this paper are for an :annoteven outperform interchanges and was devised by Morgenstern and individual processor ofa Sequent Balance™ 21000 mul ntrast to the observed Shapiro (1986), and 3) a more recent and somewhat ticomputer, running under the Dynix™ operating system orthogonal approach due to Chams, Hertz and de Werra (Balance and Dynix are trademarks ofSequent Computer (1987). None of these versions can be counted on to Systems, Inc.). Comparable times would be obtained on create good colorings quickly, but if one has large a VAX™ 750 without a floating point accelerator run amounts oftime available, they appear to be competitive ning under Unix™ (VAX is a trademark of the Digital with (and often to dominate) alternative CPU-intensive Equipment Corporation; Unix is a trademark of AT&T 1989), we describe t approaches. (Not one of the three annealing approaches Bell Laboratories). These are slow machines by modern and its motivation, a dominates the other two across the board.) standards; speedups by factors of 10 or greater are s with it in the context The second problem we study, number partitioning, possible with currently available workstations. This m (given a graph G, was chosen less for its applications than for the severe should be kept in mind when evaluating some of the : vertices into two equlf challenges it presents to the simulated annealing ap larger running times reported, and we shall have more to minimizes the number proach. In this problem, one is given a sequence of real say about it in the Conclusion. ;ets). We were concerne the various choices ma. numbers at, a2, .. . , an in the interval [0,1], and asked to partition them into two sets A I and A2 such that g to a particular proble· 1. THE GENERIC ANNEALING ALGORITHM ) how well an optimiz Both local optimization and simulated annealing require graph partitioning com that the problem to which they are applied be describable lore traditional algorithm as follows: Foreach instance I ofthe problem, there is a Jartitoning, the answer t is minimized. The challenge of this problem is that the set F of solutions, each solution S having a cost c(S). 'led: simulated annealin' natural "neighborhood structures" for it, those in which The goal is to find a solution of minimum cost. (Note neighboring solutions differ as to the location ofonly one that both problems mentioned in the Introduction have raph coloring heuristics. Simul or two elements, have exceedingly "mountainous" ter this form.) rain, in which neighboring solutions differ widely in In order to adapt either ofthe two approaches to such J030-364X/91/3903-oo00$01.2. ionsResearchSocietyofAmeric, 380 / JOHNSON ET AL. space" may have been ei a problem, one must additionally define an auxiliary All our annealing implementations start with the just the feasible solutions neighborhood graph on the space of solutions for a parameterized generic annealing algorithm summarized thus, our algorithm is can given instance. This is a directed graph whose vertices in Figure I. This generic procedure relies on several solution found, rather thar are the solutions, with the neighbors of a solution S problem-specific subroutines. They are READ_ The only substantive diJ being those solutions S'for which (S, S') is an arc in the INSTANCE(), INITIAL_SOLUTION(), summarized in Figure 1 neighborhood graph. NEXT_CHANGE(), CHANGE_SOLN() and Part I is the inclusion her A local optimization algorithm uses this structure to FINAL_SOLN( ). In addition, the procedure is para spent at high temperatures find a solution as follows: Starting with an initial solu meterized by the variables INITPROB, SIZEFAC the moves are accepted). tion S generated by other means, it repeatedly attempts TOR, CUTOFF, TEMPFACTOR, FREEZE_LIM graph partitioning, and c to find a better solution by moving to a neighbor with and MINPERCENT. (See Part Ifor observations about ments for the problems st lower cost, until it reaches a solution none of whose the best values for these parameters and the interactions temperatures does not api neighbors have a lower cost. Such a solution is called between them.) Note that the generic algorithm never quality ofthe final solutiO! locally optimal. Simulated annealing is motivated by the deals with the solutions themselves, only their costs. The simply to start at lower desire to avoid getting trapped in poor local optima, and current solution, its proposed neighbor, the best solution can degrade, however, if hence, occasionally allows "uphill moves" to solutions found so far, and the cost ofthe latter are kept as static ture too low. Cutoffs all« ofhigher cost, doing this under the guidance ofacontrol variables in the problem-specific part ofthe code. As in leaving a margin of safel parameter called the temperature. Part I, we allow for the possibility that the "solution With the addition of CI closely mirrors the am Kirkpatrick's original cod In each implementatio 1. CallREAD_INSTANCE()toreadinput,computeanupperboundc*onthe describe the relevant sub! optimalsolutionvalue,andreturntheaverageneighborhoodsizeN. chosen for the parameter 2. CallINITIAL_SOLUTIONOtogenerateaninitialsolutionSandreturnc=cost(S). Step 3 is implemented, be 3. ChooseaninitialtemperatureT >0sothatinwhatfollows thechanges/trialsratio used for graph partitioni startsoutapproximatelyequaltoINITPROB. methods. 4. Setjreezecount= O. 5. Whilejreezecount<FREEZE_LIM(Le., whilenotyet"frozen")dothefollowing: ~2. GRAPH COLORING = = 5.1 Setchanges trials O. f~The graph coloring prabl, Whiletrials <SIZEFACTOR'Nand changes <CUTOFF'N, dothefollowing: ~'aprime candidate for heu 'lilian, and indeed none a 5.1.1 Settrials =trials +1. ave been ofthis type. R 5.1.2 CallNEXT_CHANGEOtogeneratearandomneighborS'ofS '>roblem we are given a g andreturnc'=cost(S'). .Ind a partition of V int' 5.1.3 Let~=c'- c. 5.1.4 If~$ 0(downhillmove), lasses C1,Cz"'" Ck, Canbe in the samecolor( Setchanges=changes+1andc =c'. .em, i.e., if E contains CallCHANGE_SOLNOtosetS=S'and. ifS'isfeasible ,ssible number of calc andcost(S')<c*,tosets*=S'andc*=cost(S'). ,hromatic number of C 5.1.5 If~>0(uphill move), oloring has widespread: Choosearandomnumberrin[0,1]. ith scheduling (in situ2 Ifr $ e-MT (Le., withprobabilitye-IJ.lT), ;odelthe events being sc Setchanges=changes+1andc=c'. o/ents represented by ed CallCHANGE_SOLNO. '79, Opsut and Roberts Because graph colorinJ 5.2 SetT = TEMPFACTOR .T(reducetemperature). .lcient optimization alg Ifc* waschangedduring5.1, setjreezecount= O. s guaranteed to fi Ifchanges/trials <MINPERCENT, setjreezecount=jreezecount+1. arey andJohnson 1979 .t of developing heuri 6. CallFINAL_SOLNOtooutputS*. bmal colorings qui, ,mplexity-theoretic ob Figure 1. Thegeneric simulated annealing algorithm. Graph Coloring bySimulatedAnnealing / 381 " may have been expanded to include more than (1976) show that for any r<2, it is NP-hard to con entations start wi 19 algorithm summ· e feasible solutions to the original problem, and structcolorings using no morethan rX(G) colors. Fortu lcedure relies on s ~',our algorithm is careful to output the best feasible nately, NP-hardness is a worst case measure, and does :'on found, rather than simply the best solution. not rule out the possibility ofheuristics that work well in ;. They are R. \.L_SOLUTWi eonly substantive difference between the algorithm practice. There have thus been many attempts to devise arized in Figure I and the generic algorithm of such heuristics, e.g., see Welsh and Powell (1967), <\NGE_SOLN( ) is the inclusion here of cutoffs to limit the time Matula, Marble and Isaacson (1972), Johnson (1974), , the procedure is' at high temperatures (where, say, 50% or more of Grimmet and McDiarmid (1975), Brelaz (1979), NITPROB, SIZE aves are accepted). As we observe in Part I for Leighton (1979), and Johri and Matula (1982). Until CTOR, FREEZE!. partitioning, and confirm in preliminary experi recently, the literature has concentrated almost exclu 1 Ifor observations' for the problems studied here, time spent at high sively on heuristics that use a technique we might call ieters and the interac ratures does not appear to contribute much to the successive augmentation, as opposed to local optimiza generic algorithm 'tyofthe final solution. One way to limit this time is tion. In this approach, a partial coloring is extended, ves, only theircosts.:: Iy to start at lower temperatures. Solution quality vertex by vertex, until all vertices have been colored, at eighbor, the best solu egrade, however, if we make the starting tempera which point the coloring is output without any attempt to Ie latter are kept as s too low. Cutoffs allow us to save time while still improve it by perturbation. In the next section, we IC part ofthe code. ng a margin of safety in the starting temperature. describe several such algorithms because they illustrate ibility that the ' the addition of cutoffs, our generic algorithm annealing's competition, and they provide the basic in ly mirrors the annealing structure implicit in sights that can lead us to simulated annealing implemen kpatrick's original code. tations. each implementation that we discuss, we shall 2.1. Successive Augmentation Heuristics ribe the relevant subroutines and specify the values en for the parameters. We shall also discuss how Perhaps the simplest example ofa successive augmenta )Sf(5). 3is implemented, be it by the "trial run" approach tion heuristic is the "sequential" coloring algorithm ratio for graph partitioning in Part I, or more ad hoc (denoted in what follows by SEQ). Assume that the ods. vertices are labeled VI' ... ,Vno We color the vertices in order. Vertex vI is assigned to color class C1, and wing: thereafter, vertex Vi is assigned to the lowest indexed color class that contains no vertices adjacent to Vi (i.e., graph coloring problem does not seem at first to be no vertices u such that {u, Vi}EE). This algorithm Jllowing: me candidate for heuristics based on local optimiza performs rather poorly in the worst case; 3-colorable , and indeed none of the standard heuristics for it graphs may end up with Q(n) colors (Johnson). For '5 been ofthis type. Recall that in the graph coloring random graphs with edge probability p = 0.5, however, lem we are given agraph G= (V, E), and asked to it is expected (asymptotically as n = IV Igets large) to apartition of V into a minimum number of color use no more than 2· x(G), i.e., twice the optimal num es C1,C2, •. .,Ck, where no two vertices u and v berofcolors (Grimmet and McDiarmid). No polynomial beinthe samecolorclass ifthere is an edge between time heuristic has been proved to have better average , i.e., if E contains the edge {u, v}. The minimum case behavior. (The best worst case bound proved for a ible number of color classes for G is called the polynomial time heuristic is only a slight improvement omatic number of G and denoted by x(G). Graph over the worst case bound for SEQ: Berger and Rompel ring has widespread applications, many having to do (1990) improve on constructions of Johnson (1974) and with scheduling (in situations where the vertices of G Wigderson (1983) to construct an algorithm that will mOdel the events being scheduled, with conflicts between never use more than O(n(loglog nIlog n)3) times the evrnts represented by edges) (de Werra 1985, Leighton optimal number ofcolors.) W79, Opsut and Roberts 1981). Experimentally, however, SEQ is outperformed on Because graph coloring is NP-hard, it is unlikely that average by a variety of other successive augmentation efficient optimization algorithms for it exist (i.e., algo algorithms, among the best of which are the DSATUR rithms guaranteed to find optimal colorings quickly) algorithm of Brelaz and the Recursive Largest First I. (Garey andJohnson 1979). The practical question is thus (RLF) algorithm of Leighton. The former dynamically that of developing heuristic algorithms that find near chooses the vertex to color next, picking one that is optimal colorings quickly. Even here, there are adjacent to the largest number ofdistinctly colored ver Complexity-theoretic obstacles. Garey and Johnson tices. The latter colors the vertices one color class at a 382 / JOHNSON ET AL. time, in the following "greedy" fashion. Let C be the 60 Problem-Specific Det next color class to be constructed, V' be the set of neighborhood structure, as-yet-uncolored vertices, and U be an initially empty NU 50 1000-VERTEXRANDOMGRAPH tioning implementation set of uncolored vertices that cannot legally be placed MB solutions and penalty fun E 40 in C. R partition of V int 0 F 30 Cl,CZ, ... ,Ck, I~k~ 1. Choose a vertex VOEV' that has the maximum num 0C color classes or not. Tw ber of edges to other vertices in V'. Place Vo in C R0L 20 one can be transformed and move all uEV' that are adjacent to Vo from V' N1 from one color class to G to U. S 10 n n neighbor, we will rand 2. While V' remains nonempty, do the following: o J 'I J r class COLD' a vertex v Choose a vertex vE V' that has a maximum num 10I0 10I5 11I0 11I5 12I0 12I5 13I0 13I5 1~i ~k +I, where k ber of edges to vertices in U. Add v to C and RLF(10.7Min) DSATUR(2.1Min) SEQ(l.lMin) classes. The neighbor is move all UEV' that are adjacent to v from V' Figure 2. Histogram ofthe colorings found by running class Ci. If i = k +1 tl to U. new, previously empty ( each of RLF, DSATUR and SEQ for 100 Let GR be the residual graph induced by the vertices different starting permutations ofthe vertices we try again. Note that left uncolored after C is formed (i.e., the vertices in U of a typical G1,000.0.s random graph. (The of v toward vertices in ; when V' has finally been emptied). The goal of this average times per run on aSequent processor presumably desirable bl procedure is to make C large while assuring that GR has are given in parentheses.) such classes. as many edges eliminated from it as possible, with the The key to making a additional constraint that since Vo has to be in some borhood structure is th< color class it might as well be in this one. substantial cost in running time. Its average of 107.9 here is where we adapt To get a feel for the relative effectiveness and effi colors is also betterthan the results reported inJohri and which constructs its coil ciency ofthese three algorithms, let us first see how they Matula for other successive augmentation algorithms, tine for generating lar do on what has become a standard test case for graph suchas DSATURWith Interchange (111.4) and Smallest function has two compO! coloring heuristics, the 1,000-vertex random graph. In Last With Interchange (115.0). classes, the second fav the notation ofPart I, this is G 1.000.OS' the 1,000-vertex None ofthese algorithms, however, uses close to the (Cl,... ,Ck) be a soluti graph obtained by letting a pair {u, v} of vertices be an optimal number of colors for G,,000,o.s' which is esti ofedges from E both 01 edge with probability p = 0.5, independently for each mated to be about 85 by Johri and Matula. (This is only the set of badedges in pair. Although unlikely to have much relevance to prac a heuristic estimate. All that can currently be said with k tical applications of graph coloring, such a test bed has rigor is that x(G"ooo,os) ~ 80 with very high probabil cost(11) = - L ICil' the pedagogical advantage that results for it seem to be ity, as shown by Bollobas and Thomason.) It appears i=1 stable (behavior on one graph of this type is a good likely that, if we want to approach 85 colors, we will An important observ. predictor for behavior on any other) and that different need much larger running times than used by the typical that all its local minim: heuristics yield decidedly different results for it. Papers successive augmentation algorithms. Moreover, given To seethis, suppose thai that have used this graph as a prime example include the narrow variance in results of such algorithms, as endpoint ofone ofthe 1 JohriandMatula(1982), BollobasandThomason (1985), typified by the histograms in Figure 2, the approach of that moving v from C Morgenstern and Shapiro (1986), Chams, Hertz and de performing multiple runs of anyone seems unlikely to +' Ck reduces the cost Werra (1987). (We shall subsequently consider a selec yield significantly bettercolorings even ifalargeamount second component of tI tion of other types of graphs, but the well-studied oftime is available. Thus, the way is open for computa increasing the first by a G 1.000.0.Sgraph provides aconvenient setting in which to tion-intensive approaches such as simulated annealing. introduce our ideas.) 2.2. Three Simulated Annealing Implementations Figure 2 presents a histogrammatic comparison ofthe three algorithms on a typical G1.000,0.s random graph. Despite the lack oftraditional neighborhood search algo Observe that this cost f Although none of the three algorithms is, as defined, a rithms for graph coloring, the problem has proved a the number of k of c randomized algorithm, each depends, for tie-breaking if surprisingly fertile area for simulated annealing imple minimize this as a sid nothing else, on the initial permutation of the vertices. mentations. We will describe and compare three serious function. The use of s By varying that permutation, one can get different re candidates. counted for more than I sults, and the figure plots histograms obtained for the annealing, from the g results of each for 100 different initial permutations. 2.2.1. The Penalty Function Approach tioned before to pre Note that each algorithm produces colorings within a We begin with the historically first ofthe three, the one compaction (e.g., see tight range, that the ranges do not overlap, and that RLF with which we began our studies in 1983. This approach 1983, Vecchi and Kirkr is significantly better than the other two, albeit at a was motivated in part by the success ofRLF. Golden 1988). Noteals( Graph Coloring bySimulatedAnnealing / 383 Problem-Specific Details. Consider the following classes the cost function is biased toward colorings that neighborhood structure, one that, as in the graph parti are unbalanced, rather than ones where all classes are :TEXRANDOMGRAPH tioning implementation of Part I, involves infeasible approximately the same size. Consequently, an optimal solutions and penalty functions. A solution will be any solution with respect to this cost function need not use partition of V into nonempty disjoint sets the minimum possible number of colors, although in C"C2,··· ,Ck, 1~ k ~ 1V I, whether the C; are legal practice, this does not seem to be a major drawback. color classes or not. Two solutions will be neighbors if Moreover, the bias may be profitable in certain applica one can be transformed to the other by moving a vertex tions. For instance, colorings for which ~ IC;12 is from one color class to another. To generate a random maximized are precisely what is needed in a scheme for neighbor, we will randomly pick a (nonempty) color generating error-correcting codes due to Brouwer et al. class COLD' a vertex VECOLD' and then an integer i, (1990), and our annealing software has been useful in 125 I~O 1I35'"! I~i ~k +1, where k is the current number of color this application, as reported in that paper. (Aarts and vlin) SEQ(1.1Min) classes. The neighbor is obtained by moving v to color Korst (1989) describe an alternative cost function with )rings found by running'; class C;. If i = k + 1 this means that v is moved to a respect to which solutions ofminimum cost do have the LUR and SEQ for 100 new, previously empty class. If v is already in class C; minimum possible number of colors, but their function we try again. Note that this procedure biases our choice has other drawbacks.) nutations ofthe vertices'. of v toward vertices in smaller color classes, but this is To complete the specification of the problem-specific ,5 random graph. (Thd, presumably desirable because it is our goal to empty details in our penalty function implementation, we must on aSequent processor\' l such classes. say how we generate initial solutions. One possibility ses.) The key to making annealing work using this neigh would be to start with all the vertices in a single class borhood structure is the cost function we choose, and CI; the other extreme would be to start each vertex in its ~. Its average of 107.9' here is where we adapt the general philosophy of RLF, own unique one-element class. On the basis of limited Its reported in Johri and which constructs its colorings with the aid of a subrou experiments, an intermediate approach seems reason Igmentation algorithms,~ tine for generating large independent sets. Our cost able, in which one assigns the vertices randomly to Ige (111.4) and Smallest function has two components, the first favors large color CHROM_EST classes, where CHROM_EST is a classes, the second favors independent sets. Let II = rough estimate of the chromatic number. The neighbor Never, uses close to the' (CI,... ,Ck) be a solution, and E;, 1~i ~ k be the set hood size returned is then CHROM_EST' IV I, a :11,000,0.5' which is esti-:i ofedges from E both ofwhose endpoints are in C;, i.e., good estimate ofthe number ofneighbors a solution will ld Matula, (This is only the set of badedges in C;. We then have have toward the end of the annealing schedule. We did 1 currently be said with. not follow this approach precisely, however. For our k k ",ith very high probabilc cost(II) = - L C; 2+ L 2 C; E;I· G\,000,0.5 graph, we set CHROM_EST= 90, a reason I 1 1 I . I Thomason.) It appears ;=\ ;=\ able estimate, but then for simplicity we left it at tl1is mch 85 colors, we will value in our experiments with other graphs, even though An important observation about this cost function is than used by the typical. in some cases 90 was a substantial over or underesti that all its local minima correspond to legal colorings. hms, Moreover, given·; mate. Given tl1at we were varying SIZEFACTOR any To seethis, suppose that E; is nonempty, and let v be an of such algorithms, as; way, errors in neighborhood size were not deemed to be endpoint of one of the bad edges contained in E;. Note gure 2, the approach of significant, and fixing CHROM_EST left us with one that moving v from C; to the previously empty class lone seems unlikely to' less parameter to worry about. Ck+I reduces the cost function because we reduce the seven ifalarge amount'; second component of the cost by at least 2IC;I while Nevertheless, the effective neighborhood size (the ay is open for computa-" number of neighbors that a near-optimal solution can increasing the first by at most ssimulated annealing. have) can be substantially bigger than that for graph ing Implementations IC;I2 - (( IC;I- 1)2+ 12) = 2 IC;I- 2. pvaerrttiitcieosn)i.ngHienreP,arthteI (hwighheerer itthweacshsroimmpaltyicthneunmubmebr,erthoef ighborhood search algo- ' Observe that this cost function does not explicitly count bigger the effective neighborhood size gets. Assuming, problem has proved a the number of k of color classes in II; we hope to as our experiments with graph partitioning suggest, that ulated annealing imple minimize this as a side-effect of minimizing the cost the number oftrials we should perform at each tempera d compare three serious function. The use of such indirect techniques has ac ture must be a sizeable multiple of this neighborhood counted for more than one practical success claimed for size, we can see that we are in for much larger running annealing, from the graph partitioning problem men times than we encountered with graph partitioning: for ~pproach tioned before to problems of circuit layout and G I,000,0.5 the running times might blow up by afactor of rst ofthe three, the one compaction (e.g., see Kirkpatrick, Gelatt and Vecchi 90 or more! in 1983. This approach 1983, Vecchi and Kirkpatrick 1983, Collins, Eglese and As with graph partitioning, however, the time for cess ofRLF, Golden 1988). Note also that for agiven number ofcolor proposing and accepting moves is manageable. If we 384 / JOHNSON ET AL. store the graph in adjacency matrix form and the color Part I. (Limited experiments with higher starting tem CHROM_EST= 100 (slJ classes as doubly-linked lists, the time to proposea move peratures yielded longer running times but no better ard value, although this h is proportional to the sizes of the two color classes solutions.) To further reduce running time, we used picture), TEMPTFACT, involved, and the time to accept a proposed move is cutoffwith CUTOFF= 0.10. TOR = 2. So that more' constant. (Our data structures are optimized for dense For our termination condition, we set MINPER can be seen, we used an il graphs with relatively small color classes, as these are CENT= 2%, allowing for the likelihood that a certain than the value of 10 used common in applications and are the main ones we study small number ofzero-cost moves will always be possible temperature 10 was rea in this paper. For sparser graphs with larger color and hence accepted, and set FREEZE_LIM = 5. Trials)/10,000 reached 91 classes, it may be more appropriate to represent the Finally, rather than perform explicit exponentiation each picture, cutoffs were turm graph in adjacency list form and maintain an array that time we need the value e-t1/T, we use the table-lookup manually once it was clea specifies the color of each vertex. The average time for method described in Part I for quickly obtaining a close The top display shows I proposing a move would then be proportional to the approximation. (This method is also used in our other sets in the current partiti average vertex degree.) two annealing implementations.) Running times were (the first data point), th( then varied by changing the values of TEMPFACTOR jumped from the initial I Penalty Function Local Optimization. Before we turn and SIZEFACTOR. and from then on it increa to the implementation details in the generic part of the peak of 233 before decli algorithm, let us briefly examinehow well this neighbor The Dynamics of Penalty Function Annealing. Fig Interestingly, this behavi( hood scheme performs in the context of pure local ure 3 presents "time exposures" of an annealing run movementofthe "curren! optimization. In our implementa~ion of local optimiza under this scheme on our G1.000,0.5 graph with display, which is consiste tion based on this scheme, we' limit ourselves to a this lack ofcorrelation lie maximum of 200 color classes, thus giving us at most 240 r------------:;----------r-------, "random neighbor." Re 200IV I possible moves from any given partition. We ~ 220 nonempty color class and M start with a random assignment of the vertices to 200 ~ 200 class to move. This intro color classes, and a random permutation ofthe 200IV I Ro 180 bers of small classes. Sin possible moves. We then examine each move in turn, F 160 size of the chosen class t 2C performing the move only if it results in a net decrease 140 temperatures, small classe o in cost. Once all 200IV I moves have been considered, ~ 120 they are filled. Indeed, we re-permute them and try again, repeating this proce 100 temperature at which 99~ dure until for some permutation of the moves, none is 25K (here we start at roughl) accepted. Then we know we have reached a locally 20K would have been fluctual 5 optimal solution and stop. 15K As the temperature drop~ R For our standard GI,OOO.0.5 random graph, we per ~ 10K that penalizes "badedges formed 100 runs of this local optimization algorithm. ~ 5K ~up the number of colors The results, although better than what was obtainable in ofC 0 \edges for the cOmponenl -5K practice by pure sequential coloring, were unimpressive: PenaltyFunctionAnnealing ~wards big color classes t -10K the average time per run was 37.3 minutes (slower than -15K l~Colors back down. RLF), but the median solution used 117 colors (worse 80 "~y. The appearance of the thanbothRLF andthe much faster DSATURalgorithm). EP 'by a vertical line throu R No solutions were found using fewer than 115 colors. C 60 "Current cost (middle ( E N "fc Fortunately, this neighborhood structure is better for T '!n0notonically, there is A 40 simulated annealing than for local optimization. C illccurring slightly to the I C tE did not see in the time 20 Generic Details of Penalty Function Annealing. Al E 'artitioning. Bumps of tl D though all our annealing implementations follow the ,fpenalty function anne: 500 1000 1500 2000 generic outline ofFigure 1, certain parameters and rou (NUMBEROFTRIALS)/IO,OOO slope for the curve, tines therein must be specified before the description of ange in acceptance rat, any given implementation is complete. For penalty func Figure 3. Three views ofthe evolution ofan annealing ttom display.) Anneal tion annealing, we obtained our starting temperature by run for a G1.000.0.5 graph under the penalty ,ight suggest that such trial and error, discovering that a single initial tempera function annealing scheme. (The time at n" (Kirkpatrick, Gela ture usually sufficed for a given class of graphs. In the which the first legal coloring was encoun ereis no goodexplanati caseof 0n.0.5 graphs, an initial temperature of10tended tered is marked by a vertical line in all three ey can be exploited. to yield an initial acceptance ratio between 0.3 and 0.4, displays. Temperature was reduced by a fac The total running tin which seemed adequate based on the experiments of tor of0.95 every 20 data points.) 'bout 11 hours. This w( Graph Coloring bySimulatedAnnealing / 385 \lith higher starting OM_EST= 100 (slightly higher than our stand cantly had we used cutoffs and the lower standard initial ing times but no alue, although this has no significant effect on the temperature of 10, but it is already less than the 17.9 e running time, we; re), TEMPTFACTOR = 0.95 and SIZEFAC hours it took to perform 100 runs of RLF. (Recall that == 2. So that more of the total annealing process RLF never used fewer than 105 colors, 3 more than we :on, we set MIN', eseen, we used an initial temperature of96, rather needed here). By further increasing the running time (via likelihood that a c' the value of 10 used in the later experiments. The increased values for TEMPFACTOR and SIZEFAC ~s will always be p rature 10 was reached when the Number of TOR), still better colorings are obtainable by this ~t FREEZE_LIM' s)/lO,OOO reached 918. Also, for the sake ofa full approach. As we shall see in Section 2.4, it is possible Jlicit exponentiation' re, cutoffs were turned offand the run was stopped with this approach to get colorings using as few as 91 we use the table-Ioo ally once it was clear that convergence had set in. colors, ifone is willing to spend 182 hours. quickly obtaining a c: etop display shows the evolution ofthe number of , also used in our o' in the current partition. Note that by trial 10,000 2.2.2. The Kempe Chain Approach ;.) Running times w first data point), the number of sets has already Preliminary reports of the penalty function implementa ues of TEMPFACTj. ed from the initial 100 to something close to 140, tion and the results for it inspired Morgenstern and from then on it increases more or less smoothly to a Shapiro to proposed the following alternative, which of 233 before declining to a final value of 102. retains the cost function but makes a majorchange in the nction Annealing. E estingly, this behavior does not correlate with the neighborhood structure. :s" of an annealing , ement ofthe "current cost" presented in the middle Gl,{lOO,0.5 graph lay, which is consistently declining. The reason for Problem-Specific Details. Solutions are now restricted 'slack ofcorrelation lies in our method for choosing a to be partitions C1, ... ,Ck that are legal colorings, i.e., ,,~ndom neighbor." Recall that we pick a random, are such that no edge has both endpoints in the same 'bonempty color class and then a random member ofthat class. (Note that this means that all the sets E; of bad class to move. This introduces a bias toward the mem edges are empty, and so the cost function simplifies to ~rs of small classes. Since a move always reduces the just - I:~=1IC;I2.) In order to ensure that moves pre size ofthe chosen class by one, this means that at high serve the legality of the coloring, Morgenstern and lemperatures, small classes will tend to empty faster than Shapiro go to a much more complex sort of move, one ~ey are filled. Indeed, had we started our run at a involving Kempe chains. t~~perature at which 99%of the moves were accepted Suppose that C and D are disjoint independent sets in (liyre we start at roughly 75%), the number of colors a graph G. A Kempe chain for C and D is any '~~uld have been fluctuating between 30 and 40 or so. connected component in the subgraph of G induced by ~§ the temperature drops, the part of the cost function CUD. Let X Ll Y denote the symmetric difference 'thatpenalizes "bad edges" begins to take effect, driving (X- Y) U(Y- X) between two sets X and Y. The qpthe number ofcolors until there are few enough bad key observation is that if H is a Kempe chain for lldges for the component of the cost function that re disjoint independent sets C and D, then CLl Hand 'wards big color classes to begin driving the number of D LlH are themselves disjoint independent sets whose colors back down. union is CUD. This suggests the following move gen .The appearance of the first legal coloring is marked eration procedure: Randomly choose a nonempty color ,by a vertical line through the display. Although the class C and a vertex vEC, as in the penalty function current cost (middle display) declines more-or-less approach. Then randomly choose a nonempty colorclass monotonically, there is a definite bump in the curve D other than C, and let H be the Kempe chain for C occurring slightly to the left ofthat line, a bump that we and D that contains v. Repeat the above procedure until did not see in the time exposures of Part I for graph one obtains C, v, D and H such that H* CUD (i.e., partitioning. Bumps of this sort regularly occur in runs such that H is not "full"), in which case the next ofpenalty function annealing. Unlike the other changes partition isobtained by replacing C by CLl Hand D by 1500 IALS)/JO.OOO in slope for the curve, they do not reflect a similar DLlH in the current one. (Using a full Kempe chain in change in acceptance rate. (The latter is depicted in the this operation simply changes the names of the two volution ofan annealing bottom display.) Annealers with physics backgrounds colors, a meaningless change, which is why we ignore graph under the penalty might suggest that such bumps indicate "phase transi such moves.) scheme. (The time at tion" (Kirkpatrick, Gelatt and Vecchi). Unfortunately, This procedure appears to be substantially more ex I coloring was encoun there is no goodexplanation ofwhy they ariseorwhether pensivethanthemovegenerationprocedureinthepenalty vertical line in all three they can be exploited. function approach, and it is. It also makes substantially e was reduced by a fac The total running time for the time exposure was bigger changes in the solutions, however, and so may be data points.) about 11 hours. This would have been reduced signifi- worth the extra effort. Moreover, it is not exorbitantly 386 / JOHNSON ET AL. expensive, at least for dense graphs. For such graphs the 350 4results from the fact th: color classes tend to be small, and so the following IN!t 300 ently with acceptance per techniquecanfind theKempechain H relativelyquickly. IBi: Whereas the initial teml 250 Whenever a new vertex u is added to H (including the ~ 200 acceptance rate under the first vertex LJ), we scan the members ofthe other color 6 a99% rate. Thus, much ( 150 class that are not yet in H and add to H each one that is Lo attoo high atemperature adjacent to u. Furthermore, we use two auxiliary tables RS 100 temperature T= 4, whid to help us whenever possible avoid the wasted time of 50 acceptance ratio of 70% constructing full Kempe chains that must be abandoned. -2K ~------------------ function approach, we < One stores for each pair C, D the time at which they number ofcolors as we C -4K were last discovered to have a full Kempe chain; the RU Note also that the curve other stores for each C the time at which it was last N~ -6K "current cost" are simil: T modified. When C and D are first chosen, we check to C -8K tion approach the corres sseinecewhtheethlearstwteimheavtheeyseewneraefmuloldKifeiemdp,eancdhaiinf sfoo,ratbhaenm T0S-10K KempeChainArmealing ttihaellyc.osTthdereecltihneesn,uhmebreertI don their further consideration immediately. -12K valuesandthen undergo < To complete the specification of the problem-specific 100 chain implementation, Ii1 p details of the Kempe chain approach, we must say how E tion approach, is biase< instance "size" is determined and how initial solutions ECR 80 small color classes for N are generated. We perform random sequential colorings T 60 move is roughly as like A for both purposes. When an instance is read, we perform CC 40 chosen class as to decrea arandomsequentialcoloring, and return thesize K IV I, PE to empty at high tempera where K is the numberofcolors used in the coloring. A TDE 20 ber is high rather than 10 second random sequential coloring is performed each A third observation a 500 1000 1500 2000 time a new initial solution is requested. (NUMBEROFTRIALS)!I,000 Figure 4 is that the soh Figure 4. Three views ofthe evolution ofan annealing percentage of accepted I Kempe Chain Local Optimization. As with the penalty penalty function run of function approach, the Kempe chain approach can serve run for a GUJOO,O.5 graph under the Kempe annealing implementatior chain annealing scheme. (The temperature as the basis for a local optimization algorithm. Here, solution value is not sel was reduced by afactor of0.95 every 20data because we are restricted to legal colorings, we limit the points.) quite low, In the run 01 total allowable number of colors to 150, yielding coloring does not appear 150IV I possible moves. Initial solutions are generated to about 0.5%, For the 1 by random sequential colorings as just specified. Other acceptance rate still hm wise, the details are the same as for penalty function function approach). For comparison purposes, we also number of colors is fin local optimization, as outlined previously. The results set the neighborhood size to the same 100,000 value attributable to the topog were also similar (and similarly mediocre). The average used in the previous run, and used the same starting particular the structure 0 running time over 100 runs was 33.3 minutes (slightly temperature T= 96. Afirst observation is thatthe curves colorings. Such a colori better than for the penalty function approach but still are significantly more irregular than those for penalty Kempe changes that impr much slower than RLF), and the median number of function annealing, For the most part, this may be it the same) than it will colors was again 117. attributable to the reduced value ofSIZEFACTOR, since ally change color withou this means that each data point represents 1,000 rather moves under the penalty The Dynamics of Kempe Chain Annealing. Before than 10,000 trials, and so successive data points may be Moreover, in explanal describing the generic parameters governing thestart and correlated more closely. There may, however, be a cursion in the acceptanc( finish of a run, let us compare the operation of Kempe different reason for the extreme excursion in the accept likely to have far more chain annealing to the penalty function approach. Figure ance rate curve. Such excursions occur in the tails of This inhomogeneity oftl 4 presents "time exposures" for Kempe chain anneal other Kempe chain runs, but at random places (unlike ac(;ep,tarlce rate at conver ing, analogous to those in Figure 3 for the penalty the regularly occurring smooth bump in the middle of run, going as high as function approach. All parameters except SIZEFACTOR the "current cost" curve for penalty function annealing the next, This makes it ( were given the same values as in the penalty function in Figure 3), Here the excursion seems attributable to the gence parameters ofour run; because of the greater expense of the moves here, topography of the Kempe chain solution space, as we vative in the experim< we reduced SIZEFACTOR from 200 to 20 (and the run shall hypothesize in more detail below. MINPERCENT= 15% still took 18 hours, as opposed to the 11 for the penalty A second difference between the runs in Figures 3and up to 10 before termina Graph Coloring bySimulatedAnnealing / 387 results from the fact that temperature correlates differ temperatures by manual trial and error, observing as ~$t1y with acceptance percentage under the two regimes, with penalty function annealing that the same initial , ereas the initial temperature T= 96 yields a 75% temperature seems to work well across entire classes of ;~~~eptance rate under the earlier approach, here it yields graphs. For random graphs with p = 0.5, we use an ~99% rate. Thus, muchofthe time in Figure4is wasted initial temperature of T= 5, which generally yields an kttoo high atemperature. Ifwe instead choose astarting initial acceptance rate between 50% and 80%. As with femperature T= 4, which yields approximately the same penalty function annealing, we use CUTOFF= 0.10 in ~cceptance ratio of 70% as did T= 96 for the penalty our main experiments.) function approach, we converge to roughly the same A final observation about the run in Figure 4 (and Il~mber of colors as we do here, but in only six hours. presumably the most important) is that the numh"f of Note also that the curves for "number of colors" and colors to which the run converges is 94, as oppC·" to ·"current cost" are similar, whereas in the penalty func- 101 for the penalty function approach. The running time ·lion approach the corresponding curves differ substan is somewhat longer, 17.9 hours versus 11, but in 17.9 rBallY. There the numberofcolors initially increase while hours the fewest number ofcolors we have been able to 'the cost declines, here they both jump quickly to high obtain with the penalty function approach, even using ·valuesand then undergo correlateddeclines. (OurKempe cutoffs, is only 98. In the 182 hours it takes penalty chain implementation, like the one for the penalty func function annealing to find a 9l-coloring, Kempe chain tion approach, is biased toward choosing vertices in annealing can find one using only 89 colors, and, as we small color classes for recoloring. Here, however, a shall see in Section 2.4, it can do even better withjust a move is roughly as likely to increase the size of the bit more time. Thus, it appears that the extra complexity chosen class as to decrease it. Thus, classes do not tend ofthe neighborhood structure for Kempe chain annealing toempty at high temperatures, and the equilibrium num can more than pay for itself, and we shall confirmthis in beris high rather than low.) the more extensive experiments that follow. A third observation about the Kempe chain run of J RIALS)/!,000 Figure 4 is that the solution cost converges while the 2.2.3. The Fixed-K Approach ;:volution ofan annealiIi percentage of accepted moves is still rather high. The Our final annealing implementation is derived from a graph under the Kern' penalty function run of Figure 3 is typical of most paper by Chams, Hertz and de Werra, and solves a heme. (The temperatur annealing implementations we have seen in that the best slightly different problem. Instead ofattempting to mini ~torof0.95 every 20da solution value is not seen until the acceptance rate is mize the number ofcolors used in a legal coloring, this quite low. In the run of Figure 3, the first legal 102 approach attempts to minimize the number ofmonochro coloring does not appear until the acceptance rate drops matic edges in a not-necessarily-legal coloring with a to about 0.5%. For the Kempe chain run, however, the fixed number ofcolor classes. acceptance rate still hovers around 7% when its best lfison purposes, we als: \lumber of colors is first encountered. This is largely Problem-Specific Details. Given a graph G = (V, E) the same 100,000 valu attributable to the topography of the solution space, in and anumberofcolors K, the solutions areall partitions I used the same startin particular the structure ofthe neighborhoods of "good" of V into K sets (empty sets are allowed), and the cost ervation is that the curve colorings. Such a coloring is likely to have far more ofa solution is simply the total number ofedges that do lr than those for penalt Kempe changes that improve its "cost" (orat least leave nothaveendpoints indifferentclasses(the "badedges"). most part, this may b, it the same) than it will have vertices that can individu A partition IT2 is a neighbor ofapartition ITI ifthe two :ofSIZEFACTOR, sine, ally change color without negative effect (the analogous partitions differ only as to the location ofa single vertex t represents 1,000 rathe moves under the penalty function approach). v, and v is an endpoint ofa bad edge in ITI. ~ssive data points may b' Moreover, in explanation of the abovementioned ex Note that here the neighbor relation is not symmetric; re may; however, be cursion in the acceptance rate, some good colorings are in particular, alegal coloringhas no neighbors because it ~ excursion in the accept likely to have far more "good" neighbors than others. has no bad edges. This is of course no problem, for if Jns occur in the tails 0 This inhomogeneity ofthe solution space means that the everthe annealing process finds alegal coloring, there is at random places (unlik acceptance rate at convergence can vary wildly from run no point inproceeding any further. Limitedexperimenta 1 bump in the middle 0 torun, going as high as 15%one time and as low as 5% tion indicates that this neighborhood structure is much enalty function annealing the next. This makes it difficult to fine-tune the conver more effective than the less-restrictive one in which 1seems attributable tothe: gence parameters of our implementation. To be conser v need not be an endpoint of a bad edge. (The less in solution space, as we, vative in the experiments to be reported, we set restrictive neighborhood was essential in our penalty Ibelow. MINPERCENT= 15% and allow jreezecount to go function adaptation, since the goal was to reduce the the runs in Figures 3and up to 10 before terminating. (We again set our initial number of color classes, which might entail emptying
Description: