Optimization by Simulated Annealing: An Experimental Evaluation; Part II, Graph Coloring and Number Partitioning Author(s): David S. Johnson, Cecilia R. Aragon, Lyle A. McGeoch, Catherine Schevon Source: Operations Research, Vol. 39, No. 3 (May - Jun., 1991), pp. 378-406 Published by: INFORMS Stable URL: http://www.jstor.org/stable/171393 Accessed: 11/03/2010 16:17 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=informs. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Operations Research. http://www.jstor.org OPTIMIZATIONB Y SIMULATEDA NNEALING:A N EXPERIMENTAL EVALUATION;P ART 11,G RAPH COLORINGA ND NUMBERP ARTITIONING DAVIDS . JOHNSON A T&T Bell Laboratories, Murray Hill, New Jersey CECILIAR . ARAGON University of California, Berkeley, California LYLEA . MCGEOCH Amherst College, Amherst, Massachusetts CATHERINES CHEVON Johns Hopkins University, Baltimore, Maryland (Received February 1989; revision received June 1990; accepted September 1990) This is the second in a series of three papers that empirically examine the competitiveness of simulated annealing in certain well-studied domains of combinatorialo ptimization. Simulateda nnealing is a randomizedt echniquep roposed by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi for improving local optimization algorithms. Here we report on experiments at adapting simulated annealing to graph coloring and number partitioning, two problems for which local optimizationh ad not previously been thought suitable. For graph coloring, we report on three simulated annealing schemes, all of which can dominate traditional techniques for certain types of graphs, at least when large amounts of computing time are available. For number partitioning,s imulated annealing is not competitive with the differencing algorithmo f N. Karmarkara nd R. M. Karp, except on relatively small instances. Moreover, if running time is taken into account, naturala nnealing schemes cannot even outperform multiple random runs of the local optimization algorithms on which they are based, in sharp contrast to the observed performance of annealing on other problems. Simulated annealingi s a new approacht o the approxi- In Part I (Johnson et al. 1989), we describe the mate solution of difficult combinational optimization simulated annealing approach and its motivation, and problems. It was originally proposed by Kirkpatrick, report on extensive experiments with it in the context of Gelatt and Vecchi (1983) and Cerny (1985), who re- the graph partitioning problem (given a graph G= ported promising results based on sketchy experiments. (V, E), find a partition of the vertices into two equal- Since then there has been an immense outpouring of sized sets VI and V2, which minimizes the number of papers on the topic, as documented in the extensive edges with endpoints in both sets). We were concerned bibliographies of Collins, Eglese and Golden (1988) and with two main issues: 1) how the various choices made Van Laarhoven and Aarts (1987). The question of how in adapting simulated annealing to a particular problem well annealing stacks up against its more traditional affect its performance, and 2) how well an optimized competition has remained unclear, however, for a vari- annealing implementation for graph partitioning com- ety of important applications. The series of papers, petes against the best of the more traditional algorithms of which this is the second, attempts to rectify this for the problem. (For graph partitoning, the answer to situation. the second question was mixed: simulated annealing Subject classifications: Mathematics, combinatorics: number partitioning heuristics. Networks/graphs, heuristics: graph coloring heuristics. Simula- tion, applications: optimization by simulated annealing. Operations Research s030-364X/9 /3903-0000 $01.25 Vol. 39, No. 3, May-June 1991 378 ? 1991 Operations Research Society of America Graph Coloring by SimulatedAnnealing / 379 tends to dominate traditional techniques on random quality. Thus, traditional local optimization algorithms graphs as the size and/or density of the graphs increases, are not competitive with other techniques for this prob- but was roundly beaten on graphs with built-in geometric lem, in particular the "differencing" algorithm of structure.) Karmarkar and Karp (1982). Consequently, it seems In this paper, we consider the same issues in the unlikely that simulated annealing, which in essence is a context of two additional well-studied, NP-hard combi- method for improving local optimization, can offer natorial optimization problems: graph coloring and enough of an improvement to bridge the gap. Our exper- number partitioning. These problems were chosen be- iments verify this intuition. Moreover, they show that cause they have been studied extensively, but neither had for this problem even multiple random-start local traditionally been approached using local optimization, optimization outperforms simulated annealing, a the algorithmic template upon which simulated annealing phenomenon we have not observed in any of the is based. other annealing implementations we have studied The graph coloring problem has widespread applica- (even the mediocre ones). tions in areas such as scheduling and timetabling, e.g., Although some familiarity with simulated annealing see Leighton (1979), Opsut and Roberts (1981), and de will be helpful in reading this paper, our intention is that Werra (1985). We are given a graph G = (V, E), and it be self-contained. In particular, although we shall asked to find the minimum k such that the vertices of G frequently allude to Part I for background material, the can be partitioned into k color classes VI, . . ., Vk, none reader should be able to understand the results we pre- of which contains both endpoints of any edge in E. sent here without reference to that paper. The remainder Here, the apparent neglect of local optimization in the of this paper is organized as follows. In Section 1, we past may not have been totally justified. By changing the briefly outline the generic annealing algorithm that is the definition of the cost of a solution, and possibly by basis for our implementations, as developed in Part I of extending the notion of what a solution is, one can come this paper. Sections 2 and 3 are devoted to graph color- up with a variety of plausible proposals for local opti- ing and number partitioning, respectively. Section 4 mization that might yield good colorings as a side effect concludes with a brief summary and a preview of the of attempting to minimize the new cost. We investigate third and final paper in this series, which will cover our annealing schemes based on three of these proposals: experiments in applying simulated annealing to the infa- 1) a penalty-function approach that originated with the mous traveling salesman problem. current authors, 2) a variant that uses Kempe chain All running times quoted in this paper are for an interchanges and was devised by Morgenstern and individual processor of a Sequent BalanceTM2 1000 mul- Shapiro (1986), and 3) a more recent and somewhat ticomputer, running under the DynixTM operating system orthogonal approach due to Chams, Hertz and de Werra (Balance and Dynix are trademarks of Sequent Computer (1987). None of these versions can be counted on to Systems, Inc.). Comparable times would be obtained on create good colorings quickly, but if one has large a VAXTM 750 without a floating point accelerator run- amounts of time available, they appear to be competitive ning under UnixTM (VAX is a trademark of the Digital with (and often to dominate) alternative CPU-intensive Equipment Corporation; Unix is a trademark of AT&T approaches. (Not one of the three annealing approaches Bell Laboratories). These are slow machines by modern dominates the other two across the board.) standards; speedups by factors of 10 or greater are The second problem we study, number partitioning, possible with currently available workstations. This was chosen less for its applications than for the severe should be kept in mind when evaluating some of the challenges it presents to the simulated annealing ap- larger running times reported, and we shall have more to proach. In this problem, one is given a sequence of real say about it in the Conclusion. numbers a,, a2, . .., an in the interval [0, 1], and asked to partition them into two sets AI and A2 such that 1. THE GENERICA NNEALINGA LGORITHM Both local optimization and simulated annealing require E ai - E ai that the problem to which they are applied be describable aiA 1 aiEA2 as follows: For each instance I of the problem, there is a is minimized. The challenge of this problem is that the set F of solutions, each solution S having a cost c(S). natural "neighborhood structures" for it, those in which The goal is to find a solution of minimum cost. (Note neighboring solutions differ as to the location of only one that both problems mentioned in the Introduction have or two elements, have exceedingly "mountainous" ter- this form.) rain, in which neighboring solutions differ widely in In order to adapt either of the two approaches to such 380 / JOHNSON ET AL. a problem, one must additionally define an auxiliary All our annealing implementations start with the neighborhood graph on the space of solutions for a parameterized generic annealing algorithm summarized given instance. This is a directed graph whose vertices in Figure 1. This generic procedure relies on several are the solutions, with the neighbors of a solution S problem-specific subroutines. They are READ_ being those solutions S' for which (S, S') is an arc in the INSTANCE(), INITIAL_SOLUTION( ), neighborhood graph. NEXT_CHANGE( ), CHANGE_SOLN( ) and A local optimization algorithm uses this structure to FINAL_SOLN( ). In addition, the procedure is para- find a solution as follows: Starting with an initial solu- meterized by the variables INITPROB, SIZEFA C- tion S generated by other means, it repeatedly attempts TOR, CUTOFF, TEMPFA CTOR, FREEZE_LIM to find a better solution by moving to a neighbor with and MINPERCENT. (See Part I for observations about lower cost, until it reaches a solution none of whose the best values for these parameters and the interactions neighbors have a lower cost. Such a solution is called between them.) Note that the generic algorithm never locally optimal. Simulated annealing is motivated by the deals with the solutions themselves, only their costs. The desire to avoid getting trapped in poor local optima, and current solution, its proposed neighbor, the best solution hence, occasionally allows "uphill moves" to solutions found so far, and the cost of the latter are kept as static of higher cost, doing this under the guidance of a control variables in the problem-specific part of the code. As in parameter called the temperature. Part I, we allow for the possibility that the "solution 1. Call READ_INSTANCE() to read input, compute an upper bound c* on the optimal solution value, and returnt he average neighborhood size N. 2. Call INITIAL SOLUTION() to generate an initial solution S and returnc = cost (S). 3. Choose an initial temperatureT > O so that in what follows the changesl trials ratio starts out approximately equal to INITPROB. 4. Set freezecount = 0. 5. While freezecount <FREEZE_LIM (i.e., while not yet "frozen") do the following: 5.1 Set changes = trials = 0. While trials < SIZEFACTORN and changes < CUTOFF-N, do the following: 5.1.1 Set trials = trials + 1. 5.1.2 Call NEXT_CHANGEo to generate a random neighbor S' of S and return c' = cost (S'). 5.1.3 Let A = c'- c. 5.1.4 If A ? 0 (downhill move), Set changes = changes + 1 and c = c'. Call CHANGE SOLN() to set S =S' and, if S' is feasible and cost (S') < c*, to set S* = S' and c* = cost (S'). 5.1.5 If A > 0 (uphill move), Choose a random number r in [0,1]. If r ? e/AT (i.e., with probability e-A/T), Set changes = changes + 1 and c = c'. Call CHANGE_SOLNo. 5.2 Set T = TEMPFACTOR T (reduce temperature). If c* was changed during 5. 1, set freezecount = 0. If changesl trials < MINPERCENT,s et freezecount = freezecount + 1. 6. Call FINAL SOLN() to output P*. Figure 1. The generic simulated annealing algorithm. Graph Coloring by SimulatedA nnealing / 381 space" may have been expanded to include more than (1976) show that for any r < 2, it is NP-hard to con- just the feasible solutions to the original problem, and struct colorings using no more than rX(G) colors. Fortu- thus, our algorithm is careful to output the best feasible nately, NP-hardness is a worst case measure, and does solution found, rather than simply the best solution. not rule out the possibility of heuristics that work well in The only substantive difference between the algorithm practice. There have thus been many attempts to devise summarized in Figure 1 and the generic algorithm of such heuristics, e.g., see Welsh and Powell (1967), Part I is the inclusion here of cutoffs to limit the time Matula, Marble and Isaacson (1972), Johnson (1974), spent at high temperatures (where, say, 50% or more of Grimmet and McDiarmid (1975), Brelaz (1979), the moves are accepted). As we observe in Part I for Leighton (1979), and Johri and Matula (1982). Until graph partitioning, and confirm in preliminary experi- recently, the literature has concentrated almost exclu- ments for the problems studied here, time spent at high sively on heuristics that use a technique we might call temperatures does not appear to contribute much to the successive augmentation, as opposed to local optimiza- quality of the final solution. One way to limit this time is tion. In this approach, a partial coloring is extended, simply to start at lower temperatures. Solution quality vertex by vertex, until all vertices have been colored, at can degrade, however, if we make the starting tempera- which point the coloring is output without any attempt to ture too low. Cutoffs allow us to save time while still improve it by perturbation. In the next section, we leaving a margin of safety in the starting temperature. describe several such algorithms because they illustrate With the addition of cutoffs, our generic algorithm annealing's competition, and they provide the basic in- closely mirrors the annealing structure implicit in sights that can lead us to simulated annealing implemen- Kirkpatrick's original code. tations. In each implementation that we discuss, we shall 2.1. Successive Augmentation Heuristics describe the relevant subroutines and specify the values chosen for the parameters. We shall also discuss how Perhaps the simplest example of a successive augmenta- Step 3 is implemented, be it by the "trial run" approach tion heuristic is the "sequential" coloring algorithm used for graph partitioning in Part I, or more ad hoc (denoted in what follows by SEQ). Assume that the methods. vertices are labeled v1, . . ., vn. We color the vertices in order. Vertex vl is assigned to color class Cl, and thereafter, vertex vi is assigned to the lowest indexed 2. GRAPHC OLORING color class that contains no vertices adjacent to vi (i.e., The graph coloring problem does not seem at first to be no vertices u such that {u , vi} eE). This algorithm a prime candidate for heuristics based on local optimiza- performs rather poorly in the worst case; 3-colorable tion, and indeed none of the standard heuristics for it graphs may end up with Q(n) colors (Johnson). For have been of this type. Recall that in the graph coloring random graphs with edge probability p = 0.5, however, problem we are given a graph G = (V, E), and asked to it is expected (asymptotically as n = I V I gets large) to find a partition of V into a minimum number of color use no more than 2 x(G), i.e., twice the optimal num- classes C1, C2,... , Ck, where no two vertices u and v ber of colors (Grimmet and McDiarmid). No polynomial can be in the same color class if there is an edge between time heuristic has been proved to have better average them, i.e., if E contains the edge { u, v}. The minimum case behavior. (The best worst case bound proved for a possible number of color classes for G is called the polynomial time heuristic is only a slight improvement chromatic number of G and denoted by x(G). Graph over the worst case bound for SEQ: Berger and Rompel coloring has widespread applications, many having to do (1990) improve on constructions of Johnson (1974) and with scheduling (in situations where the vertices of G Wigderson (1983) to construct an algorithm that will model the events being scheduled, with conflicts between never use more than O(n(log log n/log n)3) times the events represented by edges) (de Werra 1985, Leighton optimal number of colors.) 1979, Opsut and Roberts 1981). Experimentally, however, SEQ is outperformed on Because graph coloring is NP-hard, it is unlikely that average by a variety of other successive augmentation efficient optimization algorithms for it exist (i.e., algo- algorithms, among the best of which are the DSATUR rithms guaranteed to find optimal colorings quickly) algorithm of Brelaz and the Recursive Largest First (Garey and Johnson 1979). The practical question is thus (RLF) algorithm of Leighton. The former dynamically that of developing heuristic algorithms that find near- chooses the vertex to color next, picking one that is optimal colorings quickly. Even here, there are adjacent to the largest number of distinctly colored ver- complexity-theoretic obstacles. Garey and Johnson tices. The latter colors the vertices one color class at a 382 / JOHNSONE T AL. time, in the following "greedy" fashion. Let C be the 60 - next color class to be constructed, V' be the set of as-yet-uncolored vertices, and U be an initially empty NU 50 - 1000-VERTEXR ANDOMG RAPH M set of uncolored vertices that cannot legally be placed BE 40- R in C. 0 130 - 1. Choose a vertex voc V' that has the maximum num- C 0 L ber of edges to other vertices in V'. Place vo in C RO 20 and move all use V' that are adjacent to vo from V' N G to U. S 10 2. While V' remains nonempty, do the following: 0- Choose a vertex v e V' that has a maximum num- 100 105 110 115 120 125 130 135 ber of edges to vertices in U. Add v to C and RLF (10.7 Min) DSATUR(2.1M in) SEQ (1.1 Min) move all u e V' that are adjacent to v from V' Figure 2. Histogram of the colorings found by running to U. each of RLF, DSATUR and SEQ for 100 Let GR be the residual graph induced by the vertices different starting permutations of the vertices left uncolored after C is formed (i.e., the vertices in U of a typical random graph. (The G1o,0.5 when V' has finally been emptied). The goal of this average times per run on a Sequent processor procedure is to make C large while assuring that GR has are given in parentheses.) as many edges eliminated from it as possible, with the additional constraint that since vo has to be in some color class it might as well be in this one. substantial cost in running time. Its average of 107.9 To get a feel for the relative effectiveness and effi- colors is also better than the results reported in Johri and ciency of these three algorithms, let us first see how they Matula for other successive augmentation algorithms, do on what has become a standard test case for graph such as DSATUR With Interchange (111.4) and Smallest coloring heuristics, the 1,000-vertex random graph. In Last With Interchange (1 15.0). the notation of Part I, this is G, 00 5, the 1,000-vertex None of these algorithms, however, uses close to the graph obtained by letting a pair { u, v} of vertices be an optimal number of colors for G1 which is esti- ,OW0.5, edge with probability p = 0.5, independently for each mated to be about 85 by Johri and Matula. (This is only pair. Although unlikely to have much relevance to prac- a heuristic estimate. All that can currently be said with tical applications of graph coloring, such a test bed has rigor is that X(GI W0.5) ? 80 with very high probabil- the pedagogical advantage that results for it seem to be ity, as shown by Bolloba's and Thomason.) It appears stable (behavior on one graph of this type is a good likely that, if we want to approach 85 colors, we will predictor for behavior on any other) and that different need much larger running times than used by the typical heuristics yield decidedly different results for it. Papers successive augmentation algorithms. Moreover, given that have used this graph as a prime example include the narrow variance in results of such algorithms, as Johri and Matula (1982), Bolloba's and Thomason (1985), typified by the histograms in Figure 2, the approach of Morgenstern and Shapiro (1986), Chams, Hertz and de performing multiple runs of any one seems unlikely to Werra (1987). (We shall subsequently consider a selec- yield significantly better colorings even if a large amount tion of other types of graphs, but the well-studied of time is available. Thus, the way is open for computa- G1 0.5 graph provides a convenient setting in which to tion-intensive approaches such as simulated annealing. introduce our ideas.) 2.2. Three Simulated Annealing Implementations Figure 2 presents a histogrammatic comparison of the three algorithms on a typical G1 0w 0.5 random graph. Despite the lack of traditional neighborhood search algo- Although none of the three algorithms is, as defined, a rithms for graph coloring, the problem has proved a randomized algorithm, each depends, for tie-breaking if surprisingly fertile area for simulated annealing imple- nothing else, on the initial permutation of the vertices. mentations. We will describe and compare three serious By varying that permutation, one can get different re- candidates. sults, and the figure plots histograms obtained for the 2.2.1. The Penalty Function Approach results of each for 100 different initial permutations. Note that each algorithm produces colorings within a We begin with the historically first of the three, the one tight range, that the ranges do not overlap, and that RLF with which we began our studies in 1983. This approach is significantly better than the other two, albeit at a was motivated in part by the success of RLF. Graph Coloring by SimulatedA nnealing / 383 Problem-Specific Details. Consider the following classes the cost function is biased toward colorings that neighborhood structure, one that, as in the graph parti- are unbalanced, rather than ones where all classes are tioning implementation of Part I, involves infeasible approximately the same size. Consequently, an optimal solutions and penalty functions. A solution will be any solution with respect to this cost function need not use partition of V into nonempty disjoint sets the minimum possible number of colors, although in Cl, C2,. - - Ck, 1 < k < I V 1, whether the Ci are legal practice, this does not seem to be a major drawback. color classes or not. Two solutions will be neighbors if Moreover, the bias may be profitable in certain applica- one can be transformed to the other by moving a vertex tions. For instance, colorings for which E I Ci 12 iS from one color class to another. To generate a random maximized are precisely what is needed in a scheme for neighbor, we will randomly pick a (nonempty) color generating error-correcting codes due to Brouwer et al. class COLD, a vertex VeCOLD, and then an integer i, (1990), and our annealing software has been useful in 1 < i < k + 1, where k is the current number of color this application, as reported in that paper. (Aarts and classes. The neighbor is obtained by moving v to color Korst (1989) describe an alternative cost function with class Ci. If i = k + 1 this means that v is moved to a respect to which solutions of minimum cost do have the new, previously empty class. If v is already in class C1 minimum possible number of colors, but their function we try again. Note that this procedure biases our choice has other drawbacks.) of v toward vertices in smaller color classes, but this is To complete the specification of the problem-specific presumably desirable because it is our goal to empty details in our penalty function implementation, we must such classes. say how we generate initial solutions. One possibility The key to making annealing work using this neigh- would be to start with all the vertices in a single class borhood structure is the cost function we choose, and Cl; the other extreme would be to start each vertex in its here is where we adapt the general philosophy of RLF, own unique one-element class. On the basis of limited which constructs its colorings with the aid of a subrou- experiments, an intermediate approach seems reason- tine for generating large independent sets. Our cost able, in which one assigns the vertices randomly to function has two components, the first favors large color CHROM_EST classes, where CHROM_EST is a classes, the second favors independent sets. Let H = rough estimate of the chromatic number. The neighbor- (Cl ... . Ck) be a solution, and Ei, 1 < i < k be the set hood size returned is then CHROM_EST- I V 1, a of edges from E both of whose endpoints are in Ci, i.e., good estimate of the number of neighbors a solution will the set of bad edges in Ci. We then have have toward the end of the annealing schedule. We did not follow this approach precisely, however. For our k k cost(H)= - I3 Cil2+ >2 Cil * IE il. G1,000, 0.5 graph, we set CHROM_EST= 90, a reason- i=l i~=l able estimate, but then for simplicity we left it at this value in our experiments with other graphs, even though An important observation about this cost function is in some cases 90 was a substantial over or underesti- that all its local minima correspond to legal colorings. mate. Given that we were varying SIZEFACTOR any- To see this, suppose that Ei is nonempty, and let v be an way, errors in neighborhood size were not deemed to be endpoint of one of the bad edges contained in Ei. Note significant, and fixing CHROM_EST left us with one that moving v from Ci to the previously empty class less parameter to worry about. Ck+1 reduces the cost function because we reduce the Nevertheless, the effective neighborhood size (the second component of the cost by at least 2 l Ci I while number of neighbors that a near-optimal solution can increasing the first by at most have) can be substantially bigger than that for graph ICi 2- ((ICCi - 1)2+ 12) =2jCij -2. partitioning in Part I (where it was simply the number of vertices). Here, the higher the chromatic number, the Observe that this cost function does not explicitly count bigger the effective neighborhood size gets. Assuming, the number of k of color classes in HI; we hope to as our experiments with graph partitioning suggest, that minimize this as a side-effect of minimizing the cost the number of trials we should perform at each tempera- function. The use of such indirect techniques has ac- ture must be a sizeable multiple of this neighborhood counted for more than one practical success claimed for size, we can see that we are in for much larger running annealing, from the graph partitioning problem men- times than we encountered with graph partitioning: for tioned before to problems of circuit layout and GI 05 the running times might blow up by a factor of compaction (e.g., see Kirkpatrick, Gelatt and Vecchi 90 or more! 1983, Vecchi and Kirkpatrick 1983, Collins, Eglese and As with graph partitioning, however, the time for Golden 1988). Note also that for a given number of color proposing and accepting moves is manageable. If we 384 / JOHNSONE T AL. store the graph in adjacency matrix form and the color Part I. (Limited experiments with higher starting tem- classes as doubly-linked lists, the time to propose a move peratures yielded longer running times but no better is proportional to the sizes of the two color classes solutions.) To further reduce running time, we used involved, and the time to accept a proposed move is cutoff with CUTOFF= 0.10. constant. (Our data structures are optimized for dense For our termination condition, we set MINPER- graphs with relatively small color classes, as these are CENT= 2%, allowing for the likelihood that a certain common in applications and are the main ones we study small number of zero-cost moves will always be possible in this paper. For sparser graphs with larger color and hence accepted, and set FREEZE_LIM = 5. classes, it may be more appropriate to represent the Finally, rather than perform explicit exponentiation each graph in adjacency list form and maintain an array that time we need the value e - A / T, we use the table-lookup specifies the color of each vertex. The average time for method described in Part I for quickly obtaining a close proposing a move would then be proportional to the approximation. (This method is also used in our other average vertex degree.) two annealing implementations.) Running times were then varied by changing the values of TEMPFACTOR Penalty Function Local Optimization. Before we turn and SIZEFA CTOR. to the implementation details in the generic part of the The Dynamics of Penalty Function Annealing. Fig- algorithm, let us briefly examine how well this neighbor- hood scheme performs in the context of pure local ure 3 presents "time exposures" of an annealing run optimization. In our implementation of local optimiza- under this scheme on our G1,000 0.5 graph with tion based on this scheme, we limit ourselves to a maximum of 200 color classes, thus giving us at most 240 200 l V I possible moves from any given partition. We NU 220 M start with a random assignment of the vertices to 200 color classes, and a random permutation of the 200 l V I 0180 possible moves. We then examine each move in turn, F 160 C performing the move only if it results in a net decrease 0 140 in cost. Once all 200 l V I moves have been considered, R0 120 S we re-permute them and try again, repeating this proce- dure until for some permutation of the moves, none is 25K accepted. Then we know we have reached a locally 20K optimal solution and stop. UR l5K For our standard G1,00.5 random graph, we per- RNE 105KK - formed 100 runs of this local optimization algorithm. C 0 The results, although better than what was obtainable in 0S -5K practice by pure sequential coloring, were unimpressive: K PenaltyF unctionA nnealing the average time per run was 37.3 minutes (slower than -15K RLF), but the median solution used 117 colors (worse 80 P than both RLF and the much faster DSATUR algorithm). E R No solutions were found using fewer than 115 colors. CE 60 N Fortunately, this neighborhood structure is better for T A 40 simulated annealing than for local optimization. C C E TP' 20 Generic Details of Penalty Function Annealing. Al- E D though all our annealing implementations follow the 0 500 1000 150 2~000 generic outline of Figure 1, certain parameters and rou- (NUMBERO P TRIALS)/10,000 tines therein must be specified before the description of any given implementation is complete. For penalty func- Figure 3. Three views of the evolution of an annealing tion annealing, we obtained our starting temperature by run for a G graph under the penalty 1000,0. trial and error, discovering that a single initial tempera- function annealing scheme. (The time at ture usually sufficed for a given class of graphs. In the which the first legal coloring was encoun- case of Gn 0.5 graphs, an initial temperature of 10 tended tered is marked by a vertical line in all three to yield an initial acceptance ratio between 0.3 and 0.4, displays. Temperature was reduced by a fac- which seemed adequate based on the experiments of tor of 0.95 every 20 data points.) Graph Coloring by SimulatedA nnealing / 385 CHROM_EST= 100 (slightly higher than our stand- cantly had we used cutoffs and the lower standard initial ard value, although this has no significant effect on the temperature of 10, but it is already less than the 17.9 picture), TEMPTFACTOR = 0.95 and SIZEFAC- hours it took to perform 100 runs of RLF. (Recall that TOR = 2. So that more of the total annealing process RLF never used fewer than 105 colors, 3 more than we can be seen, we used an initial temperature of 96, rather needed here). By further increasing the running time (via than the value of 10 used in the later experiments. The increased values for TEMPFACTOR and SIZEFAC- temperature 10 was reached when the Number of TOR), still better colorings are obtainable by this Trials)/10,000 reached 918. Also, for the sake of a full approach. As we shall see in Section 2.4, it is possible picture, cutoffs were turned off and the run was stopped with this approach to get colorings using as few as 91 manually once it was clear that convergence had set in. colors, if one is willing to spend 182 hours. The top display shows the evolution of the number of sets in the current partition. Note that by trial 10,000 2.2.2. The Kempe Chain Approach (the first data point), the number of sets has already Preliminary reports of the penalty function implementa- jumped from the initial 100 to something close to 140, tion and the results for it inspired Morgenstern and and from then on it increases more or less smoothly to a Shapiro to proposed the following alternative, which peak of 233 before declining to a final value of 102. retains the cost function but makes a major change in the Interestingly, this behavior does not correlate with the neighborhood structure. movement of the "current cost" presented in the middle display, which is consistently declining. The reason for Problem-Specific Details. Solutions are now restricted this lack of correlation lies in our method for choosing a to be partitions Cl, ... . Ck that are legal colorings, i.e., "random neighbor." Recall that we pick a random, are such that no edge has both endpoints in the same nonempty color class and then a random member of that class. (Note that this means that all the sets Ei of bad class to move. This introduces a bias toward the mem- edges are empty, and so the cost function simplifies to bers of small classes. Since a move always reduces the just _-Sk= I I Ci 1 2.) In order to ensure that moves pre- size of the chosen class by one, this means that at high serve the legality of the coloring, Morgenstern and temperatures, small classes will tend to empty faster than Shapiro go to a much more complex sort of move, one they are filled. Indeed, had we started our run at a involving Kempe chains. temperature at which 99% of the moves were accepted Suppose that C and D are disjoint independent sets in (here we start at roughly 75%), the number of colors a graph G. A Kempe chain for C and D is any would have been fluctuating between 30 and 40 or so. connected component in the subgraph of G induced by As the temperature drops, the part of the cost function C U D. Let XiA Y denote the symmetric difference that penalizes "bad edges" begins to take effect, driving (X- Y)U(Y-X) between two sets X and Y. The up the number of colors until there are few enough bad key observation is that if H is a Kempe chain for edges for the component of the cost function that re- disjoint independent sets C and D, then C A H and wards big color classes to begin driving the number of DAH are themselves disjoint independent sets whose colors back down. union is C U D. This suggests the following move gen- The appearance of the first legal coloring is marked eration procedure: Randomly choose a nonempty color by a vertical line through the display. Although the class C and a vertex v e C, as in the penalty function current cost (middle display) declines more-or-less approach. Then randomly choose a nonempty color class monotonically, there is a definite bump in the curve D other than C, and let H be the Kempe chain for C occurring slightly to the left of that line, a bump that we and D that contains v. Repeat the above procedure until did not see in the time exposures of Part I for graph one obtains C, v, D and H such that H # C U D (i.e., partitioning. Bumps of this sort regularly occur in runs such that H is not "full"), in which case the next of penalty function annealing. Unlike the other changes partition is obtained by replacing C by C A H and D by in slope for the curve, they do not reflect a similar D A H in the current one. (Using a full Kempe chain in change in acceptance rate. (The latter is depicted in the this operation simply changes the names of the two bottom display.) Annealers with physics backgrounds colors, a meaningless change, which is why we ignore might suggest that such bumps indicate "phase transi- such moves.) tion" (Kirkpatrick, Gelatt and Vecchi). Unfortunately, This procedure appears to be substantially more ex- there is no good explanation of why they arise or whether pensive than the move generation procedure in the penalty they can be exploited. function approach, and it is. It also makes substantially The total running time for the time exposure was bigger changes in the solutions, however, and so may be about 11 hours. This would have been reduced signifi- worth the extra effort. Moreover, it is not exorbitantly 386 / JOHNSON ET AL. expensive, at least for dense graphs. For such graphs the 350 N color classes tend to be small, and so the following U 300 M B technique can find the Kempe chain H relatively quickly. RE 8 250 Whenever a new vertex u is added to H (including the F 200 first vertex v), we scan the members of the other color C o 150 - class that are not yet in H and add to H each one that is L 0 adjacent to u. Furthermore, we use two auxiliary tables RS 100 to help us whenever possible avoid the wasted time of 50 constructing full Kempe chains that must be abandoned. -2K One stores for each pair C, D the time at which they A C -4K were last discovered to have a full Kempe chain; the U other stores for each C the time at which it was last N -EK , modified. When C and D are first chosen, we check to TTs K --* ep hi neln -8K see whether we have seen a full Kempe chain for them S T -10K Kempe ChainA nnealing since the last time they were modified, and if so, aban- don their further consideration immediately. - 12K To complete the specification of the problem-specific 100 p details of the Kempe chain approach, we must say how RE 80 C instance "size" is determined and how initial solutions E N are generated. We perform random sequential colorings T 60 A for both purposes. When an instance is read, we perform CC 40 - E a random sequential coloring, and return the size K I V l, p T 20 - where K is the number of colors used in the coloring. A second random sequential coloring is performed each 0 0 500 1000 1500 2000 time a new initial solution is requested. (NUMBER OF TRIALS)/1,000 Figure 4. Three views of the evolution of an annealing Kempe Chain Local Optimization. As with the penalty function approach, the Kempe chain approach can serve run for a G1 00, 0.5 graph under the Kempe chain annealing scheme. (The temperature as the basis for a local optimization algorithm. Here, was reduced by a factor of 0.95 every 20 data because we are restricted to legal colorings, we limit the points.) total allowable number of colors to 150, yielding 150 l V I possible moves. Initial solutions are generated by random sequential colorings as just specified. Other- wise, the details are the same as for penalty function function approach). For comparison purposes, we also local optimization, as outlined previously. The results set the neighborhood size to the same 100,000 value were also similar (and similarly mediocre). The average used in the previous run, and used the same starting running time over 100 runs was 33.3 minutes (slightly temperature T = 96. A first observation is that the curves better than for the penalty function approach but still are significantly more irregular than those for penalty much slower than RLF), and the median number of function annealing. For the most part, this may be colors was again 117. attributable to the reduced value of SIZEFACTOR, since this means that each data point represents 1,000 rather The Dynamics of Kempe Chain Annealing. Before than 10,000 trials, and so successive data points may be describing the generic parameters governing the start and correlated more closely. There may, however, be a finish of a run, let us compare the operation of Kempe different reason for the extreme excursion in the accept- chain annealing to the penalty function approach. Figure ance rate curve. Such excursions occur in the tails of 4 presents "time exposures" for Kempe chain anneal- other Kempe chain runs, but at random places (unlike ing, analogous to those in Figure 3 for the penalty the regularly occurring smooth bump in the middle of function approach. All parameters except SIZEFACTOR the "current cost" curve for penalty function annealing were given the same values as in the penalty function in Figure 3). Here the excursion seems attributable to the run; because of the greater expense of the moves here, topography of the Kempe chain solution space, as we we reduced SIZEFACTOR from 200 to 20 (and the run shall hypothesize in more detail below. still took 18 hours, as opposed to the 11 for the penalty A second difference between the runs in Figures 3 and
Description: