Table Of ContentOptimization by Simulated Annealing: An Experimental Evaluation; Part II, Graph Coloring
and Number Partitioning
Author(s): David S. Johnson, Cecilia R. Aragon, Lyle A. McGeoch, Catherine Schevon
Source: Operations Research, Vol. 39, No. 3 (May - Jun., 1991), pp. 378-406
Published by: INFORMS
Stable URL: http://www.jstor.org/stable/171393
Accessed: 11/03/2010 16:17
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=informs.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Operations Research.
http://www.jstor.org
OPTIMIZATIONB Y SIMULATEDA NNEALING:A N EXPERIMENTAL
EVALUATION;P ART 11,G RAPH COLORINGA ND
NUMBERP ARTITIONING
DAVIDS . JOHNSON
A T&T Bell Laboratories, Murray Hill, New Jersey
CECILIAR . ARAGON
University of California, Berkeley, California
LYLEA . MCGEOCH
Amherst College, Amherst, Massachusetts
CATHERINES CHEVON
Johns Hopkins University, Baltimore, Maryland
(Received February 1989; revision received June 1990; accepted September 1990)
This is the second in a series of three papers that empirically examine the competitiveness of simulated annealing in certain
well-studied domains of combinatorialo ptimization. Simulateda nnealing is a randomizedt echniquep roposed by S. Kirkpatrick,
C. D. Gelatt and M. P. Vecchi for improving local optimization algorithms. Here we report on experiments at adapting
simulated annealing to graph coloring and number partitioning, two problems for which local optimizationh ad not previously
been thought suitable. For graph coloring, we report on three simulated annealing schemes, all of which can dominate
traditional techniques for certain types of graphs, at least when large amounts of computing time are available. For number
partitioning,s imulated annealing is not competitive with the differencing algorithmo f N. Karmarkara nd R. M. Karp, except on
relatively small instances. Moreover, if running time is taken into account, naturala nnealing schemes cannot even outperform
multiple random runs of the local optimization algorithms on which they are based, in sharp contrast to the observed
performance of annealing on other problems.
Simulated annealingi s a new approacht o the approxi- In Part I (Johnson et al. 1989), we describe the
mate solution of difficult combinational optimization simulated annealing approach and its motivation, and
problems. It was originally proposed by Kirkpatrick, report on extensive experiments with it in the context of
Gelatt and Vecchi (1983) and Cerny (1985), who re- the graph partitioning problem (given a graph G=
ported promising results based on sketchy experiments. (V, E), find a partition of the vertices into two equal-
Since then there has been an immense outpouring of sized sets VI and V2, which minimizes the number of
papers on the topic, as documented in the extensive edges with endpoints in both sets). We were concerned
bibliographies of Collins, Eglese and Golden (1988) and with two main issues: 1) how the various choices made
Van Laarhoven and Aarts (1987). The question of how in adapting simulated annealing to a particular problem
well annealing stacks up against its more traditional affect its performance, and 2) how well an optimized
competition has remained unclear, however, for a vari- annealing implementation for graph partitioning com-
ety of important applications. The series of papers, petes against the best of the more traditional algorithms
of which this is the second, attempts to rectify this for the problem. (For graph partitoning, the answer to
situation. the second question was mixed: simulated annealing
Subject classifications: Mathematics, combinatorics: number partitioning heuristics. Networks/graphs, heuristics: graph coloring heuristics. Simula-
tion, applications: optimization by simulated annealing.
Operations Research s030-364X/9 /3903-0000 $01.25
Vol. 39, No. 3, May-June 1991 378 ? 1991 Operations Research Society of America
Graph Coloring by SimulatedAnnealing / 379
tends to dominate traditional techniques on random quality. Thus, traditional local optimization algorithms
graphs as the size and/or density of the graphs increases, are not competitive with other techniques for this prob-
but was roundly beaten on graphs with built-in geometric lem, in particular the "differencing" algorithm of
structure.) Karmarkar and Karp (1982). Consequently, it seems
In this paper, we consider the same issues in the unlikely that simulated annealing, which in essence is a
context of two additional well-studied, NP-hard combi- method for improving local optimization, can offer
natorial optimization problems: graph coloring and enough of an improvement to bridge the gap. Our exper-
number partitioning. These problems were chosen be- iments verify this intuition. Moreover, they show that
cause they have been studied extensively, but neither had for this problem even multiple random-start local
traditionally been approached using local optimization, optimization outperforms simulated annealing, a
the algorithmic template upon which simulated annealing phenomenon we have not observed in any of the
is based. other annealing implementations we have studied
The graph coloring problem has widespread applica- (even the mediocre ones).
tions in areas such as scheduling and timetabling, e.g., Although some familiarity with simulated annealing
see Leighton (1979), Opsut and Roberts (1981), and de will be helpful in reading this paper, our intention is that
Werra (1985). We are given a graph G = (V, E), and it be self-contained. In particular, although we shall
asked to find the minimum k such that the vertices of G frequently allude to Part I for background material, the
can be partitioned into k color classes VI, . . ., Vk, none reader should be able to understand the results we pre-
of which contains both endpoints of any edge in E. sent here without reference to that paper. The remainder
Here, the apparent neglect of local optimization in the of this paper is organized as follows. In Section 1, we
past may not have been totally justified. By changing the briefly outline the generic annealing algorithm that is the
definition of the cost of a solution, and possibly by basis for our implementations, as developed in Part I of
extending the notion of what a solution is, one can come this paper. Sections 2 and 3 are devoted to graph color-
up with a variety of plausible proposals for local opti- ing and number partitioning, respectively. Section 4
mization that might yield good colorings as a side effect concludes with a brief summary and a preview of the
of attempting to minimize the new cost. We investigate third and final paper in this series, which will cover our
annealing schemes based on three of these proposals: experiments in applying simulated annealing to the infa-
1) a penalty-function approach that originated with the mous traveling salesman problem.
current authors, 2) a variant that uses Kempe chain All running times quoted in this paper are for an
interchanges and was devised by Morgenstern and individual processor of a Sequent BalanceTM2 1000 mul-
Shapiro (1986), and 3) a more recent and somewhat ticomputer, running under the DynixTM operating system
orthogonal approach due to Chams, Hertz and de Werra (Balance and Dynix are trademarks of Sequent Computer
(1987). None of these versions can be counted on to Systems, Inc.). Comparable times would be obtained on
create good colorings quickly, but if one has large a VAXTM 750 without a floating point accelerator run-
amounts of time available, they appear to be competitive ning under UnixTM (VAX is a trademark of the Digital
with (and often to dominate) alternative CPU-intensive Equipment Corporation; Unix is a trademark of AT&T
approaches. (Not one of the three annealing approaches Bell Laboratories). These are slow machines by modern
dominates the other two across the board.) standards; speedups by factors of 10 or greater are
The second problem we study, number partitioning, possible with currently available workstations. This
was chosen less for its applications than for the severe should be kept in mind when evaluating some of the
challenges it presents to the simulated annealing ap- larger running times reported, and we shall have more to
proach. In this problem, one is given a sequence of real say about it in the Conclusion.
numbers a,, a2, . .., an in the interval [0, 1], and asked
to partition them into two sets AI and A2 such that 1. THE GENERICA NNEALINGA LGORITHM
Both local optimization and simulated annealing require
E ai - E ai
that the problem to which they are applied be describable
aiA 1 aiEA2
as follows: For each instance I of the problem, there is a
is minimized. The challenge of this problem is that the set F of solutions, each solution S having a cost c(S).
natural "neighborhood structures" for it, those in which The goal is to find a solution of minimum cost. (Note
neighboring solutions differ as to the location of only one that both problems mentioned in the Introduction have
or two elements, have exceedingly "mountainous" ter- this form.)
rain, in which neighboring solutions differ widely in In order to adapt either of the two approaches to such
380 / JOHNSON ET AL.
a problem, one must additionally define an auxiliary All our annealing implementations start with the
neighborhood graph on the space of solutions for a parameterized generic annealing algorithm summarized
given instance. This is a directed graph whose vertices in Figure 1. This generic procedure relies on several
are the solutions, with the neighbors of a solution S problem-specific subroutines. They are READ_
being those solutions S' for which (S, S') is an arc in the INSTANCE(), INITIAL_SOLUTION( ),
neighborhood graph. NEXT_CHANGE( ), CHANGE_SOLN( ) and
A local optimization algorithm uses this structure to FINAL_SOLN( ). In addition, the procedure is para-
find a solution as follows: Starting with an initial solu- meterized by the variables INITPROB, SIZEFA C-
tion S generated by other means, it repeatedly attempts TOR, CUTOFF, TEMPFA CTOR, FREEZE_LIM
to find a better solution by moving to a neighbor with and MINPERCENT. (See Part I for observations about
lower cost, until it reaches a solution none of whose the best values for these parameters and the interactions
neighbors have a lower cost. Such a solution is called between them.) Note that the generic algorithm never
locally optimal. Simulated annealing is motivated by the deals with the solutions themselves, only their costs. The
desire to avoid getting trapped in poor local optima, and current solution, its proposed neighbor, the best solution
hence, occasionally allows "uphill moves" to solutions found so far, and the cost of the latter are kept as static
of higher cost, doing this under the guidance of a control variables in the problem-specific part of the code. As in
parameter called the temperature. Part I, we allow for the possibility that the "solution
1. Call READ_INSTANCE() to read input, compute an upper bound c* on the
optimal solution value, and returnt he average neighborhood size N.
2. Call INITIAL SOLUTION() to generate an initial solution S and returnc = cost (S).
3. Choose an initial temperatureT > O so that in what follows the changesl trials ratio
starts out approximately equal to INITPROB.
4. Set freezecount = 0.
5. While freezecount <FREEZE_LIM (i.e., while not yet "frozen") do the following:
5.1 Set changes = trials = 0.
While trials < SIZEFACTORN and changes < CUTOFF-N, do the following:
5.1.1 Set trials = trials + 1.
5.1.2 Call NEXT_CHANGEo to generate a random neighbor S' of S
and return c' = cost (S').
5.1.3 Let A = c'- c.
5.1.4 If A ? 0 (downhill move),
Set changes = changes + 1 and c = c'.
Call CHANGE SOLN() to set S =S' and, if S' is feasible
and cost (S') < c*, to set S* = S' and c* = cost (S').
5.1.5 If A > 0 (uphill move),
Choose a random number r in [0,1].
If r ? e/AT (i.e., with probability e-A/T),
Set changes = changes + 1 and c = c'.
Call CHANGE_SOLNo.
5.2 Set T = TEMPFACTOR T (reduce temperature).
If c* was changed during 5. 1, set freezecount = 0.
If changesl trials < MINPERCENT,s et freezecount = freezecount + 1.
6. Call FINAL SOLN() to output P*.
Figure 1. The generic simulated annealing algorithm.
Graph Coloring by SimulatedA nnealing / 381
space" may have been expanded to include more than (1976) show that for any r < 2, it is NP-hard to con-
just the feasible solutions to the original problem, and struct colorings using no more than rX(G) colors. Fortu-
thus, our algorithm is careful to output the best feasible nately, NP-hardness is a worst case measure, and does
solution found, rather than simply the best solution. not rule out the possibility of heuristics that work well in
The only substantive difference between the algorithm practice. There have thus been many attempts to devise
summarized in Figure 1 and the generic algorithm of such heuristics, e.g., see Welsh and Powell (1967),
Part I is the inclusion here of cutoffs to limit the time Matula, Marble and Isaacson (1972), Johnson (1974),
spent at high temperatures (where, say, 50% or more of Grimmet and McDiarmid (1975), Brelaz (1979),
the moves are accepted). As we observe in Part I for Leighton (1979), and Johri and Matula (1982). Until
graph partitioning, and confirm in preliminary experi- recently, the literature has concentrated almost exclu-
ments for the problems studied here, time spent at high sively on heuristics that use a technique we might call
temperatures does not appear to contribute much to the successive augmentation, as opposed to local optimiza-
quality of the final solution. One way to limit this time is tion. In this approach, a partial coloring is extended,
simply to start at lower temperatures. Solution quality vertex by vertex, until all vertices have been colored, at
can degrade, however, if we make the starting tempera- which point the coloring is output without any attempt to
ture too low. Cutoffs allow us to save time while still improve it by perturbation. In the next section, we
leaving a margin of safety in the starting temperature. describe several such algorithms because they illustrate
With the addition of cutoffs, our generic algorithm annealing's competition, and they provide the basic in-
closely mirrors the annealing structure implicit in sights that can lead us to simulated annealing implemen-
Kirkpatrick's original code. tations.
In each implementation that we discuss, we shall
2.1. Successive Augmentation Heuristics
describe the relevant subroutines and specify the values
chosen for the parameters. We shall also discuss how Perhaps the simplest example of a successive augmenta-
Step 3 is implemented, be it by the "trial run" approach tion heuristic is the "sequential" coloring algorithm
used for graph partitioning in Part I, or more ad hoc (denoted in what follows by SEQ). Assume that the
methods. vertices are labeled v1, . . ., vn. We color the vertices in
order. Vertex vl is assigned to color class Cl, and
thereafter, vertex vi is assigned to the lowest indexed
2. GRAPHC OLORING
color class that contains no vertices adjacent to vi (i.e.,
The graph coloring problem does not seem at first to be no vertices u such that {u , vi} eE). This algorithm
a prime candidate for heuristics based on local optimiza- performs rather poorly in the worst case; 3-colorable
tion, and indeed none of the standard heuristics for it graphs may end up with Q(n) colors (Johnson). For
have been of this type. Recall that in the graph coloring random graphs with edge probability p = 0.5, however,
problem we are given a graph G = (V, E), and asked to it is expected (asymptotically as n = I V I gets large) to
find a partition of V into a minimum number of color use no more than 2 x(G), i.e., twice the optimal num-
classes C1, C2,... , Ck, where no two vertices u and v ber of colors (Grimmet and McDiarmid). No polynomial
can be in the same color class if there is an edge between time heuristic has been proved to have better average
them, i.e., if E contains the edge { u, v}. The minimum case behavior. (The best worst case bound proved for a
possible number of color classes for G is called the polynomial time heuristic is only a slight improvement
chromatic number of G and denoted by x(G). Graph over the worst case bound for SEQ: Berger and Rompel
coloring has widespread applications, many having to do (1990) improve on constructions of Johnson (1974) and
with scheduling (in situations where the vertices of G Wigderson (1983) to construct an algorithm that will
model the events being scheduled, with conflicts between never use more than O(n(log log n/log n)3) times the
events represented by edges) (de Werra 1985, Leighton optimal number of colors.)
1979, Opsut and Roberts 1981). Experimentally, however, SEQ is outperformed on
Because graph coloring is NP-hard, it is unlikely that average by a variety of other successive augmentation
efficient optimization algorithms for it exist (i.e., algo- algorithms, among the best of which are the DSATUR
rithms guaranteed to find optimal colorings quickly) algorithm of Brelaz and the Recursive Largest First
(Garey and Johnson 1979). The practical question is thus (RLF) algorithm of Leighton. The former dynamically
that of developing heuristic algorithms that find near- chooses the vertex to color next, picking one that is
optimal colorings quickly. Even here, there are adjacent to the largest number of distinctly colored ver-
complexity-theoretic obstacles. Garey and Johnson tices. The latter colors the vertices one color class at a
382 / JOHNSONE T AL.
time, in the following "greedy" fashion. Let C be the 60 -
next color class to be constructed, V' be the set of
as-yet-uncolored vertices, and U be an initially empty NU 50 - 1000-VERTEXR ANDOMG RAPH
M
set of uncolored vertices that cannot legally be placed BE 40-
R
in C.
0
130 -
1. Choose a vertex voc V' that has the maximum num- C
0
L
ber of edges to other vertices in V'. Place vo in C RO 20
and move all use V' that are adjacent to vo from V' N
G
to U. S 10
2. While V' remains nonempty, do the following:
0-
Choose a vertex v e V' that has a maximum num-
100 105 110 115 120 125 130 135
ber of edges to vertices in U. Add v to C and RLF (10.7 Min) DSATUR(2.1M in) SEQ (1.1 Min)
move all u e V' that are adjacent to v from V'
Figure 2. Histogram of the colorings found by running
to U.
each of RLF, DSATUR and SEQ for 100
Let GR be the residual graph induced by the vertices different starting permutations of the vertices
left uncolored after C is formed (i.e., the vertices in U of a typical random graph. (The
G1o,0.5
when V' has finally been emptied). The goal of this average times per run on a Sequent processor
procedure is to make C large while assuring that GR has are given in parentheses.)
as many edges eliminated from it as possible, with the
additional constraint that since vo has to be in some
color class it might as well be in this one. substantial cost in running time. Its average of 107.9
To get a feel for the relative effectiveness and effi- colors is also better than the results reported in Johri and
ciency of these three algorithms, let us first see how they Matula for other successive augmentation algorithms,
do on what has become a standard test case for graph such as DSATUR With Interchange (111.4) and Smallest
coloring heuristics, the 1,000-vertex random graph. In Last With Interchange (1 15.0).
the notation of Part I, this is G, 00 5, the 1,000-vertex None of these algorithms, however, uses close to the
graph obtained by letting a pair { u, v} of vertices be an optimal number of colors for G1 which is esti-
,OW0.5,
edge with probability p = 0.5, independently for each mated to be about 85 by Johri and Matula. (This is only
pair. Although unlikely to have much relevance to prac- a heuristic estimate. All that can currently be said with
tical applications of graph coloring, such a test bed has rigor is that X(GI W0.5) ? 80 with very high probabil-
the pedagogical advantage that results for it seem to be ity, as shown by Bolloba's and Thomason.) It appears
stable (behavior on one graph of this type is a good likely that, if we want to approach 85 colors, we will
predictor for behavior on any other) and that different need much larger running times than used by the typical
heuristics yield decidedly different results for it. Papers successive augmentation algorithms. Moreover, given
that have used this graph as a prime example include the narrow variance in results of such algorithms, as
Johri and Matula (1982), Bolloba's and Thomason (1985), typified by the histograms in Figure 2, the approach of
Morgenstern and Shapiro (1986), Chams, Hertz and de performing multiple runs of any one seems unlikely to
Werra (1987). (We shall subsequently consider a selec- yield significantly better colorings even if a large amount
tion of other types of graphs, but the well-studied of time is available. Thus, the way is open for computa-
G1 0.5 graph provides a convenient setting in which to tion-intensive approaches such as simulated annealing.
introduce our ideas.)
2.2. Three Simulated Annealing Implementations
Figure 2 presents a histogrammatic comparison of the
three algorithms on a typical G1 0w 0.5 random graph. Despite the lack of traditional neighborhood search algo-
Although none of the three algorithms is, as defined, a rithms for graph coloring, the problem has proved a
randomized algorithm, each depends, for tie-breaking if surprisingly fertile area for simulated annealing imple-
nothing else, on the initial permutation of the vertices. mentations. We will describe and compare three serious
By varying that permutation, one can get different re- candidates.
sults, and the figure plots histograms obtained for the
2.2.1. The Penalty Function Approach
results of each for 100 different initial permutations.
Note that each algorithm produces colorings within a We begin with the historically first of the three, the one
tight range, that the ranges do not overlap, and that RLF with which we began our studies in 1983. This approach
is significantly better than the other two, albeit at a was motivated in part by the success of RLF.
Graph Coloring by SimulatedA nnealing / 383
Problem-Specific Details. Consider the following classes the cost function is biased toward colorings that
neighborhood structure, one that, as in the graph parti- are unbalanced, rather than ones where all classes are
tioning implementation of Part I, involves infeasible approximately the same size. Consequently, an optimal
solutions and penalty functions. A solution will be any solution with respect to this cost function need not use
partition of V into nonempty disjoint sets the minimum possible number of colors, although in
Cl, C2,. - - Ck, 1 < k < I V 1, whether the Ci are legal practice, this does not seem to be a major drawback.
color classes or not. Two solutions will be neighbors if Moreover, the bias may be profitable in certain applica-
one can be transformed to the other by moving a vertex tions. For instance, colorings for which E I Ci 12 iS
from one color class to another. To generate a random maximized are precisely what is needed in a scheme for
neighbor, we will randomly pick a (nonempty) color generating error-correcting codes due to Brouwer et al.
class COLD, a vertex VeCOLD, and then an integer i, (1990), and our annealing software has been useful in
1 < i < k + 1, where k is the current number of color this application, as reported in that paper. (Aarts and
classes. The neighbor is obtained by moving v to color Korst (1989) describe an alternative cost function with
class Ci. If i = k + 1 this means that v is moved to a respect to which solutions of minimum cost do have the
new, previously empty class. If v is already in class C1 minimum possible number of colors, but their function
we try again. Note that this procedure biases our choice has other drawbacks.)
of v toward vertices in smaller color classes, but this is To complete the specification of the problem-specific
presumably desirable because it is our goal to empty details in our penalty function implementation, we must
such classes. say how we generate initial solutions. One possibility
The key to making annealing work using this neigh- would be to start with all the vertices in a single class
borhood structure is the cost function we choose, and Cl; the other extreme would be to start each vertex in its
here is where we adapt the general philosophy of RLF, own unique one-element class. On the basis of limited
which constructs its colorings with the aid of a subrou- experiments, an intermediate approach seems reason-
tine for generating large independent sets. Our cost able, in which one assigns the vertices randomly to
function has two components, the first favors large color CHROM_EST classes, where CHROM_EST is a
classes, the second favors independent sets. Let H = rough estimate of the chromatic number. The neighbor-
(Cl ... . Ck) be a solution, and Ei, 1 < i < k be the set hood size returned is then CHROM_EST- I V 1, a
of edges from E both of whose endpoints are in Ci, i.e., good estimate of the number of neighbors a solution will
the set of bad edges in Ci. We then have have toward the end of the annealing schedule. We did
not follow this approach precisely, however. For our
k k
cost(H)= - I3 Cil2+ >2 Cil * IE il. G1,000, 0.5 graph, we set CHROM_EST= 90, a reason-
i=l i~=l able estimate, but then for simplicity we left it at this
value in our experiments with other graphs, even though
An important observation about this cost function is
in some cases 90 was a substantial over or underesti-
that all its local minima correspond to legal colorings.
mate. Given that we were varying SIZEFACTOR any-
To see this, suppose that Ei is nonempty, and let v be an
way, errors in neighborhood size were not deemed to be
endpoint of one of the bad edges contained in Ei. Note
significant, and fixing CHROM_EST left us with one
that moving v from Ci to the previously empty class
less parameter to worry about.
Ck+1 reduces the cost function because we reduce the
Nevertheless, the effective neighborhood size (the
second component of the cost by at least 2 l Ci I while
number of neighbors that a near-optimal solution can
increasing the first by at most
have) can be substantially bigger than that for graph
ICi 2- ((ICCi - 1)2+ 12) =2jCij -2. partitioning in Part I (where it was simply the number of
vertices). Here, the higher the chromatic number, the
Observe that this cost function does not explicitly count bigger the effective neighborhood size gets. Assuming,
the number of k of color classes in HI; we hope to as our experiments with graph partitioning suggest, that
minimize this as a side-effect of minimizing the cost the number of trials we should perform at each tempera-
function. The use of such indirect techniques has ac- ture must be a sizeable multiple of this neighborhood
counted for more than one practical success claimed for size, we can see that we are in for much larger running
annealing, from the graph partitioning problem men- times than we encountered with graph partitioning: for
tioned before to problems of circuit layout and GI 05 the running times might blow up by a factor of
compaction (e.g., see Kirkpatrick, Gelatt and Vecchi 90 or more!
1983, Vecchi and Kirkpatrick 1983, Collins, Eglese and As with graph partitioning, however, the time for
Golden 1988). Note also that for a given number of color proposing and accepting moves is manageable. If we
384 / JOHNSONE T AL.
store the graph in adjacency matrix form and the color Part I. (Limited experiments with higher starting tem-
classes as doubly-linked lists, the time to propose a move peratures yielded longer running times but no better
is proportional to the sizes of the two color classes solutions.) To further reduce running time, we used
involved, and the time to accept a proposed move is cutoff with CUTOFF= 0.10.
constant. (Our data structures are optimized for dense For our termination condition, we set MINPER-
graphs with relatively small color classes, as these are CENT= 2%, allowing for the likelihood that a certain
common in applications and are the main ones we study small number of zero-cost moves will always be possible
in this paper. For sparser graphs with larger color and hence accepted, and set FREEZE_LIM = 5.
classes, it may be more appropriate to represent the Finally, rather than perform explicit exponentiation each
graph in adjacency list form and maintain an array that time we need the value e - A / T, we use the table-lookup
specifies the color of each vertex. The average time for method described in Part I for quickly obtaining a close
proposing a move would then be proportional to the approximation. (This method is also used in our other
average vertex degree.) two annealing implementations.) Running times were
then varied by changing the values of TEMPFACTOR
Penalty Function Local Optimization. Before we turn and SIZEFA CTOR.
to the implementation details in the generic part of the
The Dynamics of Penalty Function Annealing. Fig-
algorithm, let us briefly examine how well this neighbor-
hood scheme performs in the context of pure local ure 3 presents "time exposures" of an annealing run
optimization. In our implementation of local optimiza- under this scheme on our G1,000 0.5 graph with
tion based on this scheme, we limit ourselves to a
maximum of 200 color classes, thus giving us at most 240
200 l V I possible moves from any given partition. We NU 220
M
start with a random assignment of the vertices to 200
color classes, and a random permutation of the 200 l V I 0180
possible moves. We then examine each move in turn, F 160
C
performing the move only if it results in a net decrease 0 140
in cost. Once all 200 l V I moves have been considered, R0 120
S
we re-permute them and try again, repeating this proce-
dure until for some permutation of the moves, none is 25K
accepted. Then we know we have reached a locally 20K
optimal solution and stop. UR l5K
For our standard G1,00.5 random graph, we per- RNE 105KK -
formed 100 runs of this local optimization algorithm.
C 0
The results, although better than what was obtainable in 0S -5K
practice by pure sequential coloring, were unimpressive: K PenaltyF unctionA nnealing
the average time per run was 37.3 minutes (slower than -15K
RLF), but the median solution used 117 colors (worse 80
P
than both RLF and the much faster DSATUR algorithm). E
R
No solutions were found using fewer than 115 colors. CE 60
N
Fortunately, this neighborhood structure is better for T
A 40
simulated annealing than for local optimization. C
C
E
TP' 20
Generic Details of Penalty Function Annealing. Al- E
D
though all our annealing implementations follow the
0 500 1000 150 2~000
generic outline of Figure 1, certain parameters and rou- (NUMBERO P TRIALS)/10,000
tines therein must be specified before the description of
any given implementation is complete. For penalty func- Figure 3. Three views of the evolution of an annealing
tion annealing, we obtained our starting temperature by run for a G graph under the penalty
1000,0.
trial and error, discovering that a single initial tempera- function annealing scheme. (The time at
ture usually sufficed for a given class of graphs. In the which the first legal coloring was encoun-
case of Gn 0.5 graphs, an initial temperature of 10 tended tered is marked by a vertical line in all three
to yield an initial acceptance ratio between 0.3 and 0.4, displays. Temperature was reduced by a fac-
which seemed adequate based on the experiments of tor of 0.95 every 20 data points.)
Graph Coloring by SimulatedA nnealing / 385
CHROM_EST= 100 (slightly higher than our stand- cantly had we used cutoffs and the lower standard initial
ard value, although this has no significant effect on the temperature of 10, but it is already less than the 17.9
picture), TEMPTFACTOR = 0.95 and SIZEFAC- hours it took to perform 100 runs of RLF. (Recall that
TOR = 2. So that more of the total annealing process RLF never used fewer than 105 colors, 3 more than we
can be seen, we used an initial temperature of 96, rather needed here). By further increasing the running time (via
than the value of 10 used in the later experiments. The increased values for TEMPFACTOR and SIZEFAC-
temperature 10 was reached when the Number of TOR), still better colorings are obtainable by this
Trials)/10,000 reached 918. Also, for the sake of a full approach. As we shall see in Section 2.4, it is possible
picture, cutoffs were turned off and the run was stopped with this approach to get colorings using as few as 91
manually once it was clear that convergence had set in. colors, if one is willing to spend 182 hours.
The top display shows the evolution of the number of
sets in the current partition. Note that by trial 10,000 2.2.2. The Kempe Chain Approach
(the first data point), the number of sets has already Preliminary reports of the penalty function implementa-
jumped from the initial 100 to something close to 140, tion and the results for it inspired Morgenstern and
and from then on it increases more or less smoothly to a Shapiro to proposed the following alternative, which
peak of 233 before declining to a final value of 102. retains the cost function but makes a major change in the
Interestingly, this behavior does not correlate with the neighborhood structure.
movement of the "current cost" presented in the middle
display, which is consistently declining. The reason for Problem-Specific Details. Solutions are now restricted
this lack of correlation lies in our method for choosing a to be partitions Cl, ... . Ck that are legal colorings, i.e.,
"random neighbor." Recall that we pick a random, are such that no edge has both endpoints in the same
nonempty color class and then a random member of that class. (Note that this means that all the sets Ei of bad
class to move. This introduces a bias toward the mem- edges are empty, and so the cost function simplifies to
bers of small classes. Since a move always reduces the just _-Sk= I I Ci 1 2.) In order to ensure that moves pre-
size of the chosen class by one, this means that at high serve the legality of the coloring, Morgenstern and
temperatures, small classes will tend to empty faster than Shapiro go to a much more complex sort of move, one
they are filled. Indeed, had we started our run at a involving Kempe chains.
temperature at which 99% of the moves were accepted Suppose that C and D are disjoint independent sets in
(here we start at roughly 75%), the number of colors a graph G. A Kempe chain for C and D is any
would have been fluctuating between 30 and 40 or so. connected component in the subgraph of G induced by
As the temperature drops, the part of the cost function C U D. Let XiA Y denote the symmetric difference
that penalizes "bad edges" begins to take effect, driving (X- Y)U(Y-X) between two sets X and Y. The
up the number of colors until there are few enough bad key observation is that if H is a Kempe chain for
edges for the component of the cost function that re- disjoint independent sets C and D, then C A H and
wards big color classes to begin driving the number of DAH are themselves disjoint independent sets whose
colors back down. union is C U D. This suggests the following move gen-
The appearance of the first legal coloring is marked eration procedure: Randomly choose a nonempty color
by a vertical line through the display. Although the class C and a vertex v e C, as in the penalty function
current cost (middle display) declines more-or-less approach. Then randomly choose a nonempty color class
monotonically, there is a definite bump in the curve D other than C, and let H be the Kempe chain for C
occurring slightly to the left of that line, a bump that we and D that contains v. Repeat the above procedure until
did not see in the time exposures of Part I for graph one obtains C, v, D and H such that H # C U D (i.e.,
partitioning. Bumps of this sort regularly occur in runs such that H is not "full"), in which case the next
of penalty function annealing. Unlike the other changes partition is obtained by replacing C by C A H and D by
in slope for the curve, they do not reflect a similar D A H in the current one. (Using a full Kempe chain in
change in acceptance rate. (The latter is depicted in the this operation simply changes the names of the two
bottom display.) Annealers with physics backgrounds colors, a meaningless change, which is why we ignore
might suggest that such bumps indicate "phase transi- such moves.)
tion" (Kirkpatrick, Gelatt and Vecchi). Unfortunately, This procedure appears to be substantially more ex-
there is no good explanation of why they arise or whether pensive than the move generation procedure in the penalty
they can be exploited. function approach, and it is. It also makes substantially
The total running time for the time exposure was bigger changes in the solutions, however, and so may be
about 11 hours. This would have been reduced signifi- worth the extra effort. Moreover, it is not exorbitantly
386 / JOHNSON ET AL.
expensive, at least for dense graphs. For such graphs the 350
N
color classes tend to be small, and so the following U 300
M
B
technique can find the Kempe chain H relatively quickly. RE 8 250
Whenever a new vertex u is added to H (including the
F 200
first vertex v), we scan the members of the other color
C
o 150 -
class that are not yet in H and add to H each one that is L
0
adjacent to u. Furthermore, we use two auxiliary tables RS 100
to help us whenever possible avoid the wasted time of 50
constructing full Kempe chains that must be abandoned. -2K
One stores for each pair C, D the time at which they A
C -4K
were last discovered to have a full Kempe chain; the U
other stores for each C the time at which it was last N -EK ,
modified. When C and D are first chosen, we check to TTs K --* ep hi neln
-8K
see whether we have seen a full Kempe chain for them S
T -10K Kempe ChainA nnealing
since the last time they were modified, and if so, aban-
don their further consideration immediately. - 12K
To complete the specification of the problem-specific 100
p
details of the Kempe chain approach, we must say how RE 80
C
instance "size" is determined and how initial solutions E
N
are generated. We perform random sequential colorings T 60
A
for both purposes. When an instance is read, we perform CC 40 -
E
a random sequential coloring, and return the size K I V l, p
T 20 -
where K is the number of colors used in the coloring. A
second random sequential coloring is performed each 0
0 500 1000 1500 2000
time a new initial solution is requested. (NUMBER OF TRIALS)/1,000
Figure 4. Three views of the evolution of an annealing
Kempe Chain Local Optimization. As with the penalty
function approach, the Kempe chain approach can serve run for a G1 00, 0.5 graph under the Kempe
chain annealing scheme. (The temperature
as the basis for a local optimization algorithm. Here,
was reduced by a factor of 0.95 every 20 data
because we are restricted to legal colorings, we limit the
points.)
total allowable number of colors to 150, yielding
150 l V I possible moves. Initial solutions are generated
by random sequential colorings as just specified. Other-
wise, the details are the same as for penalty function function approach). For comparison purposes, we also
local optimization, as outlined previously. The results set the neighborhood size to the same 100,000 value
were also similar (and similarly mediocre). The average used in the previous run, and used the same starting
running time over 100 runs was 33.3 minutes (slightly temperature T = 96. A first observation is that the curves
better than for the penalty function approach but still are significantly more irregular than those for penalty
much slower than RLF), and the median number of function annealing. For the most part, this may be
colors was again 117. attributable to the reduced value of SIZEFACTOR, since
this means that each data point represents 1,000 rather
The Dynamics of Kempe Chain Annealing. Before than 10,000 trials, and so successive data points may be
describing the generic parameters governing the start and correlated more closely. There may, however, be a
finish of a run, let us compare the operation of Kempe different reason for the extreme excursion in the accept-
chain annealing to the penalty function approach. Figure ance rate curve. Such excursions occur in the tails of
4 presents "time exposures" for Kempe chain anneal- other Kempe chain runs, but at random places (unlike
ing, analogous to those in Figure 3 for the penalty the regularly occurring smooth bump in the middle of
function approach. All parameters except SIZEFACTOR the "current cost" curve for penalty function annealing
were given the same values as in the penalty function in Figure 3). Here the excursion seems attributable to the
run; because of the greater expense of the moves here, topography of the Kempe chain solution space, as we
we reduced SIZEFACTOR from 200 to 20 (and the run shall hypothesize in more detail below.
still took 18 hours, as opposed to the 11 for the penalty A second difference between the runs in Figures 3 and
Description:simulated annealing to graph coloring and number partitioning, two problems for which local optimization had not previously ported promising results based on sketchy experiments. Annealers with physics backgrounds.