Hybrid Evolutionary and Annealing Algorithms for Nonlinear Discrete Constrained Optimization 1 Benjamin W. Wah and Yixin Chen Department of Electrical and Computer Engineering and the Coordinated Science Laboratory University of Illinois, Urbana-Champaign Urbana, IL 61801, USA E-mail: {wah, chen}@manip.crhc.uiuc.edu URL: http://manip.crhc.uiuc.edu Abstract This paper presents a procedural framework that unifies various mech- anisms to look for discrete-neighborhood saddle points in solving discrete constrained optimization problems (DCOPs). Our approach is based on the necessary and sufficient condition on local optimality in discrete space, whichshowsthe one-to-onecorrespondencebetweenthediscrete-spacecon- strained local minima of a problem and the saddle points of the corre- sponding Lagrangian function. To look for such saddle points, we study various mechanisms for performing ascents of the Lagrangian function in the original-variablesubspace and descents in the Lagrange-multipliersub- space. Our results show that CSAEA, a combined constrained simulated annealing and evolutionaryalgorithm, performs well when using mutations and crossovers to generate trial points and accepting them based on the Metropolis probability. We apply iterative deepening to determine the op- timal number of generations in CSAEA and show that its performance is robust with respect to changes in population size. To test the performance of the procedures developed, we apply them to solve some continuous and mixed-integer nonlinear programming (NLP) benchmarks and show that they obtain better results than those of existing algorithms. Key words: Evolutionary algorithms, saddle points, iterative deepening, nonlinear discrete constrained optimization, simulated annealing. 1 Introduction Manyengineeringapplicationscanbeformulatedasdiscreteconstrainedop- timizationproblems(DCOPs). Examplesincludeproductionplanning,com- 1InternationalJournalofComputationalIntelligenceandApplications,Im- perialCollegePress,Volume3,Number4,pp. 331-355, 2003. 1 Table 1: List of acronyms used in this paper. CLM continuous neighborhood constrained local minimum c CGM continuous neighborhood constrained global minimum c CEA constrained evolutionary algorithm CEA CEA with iterative deepening ID CSA constrained simulated annealing CSA CSA with iterative deepening ID CSAEA Combined constrained SA and EA algorithm CSAEA CSAEA with iterative deepening ID CLM discrete neighborhood constrained local minimum d CGM discrete neighborhood constrained global minimum d DLM discrete Lagrangianmethod N discrete neighborhood d SP discrete-neighborhoodsaddle point d DCOP discrete constrained optimization problem EA evolutionary algorithm MINLP mixed-integer nonlinear programming problem NLP nonlinear programming problem SA simulated annealing SSA stochastic search algorithm puter integratedmanufacturing, chemical controlprocessing,and structure optimization. In this paper we review the necessary and sufficient condition for lo- cal optimality in discrete constrained optimization, present procedures for solvingthese problems,andapply these proceduresto solvesome nonlinear programming problems (NLPs). The procedures presented are especially useful when an NLP is formulated by non-differentiable functions with dis- crete or mixed-integer variables. Table 1 summarizes the acronyms used. A DCOP can be formulated as follows: min f(x) (1) x subject to g(x)≤0 and h(x)=0, where x = (x ,...,x )T ∈ Dn is a vector of bounded discrete variables. 1 n Here, f(x) is a lower-bounded objective function, g(x) = (g (x),··· , 1 g (x))T is a vector of k inequality constraint functions, and h(x) = k (h (x),··· , h (x))T is a vector of m equality constraint functions. Func- 1 m tionsf(x),g(x),andh(x)arenotnecessarilydifferentiableandcanbelinear or nonlinear, continuous or discrete, and analytic or procedural. Without loss of generality, we consider only minimization problems. Solutions to (1) cannot be characterized in ways similar to those of problems with differentiable functions and continuous variables. In the lat- 2 terclassofproblems,solutionsaredefinedwithrespecttoneighborhoodsof open spheres whose radius approaches zero asymptotically. Such a concept doesnotexistinproblemswithdiscretevariables. Tocharacterizesolutions soughtindiscretespace,wedefine thefollowingconceptsonneighborhoods and constrained solutions in discrete space X ⊆Dn. Definition 1. N (x), the discrete neighborhood [1] of x ∈ X is a finite d user-defined set of points {x′ ∈ X} such that x′ ∈ N (x) ⇐⇒ x ∈ N (x′), d d and that it is possible to reach every x′′ ∈ X from any x in one or more steps through neighboring points. Definition 2. Point x ∈ X is called a discrete-neighborhood constrained local minimum(CLM )ifitsatisfiestwoconditions: a)xisafeasible point d and satisfies constraints g(x)≤0 and h(x)=0; and b) f(x)≤f(x′) for all feasible x′ ∈ N (x). In particular, x is a CLM when x is feasible and all d d its neighboring points in N (x) are infeasible. d Definition 3. Point x ∈ X is called a constrained global minimum in discrete neighborhood (CGM ) iff a) x is feasible; and b) for every feasible d point x′ ∈ X, f(x′) ≥ f(x). It is obvious that a CGM is also a CLM . d d The set of all CGM is X . d opt Therearecorrespondingdefinitionsofconstrainedlocalminima(CLM ) c andconstrainedglobalminima (CGM )incontinuousspace,althoughthey c are not presented formally here. We have shown earlier [17] that the discrete-neighborhood extended saddle-point condition (Section 2.1) is necessary and sufficient for a point to be a CLM . We have also extended simulated annealing (SA) [16] and d greedy search [17] to look for such saddle points SP (Section 2.2). At the d same time, new problem-dependent heuristics have been developed in the evolutionaryalgorithm(EA)communitytohandlenonlinearconstraints[12] (Section 2.3). Up to now, there is no clear understanding on how the vari- ous search mechanisms in SA, EA, and others can be unified effectively for solving a wide range of optimization problems. Our primary goal in this paper is to show that discrete-neighborhood saddle points can be found effectively by integrating the various search mechanisms in SA and EA. We present a framework that employs these mechanisms to probe a search space, where a probe is a neighboring point examined, independent of whether it is accepted or not. We illustrate our approachonCSAEA,acombinedconstrainedsimulatedannealingandevo- lutionary algorithm, that uses mutations and crossovers to generate trial points and that accepts the points generated according to the Metropolis probability. Finally, we use iterative deepening to determine the optimal number of generations in CSAEA and show that its performance is robust with respect to changes in population size. Note that our approach is gen- eral and works for other operators to probe a search space. CSAEAisastochasticsearchalgorithm(SSA)thatgeneratesprobesina random fashion in order to find solutions. Among the various convergence 3 conditions of SSAs to a solution of desired quality, the strongest one is asymptotic convergence with probability one. In this case, the search stops atadesiredsolutioninthelastprobewithprobabilityonewhenthenumber ofprobesapproachesinfinity. Thisconceptisoftheoreticalinterestbecause any algorithmwith asymptotic convergencedoes not imply that it will find better solutions with higher probabilities when terminated in finite time. A weaker convergence condition is the following reachability condition: Definition4. Thereachabilityprobability,P (N),ofanSSAaftermaking R N probesis the probabilitythatthe SSAwillfind anincumbentsolutionof desired quality in any of its N probes. Let p to be the probability that the SSA finds a solution of desired j quality in its jth probe and all probes be independent (a simplifying as- sumption). The reachability probability after making N probes is: N P (N)=1− (1−p ), where N ≥0. (2) R j j=1 Y Asanexample,Figure1aplotsP (N P)whenCSAEA(seeSection3.2) R g was run under various number of generations N and fixed population size g P =3 (where N =N P). The graph shows that P (N P) approaches one g R g as N P is increased. g AlthoughitishardtoestimateP (N)whenatestproblemissolvedby R an SSA, we can always improve its chance of finding a solution by running the same SSA multiple times from random starting points. Given P (N) R foronerunoftheSSAandthatallrunsareindependent,theexpectedtotal number of probes to find a solution of desired quality is: ∞ N P (N)(1−P (N))j−1N ×j = . (3) R R P (N) R j=1 X Figure 1b plots (3) based on P (N P) in Figure 1a. In general, there R g exists Nopt that minimizes (3) because PR(0) = 0, limNα→∞PR(Nα) = 1, N is bounded below by zero, and N → ∞ as N → ∞. The curve PR(N) PR(N) in Figure 1b illustrates this behavior. Based on the existence of N , we present in Section 3.3 some search opt strategiesthatminimize (3)infindinga solutionofdesiredquality. Finally, Section 4 compares the performance of our proposed algorithms. 2 Previous Work In this section, we summarize the theory and algorithms of Lagrange mul- tipliers in discrete space and some related work in EA. 4 1.0 b b b b b 0.8 b 0.6 b P (N P) R g 0.4 b b 0.2 b b b 0 0 1000 2000 3000 4000 5000 N P g a) P (N P) approaches one as N P is increased R g g b 7000 6000 NgP 5000 b PR(NgP) b b 4000 b b b 3000 b b b 0 1000 2000 3000 4000 5000 N P g b) Existence of absolute minimum N P in NgP opt PR(NgP) Figure 1: An example showing theapplication of CSAEA with P =3 to solve a discretized version of G1 [12] (N P ≈2000). opt 2.1 Extended Saddle-Point Condition for Discrete Constrained Optimization Define an equality-constrainedDCOP as follows [17, 18]: min f(x) (4) x subjectto h(x)=0, where x is a vector of bounded discrete variables. A transformed augmented Lagrangian function of (4) is defined as [17]: 1 L (x,λ)=f(x)+λT|h(x)|+ ||h(x)||2, (5) d 2 where λ = (λ ,··· ,λ )T is a vector of Lagrange multipliers, |h(x)| = 1 m (|h (x)|,...,|h (x)|)T, and kh(x)k2= m h2(x). 1 m i=1 i Here,wehaveused|h(x)|totransformh(x)intoanon-negativefunction. P Note that, although the transformed constraints are not differentiable at h(x) = 0, they are not an issue in our theory because we do not require their differentiability. 5 By applying a similar transformation, inequality constraint g (x) ≤ 0 j can be transformed into equivalent equality constraint max(g (x),0) = 0. j Hence, we only consider equality constraints from now on. We define a discrete-neighborhood saddle point SP (x∗,λ∗) with the fol- d lowing property: ∗ ∗ ∗ ∗ L (x ,λ)≤L (x ,λ )≤L (x,λ ), (6) d d d for all x ∈ N (x∗) and all λ ∈ Rm. Note that saddle points defined in (6) d are with respect to discrete neighborhoods and are different from those in continuous space, although they both satisfy the same inequalities in (6). TheconceptofSP isveryimportantindiscretesearchesbecause,start- d ing from them, we can derive the necessary and sufficient condition for CLM that governs the correctness of all such search procedures. This is d stated formally in the following theorem [17]: Theorem 1. Necessary and sufficient extended saddle-point condition (ESPC) on CLM [17]. A point in the discrete search space of (4) is a d CLM iff there exists a finite λ∗ such that (6) is satisfied for for any ex- d tended Lagrange multiplier λ∗∗ >λ∗.2 2.2 Iterative Procedures for finding SP d Theorem1provesthatfinding(x∗,λ∗)thatsatisfies(6)entailsthesearchof some λ∗∗ >λ∗ anda localminimumofL (x,λ∗∗)with respectto x forthis d λ∗∗. This requirement is different from that of conventional theory of La- grange multipliers in continuous space, which requires the search of unique Lagrange multipliers that satisfy a system of nonlinear equations. Since thereis noclosed-formsolutionsto asystemofnonlinearequations,solving them requiresaniterativesearchinthe joint(x,λ) space. Incontrast,find- ing the local minimum of L (x,λ∗∗) with respect to x for λ∗∗ ≥λ∗ can be d carriedoutina decoupledfashion,with the searchofλ∗∗ separatefromthe searchofx∗. Apossibleimplementationistohaveaniterativeprocesswith aninnerloopthatperformsdescentsinL (x,λ),whilekeepingλfixed,until d either a local minimum of L has been found or a predetermined number d of probes have been made. The outer loop then increases the λ’s on those violated constraints. The following are two example algorithms designed using this approach. The first algorithm, the discrete Lagrangian method (DLM) [18], is an iterativelocalsearchthatuses(6)asthestoppingcondition. Itupdatesthe xvariablesinaninnerloopinordertoperformdescentsofL inthe xsub- d space,while occasionallyupdating theλvariablesofunsatisfiedconstraints in an outer loop in order to perform ascents in the λ subspace. Note that 2For two vectors v and w of the same number of elements, v ≤ w means that each element ofv isnot greater thanthecorrespondingelement ofw;andv<w meansthat each element of v is less than the corresponding element of w. 0, when compared to a vector,stands foranullvector. 6 1. procedure CSA (α,N ) α 2. set initial x←(x1,···,xn,λ1,···,λk)T with random x, λ←0; 3. while stoppingcondition (6) is not satisfied do 4. generate x′∈N (x) using G(x,x′); d 5. accept x′ with probability A (x,x′) T 6. reduce temperature byT ←αT; 7. end while 8. end procedure a) CSA called with schedule N and cooling rate α α 1. procedure CSA ID 2. set initial cooling rate α←α0 and Nα ←Nα0; 3. set K ← numberof CSA runsat fixedα; 4. repeat 5. for i←1 to K do call CSA(α,N );end for; α 6. increase cooling schedule N ←ρN ; α←α1/ρ; α α 7. until feasible solution has been found and no bettersolution in two successive increases of N ; α 8. end procedure b) CSA : CSA with iterative deepening ID Figure 2: Constrained simulated annealing algorithm (CSA) and its iterative- deepening extension. the simple algorithm may get stuck in an infeasible region when the objec- tive value is too small or when the Lagrange multipliers and/or constraint violations are too large. In this case, increasing the Lagrange multipliers will further deepen the infeasible region, making it impossible for a local- descent algorithmin the inner loop to escape from this region. One way to address this issue is to scale back λ periodically and to “lower” the barrier in the Lagrangian function in order for local descents in the inner loop to escape from an infeasible region. The secondalgorithmis the constrained simulated annealing (CSA) [16] algorithm shown in Figure 2a. It looks for SP by probabilistic descents d of the Lagrangian function in the x subspace in the inner loop and by probabilisticascentsintheλsubspaceintheouterloop,withanacceptance probability governed by the Metropolis probability. The following theorem shows that the algorithm converges to a saddle point asymptotically with probability one [16]. Theorem 2. AsymptoticconvergenceofCSA[16]. Supposex′ ∈N (x) is d generated from x using generation probability G(x,x′), where G(x,x′) > 0 and G(x,x′)=1. Further,assumealogarithmic coolingschedule x′∈Nd(x) P 7 on T and that x′ is accepted with Metropolis probability A (x,x′): T exp − (Ld(x′)−Ld(x))+ if x′ =(x′,λ) A (x,x′)= T (7) T exp(cid:16)− (Ld(x)−Ld(x′))+(cid:17) if x′ =(x,λ′), T (cid:16) (cid:17) where (a)+ = a if a> 0, and (a)+ = 0 otherwise for all a ∈ R. Then the Markov chain modeling CSA converges asymptotically to a CGM with d probability one. Theorem 2 extends a similar theorem in SA that proves the asymptotic convergenceof SA to a globalminimum when solvingan unconstrainedop- timization problem. By looking for SP in Lagrangian space, Theorem 2 d provesthe asymptotic convergenceof CSA to a CGM when solvinga con- d strained optimization problem. Theorem 2 is of theoretical interest because its convergence is only achieved as time approaches infinity. In practice, when CSA is run using a finite cooling schedule N , it finds a CGM with reachability probability α d P (N ) < 1. To increase its success probability, CSA with N can be run R α α multiple times fromrandomstartingpoints. Assuming independentruns,a CGM canbe found infinite averagetime defined by (3). The minimum of d these average times is achieved when CSA is run using a cooling schedule of N . (Figure 1b illustrates the existence of N for CSAEA.) However, opt opt N is problem-dependent and cannot be determined easily a priori. opt TofindN atruntime withoutusing problem-dependentinformation, opt we have proposed to use iterative deepening [7] in order to determine N opt iteratively. Figure 2b showsthe pseudo code of CSA (CSA with iterative ID deepening). The algorithm first runs CSA using a short duration. If the current run fails to find a CGM , it doubles the duration CSA is allowed d andrepeattherun[14]. ThisstepisrepeateduntilaCGM isfound. Since d the total overhead in iterative deepening is dominated by that of the last run, CSA has a completion time of the same order of magnitude as that ID of the last run that succeeds to find a CGM . Figure 3 illustrates a case in d whichthetotaltimeincurredbyCSA isofthesameorderastheexpected ID overhead at N . opt Since P (N ) < 1 for one run of CSA at N , it is possible that R opt opt CSA fails to find a solution when run with a duration close to N . After opt doubling the cooling schedule, its duration in the next run will overshoot beyond N . To reduce the chance of overshooting and to increase the opt success probability before its duration reaches N , we have proposed to opt run CSA multiple times from random starting points at each duration in CSA . For instance,CSA canbe runK =3 times at eachdurationbefore ID its duration is doubled (Figure 2b). Our results show that this strategy generallyrequirestwicetheaveragecompletiontimewithrespecttomultiple runs of CSA that use a duration of N before it finds a CGM [14]. opt d 8 Nα PR(Nα,Q) fail fail fail fail succe ss Nopt Total time for iterative deepening = 31t Optimal time = 12t t 2t 4t 8t 16t log2(Nα) Figure 3: An application of iterative deepening in CSAID. 2.3 EA for Constrained Optimization Evolutionary algorithm(EA)isageneraloptimizationalgorithmthatincor- porates aspects ofnaturalselectionor survivalofthe fittest. It maintains a population of alternative candidates (usually generated randomly initially) andprobesthe searchspaceusing genetic operators,suchascrossoversand mutations, in order to find better candidates. The original EA was devel- oped for solving unconstrained problems, using a single fitness function to rank candidates. Recently, many variants of EA have been developed for solving constrained NLPs. Most of these methods were based on penalty formulations that use EA to minimize an unconstrained penalty function. Similar to CSA, these methods do not require the differentiability or conti- nuity of functions. One approachis the static-penalty formulation with fixed penalties [2]: m F (x,γ)=f(x)+ γ |h (x)|ρ, where ρ>0. (8) s i i i=1 X Penalty vector γ =(γ ,··· ,γ )T is fixed and chosen to be so large that: 1 m ∗ ∗ F (x ,γ)<F (x,γ) for all x∈X −X and x ∈X . (9) s s opt opt Based on (9), an unconstrained global minimum of (8) over x is a CGM d to (4); hence,it suffices to minimize (8) insolving(4). Since both f(x) and |h (x)| are lowerbounded and x takes finite discrete values, γ alwaysexists i and is finite, thereby ensuring the correctness of the approach. Note that otherformsofpenaltyformulationshavealsobeenstudiedintheliterature. A major issue of static-penalty methods is the difficulty of selecting a suitable γ. If γ is much larger than necessary, then the terrain will be too rugged to be searched effectively. If it is too small, then (9) does not hold and feasible solutions cannot be found by minimizing F (x,γ). s Dynamic-penalty methods [6], onthe other hand, addressthe difficulties of static-penaltymethods by increasingpenalties graduallyin the following 9 fitness function: m F (x)=f(x)+(C t)α |h (x)|β, (10) d i j=1 X wheretisthegenerationnumber,andC,α,andβ areconstants. Incontrast tostatic-penaltymethods,(C t)α,thepenaltyoninfeasiblepoints,isalways increased during evolution. Dynamic-penalty methods do not always guarantee convergence to CLM or CGM . For example, consider a problem with two constraints d d h (x) = 0 and h (x) = 0. Assuming that a search is stuck at an in- 1 2 feasible point x′ and that for all x ∈ N (x′), 0 < |h (x′)| < |h (x)|, d 1 1 |h (x′)|>|h (x)|>0, and |h (x′)|β +|h (x′)|β <|h (x)|β +|h (x)|β, then 2 2 1 2 1 2 the search can never escape from x′ no matter how large (C×t)α grows. One way to ensure the convergence of dynamic-penalty methods is to use a different penalty for each constraint, as in Lagrangian formulation (5). In the previous example, the search can escape from x′ after assigning a much larger penalty to h (x′) than that to h (x′). 2 1 Thereareothervariantsofpenaltymethods,suchasannealingpenalties, adaptive penalties [12] and self-adapting weights [4]. In addition, problem- dependent genetic operators for handling constraints have been studied. These include methods based on preserving feasibility with specialized op- erators, methods searching along boundaries of feasible regions, methods based on decoders, repair of infeasible solutions, co-evolutionary methods, and strategic oscillation. However, most methods require domain-specific knowledge or problem-dependent genetic operators,and have difficulties in findingfeasibleregionsorinmaintainingfeasibilityfornonlinearconstraints. Inshort,thesuccessofpenaltymethodsrelyonchoosingsuitablepenal- ties. Such penalties are difficult to pick a priori in static penalty methods anddo not alwaysworkin dynamic penalty methods whena single penalty is used. Multiple penalties, one for each constraint as in Lagrangianmeth- ods, will be more flexible inguiding a searchout ofinfeasible localminima. SP 3 A General Framework for finding d Inthissection,weproposeageneralproceduralframeworkthatunifiessim- ulated annealing (SA), evolutionary algorithms (EA), and greedy searches in looking for discrete-neighborhood saddle points. Such a framework is important because it combines the best features from various mechanisms in order to arrive at better search algorithms. Figure 4 depicts a unified problem-independent procedural framework to look for SP among a list of candidates maintained. It consists of two d loops: the x loop that performs descents of L (x,λ) in the x subspace, d and the λ loop that performs ascents of L (x,λ) in the λ subspace when d there are unsatisfied constraints for any candidate in the list. The pro- cedure stops when no new probes can be generated in both subspaces to improveL (x,λ). Bychoosingdifferentoptionsorbytuningtheparameters d 10
Description: