Table Of Content

Submodular Maximization by Simulated Annealing Shayan Oveis Gharan∗ Jan Vondra´k† Abstract Background. Submodular functions have been Weconsidertheproblemofmaximizinganonnegative(pos- studied for a long time in the context of combinato- sibly non-monotone) submodular set function with or with- rial optimization. Lovász in his seminal paper [25] dis- outconstraints. Feigeetal. [9]showeda2/5-approximation cussed various properties of submodular functions and for the unconstrained problem and also proved that no ap- noted that they exhibit certain properties reminiscent proximation better than 1/2 is possible in the value oracle of convex functions - namely the fact that a naturally model. Constant-factorapproximationhasbeenalsoknown defined extension of a submodular function to a contin- forsubmodularmaximizationsubjecttoamatroidindepen- uous function (the ”Lovász extension”) is convex. This denceconstraint(afactorof0.309[33])andforsubmodular pointofviewexplainswhysubmodularfunctionscanbe maximizationsubjecttoamatroidbaseconstraint,provided minimized efficiently [16, 11, 28]. that the fractional base packing number ν is bounded away On the other hand, submodular functions also ex- from 1 (a 1/4-approximation assuming that ν ≥2 [33]). hibit properties closer to concavity, for example a func- Inthispaper,weproposeanewalgorithmforsubmod- ular maximization which is based on the idea of simulated tion f(S) = φ(|S|) is submodular if and only if φ is annealing. We prove that this algorithm achieves improved concave. However, the problem of maximizing a sub- approximation for two problems: a 0.41-approximation modular function captures problems such as Max Cut for unconstrained submodular maximization, and a 0.325- [13] and Max k-cover [7] which are NP-hard. Hence, approximation for submodular maximization subject to a we cannot expect to maximize a submodular function matroid independence constraint. exactly; still, the structure of a submodular functions On the hardness side, we show that in the value ora- (in particular, the “concave aspect” of submodularity) cle model it is impossible to achieve a 0.478-approximation makes it possible to achieve non-trivial results for max- forsubmodularmaximizationsubjecttoamatroidindepen- imization problems. Instead of the Lovász extension, denceconstraint,ora0.394-approximationsubjecttoama- the construct which turns out to be useful for maxi- troid base constraint in matroids with two disjoint bases. mization problems is the multilinear extension, intro- Even for the special case of cardinality constraint, we prove duced in [4]. This extension has been used to design it is impossible to achieve a 0.491-approximation. (Previ- an optimal (1−1/e)-approximation for the problem of ously it was conceivable that a 1/2-approximation exists maximizing a monotone submodular function subject for these problems.) It is still an open question whether a to a matroid independence constraint [32, 5], improv- 1/2-approximationispossibleforunconstrainedsubmodular ingthegreedy1/2-approximationofFisher,Nemhauser maximization. and Wolsey [10]. In contrast to the Lovász extension, themultilinearextensioncapturestheconcaveaswellas 1 Introduction convexaspectsofsubmodularity. Anumberofimproved A function f : 2X → R is called submodular if for any results followed for maximizing monotone submodular S,T ⊆ X, f(S ∪ T) + f(S ∩ T) ≤ f(S) + f(T). In functions subject to various constraints [21, 22, 23, 6]. this paper, we consider the problem of maximizing a This paper is concerned with submodular func- nonnegative submodular function. This means, given a tions which are not necessarily monotone. We only submodular function f : 2X → R , find a set S ⊆ X assume that the function is nonnegative.1 The prob- + (possibly under some constraints) maximizing f(S). lem of maximizing a nonnegative submodular function We assume a value oracle access to the submodular hasbeenstudiedintheoperationsresearchcommunity, function; i.e., foragivensetS, thealgorithmcanquery withmanyheuristicsolutionsproposed: data-correcting an oracle to find its value f(S). search methods [14, 15, 20], accelatered greedy algo- ∗Stanford University, Stanford, CA; shayan@stanford.edu; thisworkwasdonepartlywhiletheauthorwasatIBMAlmaden 1Forsubmodularfunctionswithoutanyrestrictions, verifying ResearchCenter,SanJose,CA. whetherthemaximumofthefunctionisgreaterthanzeroornot †IBM Almaden Research Center, San Jose, CA; requiresexponentiallymanyqueries. Thus,thereisnonon-trivial jvondrak@us.ibm.com multiplicativeapproximationforthisproblem. Problem Prior approximation New approximation New hardness Prior hardness max{f(S):S ⊆X} 0.4 0.41 − 0.5 max{f(S):|S|≤k} 0.309 0.325 0.491 0.5 max{f(S):|S|=k} 0.25 − 0.491 0.5 max{f(S):S ∈I} 0.309 0.325 0.478 0.5 max{f(S):S ∈B}∗ 0.25 − 0.394 0.5 Figure 1: Summary of results: f(S) is nonnegative submodular, I denotes independent sets in a matroid, and B bases in a matroid. ∗: in this line (matroid base constraint) we assume the case where the matroid contains two disjoint bases. The hardness results hold in the value oracle model. rithms [27], and polyhedral algorithms [24]. The first out constraints, improving upon the previously known algorithmswithprovableperformaceguaranteesforthis 0.4-approximation [9]. (Although our initial hope was problemweregivenbyFeige,MirrokniandVondrák[9]. that this algorithmmightachieve a 1/2-approximation, They presented several algorithms achieving constant- we found an example where it achieves only a factor of factorapproximation,thebestapproximationfactorbe- 17/35 (cid:39) 0.486; see subsection 3.1.) We also prove that ing2/5(byarandomizedlocalsearchalgorithm). They a similar algorithm achieves a 0.325-approximation for also proved that a better than 1/2 approximation for themaximizationofanonnegativesubmodularfunction submodular maximization would require exponentially subject to a matroid independence constraint (improv- many queries in the value oracle model. This is true ing the previously known factor of 0.309 [33]). evenforsymmetricsubmodularfunctions,inwhichcase On the hardness side, we show the following results a 1/2-approximation is easy to achieve [9]. in the value oracle model: For submodular maximiza- Recently, approximation algorithms have been de- tionunderamatroidbaseconstraint,itisimpossibleto signed for nonnegative submodular maximization sub- achieve a 0.394-approximation even in the special case jecttovariousconstraints[22,23,33,17]. (Submodular whenthematroidcontainstwodisjointbases. Formax- minimizationsubjecttoadditionalconstraintshasbeen imizinganonnegativesubmodularfunctionsubjecttoa also studied [30, 12, 18].) The results most relevant matroid independence constraint, we prove it is impos- to this work are that a nonnegative submodular func- sible to achieve a 0.478-approximation. For the special tions can be maximized subject to a matroid indepen- case of a cardinality constraint (max{f(S):|S|≤k} or dence constraint within a factor of 0.309, while a bet- max{f(S) : |S| = k}), we prove a hardness threshold terthan1/2-approximationisimpossible[33],andthere of 0.491. We remark that only a hardness of (1/2+(cid:15))- is 1(1− 1 −o(1))-approximation subject to a matroid approximation was known for all these problems prior 2 ν base constraint for matroids of fractional base packing to this work. For matroids of fractional base packing number at least ν ∈ [1,2], while a better than (1− 1)- numberν =k/(k−1),k ∈Z, weshowthatsubmodular ν approximation in this setting is impossible [33]. For maximizationsubjecttoamatroidbaseconstraintdoes explicitly represented instances of unconstrained sub- notadmita(1−e−1/k+(cid:15))-approximationforany(cid:15)>0, modularmaximization,Austrin[1]recentlyprovedthat improvingthepreviouslyknownthresholdof1/k+(cid:15)[33]. assumingtheUniqueGamesConjecture,theproblemis These results rely on the notion of a symmetry gap and NP-hard to approximate within a factor of 0.695. the hardness construction of [33]. Ourresults. Inthispaper,weproposeanewalgo- rithm for submodular maximization, using the concept of simulated annealing. The main idea is to perform Organization. The rest of the paper is organized a local search under a certain amount of random noise as follows. In Section 2, we discuss the notions of mul- whichgraduallydecreasestozero. Thishelpsavoidbad tilinearrelaxationandsimulatedannealing, whichform local optima at the beginning, and provides gradually the basis of our algorithms. In Section 3, we describe more and more refined local search towards the end. and analyze our 0.41-approximation for unconstrained Algorithms of this type have been widely employed for submodularmaximization. InSection4,wedescribeour large-scale optimization problems, but they are notori- 0.325-approximationforsubmodularmaximizationsub- ously difficult to analyze. jecttoamatroidindependenceconstraint. InSection5, We prove that a simulated annealing algorithm we present our hardness results based on the notion of achieves at least a 0.41-approximation for the maxi- symmetry gap. We defer some technical lemmas to the mizationofanynonnegativesubmodularfunctionwith- appendix. 2 Preliminaries system more freedom to explore the search space, es- Our algorithm combines the following two concepts. cape from bad local optima, and converge faster to a The first one is multilinear relaxation, which has re- better solution. We pursue a similar strategy here. cently proved to be very useful for optimization prob- We should remark that our algorithm is somewhat lems involving submodular functions (see [4, 32, 5, 21, different from a direct interpretation of simulated an- 22, 33]). The second concept is simulated annealing, nealing. In simulated annealing, the system would which has been used successfully by practitioners deal- typically evolve as a random walk, with sensitivity to ing with difficult optimization problems. Simulated an- the objective function depending on the current tem- nealing provides good results in many practical scenar- perature. Here, we adopt a simplistic interpretation ios, but typically eludes rigorous analysis (with several of temperature as follows. Given a set A ⊂ X and exceptions in the literature: see e.g. [2] for general con- t ∈ [0,1], we define a random set Rt(A) by starting vergence results, [26, 19] for applications to volumes from A and adding/removing each element indepen- estimation and optimization over convex bodies, and dentlywithprobabilityt. Insteadoftheobjectivefunc- [31, 3] for applications to counting problems). tion evaluated on A, we consider the expectation over R (A). This corresponds to the noise operator used in t Multilinearrelaxation. Considerasubmodularfunc- the analysis of boolean functions, which was implicitly tion f : 2X → R+. We define a continuous function alsousedinthe2/5-approximationalgorithmof[9]. Ob- F :[0,1]X →R+ as follows: For x∈[0,1]X, let R⊆X serve that E[f(Rt(A))]=F((1−t)1A+t1A), where F bearandomsetwhichcontainseachelementiindepen- is the multilinear extension of f. The new idea here is dently with probability xi. Then we define thattheparametertplaysarolesimilartotemperature -e.g.,t=1/2meansthatR (A)isuniformlyrandomre- (cid:88) (cid:89) (cid:89) t F(x):=E[f(R)]= f(S) x (1−x ). gardless of A (”infinite temperature” in physics), while i j S⊆X i∈S j∈/S t=0meansthattherearenofluctuationspresentatall (”absolute zero”). This is the unique multilinear polynomial in x1,...,xn We use this interpretation to design an algorithm which coincides with f(S) on the points x ∈ {0,1}X inspiredbysimulatedannealing: Startingfromt=1/2, (we identify such points with subsets S ⊆ X in a we perform local search on A in order to maximize natural way). Instead of the discrete optimization E[f(R (A))]. Note that for t = 1/2 this function does t problem max{f(S) : S ∈ F} where F ⊆ 2X is not depend on A at all, and hence any solution is a the family of feasible sets, we consider a continuous local optimum. Then we start gradually decreasing t, optimization problem max{F(x) : x ∈ P(F)} where whilesimultaneouslyrunningalocalsearchwithrespect P(F) = conv({1S : S ∈ F}) is the polytope associated to E[f(Rt(A))]. Eventually, we reach t = 0 where the withF. Itisknowndueto[4,5,33]thatanyfractional algorithm degenerates to a traditional local search and solution x ∈ P(F) where F are either all subsets, returns an (approximate) local optimum. or independent sets in a matroid, or matroid bases, We emphasize that we maintain the solution gener- can be rounded to an integral solution S ∈ F such ated by previous stages of the algorithm, as opposed to that f(S) ≥ F(x). Our algorithm can be seen as a runningaseparatelocalsearchforeachvalueoft. This new way of approximately solving the relaxed problem is also used in the analysis, whose main point is to esti- max{F(x):x∈P(F)}. mate how the solution improves as a function of t. It is not a coincidence that the approximation provided by Simulated annealing. The idea of simulated an- our algorithm is a (slight) improvement over previous nealing comes from physical processes such as gradual algorithms. Our algorithm can be viewed as a dynamic cooling of molten metals, whose goal is to achieve the process which at each fixed temperature t corresponds state of lowest possible energy. The process starts at a toacertainvariantofthealgorithmfrom[9]. Weprove hightemperatureandgraduallycoolsdowntoa”frozen thattheperformanceofthesimulatedannealingprocess state”. The main idea behind gradual cooling is that isdescribedbyadifferentialequation,whoseinitialcon- while it is natural for a physical system to seek a state ditioncanberelatedtotheperformanceofapreviously of minimum energy, this is true only in a local sense - known algorithm. Hence the fact that an improvement the system does not have any knowledge of the global can be achieved follows from the fact that the differen- structure of the search space. Thus a low-temperature tial equation yields a positive drift at the initial point. systemwouldsimplyfindalocaloptimumandgetstuck Theexactquantitativeimprovementdependsontheso- there, which might be suboptimal. Starting the process lutionofthedifferentialequation,whichwealsopresent at a high temperature means that there is more ran- in this work. domness in the behavior of the system. This gives the Notation. Inthispaper,wedenotevectorsconsistently Theorem 3.1. For any submodular function f :2X → in boldface: for example x,y∈[0,1]n. The coordinates R , Algorithm 1 returns with high probability a solution + of x are denoted by x ,...,x . Subscripts next to a ofvalueatleast0.41·OPT whereOPT =max f(S). 1 n S⊆X boldfacesymbol,suchasx ,x ,denotedifferentvectors. 0 1 InTheorem3.2wealsoshowthatAlgorithm1doesnot In particular, we use the notation x (A) to denote a p achieve any factor better than 17/35 (cid:39) 0.486. First, vector with coordinates x =p for i∈A and x =1−p i i let us give an overview of our approach and compare it for i∈/ A. In addition, we use the following notation to to the analysis of the 2/5-approximation in [9]. The denote the value of certain fractional solutions: algorithm of [9] can be viewed in our framework as C C follows: for a fixed value of p, it performs local search A p p(cid:48) :=F(p1A∩C+p(cid:48)1A\C+q1B∩C+q(cid:48)1B\C). over points of the form xp(A), with respect to element B q q(cid:48) swaps in A, and returns a locally optimal solution. Using the conditions of local optimality, F(x (A)) can p For example, if p = p(cid:48) and q = q(cid:48) =1−p, the diagram be compared to the global optimum. Here, we observe would represent F(xp(A)). Typically, A will be our thefollowingadditionalpropertyofalocaloptimum. If current solution, and C an optimal solution. Later we x (A)isalocaloptimumwithrespecttoelementswaps p omit the symbols A,B,C,C from the diagram. inA,thenslightlyincreasingpcannotdecreasethevalue of F(x (A)). During the local search stage, the value p 3 Unconstrained Submodular Maximization cannot decrease either, so in fact the value of F(x (A)) p Let us describe our algorithm for unconstrained sub- is non-decreasing throughout the algorithm. Moreover, modular maximization. We use a parameter p ∈ [1,1], we can derive bounds on ∂ F(x (A)) depending on 2 ∂p p which is related to the “temperature” discussed above the value of the current solution. Consequently, unless by p=1−t. We also use a fixed discretization param- the current solution is already valuable enough, we eter δ =1/n3. can conclude that an improvement can be achieved by increasingp. Thisleadstoadifferentialequationwhose Algorithm 1SimulatedAnnealingAlgorithmForSub- solution implies Theorem 3.1. modular Maximization Weproceedslowlyandfirstprovethebasicfactthat Input: A submodular function f :2X →R . ifxp(A)isalocaloptimumforafixedp, wecannotlose + by increasing p slightly. This is intuitive, because the Output: A subset A ⊆ X satisfying f(A) ≥ 0.41 · gradient ∇F at x (A) must be pointing away from the max{f(S):S ⊆X}. p center of the cube [0,1]X, or else we could gain by a 1: Define xp(A)=p1A+(1−p)1A. local step. 2: A←∅. 3: for p←1/2; p≤1; p←p+δ do Lemma 3.1. Let p∈[1,1] and suppose x (A) is a local 4: while there exists i ∈ X such that optimum in the sense2that F(x (A∆{i})p) ≤ F(x (A)) p p F(x (A∆{i}))>F(x (A)) do p p for all i. Then 5: A←A∆{i} • ∂F ≥0 if i∈A, and ∂F ≤0 if i∈/ A, 6: end while ∂xi ∂xi 7: end for • ∂ F(x (A))=(cid:80) ∂F −(cid:80) ∂F ≥0. 8: return the best solution among all sets A and A ∂p p i∈A ∂xi i∈/A ∂xi encountered by the algorithm. Proof. We assume that flipping the membership of elementiinAcanonlydecreasethevalueofF(x (A)). p The effect of this local step on x (A) is that the value We remark that this algorithm would not run in p of the i-th coordinate changes from p to 1−p or vice polynomialtime,duetothecomplexityoffindingalocal versa (depending on whether i is in A or not). Since optimum in Step 4-6. This can be fixed by standard F is linear when only one coordinate is being changed, techniques (as in [9, 22, 23, 33]), by stopping when the this implies ∂F ≥ 0 if i ∈ A, and ∂F ≤ 0 if i ∈/ A. By conditionsoflocaloptimalityaresatisfiedwithsufficient ∂xi ∂xi the chain rule, we have accuracy. We also assume that we can evaluate the multilinear extension F, which can be done within a ∂F(xp(A)) =(cid:88)n ∂F d(xp(A))i. certaindesiredaccuracybyrandomsampling. Sincethe ∂p ∂x dp i analysis of the algorithm is already quite technical, we i=1 ignoretheseissuesinthisextendedabstractandassume Since (x (A)) = p if i ∈ A and 1−p otherwise, we p i instead that a true local optimum is found in Step 4-6. get ∂F(xp(A)) = (cid:80) ∂F −(cid:80) ∂F ≥ 0 using the conditio∂nps above. i∈A ∂xi i∈/A ∂xi (cid:3) In the next lemma, we prove a stronger bound on using the definition of x (A) and the fact that ∂F ≥0 itnhepdroevriivnagtiTvehe∂∂oprFem(xp3(.A1.))Twhhiischcawnilblebecoomurbimneadinwtoitohl for iN∈exAt,\wCeaunsde L∂∂xeFim≤mpa0Afo.r1it∈o Best∩imCa.te G(x ∂(Axi)) as p the analysis of [9] to achieve a certain improvement. follows. To simplify notation, we denote x (A) simply p Forinstance,[9]impliesthatifAisalocaloptimumfor byx. Ifwestartfromxandincreasethecoordinatesin p=2/3, we have either f(A)≥ 52OPT, or F(xp(A))≥ A∩C by (1−p) and those in B∩C by p, Lemma A.1 2OPT. Suppose we start our analysis from the point says the value of F will change by 5 p=2/3. (The algorithm does not need to be modified, 1 p p p since at p = 2/3 it finds a local optimum in any case, − 1 1-p 1-p 1-p and this is sufficient for the analysis.) We have either f(A) > 2OPT or F(x (A)) > 2OPT, or else by = F(x+(1−p)1 +p1 )−F(x) 5 p 5 A∩C B∩C Lemma3.2, ∂∂pF(xp(A))isaconstant fraction ofOPT: ≤(1−p) (cid:88) ∂F (cid:12)(cid:12) +p (cid:88) ∂F (cid:12)(cid:12) . (3.1) 1 ∂ (cid:18) 4 1 2(cid:19) 1 ∂xi(cid:12)x ∂xi(cid:12)x i∈A∩C i∈B∩C · F(x (A))≥OPT 1− − × = OPT. 3 ∂p p 5 3 5 15 Similarly, if we decrease the coordinates in A\C by p Therefore, in some δ-sized interval, the value of and those in B\C by 1−p, the value will change by F(x (A)) will increase at a slope proportional to OPT. p p 0 p p ThustheapproximationfactorofAlgorithm1isstrictly − 1-p 0 1-p 1-p greater than 2/5. We remark that we use a different =F(x−p1 −(1−p)1 )−F(x) starting point to achieve the factor of 0.41. A\C B\C The key lemma in our analysis states the following. (cid:88) ∂F (cid:12) (cid:88) ∂F (cid:12) ≤ −(1−p) (cid:12) −p (cid:12) . (3.2) Lemma 3.2. Let OPT = maxS⊆Xf(S), p ∈ [12,1] and i∈B\C ∂xi(cid:12)x i∈A\C ∂xi(cid:12)x suppose x (A) is a local optimum in the sense that p Adding inequalities(3.1), (3.2) and noting the expres- F(x (A∆{i}))≤F(x (A)) for all i. Then p p sion for G(x) above, we obtain: ∂ (1−p)· F(x (A))≥OPT−2F(x (A))−(2p−1)f(A). 1 p p 0 p p ∂p p p + −2 ≤G(x). (3.3) 1 1-p 1-p 0 1-p 1-p Proof. Let C denote an optimal solution, i.e. f(C) = It remains to relate the LHS of equation (3.3) to OPT. Let A denote a local optimum with respect to the value of OPT. We use the ”threshold lemma” F(x (A)), and B =A its complement. In our notation p (see Lemma A.3, and the accompanying example with using diagrams, equation (A.1)): p p F(x (A))=F(p1 +(1−p)1 )= p 0 1 0 1 0 p A B 1-p 1-p ≥(1−p) +(2p−1) 1-p 0 1 0 0 0 The top row is the current solution A, the bottom row 0 0 is its complement B, and the left-hand column is the +(1−p) 0 0 optimum C. We proceed in two steps. Define 1 0 (cid:88) ∂F (cid:88) ∂F ≥(1−p)OPT +(2p−1) , G(x)=(1 −x)·∇F(x)= (1−x ) − x 0 0 C i ∂x i∂x i i i∈C i∈/C 1 p 1 1 1 1 ≥(1−p) +(2p−1) to denote the derivative of F when moving from x 1 1-p 1 1 1 0 towards the actual optimum 1 . By Lemma 3.1, we C 1 0 have +(1−p) 1 0 (cid:32) (cid:33) (1−p)∂F(xp(A)) = (1−p) (cid:88) ∂F −(cid:88) ∂F 1 1 ∂p ∂x ∂x ≥(2p−1) +(1−p)OPT. i∈A i i∈B i 1 0   (cid:88) ∂F (cid:88) ∂F Combining these inequalities with (3.3), we get ≥ (1−p) −  ∂x ∂x i i p p i∈A∩C i∈B\C G(x)≥2(1−p)OPT −2   1-p 1-p (cid:88) ∂F (cid:88) ∂F (cid:20) (cid:21) −p ∂x − ∂x =G(xp(A)) +(2p−1) 1 1 + 1 0 . i∈A\C i i∈B∩C i 1 0 0 0 p p Lemma 3.1, to obtain Recall that F(x) = . Finally, we add 1-p 1-p 0 0 Φ(p+δ)≥F(xp+δ(A)) (2p−1)f(A) = (2p−1) to this inequality, so 1 1 ≥F(x (A))−2δ2n2OPT p thatwecanusesubmodularitytotakeadvantageofthe δ last two terms: + (OPT −2F(x (A))−(2p−1)f(A)). 1−p p p p G(x)+(2p−1)f(A)≥2(1−p)OPT −2 Finally,weusef(A)≤βandF(x (A))=Φ(p)toderive 1-p 1-p p the statement of the lemma. (cid:3) (cid:20) (cid:21) 1 1 1 0 0 0 +(2p−1) + + 1 0 0 0 1 1 Bytakingδ →0,thestatementofLemma3.3leads ≥2(1−p)OPT −2F(x (A))+(2p−1)OPT naturally to the following differential equation: p =OPT −2F(x (A)). p (1−p)Φ(cid:48)(p)≥OPT −2Φ(p)−(2p−1)β. (3.4) (cid:3) We assume here that δ is so small that the difference between the solution of this differential inequality and We have proved that unless the current solution is the actual behavior of our algorithm is negligible. (We already very valuable, there is a certain improvement couldreplaceOPT by(1−(cid:15))OPT,carryouttheanalysis that can be achieved by increasing p. The next lemma and then let (cid:15) → 0; however, we shall spare the reader transforms this statement into an inequality describing of this annoyance.) the evolution of the simulated-annealing algorithm. Our next step is to solve this differential equation, Lemma 3.3. Let A(p) denote the local optimum found given certain initial conditions. Without loss of gener- ality, we assume that OPT =1. by the simulated annealing algorithm at temperature t=1−p, and let Φ(p)=F(x (A(p))) denote its value. p Lemma 3.4. Assume that OPT = 1. Let Φ(p) denote Assume also that for all p, we have f(A(p))≤β. Then the value of the solution at temperature t = 1 − p. Assume that Φ(p ) = v for some p ∈ (1,1), and 1−p(Φ(p+δ)−Φ(p)) ≥ (1−2δn2)OPT −2Φ(p) 0 0 0 2 δ f(A(p))≤β for all p. Then for any p∈(p0,1), −(2p−1)β. 1 Φ(p)≥ (1−β)+2β(1−p) 2 Proof. Herewecombinethepositivedriftobtainedfrom (1−p)2 (cid:18)1 (cid:19) decreasing the temperature (described by Lemma 3.2) − (1−β)+2β(1−p )−v . and from local search (which is certainly nonnegative). (1−p0)2 2 0 0 Consider the local optimum A obtained at temperature Proof. We rewrite Equation (3.4) using the following t = 1 − p. Its value is Φ(p) = F(x (A)). By p trick: decreasing temperature by δ, we obtain a solution x (A),whosevaluecanbeestimatedinthefirstorder p+δ d by the derivative at p (see Lemma A.2 for a precise (1−p)3 ((1−p)−2Φ(p))=(1−p)3(2(1−p)−3Φ(p) dp argument): +(1−p)−2Φ(cid:48)(p)) F(x (A))≥ F(x (A))+δ∂F(xp(A)) =2Φ(p)+(1−p)Φ(cid:48)(p). p+δ p ∂p (cid:12) ∂2F (cid:12) Therefore, Lemma 3.3 states that −δ2n2sup(cid:12) (cid:12). (cid:12)∂x ∂x (cid:12) i j d (1−p)3 (p−2Φ(p)) ≥ OPT −(2p−1)β dp This is followed by another local-search stage, in which we obtain a new local optimum A(cid:48). In this stage, the = 1−β+2β(1−p) value of the objective function cannot decrease, so we have Φ(p+δ) = F(x (A(cid:48))) ≥ F(x (A)). We have which is equivalent to p+δ p+δ sup| ∂2F |≤max |f(S+i+j)−f(S+i)−f(S+ j)+f∂(xSi∂)x|j≤2OPTS.,iW,jealsoestimate ∂∂pF(xp(A))using ddp((1−p)−2Φ(p))≥ (11−−pβ)3 + (1−2βp)2. Foranyp∈(p ,1),thefundamentaltheoremofcalculus sign - however, it does not imply that the algorithm 0 implies that actually achieves a 61/150-approximation, because we have used β =2/5 as our target value. (Also, note that (1−p)−2Φ(p)−(1−p0)−2Φ(p0) 61/150 < 0.41, so this is not the way we achieve our (cid:90) p(cid:18) 1−β 2β (cid:19) main result.) ≥ + dτ (1−τ)3 (1−τ)2 In order to get an approximation guarantee better p0 than 2/5, we need to revisit the analysis of [9] and (cid:20) 1−β 2β (cid:21)p = + compute the approximation factor of a local optimum 2(1−τ)2 1−τ p0 as a function of the temperature t = 1 − p and the 1−β 2β 1−β 2β complementary solution f(A)=β. = + − − . 2(1−p)2 1−p 2(1−p )2 1−p 0 0 Lemma 3.5. Assume OPT = 1. Let q ∈ [1, 1√ ], 3 1+ 2 Multiplying by (1−p)2, we obtain p = 1−q and let A be a local optimum with respect to F(x (A)). Let β =f(A). Then 1 p Φ(p)≥ (1−β)+2β(1−p) 2 1 (1−p)2 (cid:18) 1 (cid:19) F(xp(A))≥ 2(1−q2)−q(1−2q)β. + Φ(p )− (1−β)−2β(1−p ) . (1−p )2 0 2 0 0 Proof. A is a local optimum with respect to the objec- (cid:3) tive function F(xp(A)). We denote xp(A) simply by x. Let C be a global optimum and B = A. As we argued Inordertousethislemma,recallthattheparameter in the proof of Lemma 3.2, we have β is an upper bound on the values of f(A) throughout p p p 0 the algorithm. This means that we can choose β to ≥ be our ”target value”: if f(A) achieves value more q q q q than β at some point, we are done. If f(A) is always and also upper-bounded by β, we can use Lemma 3.4, hopefully p p p p concluding that for some p we must have Φ(p)≥β. ≥ q q 1 q In addition, we need to choose a suitable initial condition. As a first attempt, we can try to plug in We apply Lemma A.4 which states that F(x) ≥ p =1/2andv =1/4asastartingpoint(theuniformly E[f((T (x)∩C)∪(T (x)\C))], where λ ,λ are 0 0 >λ1 >λ2 1 2 random 1/4-approximation provided by [9]). We would independentanduniformlyrandomin[0,1]. Thisyields obtain thefollowing(afterdroppingsometermswhicharenon- negative): 1 Φ(p)≥ (1−β)+2β(1−p)−(1+2β)(1−p)2. 2 p p p p 1 0 1 0 ≥ ≥pq +p(p−q) However, this is not good enough. For example, if q q 1 q 1 0 0 0 we choose β = 2/5 as our target value, we obtain 1 0 1 0 Φ(p)≥ 3 +4(1−p)−9(1−p)2. Itcanbeverifiedthat +q2 1 1 +(p−q)q 0 1 10 5 5 this function stays strictly below 2/5 for all p ∈ [1,1]. 2 (3.5) Sothisdoesnotevenmatchtheperformanceofthe2/5- p p p 0 1 0 1 1 approximation of [9]. ≥ ≥pq +p(p−q) q q q q 1 0 1 0 As a second attempt, we can use the 2/5- approximationitselfasastartingpoint. Theanalysisof 0 0 0 1 +q2 +(p−q)q [9]impliesthatifAisalocaloptimumforp0 =2/3, we 1 0 1 0 haveeitherf(A)≥2/5,orF(x (A))≥2/5. Thismeans p (3.6) that we can use the starting point p = 2/3,v = 2/5 0 0 with a target value of β = 2/5 (effectively ignoring the The first term in each bound is pq · OPT. However, behaviorofthealgorithmforp<2/3). Lemma3.4gives to make use of the remaining terms, we must add some terms on both sides. The terms we add are 3 4 3 Φ(p)≥ + (1−p)− (1−p)2. 1(−p3+p2q+2pq2)f(A)+1(p3+p2q−2pq2−2q3)f(B); 10 5 2 2 2 it can be verified that both coefficients are nonnegative Themaximumofthisfunctionisattainedatp =11/15 for q ∈ [1, 1√ ]. Also, the coefficients are chosen so 0 3 1+ 2 which gives Φ(p ) ≥ 61/150 > 2/5. This is a good that they sum up to p2q−q3 = q(p2−q2) = q(p−q), 0 thecoefficientinfrontofthelasttermineachequation. where the simplification came about by using the ele- Using submodularity, we get mentaryrelationsp(p−q)=p(p−q)(p+q)=p(p2−q2) and q2 =q2(p+q). Submodularity implies 1 1 1 1 0 (−p3+p2q+2pq2) +(p−q)q 2 0 0 0 1 1 0 0 0 1 0 + ≥ =OPT 0 0 1 0 1 0 1 0 0 + (p3+p2q−2pq2−2q3) 2 1 1 and (cid:20) (cid:21) 1 1 1 1 0 1 1 1 0 1 0 = (−p3+p2q+2pq2) + (3.7) + ≥ =OPT, 2 0 0 0 1 1 0 1 1 1 0 (cid:20) (cid:21) + 1(p3+p2q−2pq2−2q3) 0 0 + 1 0 so we get, replacing the respective diagrams by F(x), 2 1 1 0 1 f(A) and f(B), ≥1(−p3+p2q+2pq2) 1 0 2 F(x)+(−p3+p2q+2pq2)f(A)+(p3+p2q−2pq2−2q3)f(B) 2 0 0 ≥ (2pq+p2)OPT =(1−q2)OPT 1 1 0 + 2(p3+p2q−2pq2−2q3) 1 1 . (3.8) again using (p + q)2 = 1. Finally, we assume that f(A)≤β and f(B)≤β, which means Similarly, we get 2F(x) ≥ (1−q2)OPT −(2p2q−2q3)β 1(−p3+p2q+2pq2) 1 1 +(p−q)q 0 1 = (1−q2)OPT −2q(p−q)β 2 0 0 1 0 = (1−q2)OPT −2q(1−2q)β. 1 0 0 + (p3+p2q−2pq2−2q3) (cid:3) 2 1 1 1 (cid:20) 1 1 0 1 (cid:21) Now we can finally prove Theorem 3.1. Consider = (−p3+p2q+2pq2) + Lemma 3.4. Starting from Φ(p ) = v , we obtain the 2 0 0 1 0 0 0 following bound for any p∈(p ,1): (cid:20) (cid:21) 0 1 0 0 0 1 + (p3+p2q−2pq2−2q3) + 1 2 1 1 1 0 Φ(p)≥ (1−β)+2β(1−p) 2 ≥1(−p3+p2q+2pq2) 1 1 (1−p)2 (cid:18)1 (cid:19) 2 1 0 − (1−p )2 2(1−β)+2β(1−p0)−v0 . 0 1 0 0 + (p3+p2q−2pq2−2q3) . (3.9) By optimizing this quadratic function, we obtain that 2 1 0 the maximum is attained at p = β(1−p0)2 1 (1−β)/2+2β(1−p0)−v0 Putting equations (3.5), (3.6) (3.8) and (3.9) all to- and the corresponding bound is gether, we get 1−β β2(1−p )2 Φ(p )≥ + 0 . p p 1 1 1 2 1(1−β)+2β(1−p )−v 2 +(−p3+p2q+2pq2) 2 0 0 q q 0 0 Lemma3.5impliesthatalocaloptimumattemperature +(p3+p2q−2pq2−2q3) 0 0 q =1−p0 ∈[13,1+1√2] has value v0 ≥ 12(1−q2)−q(1− 1 1 2q)β = p − 1p2 −(1−p )(2p −1)β. Therefore, we 0 2 0 0 0 1 0 obtain ≥2pq 1 0 Φ(p )≥ 1−β + β2(1−p0)2 . 1 (cid:20) 1 1 1 0 (cid:21) 1 2 12(1−β)+2β(1−p0)−p0+12p20+(1−p0)(2p0−1)β +(p2−pq+ (−p3+p2q+2pq2)) + √ 2 1 0 0 0 We choose p0 = √2 and solve for a value of β such 1+ 2 +(q2+ 1(p3+p2q−2pq2−2q3))(cid:20) 1 0 + 0 0 (cid:21) that Φ(p1) ≥ β. This value can be found as a solution 2 1 1 1 0 of a quadratic equation and is equal to 1 0 1 (cid:18) √ √ (cid:113) √ (cid:19) =2pq β = 37+22 2+(30 2+14) −5 2+10 . 1 0 401 (cid:20) (cid:21) 1 1 1 1 0 1 0 0 0 It can be verified that β > 0.41. This completes the + p2 + + + . 2 1 0 0 0 1 1 1 0 proof of Theorem 3.1. that even sampling from A,A (or from B,B) with B probabilities p,q does not give value more than 17. 2 4 3 It remains to show that the set A is in fact a local 4 optimum for all p ∈ [1,3]. We just need to show 2 4 8 that all the elements in A have a non-negative partial 1 derivative and the elements in A have a non-positive 3 1 6 7 partial derivative. Let p∈[1,3] and q =1−p, then: 2 4 ∂F =−12q+4p≤0 ∂F =4p−4(1−q)=0 11 3 1 ∂∂xF0 =−3q+p≤0 ∂∂xF1 =11p−5q(1−1)=0 4 ∂x2 ∂x3 ∂F =15p−q−15p+q =0 ∂F =−p+3q ≥0 ∂x4 ∂x5 ∂F =−4q+4q =0 ∂F =−4p+12q ≥0 4 5 4 ∂x6 ∂x7 Therefore, A is a local optimum for p ∈ [1,3]. 2 4 Similarly, it can be shown that B is a local optimum 12 4 for p∈[3,1] which concludes the proof. (cid:3) 4 0 1 A 4 Matroid Independence Constraint Let M=(X,I) be a matroid. We design an algorithm Figure2: Hardinstanceoftheunconstrainedsubmodu- for the case of submodular maximization subject to a lar maximization problem, where Algorithm 1 may get matroid independence constraint, max{f(S) : S ∈ I}, value no more than 17. The bold vertices {4,5,6,7} as follows. The algorithm uses fractional local search represent the optimum set with value OPT =35. to solve the optimization problem max{F(x) : x ∈ P (M)}, where P (M) = P(M)∩[0,t]X is a matroid t t 3.1 Upper bound on the performance of the polytope intersected with a box. This technique, which simulated annealing algorithm. In this section we has been used already in [33], is combined with a show that the simulated annealing algorithm 1 for simulated annealing procedure, where the parameter t unconstrained submodular maximization does not give is gradually being increased from 0 to 1. (The analogy a 1-approximation even on instances of the directed with simulated annealing is less explicit here; in some 2 maximum cut problem. We provide a directed graph sense the system exhibits the most randomness in the G (found by an LP-solver) and a set of local optimums middle of the process, when t = 1/2.) Finally, the for all values of p ∈ [1,1], such the value of f on each fractional solution is rounded using pipage rounding 2 of them or their complement is at most 0.486 of OPT. [4, 33]; we omit this stage from the description of the algorithm. Theorem 3.2. There exists an instance of the uncon- The main difficulty in designing the algorithm is strained submodular maximization problem, such that how to handle the temperature-increasing step. Con- the approximation factor of Algorithm 1 is 17/35 < trary to the unconstrained problem, we cannot just in- 0.486. crement all variables which were previously saturated at x = t, because this might violate the matroid con- i Proof. Let f be the cut function of the directed graph straint. Instead, we find a subset of variables that can G in Figure 2. We show that the set A = {1,3,5,7} beincreased,byreductiontoabipartitematchingprob- is a local optimum for all p ∈ [1,3] and the set lem. We need the following definitions. 2 4 B = {2,4,6,8} is a local optimum for all p ∈ [3,1]. 4 Definition 4.1. Let 0 be an extra element not occur- Moreover, since we have F(x (A)) = F(x (B) = 3/4 3/4 ring in the ground set X, and define formally ∂F =0. a1n6.n2e5a,linitgiaslgpooristshibmle1,ththate sient aArisunchoosfenthaendsimreumlaatiends For x = N1 (cid:80)n(cid:96)=11I(cid:96) and i ∈/ I(cid:96), we define∂bx(cid:96)0(i) = as a local optimum fort p = 1/2 to p = 3/4. Then argmin ∂F . j∈I(cid:96)∪{0}:I(cid:96)−j+i∈I∂xj the local optimum changes to B and remains until the end of the algorithm. If the algorithm follows this path In other words, b (i) is the least valuable element (cid:96) then its approximation ratio is 17/35. This is because which can be exchanged for i in the independent set I . (cid:96) the value of the optimum set f({4,5,6,7}) = 35, Note that such an element must exist due to matroid while max{f(A),f(B),f(A),f(B)} = 17. We remark axioms. We also consider b (i)=0 as an option in case (cid:96) I +i itself is independent. In the following, 0 can be Algorithm 2 Simulated Annealing Algorithm for a (cid:96) thoughtofasaspecial“empty”element,andthepartial Matroid Independence Constraint derivative ∂F isconsideredidenticallyequaltozero. By Input: A submodular function f : 2X → R and a ∂x0 + definition, we get the following statement. matroid M=(X,I). Lemma 4.1. For b (i) defined as above, we have ∂F − Output: A solution x ∈ P(M) such that F(x) ≥ (cid:96) (cid:16) (cid:17) ∂xi 0.325·max{f(S):S ∈I}. ∂x∂bF(cid:96)(i) =maxj∈I(cid:96)∪{0}:I(cid:96)−j+i∈I ∂∂xFi − ∂∂xFj . 1: Let x←0, N ←n4 and δ ←1/N. The following definition is important for the de- 2: Define Pt(M)=P(M)∩[0,t]X. scription of our algorithm. 3: Maintainarepresentationofx= N1 (cid:80)N(cid:96)=11I(cid:96) where I ∈I. Definition 4.2. For x = 1 (cid:80)n 1 , let A = {i : (cid:96) N (cid:96)=1 I(cid:96) 4: for t←0; t≤1; t←t+δ do x = t}. We define a bipartite “fractional exchange i 5: while there is v ∈ {±ei,ei−ej : i,j ∈ X} such graph” G on A ∪ [N] as follows: We have an edge x that x+δv∈P (M) and F(x+δv)>F(x) do t (i,(cid:96))∈E, whenever i∈/ I . We define its weight as (cid:96) 6: x:=x+δv {Local search} ∂F ∂F (cid:18)∂F ∂F (cid:19) 7: end while w = − =max − . i(cid:96) ∂xi ∂xb(cid:96)(i) j∈I(cid:96)∪{0}:I(cid:96)−j+i∈I ∂xi ∂xj 8: f{oir:exach≤ofλ}thdeon{+Co1mppolsesimbleensteatrsyTs≤oλl(uxt)io=n i We remark that the vertices of the bipartite ex- check} change graph are not elements of X on both sides, but 9: Find a local optimum B ⊆ T≤λ(x),B ∈ I elements on one side and independent sets on the other trying to maximize f(B). side. 10: Remember1B forthelargestf(B)asapossible Algorithm 2 is our complete algorithm for matroid candidate for the output of the algorithm. independence constraints. As a subroutine in Step 9, 11: end for we use the discrete local search algorithm of [22]. The 12: Form the fractional exchange graph (see Defini- returned solution is a point in P(M); finally, we obtain tion 4.2) and find a max-weight matching M. anintegersolutionusingthepipageroundingtechnique 13: Replace I(cid:96) by I(cid:96)−b(cid:96)(i)+i for each edge (i,(cid:96)) ∈ [5, 33]. We omit this from the description of the M, and update the point x = 1 (cid:80)N 1 . algorithm. N (cid:96)=1 I(cid:96) {Temperature relaxation: each coordinate in- Theorem 4.1. For any submodular function f :2X → creases by at most δ = 1/N and hence x ∈ R and matroid M=(X,I), Algorithm 2 returns with P (M).} + t+δ high probability a solution of value at least 0.325·OPT 14: end for where OPT =maxS∈If(S). 15: return the best encountered solution x∈P(M). Let us point out some differences between the analysisofthisalgorithmandtheoneforunconstrained maximization(Algorithm1). Thebasicideaisthesame: We proceed in two steps, again using as an inter- we obtain certain conditions for partial derivatives at mediate bound the notion of derivative of F on the line the point of a local optimum. These conditions help us towards the optimum: G(x) = (1 −x)·∇F(x). The eithertoconcludethatthelocaloptimumalreadyhasa C plan is to relate the actual gain of the algorithm in the goodvalue,ortoprovethatbyrelaxingthetemperature “Temperature relaxation” phase (Steps 12-13) to G(x), parameter we gain a certain improvement. We will and then to argue that G(x) can be compared to the provethefollowinglemmawhichisanalogoustoLemma RHS of (4.10). The second part relies on the submod- 3.3. ularity of the objective function and is quite similar to Lemma 4.2. Let x(t) denote the local optimum found the second part of Lemma 3.2 (although slightly more by Algorithm 2 at temperature t < 1−1/n right after involved). the “Local search” phase, and let Φ(t)=F(x(t)) denote The heart of the proof is to show that by relax- the value of this local optimum. Also assume that ing the temperature we gain an improvement at least the solution found in “Complementary solution check” δ G(x). As the algorithm suggests, the improvement 1−t phase of the algorithm (Steps 8-10) is always at most β. in this step is related to the weight of the matching ob- Then the function Φ(t) satisfies tained in Step 12 of the algorithm. Thus the main goal 1−t is to prove that there exists a matching of weight at (Φ(t+δ)−Φ(t))≥(1−2δn3)OPT −2Φ(t)−2βt. least 1 G(x). We prove this by a combinatorial argu- δ 1−t (4.10) mentusingthelocaloptimalityofthecurrentfractional

Description:

Submodular Maximization by Simulated Annealing. Shayan Oveis Gharan∗. Jan Vondrák†. Abstract. We consider the problem of maximizing a nonnegative