ProjectedPrimal-DualGradientFlowofAugmented LagrangianwithApplicationtoDistributedMaximizationof (cid:63) theAlgebraicConnectivityofaNetwork HanZhanga,JieqiangWeib,PengYic,XiaomingHua, 7 1 aDepartmentofMathematics,KTHRoyalInstituteofTechnology,SE-10044,Stockholm,Sweden 0 2 bDepartmentofAutomaticControl,KTHRoyalInstituteofTechnology,SE-10044,Stockholm,Sweden n cDepartmentofElectricalandComputerEngineering,UniversityofToronto,Canada a J 5 2 Abstract ] C In this paper, a projected primal-dual gradient flow of augmented Lagrangian is presented to solve convex optimization O problems that are not necessarily strictly convex. The optimization variables are restricted by a simple convex set as well as equality constraints. Instead of choosing to project the negative gradient flow of the objective function onto the tangent . h coneoftheentirefeasiblesetoftheconvexoptimizationproblem,wechoosetousetheprojectedprimal-dualgradientflowof t augmentedLagrangiansothatthedynamicsonlyprojectsthenegativegradientflowoftheprimalvariablesontothetangent a coneofthesimpleconvexset.Hencethecomplexityofcalculatingthetangentconeisdecreased.Weshowthattheprojected m dynamical system converges to one of the optimal solutions. We further explain the connection and difference between the [ continuous-time algorithm and proximal gradient method. Moreover, the problem of distributedly maximizing the algebraic connectivity of an undirected graph by optimizing the edge weights (communication gains for each channel) of each nodes is 1 considered. The original semi-definite programming (SDP) problem is relaxed into a nonlinear programming (NP) problem v that will be solved by the aforementioned projected dynamical system. Numerical examples show the convergence of the 5 aforementionedalgorithmtooneoftheoptimalsolutions. 7 4 7 Keywords: ProjectedDynamicalSystems,semi-definiteprogramming,distributedoptimization 0 . 1 0 1 Introduction many powerful mathematical tools to analyze the con- 7 vergenceofthealgorithms.Hencestudyingcontinuous- 1 : Recently,duetothefactofrisinginterestindistributed time algorithms provides us with more insights about v the theory behind some iterative algorithms as well as i optimization, there has been a number of advances in X the field of optimization that involves using continuous moreideasofdesigningthem. r time dynamical systems to solve smooth convex prob- a lems, such as (Yi et al., 2016), (Gharesifard & Cort´es, When solving a convex minimization problem with 2014),(Linetal.,2016),etc,aswellasnonsmoothconvex strong duality, it is well-known that the optimal so- problems(Qiuetal.,2016).Thoughitseemstobequite lution is the saddle point of the Lagrangian. Hence differentfromtheclassicaliterativeoptimizationmeth- beingdifferentfromthedynamicsderivedfromKarush- ods, it has been found that classical methods such as Kuhn-Tucker (KKT) conditions in (Yang et al.) and steepestdescentandproximalpointmethodarenothing (Liu et al., 2016), it is natural to consider the gradi- butforwardEulerandbackwardEulerdiscretizationof ent flow of Lagrangians and the primal variable follows thesamegradientflowdynamics(Parikh&Boyd,2014). the negative gradient flow while the dual variable fol- On the other hand, continuous-time algorithms enable lows the gradient flow. Gradient flow of Lagrangians is first studied by Arrow et al. (1959), Kose (1956) and (cid:63) ThisworkissupportedbyChinaScholarshipCouncil. has been revisited by Feijer & Paganini (2010). (Feijer Emailaddresses: [email protected](HanZhang), & Paganini, 2010) studies the case of strictly convex [email protected](JieqiangWei),[email protected] problemsandprovidesmethodologiestotransformnon- (PengYi),[email protected](XiaomingHu). strictly convex problems to strictly convex problems to PreprintsubmittedtoAutomatica 27January2017 fit the framework. The convergence is shown by em- projected primal-dual gradient flow of augmented La- ploying the invariance principle for hybrid automata. grangian.Moreprecisely,thedynamicsonlyprojectsthe (Cherukurietal.,2016)studiesthesamestrictlyconvex negative gradient flow of the primal variable onto the problemfromtheangleofprojecteddynamicalsystems tangent cone of the simple convex set. Hence the com- andisabletoshowtheconvergencebyaLasalle-likein- plexityofcalculatingthetangentconeisdecreased.We variant principle for Carath´eodory solutions. Instead of showthattheprojecteddynamicalsystemconvergesto consideringdiscontinuousdynamics,Durr&Ebenbauer oneoftheoptimalsolutions.Wefindthatafterbeingdis- (2011) propose a smooth vector field for seeking the cretizedbyforwardEulerdiscretization,thecontinuous- saddle points of strictly convex problems. Wang & Elia time algorithm actually has a strong connection with (2011) consider a strictly convex problem with equality iterative proximal gradient method but with iteration constraintsandwithinequalityconstraintsrespectively. also on the dual variables. Second, the problem of dis- Gradient flow is also used therein, however, it is worth tributedly maximizing the algebraic connectivity of an noticingthattheirproblemisstillstrictlyconvex.When undirected graph by adjusting the edge weights (com- they consider the problem with inequality constraints, munication gain for each channel) of each nodes is con- logarithmic barrier function is used. Though consid- sidered.Itisworthnoticingthattheproblemmotivates ering nonsmooth problems, Zeng et al. (2016) use the from optimizing a parameter of a graph and the com- projectedgradientflowofaugmentedLagrangianwhose munication network of the distributed optimization al- equalityconstraintisthevariableconsensusconstraint, gorithm is actually the physical network itself; one can and can be viewed as a special case of our problem. not “design” the communication network according to Instead of using the continuous-time gradient flow, an thestructureoftheproblemorthealgorithm.Wesolve iterative distributed augmented Lagrangian method is theoriginalproblem,whichisanSDP,byrelaxingitinto developedin(Chatzipanagiotisetal.,2015). anNPproblem.TheNPproblemisnotstrictlyconvex, henceweadapttheprojectedgradientflowmethodpro- Since first introduced by Fiedler (1973), the algebraic posedinthisworktosolvetheaforementionedNPprob- connectivity of a graph, namely, the second smallest lem.Numericalexamplesshowthattheaforementioned eigenvalue of the Laplacian matrix has been playing a algorithmconvergestooneoftheoptimalsolutions. very important role in the networked systems. Alge- braicconnectivityisboundedbothaboveandbelowby Therestofthispaperisorganizedasfollows.InSection Cheeger’sconstantandhenceisameasureoftherobust- 2, some preliminaries and notations are presented. In nessofagraph(Mesbahi&Egerstedt,2010).Italsoin- Section3,weintroducetheproblemformulationaswell dicates the convergence rate of the variables consensus astheprojectedgradientflowofaugmentedLagrangian. inmulti-agentsystems(Olfati-Saberetal.,2007).Hence Theconvergenceanalysisoftheprojectedgradientflow maximizing the algebraic connectivity of a graph is an ispresentedinSection4andtheconnectionandthedif- importantandfundamentalproblemandhasbeencon- ferencebetweentheaforementionedcontinuous-timeal- sidered intensively in the past. Ghosh & Boyd (2006) gorithmanditerativeproximalgradientmethodarealso consider the problem of optimizing the way of adding a described. In Section 5, we apply the projected dynam- fixed number of edges to the graph to maximizing the ics to distributedly solve the maximization problem of algebraic connectivity. Since the problem is NP-hard, a the algebraic connectivity of a network. Numerical ex- greedy heuristic is used to solve the problem. (G¨oring amplesarepresented.ThepaperisconcludedinSection etal.,2008)considertheproblemofmaximizingtheal- 6. gebraicconnectivitybyadjustingtheedgeweight.They explain connections between the problem and the em- 2 PreliminariesandNotations bedding of nodes in n dimensional space. From the an- gle of applications, the edge weights of a graph can be treatedasthegainsofcommunicationchannelsbetween In this section, we introduce the notations we will use theagentsandwewillconsiderasimilarprobleminthis in the rest of the paper. Some basic knowledge about paperandsolveitdistributedly. projecteddynamicalsystems,convexanalysisaswellas SDPwillalsoberevisited. The main contribution of this paper is two-fold. First, we construct a projected primal-dual gradient flow of We denote 1 = 11T as an N dimensional all-one ma- augmented Lagrangian that solves convex optimization trix, where 1 is an N dimensional all-one vector. The problemswhicharenotnecessarilystrictlyconvex.The elementlocatedontheithrowandandjthcolumnofa optimization variables are restricted by a simple con- matrixAisdenotedas[A] .MatrixA −A ispositive ij 1 2 vex set as well as equality constraints. Using the gra- semi-definite, then it will be denoted as A (cid:23) A . We 1 2 dient flow of augmented Lagrangian enables us to han- use(cid:107)·(cid:107)todenote2-normofvectors.|S|denotesthecar- dle nonstrictly convex problems. In addition, instead dinalityofsetS.Andanynotationwiththesuperscript of projecting the negative gradient flow of the objec- ∗denotestheoptimalsolutiontothecorrespondingop- tivefunctionontothetangentconeoftheentirefeasible timization problem. tr(·) denotes the trace of a matrix. set of the convex optimization problem, we employ the (cid:104)·,·(cid:105) denotes the inner-product in euclidean space and 2 2 (cid:104)A ,A (cid:105) =tr(A A )denotestheinner-productinSn, TakingthegradientoftheLagrangianfunction,together 1 2 M 1 2 whichistheHilbertspaceofn×nsymmetricmatrix. withtheconstraintandthecomplementarityslackcon- dition,wegetitsKKTconditions Assume K ⊂ Rn is a closed and convex set, the pro- m jection of a point x to the set K is defined as P (x) = (cid:88) K (cid:104)A ,Φ(cid:105) =−c , x A (cid:22)S, argmin (cid:107)x−y(cid:107). For x ∈ K, v ∈ Rn, the projection i M i i i y∈K ofthevectorv atxwithrespecttoK isdefinedas(see i=1 m (Nagurney&Zhang,2012),(Brogliatoetal.,2006)) (cid:88) Φ(cid:23)0, (cid:104) x A −S,Φ(cid:105) =0. i i M i=1 P (x+δv)−x Π (x,v)= lim K =P (v), (1) K δ→0 δ TK(x) 3 Problem Formulation and Projected Gradi- where T (x) denotes the tangent cone of K at x. The entFlow K interior, the boundary and the closure of K is denoted asK◦,∂K andcl(K)respectively. In this section, we consider the following optimization problemdefinedonRn: InwardnormalsofK atxisdefinedas minimize f(x) x∈K (4) (cid:8) (cid:9) n(x)= γ|(cid:107)γ(cid:107)=1,(cid:104)γ,x−y(cid:105)2 ≤0,∀y ∈K , subjectto Ax−b=0, andΠ (x,v)fulfillsthefollowinglemma: where f : Rn (cid:55)→ R and A ∈ Rm×n. K is a simple K convex set, namely, it is generated by simple inequality Lemma1(Nagurney&Zhang(2012)) If x ∈ K◦, constraints such as interval constraints and norm con- straints. We assume calculating the projection on the then Π (x,v) = v; if x ∈ ∂K, then Π (x,v) = K K v+β(x)n∗(x),wheren∗(x)=argmax (cid:104)v,−n(cid:105)and tangentconeofKiscomputationallycheap.f(x)iscon- n∈n(x) vex function but not necessarily strictly convex. It is β(x)=max{0,(cid:104)v,−n∗(x)(cid:105)}. also assumed that the gradient of f(x) is locally Lips- chitzcontinuousandtheSlater’sconditionholdsfor(4). Let F be a vector field such that F : K (cid:55)→ Rn, the Hencestrongdualityholdsfor(4). projecteddynamicalsystemisgivenintheform The Lagrangian L : K ×Rm (cid:55)→ R for the problem (4) x˙ =Π (x,F(x)). (2) isgivenby K Note (1), the right hand side of (2) can be discontin- L(x,v)=f(x)+vT(Ax−b), (5) uous on the ∂K. Hence given an initial value x ∈ K, thesystem(2)doesnotnecessarilyhaveaclassic0alsolu- wherev ∈Rm istheLagrangianmultiplierofconstraint tion. However, if F(x) is Lipschitz continuous, then (2) Ax − b = 0. Since strong duality holds for (4), then has a unique Carath´eodory solution that continuously (x∗,v∗) is a saddle point of L(x,v) if and only if x∗ dependsontheinitialvalue(Nagurney&Zhang,2012). is an optimal solution to (4) and v∗ is optimal solution to its dual problem. The augmented Lagrangian L : A K×Rm (cid:55)→Rfor(4)isgivenby NowweintroducesomebasicknowledgeaboutSDP.An SDPprobleminstandardformcanbeexpressedas ρ L (x,v)=f(x)+vT(Ax−b)+ (Ax−b)T(Ax−b), (6) A 2 m (cid:88) maximize cixi where ρ > 0 is the damping parameter that suppresses i=1 theoscillationofx.Withoutlossofgenerality,wechoose (cid:88)m ρ=1here. subjectedto x A (cid:22)S, i i i=1 We propose to find the saddle point of (6) via the gra- dientflowdynamicsrestrictedonthesetK,i.e., where A ,S ∈ SN. Introducing the Lagrangian multi- i plierΦ(cid:23)0,theLagrangianfunctionof(3)canbewrit- x˙ =Π (x,−∇f(x)−ATv−AT(Ax−b)) K tenas ∂L (x,v) =Π (x,− A ), (7a) K ∂x m m L(x,Φ)=−(cid:88)cixi+tr(Φ((cid:88)xiAi−S)) (3) v˙ =Ax−b= ∂LA(x,v). (7b) ∂v i=1 i=1 3 Note that it is assumed that ∇f(x) is locally Lipschitz map such that L V(x) ≤ 0, where L V(x) = ∂Vf(x) f f ∂x continuous,thereforethereisauniqueCarath´eodoryso- denotes the Lie derivative along the vector field f(x). lutionforthedynamics,hencethesystemiswell-defined. Then any solution of x˙ = f(x) starting at S converges tothelargestinvariantsetincl({x∈S|L V(x)=0}). f Remark2 Indeed, one can choose to project the nega- Theorem6 Given an initial value (x(0),v(0)), where tivegradientflowoftheobjectivefunctionontotheentire x(0)∈K,theprojecteddynamicalsystem (7)asymptot- feasibleset.However,thisactuallyincreasesthecompu- ically converges to some (x(cid:48),v(cid:48)), where x(cid:48) is one of the tational complexity since it adds more constraints while optimalsolutionsof (4). computing the projection. Recall that we assume that calculating the projection to T (x) is computationally K cheap,therefore,primal-dualgradientflowisused. (cid:50) PROOF. Suppose (x∗,v∗) is a saddle point of the La- grangian (5), namely, x∗ is the optimal solution of (4) andv∗ istheoptimalsolutionofitsdualproblem.Con- 4 ConvergenceAnalysis structthefollowingLyapunovfunction In this section, we will analyse the convergence for (7) 1 and start with the analysis of the equilibrium point of d(t)= ((cid:107)x−x∗(cid:107)2+(cid:107)v−v∗(cid:107)2). (9) 2 (7). Notethatd(t)iscontinuouslydifferentiableanddenote Proposition3 (x∗,v∗)istheoptimalsolutionto (4)if therighthandsideofthedynamics(7)asvectorfieldF. andonlyifitisanequilibriumof (7). By the definition of saddle points, it holds for (x∗,v∗) thatL(x∗,v)≤L(x∗,v∗)≤L(x,v∗).TheLiederiva- tiveofd(t)alongthevectorfieldF isgivenby PROOF. Since strong duality holds for (4), the opti- mality conditions become necessary and sufficient con- L d(x,v) ditions. The optimality condition for (4) is given by F (Ruszczyn´ski,2006) =(x−x∗)TΠK(x,−∇f(x)−ATv−AT(Ax−b)) +(v−v∗)T(Ax−b) −∇f(x∗)−ATv∗ ∈N (x∗), Ax∗−b=0, K ByLemma1,itholdsthat whichisequivalentto Π (x,−∇f(x)−ATv−AT(Ax−b))= −∇f(x∗)−ATv∗+AT(Ax∗−b)∈N (x∗), K K −∇f(x)−ATv−AT(Ax−b)+β(x)n∗(x), whereN (x∗)denotesthenormalconeofK atx∗.This impliesΠK (x,−∇f(x)−ATv−AT(Ax−b))=0,which where β(x) ≥ 0 and (cid:104)n∗(x),x−y(cid:105) ≤ 0, ∀y ∈ K. Since isequivaleKnttobeinganequilibriumpointof (7). (cid:50) Ax∗−b=0,henceitfollowsthat L d(x,v) F Before we show the asymptotic stability of the equilib- =(x−x∗)T(−∇f(x)−ATv−AT(Ax−b) riums, we first introduce the definition of omega-limit +β(x)n∗(x))+(v−v∗)T(Ax−b) set and a Lasalle-like invariance principle originated from (Bacciotti & Ceragioli, 2006) and simplified in ≤(x−x∗)T(−∇f(x)−ATv−AT(Ax−b)) (Cherukurietal.,2015). +(v−v∗)T(Ax−b) ∂L(x,v) Definition4 Assumeγisasolutiontox˙ =f(x),where =−(x−x∗)T −(Ax−b)T(Ax−b) f :Rn (cid:55)→Rn.Theomega-limitsetisdefinedas ∂x ∂L(x,v) +(v−v∗)T . Ω(γ)={y ∈Rn|∃{t }∞ ⊂[0,∞), lim t =∞, ∂v k k=1 k k→∞ (8) lim γ(t )=y} Since L(x,v) is convex with respect to x and concave k k→∞ withrespecttov,itfollowsfromthefirstorderproperty ofconvexandconcavefunction(Boyd&Vandenberghe, Lemma5 (LasalleinvarianceprincipleforCarath´eodory 2004)that solutions) Let S ∈ Rn be compact and invariant. As- sume for each x ∈S, there exists a unique solution for 0 L d(x,v) x˙ = f(x) starting at x and its omega-limit set is in- F 0 variant.LetV :Rn (cid:55)→Rbeacontinuouslydifferentiable ≤L(x∗,v)−L(x,v∗)−(Ax−b)T(Ax−b). 4 Since L(x∗,v) ≤ L(x,v∗), we have L d(x,v) ≤ ConsidertheindicatorfunctiononK F −(Ax − b)T(Ax − b) ≤ 0. Since d(t) is differen- tiable almost everywhere with respect to t and (cid:26)0, ifx∈K, ddtd(t) = LFd(x,v), d(t) is non-increasing with respect IK(x)= +∞, otherwise, totime.Notethatd(t)isradiallyunbounded,hencethe set Sˆ= {(x,v) ∈ Rn×Rm|d(x,v) ≤ d(x(0),v(0))} is a whose proximal mapping is prox (x) = P (x) = compactinvariantsetforthesystem(7).Recallthatitis argmin (cid:107)u−x(cid:107)2. tIK K u∈K assumedthatf(x)haslocallyLipschitzcontinuousgra- dient,hencetheuniquenessoftheCarath´eodorysolution Denote ∂L (xk,vk)/∂x = yk, by the definition of the A to (7) can be guaranteed. The invariance of the omega- projection (1),wecandiscretizetheprojecteddynamical limit set can be proved by using the same methodology system (7)usingforwardEulerdiscretization asLemma4.1in(Khalil&Grizzle,2002)whichisbased on the continuity and the uniqueness of the solution. (xk+1−xk)/∆t=(P (cid:0)xk−∆tyk)(cid:1)−xk)/∆t, K NowbyLemma5,thetrajectoryof (7)convergestothe largestinvariantsetincl({(x,v)∈Sˆ|LFd(x,v)=0}). andthisgives (x(cid:48)(t),v(cid:48)(t))∈{x∈Sˆ|LFd(x,v)=0}meansv˙(cid:48) =Ax(cid:48)− xk+1 =PK(xk−∆tyk)=prox∆tIK(xk−∆tyk), b=0and (11a) vk+1 =vk+∆t(Axk−b). (11b) L(x∗,v)=L(x,v∗)⇔f(x∗)=f(x(cid:48))+v∗T(Ax(cid:48)−b) ⇔f(x∗)=f(x(cid:48)). Note that if one choose to project on the tangent cone (cid:84) of the entire feasible set Ω = K {x|Ax = b}, then the Sincex∗ isanoptimalsolutionto(4),x(cid:48) isalsoanopti- continuous-time dynamics will be x˙ = Π (x,−∇f(x)). Ω malsolutionto(4).Sincethesetofoptimalsolutionsis OnecandiscretizethedynamicsusingforwardEuleras closedandv(cid:48)isaconstantin{(x,v)∈Sˆ|L d(x,v)=0}, {(x,v)∈Sˆ|L d(x,v)=0}isalsoclosed.F xk+1 =PΩ(xk−∆t∇f(xk)) F =prox (xk−∆t∇f(xk)), ∆tIΩ Denote M as the largest invariant set in {(x,v) ∈ Sˆ|L d(x,v) = 0}. What remains to show is that there whichisexactlyproximalgradientmethod.Duetothefact F that the gradient flow only projects on K and the intro- isnolimitcyclewithinM,namely,ifatrajectorystarts duction of the dual variable v, the iterative method (11) in M, it will remain constant for all times. Assume (xˆ(0),v¯)∈M⊂{(x,v)∈Sˆ|L d(x,v)=0}, where v¯is isdifferentfromclassicalproximalgradientmethod.The F designof∆tsothat (11)convergestooneoftheoptimal a constant, then xˆ(0) is also an optimal solution to (4). solutions of (4) will be studied in the future. Moreover, Hence there exist some vˆ such that (xˆ(0),vˆ) is another apart from forward Euler discretization, one can choose saddlepointof (5).ThentheLyapunovfunctioncanbe other methods such as Heun Method and Runge-Kutta constructedsimilarlyas Methods(Nagurney&Zhang,2012),whichwillresultin differentiterativeoptimizationmethods. (cid:50) 1 dˆ(t)= ((cid:107)x−xˆ(0)(cid:107)2+(cid:107)v−vˆ(cid:107)2) (10) 2 Example8 Considerthefollowingsimpleproblem: We have shown that for any arbitrarily chosen saddle minimize x+3y point(x∗,v∗),d(t)isnon-increasingglobally;suchstate- x,y∈R ment also holds for dˆ(t). Furthermore, the trajectory subjectto x≥2, starts at (xˆ(0),v¯) can not be time-variant, namely, xˆ(t) x=y, remains constant for all t > 0. Since dˆ(0) = 1(cid:107)v¯−vˆ(cid:107)2, 2 it contradicts the fact that dˆ(t) is non-increasing if x(t) and apply the projected system dynamics (7) to it. The istime-variant. evolutionofx(t),y(t)andv(t)isillustratedinFig.1.The v(0)andy(0)arearbitrarilychosenandx(0)isarbitrarily Fromtheanalysisabove,thetheoremfollows. (cid:50) chosen among the real numbers which is greater than 2. (cid:50) Remark7 Wenowdescribetheconnectionbetweenthe continuous-timealgorithm (7)andtheproximalgradient Example9 Nowweconsideramorecomplicatedprob- method(Beck&Teboulle,2010).Asisknown,theproxi- lemin(Chen,2005): maloperatorofaconvexfunctionh:Rn (cid:55)→Risgivenby minimize cTx 1 x∈R5+ prox (x)=argmin{h(u)+ (cid:107)u−x(cid:107)2}. th u 2t subjectto Ax=b, 5 vertex i and j are connected; w ∈ W, (i,j) ∈ E de- 6 ij notes the edge weight set. To abbreviate the notation, the edges are labelled with numbers. For instance, an n 4 edge with label l is denoted as l ∈ E. The set of edges o uti (communication channel)(cid:83)adjacent to node i is denoted ol 2 as E(i). Note that E = i∈VE(i). We only consider Ev undirected graphs, and hence the weighted Laplacian e x(t) matrixL issymmetricandcanbeexpressedas u y(t) w al 0 v(t) V (cid:80) w ifi=jand(i,l)∈E l il -2 [Lw]ij = −wij ifi(cid:54)=jand(i,j)∈E 0 4 8 12 16 20 0 otherwise t(s) (cid:88) = w E , k k Fig.1.Theevolutionofx(t),y(t)andv(t)forExample8. k∈E where where k is the label of the edges, 0 ≤ w ∈ W, ∀k ∈ E k −3 1 1 −1 2 are the edge-weights. If node i and j are connected via edgek,then A= 2 0 −1 1 −1, 0 1 2 −1 1 [E ] =[E ] =1, [E ] =[E ] =−1, k ii k jj k ij k ji b = [5,6,3]T and c = [2,1,−1,−3,1]T. The optimal and the other elements of E are zero. If the graph is value for the problem is cTx∗ = −76 and the evolution k connected,thentheeigenvaluesofL satisfies:0=λ < ofcTx(t)isshowninFig.2. (cid:50) w 1 λ ≤ ··· ≤ λ and λ is the algebraic connectivity of 2 N 2 G(V,E,W). 0 (cid:80) Let L = E be the unweighted Laplacian matrix k∈E k -20 ofthegraph.WesupposeLhasonlyonezeroeigenvalue, namely, the unweighted graph G(V,E) is connected. As -40 illustratedinFig.3,eachedgeweight(thegainofcom- x T municationchannel)isthesumoftheweightcontributed c -60 X: 186.5 bythetwoendnodesoftheedge. Y: -76 -80 Moreover,itisalsoassumedthatthereisalimitedcom- munication resource or budget constraint for each node -100 0 50 100 150 200 in the network. Without loss of generality, we assume t(s) (cid:80) w(i) = 1. Note that this differs from the for- k∈E(i) k mulation in (G¨oring et al., 2008), while they assume Fig.2.TheevolutionofobjectivefunctionforExample9. (cid:80) w = 1, if written in the form of our model, k∈E k (cid:80) (cid:80) w(i) =1. Then maximizing the algebraic i∈V k∈E(i) k connectivity of the edge-weighted Laplacian matrix by 5 Distributed Maximization of the Algebraic adjustingtheedge-weightscanbeformulatedasthefol- ConnectivityofaNetwork lowingSDPproblem: In this section, we apply the aforementioned algorithm maximize λ 2 tomaximizethealgebraicconnectivityofanetworkina λ2,µ,{wk(i)} distributed manner. The problem is first formulated as subjectto λ I−µ1(cid:22)(cid:88) (cid:88) w(i)E , anSemi-definiteProgramming(SDP)problem,thenby 2 k k employingthefunction(14),theproblemisrelaxedinto i∈Vk∈E(i) (Pc) anNonlinearProgramming(NP)problem. (cid:88) w(i) =1, k k∈E(i) 5.1 ModelingandProblemEquivalence w(i) ≥0, ∀k ∈E, i∈V. k An edge-weighted undirected graph is denoted as G(V,E,W),whereV ={1,2,··· ,N}denotesthevertex In(P ),thevariableµisusedtoshiftthezeroeigenvalue c set; E denotes the edge set and (i,j) ∈ E, i,j ∈ V if oftheLaplacianmatrixL withitseigenvector1.When w 6 the optimal value is reached, λ∗ would be the smallest (P ): 2 c eigenvalue of (cid:80) (cid:80) w(i)∗E +µ∗1. Since for any positive semi-deifi∈nVite mk∈aEtrixk G,kit holds that ξI (cid:22) G, tr(Φ∗)=1, Φ∗ (cid:23)0, ϕ(i)∗ ≥0, tr(1Φ∗)=0, (12a) k whereξisthesmallesteigenvalueofG,wegettheabove tr(E Φ∗)−v(i)∗+ϕ(i)∗ =0, w(i)∗ ≥0, (12b) constraints.Moreover,sinceλ∗isthesmallesteigenvalue k k k 2 of(cid:80)i∈V(cid:80)k∈Ewk(i)∗Ek+µ∗1,theoptimalvalueof (Pc) ϕ(ki)∗wk(i)∗ =0, ∀k ∈E(i), (12c) can be attained. On the other hand, we can choose λ2 λ∗−(cid:88) (cid:88) w(i)∗tr(E Φ∗)=0, (12d) smallenoughandµlargeenoughtomakethefirstmatrix 2 k k i∈Vk∈E(i) inequalityconstraintstrictlyholds,hencestrongduality holds for (Pc). In order to solve (Pc) distributedly, we λ∗2I−µ∗1−(cid:88) (cid:88) wk(i)∗Ek (cid:22)0, (12e) i∈Vk∈E(i) edge 1 edge 2 (cid:88) w(i)∗ =1, (12f) k k∈E(i) 1 2 3 whiletheKKTconditionsof (P )foralli∈V reads D tr(Φ(i)∗)=1,Φ(i)∗ (cid:23)0,ϕ(i)∗ ≥0,tr(1Φ(i)∗)=0, Fig.3.Edgeweightisthesumofweightcontributionbythe k nodes. (13a) considerthefollowingproblem tr(E Φ(i)∗)−v(i)∗+ϕ(i)∗ =0, w(i)∗ ≥0, (13b) k k k maximize (cid:88)λ(i) ϕ(ki)∗wk(i)∗ =0,∀k ∈E(i), (13c) {µ(i)},{wk(i)},{Z(i)} i∈V 2 λ(2i)∗− (cid:88) wk(i)∗tr(EkΦ(i)∗) subjectto λ(i)I−µ(i)1− (cid:88) w(i)E k∈E(i) 2 k k (cid:88) k∈E(i) + tr[(Z(i)−Z(j))Φ(i)]=0, (13d) (cid:88) + (Z(i)−Z(j))(cid:22)0, j∈N(i) j∈N(i) λ(2i)∗I−µ(i)∗1− (cid:88) wk(i)∗Ek (cid:88) w(i) =1, k∈E(i) k (cid:88) k∈E(i) + (Z(i)∗−Z(j)∗)(cid:22)0, (13e) w(i) ≥0, ∀k ∈E(i), ∀i∈V, j∈N(i) k (PD) (cid:88) w(i)∗ =1, (cid:88) Φ(i)∗−Φ(j)∗ =0, (13f) k where Z(i),∀i ∈ V are symmetric matrices which can k∈E(i) j∈N(i) be written as Z(i) = (cid:80)N(N2+1) z(i)B , where z(i) ∈ R l=1 l l l where Φ, ϕ and v are the Lagrange multipliers cor- is the matrix entry and B is the basis matrix for SN. k l respond to the matrix inequality constraint, inequality Both are labeled by l. To be more precise, if z(i) is the l constraints and the equality constraint of (Pc) respec- entry located on pth row and qth column of Z(i), then tively.Similarly,Φ(i),ϕ(i)andv(i)aretheLagrangemul- [B ] =[B ] =1andotherentriesofB remainzeros. k l pq l qp l tipliersof (P ). D The idea behind the introduction of Z(i) terms is that it will result in a consensus condition (13f) in the KKT Sincestrongdualityholdsfor(P ),thentheKKTcon- D conditions. ditions(13)becomesnecessaryandsufficientconditions for optimality. Hence {λ(i)∗,µ(i)∗,{w(i)∗},Z(i)∗}, i ∈ V 2 k It is worth noticing that the Slater’s conditions holds solves (P ), if and only if there exists Lagrange multi- D alsofor(PD)whichimpliesstrongduality. pliers{Φ(i)∗}and{ϕ(i)∗}suchthat(13)holds. k Proposition10 If {λ(i)∗,µ(i)∗,{w(i)∗},Z(i)∗}, i ∈ V Meanwhile, (13f) implies Φ(i)∗ = Φ(j)∗ for all i,j ∈ V. 2 k solves (P ), then λ∗ = (cid:80) λ(i)∗, µ∗ = (cid:80) µ(i)∗ This means that (12a)-(12b) are equivalent to (13a)- D 2 i∈V 2 i∈V (13b).Further,byadding(13d)and(13e)foreachnode solves (P ). c i∈V,thetermstr[(Z(i)−Z(j))Φ(i)]andZ(i)∗−Z(j)∗will disappear. Denote λ∗ = (cid:80) λ(i)∗, µ∗ = (cid:80) µ(i)∗, 2 i∈V 2 i∈V wewillget(12).Sincestrongdualityholdsfor(P ),then c PROOF. Byfollowingthesameprocedureintroduced KKT conditions is necessary and sufficient conditions inSection2,itisnothardtogettheKKTconditionsof foroptimality.Hencethestatementfollows. (cid:50) 7 5.2 RelaxingSDPintoNP Remark12 Thoughconvex,f (X)isnotastrictlycon- ε vexfunction.Forall0≤α≤1,itholdsthat (P ) can be solved distributedly by using a similar D methodas(Zhang&Hu,2016)whenthegraphisregu- αfε(I)+(1−α)fε(2I)=αεln(Ne1ε)+(1−α)εln(Ne2ε) lar.However,wewouldliketoconsiderarbitrarygraphs, =εlnN +2−α notonlyregulargraphs.Inordertoapplytheprojected gradient flow dynamics to solve (P ), the problem is D 2−α relaxedintoanNPfirst. fε(αI+(1−α)2I)=fε((2−α)I)=εln(Ne ε ) =εlnN +2−α Now we introduce the convex function proposed by Hence αf (I) + (1 − α)f (2I) = f (αI + (1 − α)2I), ε ε ε Nesterov (2007) which can be used to approximate f (X)isnotstrictlyconvex. (cid:50) ε the largest eigenvalue of a symmetric matrix. Given X ∈SN,functionf :SN (cid:55)→Randreads ε 5.3 ProjectedDynamicsandNumericalExamples N f (X)=εlntr(eX/ε)=εln[(cid:88)eλi(X)/ε], (14) Now we apply the projected dynmaical system to ε solve (P ). To abbreviate the notation, denote x = i=1 c (cid:2)x(1)T,··· ,x(N)T(cid:3)T, where x(i) =(cid:2)µ(i),{w(i)},{z(i)}(cid:3)T k l anditsderivativewithrespecttoX reads andv =(cid:2)v(1),··· ,v(N)(cid:3)T. N N (cid:88) (cid:88) Theprojecteddynamicsisgivenby ∇ f (X)=[ eλi(X)/ε]−1[ eλi(X)/εu uT], (15) X ε i i i=1 i=1 ∂L (x,v) µ˙(i) =(cid:104)∇ f (X(i)),1(cid:105) =− A , (16a) X(i) ε M ∂µ(i) where(λ (X),u )areeigen-pairsofX with(cid:107)u (cid:107)=1,∀i. i i i It has been proved by Nesterov (2007) that λmax(X)≤ w˙k(i) =ΠR+(cid:0)wk(i),(cid:104)∇X(i)fε(X(i)),Ek(cid:105)M −v(i) fε(X)≤λmax(X)+εlnN.Hencewhenεissufficiently −( (cid:88) w(i)−1)(cid:1) small,fε(X)≈λmax(X). l l∈E(i) Nnoottheintghebumta−trλi(xi)iinsetqhueallaitrygecsotnesigtreanivnatluieno(fPtDhe)mmaetarnixs =ΠR+(wk(i),−∂L∂Aw((xi),v)), ∀k ∈E(i) (16b) 2 k −µ(i)1 − (cid:80)k∈Ewk(i)Ek + (cid:80)j∈N(i)(Z(i) − Z(j)) (Boyd z˙l(i) =− (cid:88) (cid:104)∇X(i)fε(X(i))−∇X(j)fε(X(j)),Bl(cid:105)M & Vandenberghe, 2004). Hence when choosing ε small, j∈N(i) SDP(PD)canberelaxedintothefollowingNP ∂L (x,v) =− A , (16c) (cid:88) ∂z(i) minimize f (X(i)) l {µ(i)},{wk(i)},{Z(i)} i∈V ε v˙(i) = (cid:88) w(i)−1= ∂LA(x,v), (16d) subjectto (cid:88) w(i) =1 (NP) k∈E(i) k ∂v(i) k k∈E(i) w(i) ≥0, ∀k ∈E(i), whereLA(x,v)=(cid:80)i∈V{fε(X(i))+v(i)((cid:80)k∈E(i)wk(i)− k 1)+1((cid:80) w(i)−1)2}and∇ f (X(i))isgivenby 2 k∈E(i) k X(i) ε whereX(i) =−µ(i)1−(cid:80) w(i)E +(cid:80) (Z(i)− (15). k∈E(i) k k j∈N(i) Z(j)) to abbreviate the notation. It is clear that the Corollary13 The system (16) is well-defined and Slater’s condition also holds for (NP) and hence strong asymptotically stable with x converging to one of the dualityholds.Therefore,KKTconditionsbecomesnec- optimalsolutions. essaryandsufficientconditionsfor(NP)problem. Remark11 (NP) explains the reason of the introduc- tion of µ in (Pc) instead of writing the constraint as PROOF. fε(X) has a Lipschitz(cid:80)continuous gradient λ (I − 11) (cid:22) (cid:80) (cid:80) w(i)E as in (Ghosh & withrespecttoxgiventhatX = ixiAi,whereallAi 2 N i∈V k∈E(i) k k are symmetric matrices (Nesterov, 2007). By Theorem Boyd, 2006). A λ2I term is needed for the relaxation. 2.5 in (Nagurney & Zhang, 2012), for any initial value It is worth noticing that µ∗ does not necessarily equals µ(i)(0) ∈ R, w(i)(0) ∈ R , and v(i)(0) ∈ R, there ex- toλ∗/N in (P ).Infact,any(λ∗,µ∗,(cid:8)w(i)∗(cid:9))suchthat k + 2 c 2 k istsauniqueCarath´eodorysolutionwhichcontinuously µ∗ ≥λ∗2/N isanoptimalsolutionfor (Pc). (cid:50) depends on the initial value. Therefore the system (16) 8 is well-defined and by Theorem 6, the system (16) is 1.2 asymptotically stable with x converging to the optimal X: 18.83 solution. (cid:50) Y: 0.9997 1 Tdyhnoaumghicsit(1se6e)m, sontehahtaswhtoendsoimeuiglaentivnaglutehedepcroomjepcotseid- eulaV thg0.8 www112(((122))) tion on X(i) to compute ∇X(i)f(X(i)) at each time ieW e0.6 w2(3) XY:: 108.5.83 step. However, since the factors eλi(X(i))/ε decrease gd E0.4 very rapidly, the gradient numerically only depends on few largest eigenvalues and correpondant eigenvectors (Nesterov,2007).Moreprecisely,forsufficientlysmallε, 0.2 0 5 10 15 20 ∇ f(X(i)) ≈ 1 (cid:80) u(i)u(i)T, where u(i) is the eigen- t(s) X(i) η j j j j vector corresponds to the largest eigenvalue, η is the Fig.5.Theevolutionoftheedgeweights multiplicity of the largest eigenvalue of X(i). Extreme eigenvalues will converge first in numerical methods such as Lanszos scheme, hence one does not have to do 3 2 the entire eigenvalue decomposition and the numerical complexityisreduced. 4 1 Example14 Werunasimplenumericalexampleusing the graph illustrated in Fig. 3 to show that the variables 5 10 do converge to the optimal solutions. We know that the optimal solution of (P ) is w(1)∗ = w(3)∗ = 1, w(2)∗ = c 1 2 1 6 9 w(2)∗ = 0.5 and λ∗ = 1.5. Fig. 4 and 5 show that the 2 2 7 8 10 Fig.6.GraphTopology 5 X: 18.69 Y: 1.5 )i(62 0 50 V i P 0 X: 17.15 -5 -50 optimal Y: 1.061 value: -100 1.1406 -10 0 5 10 15 20 -150 6(i) t(s) i 2 6$ 2V -200 P2 Fig.4.Theevolutionoftheobjectivefunction -250 objectivefunctionvalueandtheedgeweightsconvergeto 0 5 10 15 20 theoptimalsolution. (cid:50) t(s) Example15 Now we consider a more complicated Fig.7.Theevolutionoftheobjectivefunction graphgeneratedbytennodesasillustratedinFig.6The evolution of the objective function is illustrated in Fig. and difference between the aforementioned continuous- 7. It shows that the objective function converges to the optimalvalueasymptotically. (cid:50) timealgorithmandproximalgradientmethodisfurther explained.Moreover,theproblemofdistributedlymaxi- mizingthealgebraicconnectivityofanundirectedgraph 6 Conclusion byadjustingtheedgeweights(communicationgainsfor each channel) of each nodes is considered. The original In this paper, a projected primal-dual gradient flow of SDP problem is relaxed into an NP problem and then augmentedLagrangianispresentedtosolveconvexopti- the aforementioned projected dynamical system is ap- mizationproblemswhicharenotnecessarilystrictlycon- plied to solve the NP. Numerical examples show that vex.Weshowthattheprojecteddynamicalsystemcon- theprojecteddynamicalsystemsconvergetooneofthe verges to one of the optimal solutions. The connection optimalsolutions. 9 References rodynamic approach to distributed constrained opti- mization,. Arrow, K.-J., Hurwicz, L., Uzawa, H., Chenery, H.-B., Mesbahi, M., & Egerstedt, M. (2010). Graph theoretic Johnson, S.-M., Karlin, S., & Marschak, T. (1959). methodsinmultiagentnetworks.PrincetonUniversity Studiesinlinearandnon-linearprogramming.,. Press. Bacciotti, A., & Ceragioli, F. (2006). Nonpathological Nagurney,A.,&Zhang,D.(2012). Projecteddynamical lyapunov functions and discontinuous carath´eodory systems and variational inequalities with applications systems. Automatica,42,453–458. volume2. SpringerScience&BusinessMedia. Beck,A.,&Teboulle,M.(2010). 1gradient-basedalgo- Nesterov, Y. (2007). Smoothing technique and its ap- rithmswithapplicationstosignalrecoveryproblems. plicationsinsemidefiniteoptimization. Mathematical Convex Optimization in Signal Processing and Com- Programming,110,245–259. munications,. Olfati-Saber, R., Fax, J. A., & Murray, R. M. (2007). Boyd,S.,&Vandenberghe,L.(2004). Convexoptimiza- Consensusandcooperationinnetworkedmulti-agent tion. Cambridgeuniversitypress. systems. ProceedingsoftheIEEE,95,215–233. Brogliato,B.,Daniilidis,A.,Lemar´echal,C.,&Acary,V. Parikh, N., & Boyd, S. (2014). Proximal algorithms. (2006). Ontheequivalencebetweencomplementarity FoundationsandTrendsinOptimization,1,127–239. systems,projectedsystemsanddifferentialinclusions. Qiu, Z., Liu, S., & Xie, L. (2016). Distributed con- Systems&ControlLetters,55,45–51. strained optimal consensus of multi-agent systems. Chatzipanagiotis,N.,Dentcheva,D.,&Zavlanos,M.M. Automatica,68,209–215. (2015). An augmented lagrangian method for dis- Ruszczyn´ski, A. P. (2006). Nonlinear optimization. tributed optimization. Mathematical Programming, PrincetonUniversityPress. 152,405–434. Wang, J., & Elia, N. (2011). A control perspective for Chen, B. (2005). Optimization Theory and Algorithms. centralized and distributed convex optimization. In TsinghuaUniversitypress. 2011 50th IEEE Conference on Decision and Control Cherukuri, A., Mallada, E., & Cort´es, J. (2015). Con- and European Control Conference (pp. 3800–3805). vergenceofcaratheodorysolutionsforprimal-dualdy- IEEE. namicsinconstrainedconcaveoptimization. InSIAM Yang, S., Liu, Q., & Wang, J. (). A multi-agent system conferenceoncontrolanditsapplications. SIAM. with a proportional-integral protocol for distributed Cherukuri,A.,Mallada,E.,&Cort´es,J.(2016).Asymp- constrainedoptimization. IEEETransactionsonAu- toticconvergenceofconstrainedprimal–dualdynam- tomaticControl,. ics. Systems&ControlLetters,87,10–15. Yi, P., Hong, Y., & Liu, F. (2016). Initialization-free Durr, H.-B., & Ebenbauer, C. (2011). A smooth vector distributedalgorithmsforoptimalresourceallocation field for saddle point problems. In CDC-ECE (pp. with feasibility constraints and application to eco- 4654–4660). nomic dispatch of power systems. Automatica, 74, Feijer, D., & Paganini, F. (2010). Stability of primal– 259–269. dual gradient dynamics and applications to network Zeng, X., Yi, P., & Hong, Y. (2016). Distributed optimization. Automatica,46,1974–1981. continuous-timealgorithmforconstrainedconvexop- Fiedler, M. (1973). Algebraic connectivity of graphs. timizations via nonsmooth analysis approach. IEEE Czechoslovakmathematicaljournal,23,298–305. Transactions on Automatic Control, PP. doi:10. Gharesifard, B., & Cort´es, J. (2014). Distributed 1109/TAC.2016.2628807. continuous-time convex optimization on weight- Zhang, H., & Hu, X. (2016). Consensus control for lin- balanceddigraphs. IEEETransactionsonAutomatic earsystems withoptimalenergycost. arXivpreprint Control,59,781–786. arXiv:1612.00316,. Ghosh, A., & Boyd, S. (2006). Growing well-connected graphs. In Proceedings of the 45th IEEE Conference onDecisionandControl (pp.6605–6611). IEEE. G¨oring, F., Helmberg, C., & Wappler, M. (2008). Em- beddedintheshadowoftheseparator. SIAMJournal onOptimization,19,472–501. Khalil, H. K., & Grizzle, J. (2002). Nonlinear systems. (3rded.). PrenticehallNewJersey. Kose, T. (1956). Solutions of saddle value problems by differential equations. Econometrica, Journal of the EconometricSociety,(pp.59–70). Lin, P., Ren, W., & Farrell, J. A. (2016). Distributed continuous-time optimization: Nonuniform gradient gains, finite-time convergence, and convex constraint set. IEEETransactionsonAutomaticControl,. Liu, Q., Yang, S., & Wang, J. (2016). A collective neu- 10