Table Of Content

Distributed Random-Fixed Projected Algorithm for Constrained Optimization (cid:63) Over Digraphs Pei Xie∗, Keyou You∗, Shiji Song∗, Cheng Wu∗ ∗Department of Automation, Tsinghua University, and Tsinghua National Laboratory for Information Science and Technology, Beijing, 100084, China ([email protected], {youky,shijs,wuc}@tsinghua.edu.cn) Abstract: This paper is concerned with a constrained optimization problem over a directed graph (digraph) of nodes, in which the cost function is a sum of local objectives, and each node 7 1 only knows its local objective and constraints. To collaboratively solve the optimization, most 0 oftheexistingworksrequiretheinteractiongraphtobebalancedor“doubly-stochastic”,which 2 is quite restrictive and not necessary as shown in this paper. We focus on an epigraph form of the original optimization to resolve the “unbalanced” problem, and design a novel two-step n recursive algorithm with a simple structure. Under strongly connected digraphs, we prove that a J each node asymptotically converges to some common optimal solution. Finally, simulations are performed to illustrate the effectiveness of the proposed algorithms. 1 2 Keywords: Distributed algorithms, constrained optimization, unbalanced digraphs, epigraph ] form, random-fixed projected algorithm C D 1. INTRODUCTION underlying graph to be undirected or balanced digraph, . s whichisquitearestrictiveassumptionaboutrealnetwork c Over the last decades, a paradigm shift from centralized topology.ThesameissueexistsinotherworksuchasNedić [ processing to highly distributed systems has excited in- et al. (2010). 1 terest due to the increasing development in interactions For unbalanced graphs, Nedić and Olshevsky (2015) pro- v between computers, microprocessors and sensors. In this posesanalgorithmbycombiningthegradientdescentand 6 work, we consider a distributed constrained optimization the push-sum consensus. The so-called push-sum proto- 8 overadirectedgraph(digraph)ofnodes.Withoutacentral 9 col is primarily designed to achieve average-consensus on coordinationunit,eachnodeisunabletoobtaintheoverall 5 unbalanced graphs (Kempe et al., 2003). Although their informationoftheoptimizationproblem.Morespecifically, 0 algorithm can be applied to time-varying directed graphs, we focus on a problem of minimizing a sum of local . itonlyfocusesontheunconstrainedoptimization,andthe 1 objectives over general digraphs, where each node only additional computational cost makes it very complicated. 0 accessesitslocalobjectiveandconstraints.Suchproblems 7 An intuitive method for unbalanced graphs is recently arise in network congestion problems, where routers indi- 1 proposed by Xi and Khan (2016), which augments an vidually optimize their flow rates to minimize the latency : additional “surplus” for each node to record the state v alongtheirroutesinadistributedway.Otherapplications updates. The main ideas are motivated by Cai and Ishii i include non-autonomous power control, distributed esti- X (2012), which aims at achieving average consensus on di- mation, cognitive networks, statistical inference, machine graphs.Unfortunately,themethodinXiandKhan(2016) r learning, and etc. a isunabletohandletime-varyingdigraphs.Althoughthere Theproblemofminimizingthesumofconvexfunctionsis exists some distributed algorithms for constrained opti- extensivelystudiedinrecentyears,seeNedićandOzdaglar mizations such as Nedić et al. (2010), they mainly focus (2009); Lobel and Ozdaglar (2011); Xi and Khan (2016). onabstractconvexconstraintsandneedtoperformprojec- In general, the existing distributed algorithms mainly tion,whichiscomputationallydemandingiftheprojected adoptgradient(orsub-gradient)basedmethodstoupdate set is irregular. Moreover, the existing algorithms dealing the local estimate in each node to minimize its local with structural constraints (not abstract) often encounter objective, and a communication protocol is designed to problems under unbalanced graphs. achieve consensus among nodes. When constraints are To sum up, we notice that almost all the existing algo- taken into account, the distributed implementation of the rithms are either only applicable to unconstrained opti- well-known Alternating Directed method of Multipliers mizationorinneedofbalanceddigraphs.Tosolvethetwo (ADMM) (Mota et al., 2013) is proposed. It assumes the issues simultaneously, we introduce an epigraph form of (cid:63) This work was supported by the National Natural Science Foun- the optimization problem to convert the objective func- dation of China (61304038, 41576101), and Tsinghua University tion to a linear form, which can address the unbalanced InitiativeScientificResearchProgram. case, and design a novel two-step recursive algorithm. In i. The out-neighbors Nout = {j|(j,i) ∈ E} are defined i particular, we firstly solve an unconstrained optimization similarly. Note that each node is included in both its out- problemwithoutthedecoupledconstraintsoftheepigraph neighborsandin-neighbors.Nodei issaidtobeconnected k formbyusingastandarddistributedsub-gradientdescent to node i if there exists a sequence of directed edges 1 and obtain an intermediate state vector in each node. (i ,i ),...,(i ,i )with(i ,i )∈E forallj ∈{2,...,k}, 1 2 k−1 k j−1 j Secondly, the intermediate state vector of each node is which is called a directed path from i to i . A directed 1 k movedtowardtheintersectionofitslocalconstraintsetsby graph G is said to be strongly connected if each node is using the Polyak random algorithm (Nedić, 2011). While connected to every other node via a directed path. If A = a distributed version of the Polyak algorithm is proposed {a }∈Rn×nsatisfiesthata >0if(i,j)∈E anda =0, ij ij ij in You and Tempo (2016), in this paper we further intro- otherwise,wesayAisaweightingmatrixadaptedtograph duce an additional “projection” toward a fixed direction G. Given a digraph G and its associated weighting matrix to improve the transient performance. This algorithm is A, we say G is balanced if (cid:80) a = (cid:80) a for j∈Nout ji j∈Nin ij termed as distributed random-fixed projected algorithm, of i i any i∈V, and unbalanced, otherwise. which the convergence is rigorously proved as well. Moreover, A is row-stochastic if (cid:80)n a = 1 for any The rest of the paper is organized as follows. In Section i ∈ V, and column-stochastic, if (cid:80)jn=1aij = 1 for any II, we formulate the distributed constrained optimization i=1 ij j ∈V. The matrix A is said to be doubly-stochastic if A is andreviewtheexistingworks.InSectionIII,weintroduce both row-stochastic and column-stochastic. We note that the epigraph form of the original optimization to attack double-stochasticityisarestrictiveconditionfordigraphs. theunbalancedissue,anddesignarandom-fixedprojected algorithm to distributedly solve the reformulated opti- The objective of this paper is to design a distributedly mization. In Section IV, the convergence of the proposed recursive algorithm for problem (1) over an unbalanced algorithm is proved. In Section V, some illustrative exam- digraph, under which every node i updates a local vector plesarepresentedtoshowtheeffectivenessoftheproposed xk by exchanging limited information with neighbors at i algorithm. Finally, some concluding remarks are drawn in each time so that each xk eventually converges to some i Section VI. common optimal solution. Notation: For two vectors a = [a ,...,a ]T and b = 1 n 2.2 Review of the major results and motivation [b ,...,b ]T, the notation a (cid:22) b means that a ≤ b 1 n i i for any i ∈ {1,...,n}. Similar notation is used for ≺, We first review the standard distributed gradient de- (cid:23) and (cid:31). The symbols 1 and 0 denotes the vectors n n scent algorithm (DGD) for unconstrained optimization, with all entries equal to one and zero respectively, and which however requires doubly-stochastic weighting ma- e denotes a unit vector with the jth element equals to j trices (Nedić and Ozdaglar, 2009). That is, each node i one. For a matrix A, we use (cid:107)A(cid:107) and ρ(A) to represent its updates its local estimate of an optimal solution by normandspectralradiusrespectively.Givenapairofreal n matricesofproperdimensions,⊗indicatestheirKronecker (cid:88) xk+1 = a xk−ζk∇f , (2) product. The sub-gradient of a function f with respect i ij j i to an input vector x ∈ Rm is denoted by ∂f(x). Finally, j=1 f(θ) =max{0,f(θ)} denotes the nonnegative part of f. where ζk is a given step size. + However, the DGD is only able to solve the optimization 2. PROBLEM FORMULATION AND MOTIVATION problem over balanced graphs, which is not applicable to unbalanced graphs. To illustrate this point, we define the 2.1 Distributed constrained optimization Perron vector of a weighting matrix A as follows. Lemma 1. (Horn and Johnson, 2012, Perron Theorem) If Consider a network of n nodes to distributedly solve a G a strongly-connected digraph and A is the associated constrained convex optimization weighting matrix, there exists a Perron vector π ∈ Rn n such that (cid:88) min F(x)(cid:44) f (x), i πTA=πT,πT1 =1,π >0, and (3) x∈X n i i=1 ρ(A−1 πT)<1. (4) s.t. g (x)(cid:22)0, i=1,2,...,n, (1) n i where X ∈ Rm is a common convex set known by all Bymultiplyingπ in(3)onbothsidesof(2)andsumming nodes,whilefi :Rm →Risaconvexfunctiononlyknown up over i, we obtiain that by node i. Moreover, only node i is aware of its local n constraintsgi(x)(cid:22)0,wheregi(x)=(cid:104)gi1(x),...,gidi(x)(cid:105)T ∈ x¯k+1 (cid:44)(cid:88)πixki+1 Rdi is a vector of convex functions. i=n1(cid:32) n (cid:33) n (cid:88) (cid:88) (cid:88) We introduce a directed graph (digraph) G = {V,E} to = πiaij xkj −ζk πi∇fi(xki) describe interactions between nodes, where V :={1,...,n} j=1 i=1 i=1 denotesthesetofnodes,andthesetofinteractionlinksis n (cid:88) represented by E. A directed edge (i,j) ∈ E if node i can =x¯k−ζk πi∇fi(xki). (5) directlyreceiveinformationfromnodej.WedefineNin = i=1 i {j|(i,j) ∈ E} as the collection of in-neighbors of node i, If all nodes have already reached consensus, then (5) is i.e., the set of nodes directly send information to node written as n (1),andobtainthesamelinearobjectivefunctionforevery (cid:88) x¯k+1 =x¯k−ζk πi∇fi(x¯k). (6) node. This eliminates the effect of different elements of i=1 the Perron vector on the limiting point of (6). Then we Clearly, (6) is a gradient descent algorithm to minimize utilize the DGD in (2) to resolve the epigraph form and the following objective function obtain an intermediate state vector. The feasibility of the n local estimate in each node is asymptotically ensured by F¯(x)(cid:44)(cid:88)π f (x). (7) furtherdrivingthisvectortowardtheconstraintset.That i i is, we update the intermediate vector toward the negative i=1 Thus, each node converges to a minimizer of F¯(x) rather sub-gradient direction of a local constraint function. This novel idea is in fact proposed in our recent work (You than F(x) in (1), which is also noted in Xi and Khan and Tempo, 2016), which generalizes the Polyak random (2016). For a generic unbalanced digraph, the weighting algorithm to its distributed version. The convergence of matrix is no longer doubly-stochastic, and the Perron vector is not equal to (cid:2)1,..., 1(cid:3)T, which obviously implies the algorithm is proved in next section. n n that F¯(x) (cid:54)= F(x). That is, DGD in (2) is not applicable to the case of unbalanced graphs. 3.1 Epigraph form of the constrained optimization Ifeachnodeiisabletoaccessitsassociatedelementofthe Perron vector πi, it follows from (6) that a natural way to Our main idea does not focus on πi but on fi in (6). modify the DGD in (2) is given as Specifically, if we transform all the local objective fi(x) to the same form f (x), then (6) is reduced to x¯k+1 = xki+1 =(cid:88)n aijxkj − ζπki∇fi, xb¯ket−weζekn∇tfh0e(x¯cka)s,eswho0ifchbailmanpclieeds athnadtutnhberaelainscneodddiiffgerarepnhcse. j=1 This is achieved by concentrating on the epigraph form of which is recently exploited in Morral (2016) by designing the optimization (1). an additional distributed algorithm to locally estimate π . However, it is not directly applicable to time-varying Given a f(x):Rm →R, the epigraph of f is defined as i graphs as there does not exist such a constant Perron epi f = {(x,t)|x∈dom f,f(x)≤t}, vector. In fact, this shortcoming has also been explicitly pointed out in Morral (2016). Another idea to resolve which is a subset of Rm+1. It follows from Boyd and the unbalanced problem is to augment the original row- Vandenberghe (2004) that the epigraph of f is a convex stochastic matrix into a doubly-stochastic matrix. This set if and only if f is convex, and minimizing f is equal novel approach is originally proposed by Cai and Ishii to searching the minimal auxiliary variable t within the (2012) for average consensus achieving problems over un- epigraph. By this way, we transform the optimization balanced graphs. Their key is to augment an additional problem of minimizing a convex objective to minimizing variableforeachagent,called“surplus”,whosefunctionis a linear function within a convex set. In the case of tolocallyrecordindividualstateupdates.InXiandKhan multiple functions, the epigraph can be defined similarly (2016), the “surplus-based” idea is adopted to solve the by introducing multiple auxiliary variables. distributed optimization problem over fixed unbalanced Combiningtheaboveideas,weconsidertheepigraphform graphs. Although it is extended to time-varying graphs of(1)byusinganauxiliaryvectort∈Rn.Then,itisclear in Cai and Ishii (2014), it only focuses on the average that problem (1) can be reformulated as consensus problem. Again, it is unclear how to use the n “surplus-based” idea to solve the distributed optimization (cid:88) min 1Tt/n, problem over time-varying unbalanced graphs. This prob- n (x,t)∈Θ lem has been resolved in Nedić and Olshevsky (2015) by i=1 adopting the so-called push-sum consensus protocol, the s.t. fi(x)−eTi t≤0, goal of which is to achieve the average consensus over un- g (x)(cid:22)0, i=1,2,...,n, (8) i balanced graphs. Unfortunately, their algorithms appear where Θ=X×Rn is the Cartesian product of X and Rn. to be over complicated and involve nonlinear iterations. Remark 1. In view of the epigraph form, we have the Moreimportantly,theyarerestrictedtotheunconstrained following comments. optimization, and their rationale is not as clear as the DGD. (a) Denote y = [xT,tT]T and f (y) = cTy/n, where 0 0 c = [0T,1T]T. Thus, the objective in (8) becomes In this work, we solve the unbalanced problem from a 0 m n the sum of the local objective f , which is the same different perspective, which can easily address the con- 0 strained optimization over time-varying digraphs1. for all nodes. In view of (6), the correctness of the DGD can be guaranteed even for unbalanced graphs. (b) The local objective f (x) in (1) is handled via an 3. DISTRIBUTED ALGORITHMS FOR i CONSTRAINED OPTIMIZATION additional constraint in (8) such that f˜i(y)=fi(x)− eTt≤0, where f˜(y) is a convex function as well. To i i As explained, perhaps it is not effective to attack the un- evaluate f˜(y), it requires each node i to select the i balanced problem via the Perron vector. To overcome this i-th element of the vector t and i is the identifier of limitation,westudytheepigraphformoftheoptimization node i. As a result, the epigraph form requires each node to know its identifier, which is also needed in 1 Theresultontime-varyingdigraphsistobeincludedinthejournal Mai and Abed (2016, Assumption 2). versionofthiswork. 3.2 Distributed Random-Fixed Projected Algorithm the newly introduced constraint X0×T such that j j (f (zk)−eTpk) Sincetheobjectivefunctionof(8)islinear,theredoesnot xk+1 =Π (zk−β j j j j +vk), (14) exist an optimal point for the unconstrained optimization j X j 1+(cid:107)vk(cid:107)2 j j without local constrains in (8). That is, the problem (8) (f (zk)−eTpk) is meaningful only when constraints are taken into con- tk+1 =pk+β j j j j +e , (15) sideration. Consider that the local constraints in (8) are j j 1+(cid:107)vk(cid:107)2 j j givenintheformofconvexfunctions,weshallfullyexploit where the vector vk is a sub-gradient of f evaluated at their structures, which is different from the constrained j j zk. Similarly, we have that version of DGD in Lobel and Ozdaglar (2011) by using j the projection operator. Clearly, the projection is easy to d((xk+1,tk+1),X0×T )≤d((zk,tk),X0×T ). (16) j j j j j j j j perform only if the projected set has a relatively simple Once all the nodes reach an agreement, the state vector structure, e.g., interval or half-space. From this perspec- (xk,tk)ineachnodeasymptoticallyconvergestoafeasible tive, our algorithm requires much less computational load j j point. Overall, we use Algorithm 1 to formalize the above per iteration. To this end, we adopt the Polyak random discussion. Note that Nedić and Ozdaglar (2009) requires algorithm(Nedić,2011),whichhoweveronlyaddressesthe the double stochasticity of A, which is unnecessary here. centralized version, to solve the distributed constrained optimization. Algorithm 1 Distributed random-fixed projected algorithm (D-RFP) To solve (8) recursively, every node j maintains a local estimate xk ∈ Rm and tk ∈ Rn at each iteration k. 1: Initialization:Foreachnodej ∈V,setx =0,t =0. j j j j Firstly, we solve an unconstrained optimization problem 2: Repeat which removes the constraints in problem (8) by using 3: Set k =1. the standard distributed sub-gradient descent algorithm 4: Local information exchange: Each node j ∈ V and obtain intermediate state vectors pk and yk, which broadcasts x and t to its out-neighbors. j j j j correspond to tk ∈Rm and xk ∈Rn, respectively, i.e., 5: Local variables update: Each node j ∈ V receives j j thestatevectorsx andt fromitsin-neighborsi∈Nin n i i j (cid:88) and updates its local vectors as follows pkj = ajitki −ζk1n, (9) • y = (cid:80) a x , p = (cid:80) a t − ζk1 , i=1 j i∈Njin ji i j i∈Njin ji i n n where the stepsize ζk is given in (11). (cid:88) yk = a xk, (10) • Draw a random variable ω from {1,...,τ }, and j ji i j j i=1 obtain z = y −βgjωj(yj)+u , where u is defined whereζk isthestep-sizesatisfyingthepersistentlyexciting j j (cid:107)uj(cid:107)2 j j in (13). condition ζk >0, (cid:88)∞ ζk =∞, (cid:88)∞ (ζk)2 <∞. (11) • Set xj ← ΠX(zj −β(fj(1z+j)(cid:107)−vejTj(cid:107)2pj)+vj), where vj is Then, we adopt thke=0Polyak’s ideka=0to address the con- 6: Setdke=finked+in1.(14), and tj ←pj +β(fj(1z+j)(cid:107)−vejTj(cid:107)2pj)+ej. straints of (8) to drive the intermediate state vectors 7: Until a predefined stopping rule is satisfied. toward the feasible set. To facilitate the presentation, we introduce following notations Remark 2. Algorithm 1 is motivated by a centralized Xjl ={x∈Rm|gjl(x)≤0},l∈{1,...,τj}, Polyak random algorithm (Nedić, 2011), which is very X0×T ={(x,t)|f (x)−eTt≤0,x∈X}. (12) recently extended to the distributed version in You and j j j j Tempo (2016). The main difference from You and Tempo To be specific, we update yk toward a randomly selected j (2016) is that we do not use randomized projection on set Xωjk by using the Polyak’s projection idea, i.e., all the constraints. For instance, Xj0 ×Tj will always be j consideredperiteration.Ifweequallytreattheconstraints zk =yk−βgjωjk(yjk)+uk, (13) gj(x) (cid:22) 0 and fj(x) − eTjt ≤ 0, then once the selected j j (cid:107)uk(cid:107)2 j constraint is from an element of gj(x), the vector t is not j updated as t is independent of g (x). This will slow down j where β ∈ (0,2) is a constant parameter, ωjk is a random the convergence speed and introduce undesired transient variable taking value from the integer set {1,...,τ }, and behavior. Thus, Algorithm 1 adds a fixed projection to j the vector uk ∈ ∂gωjk(yk) if gωjk(yk) > 0 and uk = ensure that both x and t are updated at each iteration. j j j + j j + j Remark 3. WeobservethatAlgorithm1isalsomotivated uj for some uj (cid:54)= 0 if gjωjk(yjk)+ = 0. In fact, ukj is from the alternating projection algorithm, which searches a decreasing direction of gωjk(yk) , which leads to that the intersection of several constraint sets by employing j j + alternating projections, see e.g. Escalante and Raydan d(zk,Xωjk) ≤ d(yk,Xωjk) for sufficiently small β. If ωk is (2011) and references therein. The key idea of the algo- j j j j j appropriately selected, it is expected in the average sense rithmisthatthestatevectorwillasymptoticallygetcloser that d(zk,∩τj Xl)≤d(yk,∩τj Xl). to the intersection by repeatedly projecting to differently j l=1 j j l=1 j selected constraint sets. In light of this, the “projection” It is noted that the auxiliary vector tk is not updated in our algorithm can also be performed for any times j during the above process. We use the same idea to handle at each iteration, either randomly or fixedly, to achieve the feasibility. In fact, we can also design other rules for Assumption 3. (Randomization and sub-gradient bound- selecting the projected constraint. For example, we may edness). Suppose the following holds choose the most distant constraint set from the interme- (a) {ωk}isani.i.d.sequencethatisuniformlydistributed diate vector. The measure of the “distance” from a vector j x to a constraint set f(x)≤0 is given as f(x)+ . over {1,...,τj} for any j ∈V, and is also independent (cid:107)∂f(x)(cid:107) over the index j. 4. CONVERGENCE ANALYSIS (b) The sub-gradient ukj and vjk are uniformly bounded over the set Θ, i.e., there exists a scalar D > 0 such To prove the convergence of Algorithm 1, we consider a that general form of (8) to simplify notations as max{(cid:107)uk(cid:107), (cid:107)vk(cid:107)}≤D, ∀j ∈V,∀k >0. j j min cTθ, θ∈Θ Obviously, the designer can freely choose any distribution s.t. f (θ)≤0, fordrawingthesamplesωk.HenceAssumption3(a)iseasy j j g (θ)(cid:22)0, j =1,2,...,n, (17) to satisfy. The Assumption 3(b) is also common for the j optimization problem, see e.g. Nedić and Ozdaglar (2009, where θ = (x,t) ∈ Rd in (8) and d = m+n. Moreover, Assumption 7), which is not hard to satisfy. fj : Rd → R is a convex function and gj : Rd → Rτj is a vector of convex functions. Nowwearereadytopresenttheconvergenceresultonthe distributed random-fixed projected algorithm. The objective is a global linear function, and each node Theorem 1. (Almost sure convergence). Under Assump- maintains local constraints only known by itself. In the tions 1-3, the sequence {θk} in (18) almost surely con- optimization problem (17), the inequality fj(θ) ≤ 0 is vergestosomecommonpoijntinthesetΘ∗ oftheoptimal regarded as a crucial constraint that needs to be prior solutions to (17). satisfied, while some constraints in g (θ) (cid:22) 0 can be j temporarilyrelaxeduntilbeingselected.ThentheD-RFP TheproofofTheorem1isroughlydividedintothreeparts. algorithm for (17) is given as The first part demonstrates the asymptotic feasibility (cid:88)n of the state vector θk, see Lemma 3. The second part pk = a θk−ζkc, (18a) j j ji i illustrates the optimality by showing that the distance of i=1 θk to any optimal point θ∗ is “stochastically” decreasing. qk =pk−βgωjk(pkj)+uk, (18b) Fjinally, the last part establishes a sufficient condition to j j (cid:107)uk(cid:107)2 j ensureasymptoticconsensus inLemma5,underwhichthe j sequence {θk} converges to the same value for all j ∈ V. f (pk) j θjk+1 =ΠΘ(qjk−β (cid:107)jvkj(cid:107)2+vjk), (18c) BtoysuomsinegcothmemaobnovpeoirnetsuinltsΘ,∗waelmshooswt stuhraetly{.θjk} converges j where uk ∈ ∂gωjk(pk) if gωjk(pk) > 0 and uk = u for Lemma 2. (Iterative projection). Let {hk} : Rm → R j j j + j j + j j be a sequence of convex functions and {Ω ⊆ Rm} be a k some uj (cid:54)=0 if gjωjk(pkj)+ =0, and the vector vjk is defined sequence of convex closed sets. Define {yk}⊆Rm by similarly related to f . h (y ) j y =Π (y −β k k +d ), k+1 Ωk k (cid:107)d (cid:107)2 k It is easy to verify that Algorithm 1 is just a special case k of the algorithm given in (18). Therefore, we only need to where 0 < β < 2,d ∈ ∂h (y ) if h (y ) > 0 k k k k k prove the convergence of (18). To this end, we introduce and d = d for any d (cid:54)= 0, otherwise. For any z ∈ k (cid:84) the following notations (Ω ∩···∩Ω ) {y|h (y)≤0,j =0,...,k−1},itholds 0 k−1 j Θ ={θ ∈Θ|f (θ)≤0,g (θ)(cid:22)0}, that j j j (cid:107)h (y ) (cid:107)2 Θ0 =Θ1∩···∩Θn, (cid:107)yk−z(cid:107)2 ≤(cid:107)y0−z(cid:107)2−β(2−β) 0(cid:107)d0(cid:107)2+ . Θ∗ ={θ ∈Θ |cTθ ≤cTθ(cid:48),∀θ(cid:48) ∈Θ }. (19) 0 0 0 Before proceeding, several assumptions are needed. Proof. By Nedić (2011, Lemma 1) and the definition of Assumption 1. The optimization problem in (17) is fea- {yk}, it holds for j ≤ k − 1 that (cid:107)yj+1 − z(cid:107)2 ≤ (cid:107)yj − sible and has a nonempty set of optimal solutions, i.e., z(cid:107)2 − β(2 − β)(cid:107)hj(yj)+(cid:107)2. Together with the fact that Θ (cid:54)=∅ and Θ∗ (cid:54)=∅. (cid:107)dj(cid:107)2 0 0 < β < 2, we have that (cid:107)y −z(cid:107)2 ≤ (cid:107)y −z(cid:107)2. Then, Assumption 2. (Strong connectivity). The graph G is j+1 j strongly connected. (cid:107)yk−z(cid:107)2 ≤(cid:107)y1−z(cid:107)2 ≤(cid:107)y0−z(cid:107)2−β(2−β)(cid:107)h0(cid:107)(dy00(cid:107))2+(cid:107)2. Lemma 3. (Feasibility). Define λk and µk as follows Assumption 1 is trivial that ensures the solvability of the j j problem.Astheconstraintsin(17)areonlyknowntonode n (cid:88) j,thestrongconnectivityofG isalsonecessary.Otherwise, λkj = ajiθik, and µkj =ΠΘ0(λkj), (20) wemayencounterasituationwhereanodeicanneverbe i=1 accessed by some other node j, thus the information from where Θ is defined in (19). If lim (cid:107)λk −µk(cid:107) = 0 for 0 k→∞ j j nodeicannotreachnodej.Then,itisimpossiblefornode any j ∈V, then lim (cid:107)µk−θk+1(cid:107)=0. j to find a solution to (17) since the information on the k→∞ j j constraintsmaintainedbynodeiisalwaysmissingtonode Proof. Consider Lemma 2, let y =pk, where pk is given j. To ensure the convergence of the proposed algorithm, 0 j j we also need the following assumptions. in (18a), h (y) = gωjk(y) and h (y) = f (y), Ω = Rm 0 j 1 j 0 and Ω = Θ. Then it is clear that y = θk+1. Since (c) lim (cid:107)µk−λk(cid:107)=0. 1 2 j k→∞ j j µk ∈Θ ⊆(Ω ∩Ω ), both y (µk)≤0 and y (µk)≤0 are (d) lim (cid:107)µk−θk+1(cid:107)=lim (cid:107)λk−θk+1(cid:107)=0. j 0 0 1 0 j 1 j k→∞ j j k→∞ j j satisfied. By Lemma 2, it holds that (e) lim (cid:107)µ¯k−θ¯k+1(cid:107)=lim (cid:107)λ¯k−θ¯k+1(cid:107)=0. k→∞ k→∞ (cid:107)θk+1−µk(cid:107)2 ≤(cid:107)pk−µk(cid:107)2−β(2−β)gjωjk(pkj)2+. Proof. Bytheconvexityof(cid:107)·(cid:107)2 andtherowstochasticity j j j j (cid:107)dk(cid:107)2 of A, i.e, (cid:80)n a =1, it follows that j i=1 ji Notice that (cid:107)pkj −µkj(cid:107)≤(cid:107)pkj −λkj(cid:107)+(cid:107)λkj −µkj(cid:107)=ζk(cid:107)c(cid:107)+ (cid:107)λk−θ∗(cid:107)2 ≤(cid:88)n a (cid:107)θk−θ∗(cid:107)2. (cid:107)λk − µk(cid:107), we have (cid:107)θk+1 − µk(cid:107) ≤ ζk(cid:107)c(cid:107) + (cid:107)λk − µk(cid:107). j ji i j j j j j j i=1 Bytakinglimitsonbothsides,weobtainlim (cid:107)θk+1− k→∞ j Jointly with (21), we obtain that for sufficiently large k, µk(cid:107)=0. j E(cid:2)(cid:107)θk+1−θ∗(cid:107)2Fk(cid:3) j The second part is a stochastically “decreasing” result, n whose proof is similar to that of You and Tempo (2016, ≤ (1+A (ζk)2)(cid:88)a (cid:107)θk−θ∗(cid:107)2−2ζkcT(µk−θ∗) 1 ji i j Lemma 4), and is omitted here. i=1 Lemma 4. (Stochasticallydecreasing).LetFk beaσ-field −A (cid:107)λk−µk(cid:107)2+A (ζk)2. (25) generated by the random variable {ωk,j ∈ V} up to time 2 j j 3 j k. Under Assumption 1 and 3, it holds almost surely that We multiply both sides of (25) with πj and sum over j, for ∀j ∈V and sufficiently large number k, together with (3) and the definition of µ¯k, we obtain E(cid:2)(cid:107)θk+1−θ∗(cid:107)2(cid:12)(cid:12)Fk]  n  j (cid:88) ≤ (1+A (ζk)2)(cid:107)λk−θ∗(cid:107)2−2ζkcT(µk−θ∗) E πj(cid:107)θjk+1−θ∗(cid:107)2|Fk 1 j j j=1 −A (cid:107)λk−µk(cid:107)2+A (ζk)2. (21) n n 2 j j 3 ≤ (1+A (ζk)2)(cid:88)(cid:88)π a (cid:107)θk−θ∗(cid:107)2−2ζkcT(µ¯k−θ∗) where λk, µk are given in (20), θ∗ ∈ Θ∗, and A ,A ,A 1 j ji i j j 1 2 3 are positive constants. j=1i=1 n Finally,wecanshowthattheconsensusvalueisaweighted −A2(cid:88)πj(cid:107)λkj −µkj(cid:107)2+A3(ζk)2 average of the state vector in each node, of which the j=1 weighted vector is the Perron vector of A. n = (1+A (ζk)2)(cid:88)π (cid:107)θk−θ∗(cid:107)2−2ζkcT(µ¯k−θ∗). Lemma 5. (YouandTempo,2016).Considerthefollowing 1 j j sequence j=1 n (cid:88)n −A (cid:88)π (cid:107)λk−µk(cid:107)2+A (ζk)2. θk+1 = a θk+(cid:15)k, ∀j ∈V. (22) 2 j j j 3 j ji i j j=1 i=1 Suppose that G is strongly connected, let θ¯k =(cid:80)n π θk, It follows from (11) that A (ζk)2 ≥ 0, A (ζk)2 ≥ 0, i=1 i i 1 3 whereπi isanelementofπ givenby(3).Iflimk→∞(cid:107)(cid:15)kj(cid:107)= (cid:80)∞k=0A1(ζk)2 < ∞ and (cid:80)∞k=0A3(ζk)2 < ∞ hold. Notice 0, it holds that the convexity of Θ and µk ∈Θ , it is clear that µ¯k ∈Θ . 0 j 0 0 lim (cid:107)θk−θ¯k(cid:107)=0, ∀j ∈V. (23) In view of the fact that θ∗ is one optimal solution in Θ , j 0 k→∞ it holds that cTµ¯k −cTθ∗ ≥ 0. Thus, all the conditions The proof also relies crucially on the well-known super- in Theorem 2 are satisfied. It holds almost surely that martingaleconvergenceTheorem,whichisduetoRobbins {(cid:80)nj=1πj(cid:107)θjk − θ∗(cid:107)2} is convergent for any j ∈ V and and Siegmund (1985), see also Bertsekas (2015, Proposi- θ∗ ∈ Θ∗, hence (a) is proved. Moreover, it follows from tion A.4.5). This result is now restated for completeness. Theorem 2 that ∞ Theorem 2. (Super-martingale Convergence). Let {vk}, (cid:88)ζkcT(µ¯k−θ∗)<∞ (26) {u }, {a } and {b } be sequences of nonnegative random k k k variables such that k=0 E[v |F ]≤(1+a )v −u +b (24) and k+1 k k k k k ∞ n (cid:88)(cid:88) awh,e.r.e.,Fak,bde,n..o.t,ebs .thLeetc(cid:80)ol∞lectiaon<v0∞,.a.n.,dv(cid:80)k,∞u0,b...<,u∞k, πj(cid:107)λkj −µkj(cid:107)2 <∞. (27) 0 k 0 k k=0 k k=0 k k=0j=1 almostsurely.Then,wehavelim v =v forarandom variable v ≥0 and (cid:80)∞ u <∞k→al∞mokst surely. Itisclearthat(26)directlyimplies(b)underthecondition k=0 k cTµ¯k−cTθ∗ ≥0. Together with the fact that π >0 from i Now, we can summarize the previous discussions. Lemma1,itfollowsfrom(27)thatlimk→∞(cid:107)λkj−µkj(cid:107)2 =0, Proposition 1. (ConvergentResults).Letλ¯k =(cid:80)n π µk, thus(c)isproved.Combiningtheresultof(c)withLemma j=1 j j (3), it is clear that (d) holds as well. As for (e), it is the and µ¯k = (cid:80)n π µk, where λk, µk are defined in (20) j=1 j j j j direct inference of (d) by using triangle inequality, i.e., and π is given in (3). Then, for any θ∗ ∈Θ∗, the following (cid:107)λ¯k−θ¯k+1(cid:107)≤(cid:80)n π (cid:107)λk−θk+1(cid:107). statements hold almost surely. j=1 j j j (a) {(cid:80)nj=1πj(cid:107)θjk−θ∗(cid:107)2} converges. Combining the above results, we are in a position to (b) liminf cTµ¯k =cTθ∗. formally prove Theorem 1. k→∞ Proof of Theorem 1. Notice that λk = (cid:80)n a θk, the Polyak randomization algorithm. This is mainly due j i=1 ji j it follows from Proposition 1(d) that lim (cid:107)θk+1 − to the use of an additional fixed “projection’ in D-RFP. k→∞ j We also apply the D-RFP to time-varying unbalanced (cid:80)n a θk(cid:107)=0.ThenitholdsalmostsurelyfromLemma i=1 ji j 5 that lim (cid:107)θk − θ¯k(cid:107) = 0. Jointly with the fact k→∞ j 1650 Proposition 1(a) that {(cid:80)n π (cid:107)θk −θ∗(cid:107)2} converges for j=1 j j 1600 anyθ∗ ∈Θ∗,weobtainthat{(cid:107)θ¯k−θ∗(cid:107)}convergesalmost 1550 surely for any θ∗ ∈ Θ∗. Then it follows from Proposition 1500 1(e) that {(cid:107)µ¯k −θ∗(cid:107)} converges as well. By Proposition t1450 1(b), it implies that there exists a subsequence of {µ¯k} s o that converges almost surely to some point in the optimal c1400 set Θ∗, which is denoted as θ∗ . Due to the convergence 1350 Constrained DGD opt of{(cid:107)µ¯k−θ∗ (cid:107)},itfollowsthatlim µ¯k =θ∗ .Finally, 1300 Distributed Polyak Randomization we note thoaptt (cid:107)θk+1 −θ∗ (cid:107) ≤ (cid:107)θkk+→1∞−θ¯k+1(cid:107)o+pt(cid:107)θ¯k+1 − 1250 D−RFP j opt j 1200 µ¯k(cid:107)+(cid:107)µ¯k −θ∗ (cid:107), which converges almost surely to zero 0 0.5 1 1.5 2 2.5 3 as k → ∞. Tohpetrefore, there exists θ∗ ∈ Θ∗ such that k x 104 opt lim θk =θ∗ forallj ∈V withprobabilityone.Thus, Fig. 2. Comparisons of constrained DGD, Distributed k→∞ j opt Theorem 1 is proved. Polyak randomization, and D-RFP. 5. ILLUSTRATIVE EXAMPLES digraphs. For time-varying graphs, a common assumption is that the graph sequence {G(t)} is uniformly strongly We consider the facility location problem, which is one of connected (Nedić and Olshevsky, 2015), i.e., there exists the classical problems in operations research. Traditional a constant L such that G(t) ∩ ... ∩ G(t + L) is strongly facility location is a centralized problem, while in this pa- connected for any t > 0. In this experiment, the time- per, we propose a distributed formulation of the problem. varying graphs are given in Fig. 3, where any graph is not strongly connected but their joint graph is strongly n (cid:88) connected. We assume that G(t) is the left graph at odd min w (cid:107)x−q (cid:107) i i x∈R2 time,andotherwise,istherightone.Thesimulationresult i=1 in Fig. 4 confirms that our algorithm is also applicable to s.t. (cid:107)x−p(1)(cid:107)≤l(1), i i time-varying digraphs. The proof will be included in the (cid:107)x−p(2)(cid:107)≤l(2), i=1,...,n (28) journal version. i i The local constraints in (28) represent local resource limi- tations in each node, and the objective function describes the sum of cost when the facility is settled. We first compare several algorithms over a strongly- connected directed graph (omit self-loops) in Fig. 1. The Fig. 3. Two jointly strongly connected digraphs, but indi- vidually not strongly connected. 1600 Fig. 1. A strongly connected but unbalanced digraph. 1550 1500 associated weighting matrix is only row-stochastic. Inthis experiment, three algorithms are performed under the t1450 s same stepsizes, e.g., β = 1, ζk = 1. Comparisons of D- o k c1400 RFP, distributed Polyak randomization (You and Tempo, 2016)andtheconstrainedextensionofDGD(Nedićetal., 1350 Fixed digraph 2010)areshowninFig.2.Wecanclearlyobservethatthe 1300 Switching digraphs minimalcostcalculatedbytheconstrainedDGDisgreater thantheothertwoalgorithms,whichimpliesthatthecon- 125 00 0.5 1 1.5 2 2.5 3 strained DGD does not converge to an optimal solution. k x 104 ThisisconsistentwiththeobservationinSection2.2.The Fig. 4. The D-RFP over time-varying digraphs and its resultinFig.2indicatesthattheD-RFPconvergesfaster, comparison to the case under a fixed digraph. andiswithlessfluctuationsthanthedistributedversionof 6. CONCLUSIONS Robbins, H. and Siegmund, D. (1985). A convergence theorem for non negative almost supermartingales and In this work, we developed a random-fixed projected some applications. In Herbert Robbins Selected Papers, algorithm to collaboratively solve distributed constrained 111–135. Springer. optimizations over unbalanced digraphs. The proposed Xi, C. and Khan, U.A. (2016). Distributed subgradient algorithmhasasimplestructure.Thesimulationindicates projectionalgorithmoverdirectedgraphs. IEEE Trans- that the proposed algorithm is applicable to time-varying actions on Automatic Control in press. digraphs, of which the strict proof will be given in the You, K. and Tempo, R. (2016). Networked parallel algo- journalversion.Thedrawbackofouralgorithmisthatthe rithms for robust convex optimization via the scenario number of the augmented variables depends on the scale approach. arXiv preprint arXiv:1607.05507. of topology, which is an open question. Future work will focus on reducing the number of augmented variables and accelerating the convergence speed. REFERENCES Bertsekas, D.P. (2015). Convex optimization algorithms. Athena Scientific Belmont. Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge University Press. Cai,K.andIshii,H.(2012). Averageconsensusongeneral stronglyconnecteddigraphs. Automatica,48(11),2750– 2761. Cai, K. and Ishii, H. (2014). Average consensus on arbitrarystronglyconnecteddigraphswithtime-varying topologies. IEEE Transactions on Automatic Control, 59(4), 1066–1071. Escalante, R. and Raydan, M. (2011). Alternating projection methods, volume 8. SIAM. Horn, R.A. and Johnson, C.R. (2012). Matrix analysis. Cambridge University Press. Kempe, D., Dobra, A., and Gehrke, J. (2003). Gossip- based computation of aggregate information. In 44th Annual IEEE Symposium on Foundations of Computer Science, 482–491. Lobel,I.andOzdaglar,A.(2011). Distributedsubgradient methodsforconvexoptimizationoverrandomnetworks. IEEE Transactions on Automatic Control, 56(6), 1291– 1306. Mai, V.S. and Abed, E.H. (2016). Distributed optimization over weighted directed graphs using row stochastic matrix. In American Control Conference, 7165–7170. Morral, G. (2016). Distributed estimation of the left perron eigenvector of non-column stochastic protocols for distributed stochastic approximation. In American Control Conference, 3352–3357. Mota, J.F., Xavier, J.M., Aguiar, P.M., and Pu¨schel, M. (2013). D-admm:Acommunication-efficientdistributed algorithm for separable optimization. IEEE Transac- tions on Signal Processing, 61(10), 2718–2723. Nedić, A. (2011). Random algorithms for convex mini- mizationproblems. MathematicalProgramming,129(2), 225–253. Nedić,A.andOlshevsky,A.(2015). Distributedoptimiza- tion over time-varying directed graphs. IEEE Transac- tions on Automatic Control, 60(3), 601–615. Nedić, A. and Ozdaglar, A. (2009). Distributed sub- gradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1), 48–61. Nedić, A., Ozdaglar, A., and Parrilo, P.A. (2010). Con- strainedconsensusandoptimizationinmulti-agentnet- works. IEEETransactionsonAutomaticControl,55(4), 922–938.

Distributed Random-Fixed Projected Algorithm for Constrained Optimization Over Digraphs PDF

0.36 MB·

by Pei Xie

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Distributed Random-Fixed Projected Algorithm for Constrained Optimization Over Digraphs

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.