ebook img

An Arrow-Hurwicz-Uzawa Type Flow as Least Squares Solver for Network Linear Equations PDF

2 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview An Arrow-Hurwicz-Uzawa Type Flow as Least Squares Solver for Network Linear Equations

An Arrow-Hurwicz-Uzawa Type Flow as Least Squares Solver for Network Linear Equations Yang Liu, Christian Lageman, Brian D. O. Anderson, and Guodong Shi∗ Abstract 7 We study the approach to obtaining least squares solutions to systems of linear algebraic equations 1 over networks by using distributed algorithms. Each node has access to one of the linear equations and 0 2 holdsadynamicstate.Theaimforthenodestatesistoreachaconsensusasaleastsquaressolutionof n the linear equations by exchanging their states with neighbors over an underlying interaction graph. A a J continuous-time distributed least squares solver over networks is developed in the form of the famous 4 Arrow-Hurwicz-Uzawa flow. A necessary and sufficient condition is established on the graph Laplacian 1 for the continuous-time distributed algorithm to give the least squares solution in the limit, with an ] Y exponentiallyfastconvergencerate.Thefeasibilityofdifferentfundamentalgraphsisdiscussedincluding S path graph, star graph, etc. Moreover, a discrete-time distributed algorithm is developed by Euler’s . s c method,convergingexponentiallytotheleastsquaressolutionatthenodestateswithsuitablestepsize [ andgraphconditions.Theexponentialconvergencerateforboththecontinuous-timeanddiscrete-time 1 v algorithmsundertheestablishedconditionsisconfirmedbynumericalexamples.Finally,weinvestigate 8 the performance of the proposed flow under switching networks, and surprisingly, switching networks 0 9 at high switching frequencies can lead to approximate least square solvers even if all graphs in the 3 0 switching signal fail to do so in the absence of structure switching. . 1 0 Keywords: Distributed Algorithms; Dynamical Systems; Linear Equations; Least Squares; Switching 7 1 Networks. : v i X r 1 Introduction a Linear algebraic equations are foundational for various computation tasks, and systems of linear algebraic equations also arise from various practical engineering problems [11, 17, 30, 32, 9, 5, 8]. In recent years much interest has developed in finding out how to solve linear equations using multiple processing units or ∗Y. Liu, B. D. O. Anderson, and G. Shi are with the Research School of Engineering, The Australian National Uni- versity, ACT 0200, Canberra, Australia. C. Lageman is with Institute of Mathematics, University of Wu¨rzburg, Wu¨rzburg 97070,Germany.Email:[email protected],[email protected],[email protected], [email protected] 1 over a network. Major efforts have been made in the development of parallel or distributed algorithms as linear-equationsolvers.Ontheonehandoneaimsatfasteralgorithmsinviewoftheintuitionthatmultiple processing units working in parallel under smart arrangements might provide significant improvements in computation efficiency. On the other hand various distributed systems induce structure constraints, e.g., one node holds one equation and it cannot or does not want to share the exact equation with other nodes, a constraint which excludes the feasibility of classical centralized algorithms. Parallel algorithms for linear equations have been developed in the spirit of high-performance comput- ing, e.g., the Jacobi method [23, 37], the successive over-relaxations method [38], the Kaczmarz method [16] and the randomized iterative method [14]. In these algorithms, the state of each node can give an entry of the solution to a linear equation after a suitably long running time, via successive information exchange with other nodes and parallel independent computing. There are two restrictions in these par- allel algorithms. The first restriction is that often each node is implicitly required to have access to all the other ones, i.e., the network graph is naturally complete [23, 37, 16, 14]. Second, linear equations are restricted in many parallel algorithms. It is somewhat unsatisfactory that, for example in the Jacobi method, a sufficient condition for convergence is that the linear equations must be strictly or irreducibly diagonally dominant. Meanwhile, discrete and continuous-time algorithms for linear equations known to have a unique so- lution are also established from the point of view of distributed control and optimization. A variety of distributedalgorithmsarepresented,amongwhichdiscrete-timealgorithmsaregivenby[25,26,20,21,22] andcontinuous-timealgorithmsarepresentedin[3,33].Inthesenetworkdistributedalgorithms,compared with the development in parallel computing, each node state asymptotically converges to the solution to the linear equation. Such developments on network linear equations are closely related to, and sometimes evenspecialcasesof,thestudyofdistributedoptimizationmethodsonnonlinearmodels[34,27,28,13,15], due to the natural connection between solving equations and optimizing objective functions. Mostoftheexistingworkforparallelanddistributedalgorithmsassumesthatthelinearequationshave exact solutions [23, 37, 38, 16, 14, 25, 26, 20, 21, 22, 3], or can only produce least square solutions in the approximate sense or for limited graph structures [33], and very few results have been obtained on exact distributed least squares solvers for network linear equations. In this paper, distributed continuous and discrete-time algorithms that can compute the least squares solution to a linear equation over a network are presented. The particular contributions of our work are summarized as follows. • By recognizing a least-squares problem for a linear equation as a constrained optimization problem overanetwork,acontinuous-timeflowispresentedintheformoftheclassicalArrow-Hurwicz-Uzawa flow [4], for which we establish necessary and sufficient conditions regarding which graph structures can drive the network flow to agree on the least squares solution. • By Euler’s method, a discrete-time algorithm is presented and the properties of its convergence are 2 also specified and proved. • Comprehensivediscussionsontheresidualvector,analternativestateexpansionmethod,andswitch- ingnetworksareprovided.Surprisingly,itcanbeshownthatunderasufficientlyfastswitchingsignal, the proposed flow for switching networks becomes an approximate least square solver under which node states are driven to at least a small neighborhood around the least square solution, even if all graphs in the switching signal fail to do so in the absence of structure switching. ThepaperbeginsbytheformulatingthenetworklinearequationsinSection2,inadditiontoexplaining how the Arrow-Hurwicz-Uzawa flow can be used to derive a continuous-time network flow. In Section 3, a necessary and sufficient condition for the continuous-time flow to converge to the least squares solution is established, along with its proof and a couple of numerical examples. In Section 4, a discrete-time algorithm is obtained by Euler’s method and the necessary and sufficient conditions for its convergence conditions are established, and a few numerical examples are also given. In Section 5, we further discuss distributed computation of residual vector, alternative methods of obtaining the least squares solution, and a few numerical examples to study the convergence properties of the continuous-time algorithm over a switching network. Finally a few concluding remarks are given in Section 6. 2 Problem Definition 2.1 Linear Equation Consider the following linear algebraic equation with unknown y ∈ Rm: z = Hy (1) and where z ∈ RN and H ∈ RN×m are known. Denote the column space of a matrix M by colsp{M}. If z ∈ colsp{H}, then the equation (1) always has (one or many) exact solutions. If z ∈/ colsp{H}, the least squares solution is defined by the solution of the following optimization problem: min (cid:107)z−Hy(cid:107)2. (2) y∈Rm It is well known that if rank(H) = m, then (2) yields a unique solution y∗ = (H(cid:62)H)−1H(cid:62)z. 2.2 Network Denote     h(cid:62) z 1 1     h(cid:62) z   2   2 H =  . , z =  . .  .   .  . .         h(cid:62) z N N 3 We can rewrite (1) as h(cid:62)y = z , i = 1,...,N. i i Let G = (V,E) be a constant, undirected and connected graph with the set of nodes V = {1,2,...,N} and the set of edges E ⊂ V ×V. Each node i holds the equation h(cid:62)y = z and also holds a vector x (t) ∈ Rm i i i that varies as a function of time t. Note that x (t) will turn out to be part of the state of node i at time i t. Let N be the set of neighbor nodes that are connected to node i, i.e., N = {j : (i,j) ∈ E}. Define a i i diagonal matrix D = diag(|N |,|N |,...,|N |) and an incidence matrix A of the graph G by [A] = 1 if 1 2 N ij (i,j) ∈ E and [A] = 0 otherwise. Then L = D−A is the Laplacian of graph G. ij 2.3 Distributed Flows Consider a cost function U(·) : Rm×···×Rm → R N (cid:88) U(x ,...,x ) = |h(cid:62)x −z |2. (3) 1 N i i i i=1 Let x(t) = [x(cid:62)(t) ... x(cid:62)(t)](cid:62) and introduce v(t) = [v(cid:62)(t) ... v(cid:62)(t)](cid:62) with v (t) ∈ Rm for i = 1,...,N. 1 N 1 N i The vector v (t) is also held by node i, and the 2m-dimensional vector [x (t)(cid:62) v (t)(cid:62)](cid:62) represents the i i i state of node i. We consider the following flow: x˙ = −(L⊗I )v−∇U(x) m (4) v˙ = (L⊗I )x. m We term (4) an “Oscillation + Gradient Flow” because the equation x˙ = −(L⊗I )v m v˙ = (L⊗I )x m yields oscillating trajectories for x(t), while x˙ = −∇U(x) is a gradient flow. Note that in the flow (4), the state variable [x(cid:62)(t) v(cid:62)(t)](cid:62) of node i obeys the evolution i i (cid:88) x˙ (t) = − (v (t)−v (t))−(h h(cid:62)x (t)−z h ) i i j i i i i i j∈Ni (cid:88) v˙ (t) = (x (t)−x (t)) i i j j∈Ni Therefore, besides the equation h(cid:62)y = z that node i possesses, it only needs to communicate with its i i neighbors to obtain their states in order to implement (4). The flow (4) is distributed in that sense. 4 2.4 Discussions 2.4.1 Relation to A-H-U Flow Consider a constrained optimization problem as min f(x) x (5) s.t. Fx = b where f(·) : Rn → R is a differentiable function, F ∈ Rm×n and b ∈ Rm. The well-known Arrow-Hurwicz- Uzawa (A-H-U) flow introduced in [4] provides under appropriate conditions a continuous-time solver defined by x˙ = −∇ f(x)−F(cid:62)v x (6) v˙ = Fx−b. In particular, if f is strictly convex and F has full row rank, then (see [4][36]) along the flow (6), x(t) will converge to the optimal point of (5) and v(t) will converge to a Lagrangian multiplier of (5). As one can see, the flow (4) is a form of the A-H-U flow (6) with the cost function f(x) being the given U(x) and the constraint Fx = b given by (L⊗I )x = 0. However, neither the Laplacian L is a m full-rank matrix, nor is U(x) strictly convex. Therefore, the sufficiency results and analysis for the A-H-U flow established in [36] cannot be applied directly to the flow (4). 2.4.2 Relation to Wang-Elia/Gharesifard-Cort´es Flows In [35] and [12], an alternative distributed flow for solving N (cid:88) min f (x ) i i x=(x(cid:62)...x(cid:62))(cid:62) 1 N i=1 (7) s.t. x ∈ Rm,i = 1,...,N i x = ··· = x 1 N was proposed, where each f : Rm (cid:55)→ R is a locally Lipschitz and convex function known to node i only. i Viewing the component |h(cid:62)x −z |2 in U(x ,...,x ) as the f in (7), the Wang-Elia/Gharesifard-Cort´es i i i 1 N i flow [35, 12] reads as x˙ = −α(L⊗I )x−(L⊗I )v−H˜x+z m m H (8) v˙ = (L⊗I )x. m where α is a positive number, z = [z h(cid:62) ... z h(cid:62)](cid:62) and H˜ = diag(h h(cid:62),...,h h(cid:62)) since by direct H 1 1 H N 1 1 N N calculation we know ∇U(x) = H˜x−z . With fixed interaction graph and suitable choice of α, the flow H (8) is a least squares solver for (1) even for balanced directed networks (Theorem 5.4, [12]). The system (4) under consideration in the current work certainly has a simpler structure compared to (8), which is desirable in large-scale applications. Moreover, it is also useful to establish a clear understanding of the 5 convergence conditions of the system (4) under general conditions (which are currently missing in the literature),the more so considering the historical context. 3 Continuous Flow In this section, we study the behavior of the flow (4) in terms of convergence to a least squares solution for the x (t), and present necessary and sufficient conditions for convergence. i 3.1 Convergence Result We present the following result. Theorem 1 Assume that N > m and rank(H) = m. Let y∗ = (H(cid:62)H)−1H(cid:62)z be the unique least squares solution of (1). Define S as the set of all complex eigenvectors of L and for α ∈ S with α[i] denoting L L the i-th entry, (cid:2) (cid:3)(cid:62) I := {i : α[i] (cid:54)= 0, α = α[1] α[2] ... α[N] }. α Then: (i) If span{h : i ∈ I } = Rm for all α ∈ S , there holds i α L lim x (t) = y∗,i = 1,...,N i t→∞ alongtheflow(4).Furtherv(t)along(4)convergestoaLagrangemultiplierassociatedwithasolution of the optimization problem min U(x) x (9) s.t. (L⊗I )x = 0 m (ii) If there exists α ∈ S such that dim(span{h : i ∈ I }) < m, then there exist trajectories of x(t) L i α along (4) which do not converge. Proof. Recall that ∇U(x) = H˜x − z where H˜ is the nonnegative matrix diag(h h(cid:62),...,h h(cid:62)) and H 1 1 N N z = [z h(cid:62) ... z h(cid:62)](cid:62). We rewrite (4) as H 1 1 H N x˙ = −(L⊗I )v−H˜x+z m H (10) v˙ = (L⊗I )x. m Suppose there exists an equilibrium (x∗,v∗) of (10), i.e. 0 = −(L⊗I )v∗−H˜x∗+z m H (11) 0 = (L⊗I )x∗. m 6 It is worth noting that (11) specifies exactly the Karush-Kuhn-Tucker conditions on (x∗,v∗) for the optimization problem (9) [6]. Since U(x) is a convex function and the constraints in (9) are equality constraints, Slater’s condition holds [7]. Therefore the optimal points of the primal problem and dual problem are the same, i.e., x∗ is an optimal solution to (9) and any optimal solution of (9) must have the form 1⊗y∗, where y∗ is a least squares solution to (1). We know y∗ is unique because rank(H) = m. Since x∗ = 1⊗y∗, then x∗ is also unique. Note however that v∗ is not necessarily unique. Define the variables xˆ = x−x∗, vˆ = v−v∗. Then xˆ˙ = x˙ −x˙∗ = −(L⊗I )vˆ −H˜xˆ m (12) vˆ˙ = (L⊗I )xˆ. m Denote uˆ(t) = [xˆ(t)(cid:62) vˆ(t)(cid:62)](cid:62) and   −H˜ −L⊗I m M =  . L⊗I 0 m Then (12) is a linear system with the form uˆ˙ = Muˆ. Consider the following Lyapunov function: 1 1 V(xˆ,vˆ) = (cid:107)uˆ(cid:107)2 = ((cid:107)xˆ(cid:107)2+(cid:107)vˆ(cid:107)2). 2 2 Since V˙ = −xˆ(cid:62)(L⊗I )vˆ −xˆ(cid:62)H˜x+vˆ(cid:62)(L⊗I )xˆ m m (13) = −xˆ(cid:62)H˜xˆ ≤ 0, uˆ(t) is bounded for any finite initial values xˆ(0), vˆ(0), namely uˆ(0). Therefore, we conclude: C1. (cid:60)(λ) ≤ 0 for all λ ∈ σ(M). C2. If (cid:60)(λ) = 0, then λ has equal algebraic and geometric multiplicity. (i). Suppose span{h : i ∈ I } = Rm for all α ∈ S . We proceed to prove the convergence of xˆ(t) and i α L vˆ(t). The proof contains two steps. Step 1. We prove M does not have a purely imaginary eigenvalue if span{h : i ∈ I } = Rm for all i α α ∈ S , using a contradiction argument. Thus suppose λ = ır (cid:54)= 0 where r ∈ R is an eigenvalue of M with L a corresponding eigenvector β = [β(cid:62) β(cid:62)] ∈ C2Nm, where β ∈ CNm, β ∈ CNm. Let uˆ(0) = β. Then a b a b uˆ(t) = eMtuˆ(0) = eırtuˆ(0). Therefore, (cid:107)uˆ(t)(cid:107)2 = (cid:107)uˆ(0)(cid:107)2 for all t. On the other hand, according to (13), d 1 ( (cid:107)uˆ(t)(cid:107)2) = −xˆ(cid:62)(t)H˜xˆ(t) dt 2 = −xˆ(cid:62)(0)eırtH˜eırtxˆ(0) = −eı2rtβ(cid:62)H˜β . a a 7 Consequently, there must hold H˜β = 0. Next, based on a      −H˜ −L⊗I β β m a a    = ır , L⊗I 0 β β m b b we know −(L⊗I )β = ırβ m b a (14) (L⊗I )β = ırβ . m a b Since β (cid:54)= 0, neither of β nor β can be zero. By simple calculation, we have a b (L⊗I )2β = r2β m a a (15) (L⊗I )2β = r2β , m b b i.e., β and β are both eigenvectors of (L⊗I )2 corresponding to r2. From (15), we know a b m (L2⊗I )β = r2β . m a a Based on the properties for eigenvectors of the Kronecker product of two matrices (Theorem 13.12 [19]), weknowthereexist(r2,α )andη suchthatL2α = r2α andβ = α ⊗η withα ∈ CN andη ∈ Cm. a a a a a a a a a It is trivial that if L2α = r2α , Lα = |r|α , i.e., α is an eigenvector of L corresponding to eigenvalue a a a a a |r|. Denote   [1] β a   β[2] βa =  ..a , βa[i] ∈ Cm, i = 1,2,...,N .     [N] β a and   η [1] a   η [2] ηa =  a. , ηa[i] ∈ C, i = 1,2,...,N.  .  .     η [N] a [i] [i] It is apparent that β = α [i]η if i ∈ I and β = 0 otherwise. Then noting that a a a αa a   α [1]h h(cid:62)η a 1 1 a    α [2]h h(cid:62)η  H˜βa =  a .2 2 a  = 0,  .  .     α [N]h h(cid:62)η a N N a we get α [i]h h(cid:62)η = 0 for i = 1,2,...,N, which implies that a i i a h(cid:62)η = 0, i ∈ I . (16) i a αa Because span{h : i ∈ I } = Rm, there must hold η = 0. In turn, β must be zero, leading to β = 0 i αa a a b with (14). Therefore M does not have purely imaginary eigenvalues. 8 Based on C1, C2 and the fact that M has no purely imaginary eigenvalue, it follows that xˆ(t) and vˆ(t) converge. Step 2. In this step, we establish the limits of xˆ(t) and vˆ(t) by studying the zero eigenspace of M, thereby obtaining the convergence property for x(t) and v(t). Suppose δ = [δ(cid:62) δ(cid:62)](cid:62) is one of the eigenvectors of a b M corresponding to zero eigenvalue with δ ∈ R2Nm and δ , δ ∈ RNm, i.e., Mδ = 0. Consider a solution a b uˆ(t) of (12) with uˆ(0) = δ. We see from the derivative of the Lyapunov function and Mδ = 0 that H˜δ = 0 a (L⊗I )δ = 0 m a (L⊗I )δ = 0 m b Then there exist η ∈ Rm and η ∈ Rm such that δ = 1 ⊗ η and δ = 1 ⊗ η . Since H˜δ = 0 and a b a a b b a rank(H˜) = m,δ = 0,i.e.,δ mustbeintheformδ = [0δ(cid:62)](cid:62) withδ = 1⊗η .Notethatthealgebraicand a b b b geometric multiplicity of the zero eigenvalue of M is m. Now we decompose M into its Jordan canonical form M = TJT−1: T = [δ δ ···δ ···], 1 2 m T−1 = [δ(cid:48) δ(cid:48) ··· δ(cid:48) ···](cid:62) 1 2 m where δ and δ(cid:48)(cid:62) with i = 1,2,...,m are mutually orthogonal right and left eigenvectors respectively of i i M all corresponding to zero eigenvalues and all with the form of δ = [0 δ(cid:62)](cid:62) and δ(cid:48)(cid:62) = [0 δ(cid:48)(cid:62)]. Then i ib i ib m (cid:88) lim uˆ(t) = δ δ(cid:48)(cid:62)uˆ(0), i i t→∞ i=1 which implies that lim xˆ(t) = 0, t→∞ lim vˆ(t) = W vˆ(0) G t→∞ with W = (cid:80)m δ δ(cid:48)(cid:62). Thus we can conclude that x(t) converges to x∗ = 1⊗y∗ while v(t) converges G i=1 ib ib to a constant associated with the initial value v(0), which can be obtained by lim v(t) = v∗+ lim vˆ(t) t→∞ t→∞ = v∗+Wvˆ(0) = v∗+W(v(0)−v∗) = (I −W)v∗+Wv(0). Nm Recall that v∗ was chosen satisfying (11). This completes the proof of (i). (ii). Suppose there exists α ∈ S with Lα = rα such that dim(span{h : i ∈ I }) < m. Then there a L a a i αa must exist η (cid:54)= 0 satisfying that a h(cid:62)η = 0, i ∈ I i a αa 9 Let β = [β(cid:62) β(cid:62)](cid:62) with β = α ⊗η and β = (L⊗Im)βa. It is easy to check that a b a a a b ır    −H˜ −L⊗I β m a Mβ =    L⊗I 0 β m b (17)     −(L⊗I )β β m b a =   = ır . (L⊗I )β β m a b Therefore, M has a purely imaginary eigenvalue. Hence, x(t) and v(t) do not converge for generic initial conditions. We have now completed the proof of Theorem 1. (cid:3) 3.2 Generic Feasibility of H Proposition 1 Let L be the Laplacian of a graph G with |I | ≥ m for all α ∈ S . Then the convergence α L condition of Theorem 1, viz. span{h : i ∈ I } = Rm, is satisfied for generic H ∈ RN×m, i.e., there does i α not exist an open nonempty subset in RN×m of H, for which the convergence condition in Theorem 1 is not satisfied. Proof. Let   h i1  .  Hi1,...,im =  .. , 1 ≤ i1 < ··· < im ≤ N.   h im Introduce (cid:91) W = W i1,...,im i1,...,im: 1≤i1<···<im≤N where W = {H ∈ RN×m : det(H ) = 0}. i1,...,im i1,...,im It can be noted that W is the set of H for which the convergence condition in Theorem 1 is not satisfied. Since W (cid:54)= RN×m, by identity theorem for holomorphic functions [1], W does not contain i1,...,im i1,...,im a nonempty open subset of RN×m, i.e., W does not contain a nonempty open subset of RN×m. This completes the proof. (cid:3) 3.3 Graph Feasibility In this section, we consider several fundamental graphs to investigate the feasibility of the convergence condition presented in Theorem 1. Suppose N > 2. For a number of graphs we will first determine the minimum value of |I |. The collection of values and the implications for solvability of the least squares α problem will be interpreted for all the graphs at the end of the calculations. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.