ebook img

An Extragradient-Based Alternating Direction Method for Convex Minimization PDF

0.33 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview An Extragradient-Based Alternating Direction Method for Convex Minimization

An Extragradient-Based Alternating Direction Method for Convex Minimization 3 1 0 Shiqian MA ∗ Shuzhong ZHANG † 2 n a January 26, 2013 J 7 2 ] C Abstract O . In this paper, we consider the problem of minimizing the sum of two convex functions h t subject to linear linking constraints. The classical alternating direction type methods usually a m assume that the two convex functions have relatively easy proximal mappings. However, many [ problems arising from statistics, image processing and other fields have the structure that only 1 one of the two functions has easy proximalmapping, and the other one is smoothly convex but v doesnothaveaneasyproximalmapping. Therefore,the classicalalternatingdirectionmethods 8 cannot be applied. For solving this kind of problems, we propose in this paper an alternating 0 3 directionmethodbasedonextragradients. Undertheassumptionthatthesmoothfunctionhasa 6 Lipschitzcontinuousgradient,weprovethattheproposedmethodreturnsanǫ-optimalsolution . 1 within O(1/ǫ) iterations. We test the performance of different variants of the proposedmethod 0 3 through solving the basis pursuit problem arising from compressed sensing. We then apply the 1 proposedmethodtosolveanewstatisticalmodelcalledfusedlogisticregression. Ournumerical : v experimentsshowthattheproposedmethodperformsverywellwhensolvingthetestproblems. i X r a Keywords: AlternatingDirectionMethod;Extragradient;IterationComplexity;BasisPursuit; Fused Logistic Regression Mathematics Subject Classification 2010: 90C25, 68Q25,62J05 ∗DepartmentofSystemsEngineeringandEngineeringManagement,TheChineseUniversityofHongKong,Shatin, N. T., Hong Kong. Email: [email protected] †Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN 55455. Email: [email protected]. Research of this author was supported in part by theNSFGrant CMMI-1161242. 1 1 Introduction In this paper, we consider solving the following convex optimization problem: minx∈Rn,y∈Rp f(x)+g(y) s.t. Ax+By = b (1.1) x ∈ X,y ∈ Y, where f and g are convex functions, A ∈ Rm×n, B ∈ Rm×p, b ∈ Rm, X and Y are convex sets and the projections on them can be easily obtained. Problems in the form of (1.1) arise in different applications in practice and we will show some examples later. A recently very popular way to solve (1.1) is to apply the alternating direction method of multipliers (ADMM). A typical iteration of ADMM for solving (1.1) can be described as xk+1 := argmin L (x,yk;λk) x∈X γ yk+1 := argmin L (xk+1,y;λk) (1.2)  y∈Y γ  λk+1 := λk −γ(Axk+1+Byk+1−b), where the augmented Lagrangian function Lγ(x,y;λ) for (1.1) is defined as γ 2 L (x,y;λ) := f(x)+g(y)−hλ,Ax+By−bi+ kAx+By−bk , (1.3) γ 2 where λ is the Lagrange multiplier associated with the linear constraint Ax+By = b and γ > 0 is a penalty parameter. The ADMM is closely related to some operator splitting methods such as Douglas-Rachford operator splitting method [7]and Peaceman-Rachford operator splitting method [29] for finding the zero of the sum of two maximal monotone operators. In particular, it was shown by Gabay [12] that ADMM (1.2) is equivalent to applying the Douglas-Rachford operator splitting method to the dual problem of (1.1). The ADMM and operator splitting methods were then studied extensively in the literature and some generalized variants were proposed (see, e.g., [20, 10, 14, 8, 9]). The ADMM was revisited recently because it was found very efficient for solving many sparse and low-rank optimization problems, such as compressed sensing [38], compressive imaging [35, 15], robust PCA [32], sparse inverse covariance selection [41, 30], sparse PCA [22] and semidefinite programming [36] etc. Moreover, the iteration complexity of ADMM (1.2) was recentlyestablishedbyHeandYuan[17]andMonteiroandSvaiter[24]. Therecentsurveypaperby Boyd et al. [3] listed many interesting applications of ADMM in statistical learning and distributed optimization. Note that the efficiency of ADMM (1.2) actually depends on whether the two subproblems in (1.2) can be solved efficiently or not. This requires that the following two problems can be solved efficiently for given τ > 0, w ∈ Rn and z ∈ Rp: 1 2 x := argmin f(x)+ kAx−wk (1.4) x∈X 2τ 2 and 1 2 y := argmin g(y)+ kBy−zk . (1.5) y∈Y 2τ When X and Y are the whole spaces and A and B are identity matrices, (1.4) and (1.5) are known as the proximal mappings of functions f and g, respectively. Thus, in this case, ADMM (1.2) requires that the proximal mappings of f and g are easy to be obtained. In the cases that A and B are not identity matrices, there are results on linearized ADMM (see, e.g., [38, 37, 42]) which try to linearize the quadratic penalty term in such a way that problems (1.4) and (1.5) still correspond to the proximal mappings of functions f and g. The global convergence of the linearized ADMM is guaranteed under certain conditions on a linearization step size parameter. There are two interesting problems that are readily solved by ADMM (1.2) since the involved functions have easy proximal mappings. One problem is the so-called robust principal component pursuit (RPCP) problem: min kXk∗ +ρkYk1, s.t., X +Y = M, (1.6) X∈Rm×n,Y∈Rm×n where ρ > 0 is a weighting parameter, M ∈ Rm×n is a given matrix, kXk is the nuclear norm of ∗ X, which is defined as the sum of singular values of X, and kYk1 := i,j|Yij| is the ℓ1 norm of Y. Problem (1.6) was studied by Cand`es et al. [4] and Chandrasekaran et al. [5] as a convex relaxation P of the robust PCA problem. Note that the two involved functions, the nuclear norm kXk and the ∗ ℓ1 norm kYk1, have easy proximal mappings (see, e.g., [23] and [16]). The other problem is the so-called sparse inverse covariance selection, which is also known as the graphical lasso problem [40, 1, 11]. This problem, which estimates a sparse inverse covariance matrix from sample data, can be formulated as min −logdet(X)+hΣˆ,Xi+ρkXk1, (1.7) X∈Rn×n where the first convex function −logdet(X) + hΣˆ,Xi is the negative log-likelihood function for given sample covariance matrix Σˆ, and the second convex function ρkXk1 is used to promote the sparsity of the resulting solution. Problem (1.7) is of the form of (1.1) because it can be rewritten equivalently as min −logdet(X)+hΣˆ,Xi+ρkYk1, s.t., X −Y = 0. (1.8) X∈Rn×n,Y∈Rn×n Note that the involved function −logdet(X) has an easy proximal mapping (see, e.g., [30] and [41]). However, there are many problems arising from statistics, machine learning and image processing which do not have easy subproblems(1.4) and (1.5) even when A and B are identity matrices. One such example is the so-called sparse logistic regression problem. For given training set {a ,b }m i i i=1 3 where a1,a2,...,am are the m samples and b1,...,bm with bi ∈ {−1,+1},i = 1,...,m are the m binary class labels. The likelihood function for these m samples is Prob(b | a ), where i=1 i i 1 Q Prob(b |a) := , 1+exp(−b(a⊤x+c)) is the conditional probability of the label b condition on sample a, where x ∈ Rn is the weight vector and c ∈ R is the intercept, and a⊤x + c = 0 defines a hyperplane in the feature space, on which Prob(b | a) = 0.5. Besides, Prob(b | a) > 0.5 if a⊤x + c has the same sign as b, and Prob(b | a) < 0.5 otherwise. The sparse logistic regression (see [21]) can be formulated as the following convex optimization problem min ℓ(x,c)+αkxk1, (1.9) x,c where ℓ(x,c) denotes the average logistic loss function, which is defined as m m 1 1 ℓ(x,c) := − Prob(b | a ) = log(1+exp(−b (a⊤x+c))), m i i m i i i=1 i=1 Y X and the ℓ1 norm kxk1 is imposed to promote the sparsity of the weight vector x. If one wants to apply ADMM (1.2) to solve (1.9), one has to introduce a new variable y and rewrite (1.9) as minx,c,y ℓ(x,c)+αkyk1, (1.10) s.t. x−y = 0. When ADMM (1.2) is applied to solve (1.10), although the subproblem with respect to y is easily solvable (an ℓ1 shrinkage operation), the subproblem with respect to (x,c) is difficult to solve because the proximal mapping of the logistic loss function ℓ(x,c) is not easily computable. Another example is the following fused logistic regression problem: n min ℓ(x,c)+αkxk1 +β |xj −xj−1|. (1.11) x,c j=2 X This problem cannot be solved by ADMM (1.2), again because of the difficulty of computing the proximal mapping of ℓ(x,c). We will discuss this example in more details in Section 5. But is it really crucial to compute the proximal mapping exactly in the ADMM scheme? After all, ADMM can be viewed as an approximate dual gradient ascent method. As such, computing the proximal mapping exactly is in some sense redundant, since on the dual side the iterates are updated based on the gradient ascent method. Without sacrificing the scale of approximation to optimality, an update on the primal side based on the gradient information (or at least part of it), by the principle of primal-dual symmetry, is entirely appropriate. Our subsequent analysis indeed confirms this belief. 4 Our contribution. In this paper, we propose a new alternating direction method for solving (1.1). Thisnew method requires only one of the functions in theobjective to have an easy proximal mapping, and the other involved function is merely required to be smooth. Note that the afore- mentioned examples, namely sparse logistic regression (1.9) and fused logistic regression (1.11), are both of this type. In each iteration, the proposed method involves only computing the proximal mappingforonefunctionandcomputingthegradient fortheotherfunction. Undertheassumption that the smooth function has a Lipschitz continuous gradient, we prove that the proposed method finds an ǫ-optimal solution to (1.1) within O(1/ǫ) iterations. We then compare the performance of some variants of the proposed method using basis pursuit problem from compressed sensing. We also discuss in details the fused logistic regression problem and show that our method can solve this problem effectively. 2 An Alternating Direction Method Based on Extragradient In this section, we consider solving (1.1) where f has an easy proximal mapping, while g is smooth but does not have an easy proximal mapping. Note that in this case, ADMM (1.2) cannot be applied to solve (1.1) as the solution for the second subproblem in (1.2) is not available. We propose the following extragradient-based alternating direction method (EGADM) to solve Problem (1.1). Starting with any initial point y0 ∈ Y and λ0 ∈ Rm, a typical iteration of EGADM can be described as: xk+1 := argmin L (x,yk;λk)+ 1kx−xkk2 x∈X γ 2 H y¯k+1 := [yk −γ∇ L(xk+1,yk;λk)]  y Y  λ¯k+1 := λk −γ(Axk+1+Byk −b) (2.1)     yk+1 := [yk −γ∇ L(xk+1,y¯k+1;λ¯k+1)] y Y λk+1 := λk −γ(Axk+1+By¯k+1−b),       where [y] denotes the projection of y onto Y, L (x,y;λ) is the augmented Lagrangian function Y γ for (1.1) as defined in (1.3), H is a pre-specified positive semidefinite matrix, and L(x,y;λ) is the Lagrangian function for (1.1), which is defined as L(x,y;λ) := f(x)+g(y)−hλ,Ax+By−bi. (2.2) Note that the first subproblem in (2.1) is to minimize the augmented Lagrangian function plus a proximal term 1kx−xkk2 with respect to x, i.e., 2 H γ 1 xk+1 := argmin f(x)−hλk,Ax+Byk −bi+ kAx+Byk −bk2+ kx−xkk2 . (2.3) x∈X 2 2 H 5 In sparseand low-rank optimization problems, the proximal term 1kx−xkk2 is usually imposed to 2 H cancel the effect of matrix A in the quadratic penalty term. One typical choice of H is H = 0 when A is identity, and H = τI−γA⊤A when A is not identity, where τ > γλmax(A⊤A) and λmax(A⊤A) denotes the largest eigenvalue of A⊤A. We assume that (2.3) is relatively easy to solve. Basically, when A is an identity matrix, and f is a function arising from sparse optimization such as ℓ1 norm, ℓ2 norm, nuclear norm etc, the subproblem (2.3) is usually easy to solve. Becausewedonotimposeanystructureonthesmoothconvex functiong, thefollowing subproblem is not assumed to be easily solvable to optimality: min L (x,y;λ) (2.4) γ y∈Y for given x and λ. Thus, in the extragradient-based ADM (2.1), we do not solve (2.4). Instead, we take gradient projection steps for Lagrangian function L(x,y;λ) for fixed x and λ to update y. Note that since the gradient of L(x,y;λ) with respect to λ is given by ∇ L(x,y;λ) = −(Ax+By−b), (2.5) λ the two updating steps for λ in (2.1) can be interpreted as λ¯k+1 := λk +γ∇ L(xk+1,yk;λk) (2.6) λ and λk+1 := λk +γ∇ L(xk+1,y¯k+1;λ¯k+1). (2.7) λ Hence, by defining y y¯ ∇ L(x,y;λ) z := ,z¯:= ,F(x,z) = y ,Z := Y ×Rm, λ! λ¯! −∇λL(x,y;λ)! we can rewrite (2.1) equivalently as xk+1 := argmin L (x,yk;λk)+ 1kx−xkk2 x∈X γ 2 H z¯k+1 := [zk −γF(xk+1,zk)] (2.8)  Z  zk+1 := [zk −γF(xk+1,z¯k+1)] . Z   Now it is easy to see that the two steps for updating z in (2.8) are gradient projection steps for the Lagrangian function L(xk+1,y;λ) for (yk,λk) and (y¯k+1,λ¯k+1) respectively. The steps for y are gradient descent steps and the steps for λ are gradient ascent steps, because the original problem (1.1) can be rewritten as a minimax problem: min max f(x)+g(y)−hλ,Ax+By−bi, (2.9) x∈X,y∈Y λ 6 which minimizes the Lagrangian function with respect to x and y but maximizes the Lagrangian function with respect to λ. Since there are two gradient steps for updating z, the second gradient step can be seen as an extragradient step. The idea of extragradient is not new. In fact, the extra- gradient method as we know it was originally proposed by Korpelevich for variational inequalities and for solving saddle-point problems [18, 19]. Korpelevich proved theconvergence of theextragra- dient method [18, 19]. For recent results on convergence of extragradient type methods, we refer to [28] and the references therein. The iteration complexity result for extragradient method was analyzed by Nemirovski in [27]. Recently, Monteiro and Svaiter [25, 26, 24] studied the iteration complexity results of the hybrid proximal extragradient method proposed by Solodov and Svaiter in [31] and its variants. More recently, Bonettini and Ruggiero studied a generalized extragradient method for total variation based image restoration problem [2]. 3 Iteration Complexity In this section, we analyze the iteration complexity of EGADM, i.e., (2.1). We show that under the assumption that the smooth function g has a Lipschitz continuous gradient, EGADM (2.1) finds an ǫ-optimal solution (defined in Definition 3.1) to Problem (1.1) within O(1/ǫ) iterations. The Lagrangian dual problem of (1.1) is maxd(λ), (3.1) λ where d(λ) = min L(x,y;λ). x∈X,y∈Y The ǫ-optimal solution to Problem (1.1) is thus defined as follows. Definition 3.1 We call (xˆ,yˆ) ∈ X ×Y and λˆ ∈ Rm a pair of ǫ-optimal solution to Problem (1.1), if the following holds, max L(xˆ,yˆ;λ)−L(x,y;λˆ) ≤ ǫ, (3.2) x∈X⋆,y∈Y⋆,λ∈Λ⋆ (cid:16) (cid:17) where X⋆×Y⋆ is the optimal set of the primal problem (1.1) and Λ⋆ is the optimal set of the dual problem (3.1). The ǫ-optimal solution defined in Definition 3.1 measures the closeness of the optimal solution to theoptimal set interms of the duality gap. Thisis validated by thefollowing saddlepointtheorem. Theorem 3.1 (Saddle-Point Theorem) The pair (x⋆,y⋆;λ⋆) with (x⋆,y⋆) ∈ X ×Y and λ⋆ ∈ Rm is a primal-dual optimal solution pair to (1.1) if and only if (x⋆,y⋆) and λ⋆ satisfy L(x⋆,y⋆;λ) ≤ L(x⋆,y⋆;λ⋆) ≤ L(x,y;λ⋆), for all x ∈ X,y ∈ Y,λ ∈Rm, 7 i.e., (x⋆,y⋆;λ⋆) is a saddle point of the Lagrangian function L(x,y;λ). It is well known that the weak duality holds for the primal problem (1.1) and the dual problem (3.1), i.e., d(λ) ≤ f(x)+g(y) for any feasible solutions (x,y) and λ. In our particular case, the Lagrangian dualvariableλisassociated withasetoflinear equality constraints. Thestrongduality holds if the primal problem has an optimal solution. Furthermore, we assume that the optimal set (X⋆,Y⋆) to the primal problem (1.1) and the optimal set Λ⋆ to the dual problem (3.1) are both bounded. This assumption indeed holds for a wide variety of problem classes (e.g. when the primal objective function is coercive and continuously differentiable). Now we are ready to analyze the iteration complexity of EGADM (2.8), or equivalently, (2.1), for an ǫ-optimal solution in the sense of Definition 3.1. We will prove the following lemma first. Lemma 3.2 The sequence {xk+1,zk,z¯k} generated by (2.8) satisfies the following inequality: hγF(xk+1,z¯k+1),z¯k+1−zk+1i− 1kzk −zk+1k2 2 (3.3) ≤ γ2kF(xk+1,z¯k+1)−F(xk+1,zk)k2− 1kz¯k+1−zkk2− 1kz¯k+1−zk+1k2. 2 2 Proof. Note that the optimality conditions of the two subproblems for z in (2.8) are given by hzk −γF(xk+1,zk)−z¯k+1,z−z¯k+1i ≤ 0, ∀z ∈ Z, (3.4) and hzk −γF(xk+1,z¯k+1)−zk+1,z−zk+1i≤ 0, ∀z ∈ Z. (3.5) Letting z = zk+1 in (3.4) and z = z¯k+1 in (3.5), and then summing the two resulting inequalities, we get kzk+1−z¯k+1k2 ≤ γhF(xk+1,zk)−F(xk+1,z¯k+1),zk+1−z¯k+1i, (3.6) which implies kzk+1−z¯k+1k ≤ γkF(xk+1,zk)−F(xk+1,z¯k+1)k. (3.7) Now we are able to prove (3.3). We have, hγF(xk+1,z¯k+1),z¯k+1−zk+1i− 1kzk −zk+1k2 2 = γhF(xk+1,z¯k+1)−F(xk+1,zk),z¯k+1−zk+1i +γhF(xk+1,zk),z¯k+1−zk+1i− 1kzk −zk+1k2 2 ≤ γhF(xk+1,z¯k+1)−F(xk+1,zk),z¯k+1−zk+1i +hzk −z¯k+1,z¯k+1−zk+1i− 1kzk −zk+1k2 2 (3.8) = γhF(xk+1,z¯k+1)−F(xk+1,zk),z¯k+1−zk+1i −1kzkk2+hzk,z¯k+1i+hz¯k+1,zk+1−z¯k+1i− 1kzk+1k2 2 2 ≤ γkF(xk+1,z¯k+1)−F(xk+1,zk)k·kz¯k+1−zk+1k −1kzkk2+hzk,z¯k+1i− 1kz¯k+1k2 + −1kz¯k+1k2 +hz¯k+1,zk+1i− 1kzk+1k2 2 2 2 2 ≤ γ2kF(xk+1,z¯k+1)−F(xk+1,zk)k2 − 1kz¯k+1−zkk2− 1kz¯k+1−zk+1k2, (cid:0) (cid:1) 2(cid:0) 2 (cid:1) 8 where the first inequality is obtained by letting z = zk+1 in (3.4) and the last inequality follows from (3.7). This completes the proof. (cid:3) We next prove the following lemma. Lemma 3.3 Assume that ∇g(y) is Lipschitz continuous with Lipschitz constant L , i.e., g k∇g(y1)−∇g(y2)k ≤Lgky1−y2k, ∀y1,y2 ∈Y. (3.9) 1 By letting γ ≤ 1/(2Lˆ), where Lˆ := max{2L2g +λmax(B⊤B),2λmax(B⊤B)} 2, the following in- equality holds, (cid:0) (cid:1) 1 hγF(xk+1,z¯k+1),z¯k+1−zk+1i− kzk −zk+1k2 ≤ 0. (3.10) 2 Proof. For any z1 ∈ Z and z2 ∈ Z, we have, kF(xk+1,z1)−F(xk+1,z2)k2 2 (∇g(y1)−B⊤λ1)−(∇g(y2)−B⊤λ2) = (cid:13) (Axk+1+By1−b)−(Axk+1+By2−b)!(cid:13) (cid:13) (cid:13) = k(cid:13)(cid:13)(∇g(y1)−∇g(y2))−B⊤(λ1−λ2)k2 +kB(y1(cid:13)(cid:13)−y2)k2 ≤ 2(cid:13)k∇g(y1)−∇g(y2)k2+2kB⊤(λ1−λ2)k2+kB(cid:13)(y1−y2)k2 ≤ 2L2gky1−y2k2+2λmax(B⊤B)kλ1−λ2k2+λmax(B⊤B)ky1−y2k2 2 ≤ max{2L2g +λmax(B⊤B),2λmax(B⊤B)} y1−y2 (cid:13) λ1−λ2!(cid:13) (cid:13) (cid:13) = Lˆ2kz1 −z2k2, (cid:13)(cid:13) (cid:13)(cid:13) (cid:13) (cid:13) where the second inequality is due to (3.9) and the last equality is from the definition of Lˆ. Thus, we know that F(xk+1,z) is Lipschitz continuous with Lipschitz constant Lˆ. Since γ ≤ 1/(2Lˆ), we have the following inequality, γ2kF(xk+1,z¯k+1)−F(xk+1,zk)k2− 1kz¯k+1−zkk2− 1kz¯k+1−zk+1k2 2 2 ≤ γ2kF(xk+1,z¯k+1)−F(xk+1,zk)k2− 1kz¯k+1−zkk2 2 ≤ (γ2Lˆ2− 1)kz¯k+1 −zkk2 2 ≤ 0, which combining with (3.3) yields (3.10). (cid:3) We further prove the following lemma. Lemma 3.4 Under the same assumptions as in Lemma 3.3, the following holds: 1 1 kz−zk+1k2 − kz−zkk2 ≤ hγF(xk+1,z¯k+1),z−z¯k+1i. (3.11) 2 2 9 Proof. Adding 1 hγF(xk+1,z¯k+1),z −zk+1i− kzk+1−zkk2 2 to both sides of (3.5), we get, 1 1 hzk −zk+1,z−zk+1i− kzk+1−zkk2 ≤ hγF(xk+1,z¯k+1),z−zk+1i− kzk+1−zkk2. (3.12) 2 2 Notice that the left hand side of (3.12) is equal to 1kz−zk+1k2− 1kz−zkk2. Thus we have, 2 2 1kz−zk+1k2− 1kz−zkk2 2 2 ≤ hγF(xk+1,z¯k+1),z−zk+1i− 1kzk+1−zkk2 2 (3.13) = hγF(xk+1,z¯k+1),z−z¯k+1i+hγF(xk+1,z¯k+1),z¯k+1−zk+1i− 1kzk+1−zkk2 2 ≤ hγF(xk+1,z¯k+1),z−z¯k+1i, where the last inequality is due to (3.10). (cid:3) We now give the O(1/ǫ) iteration complexity of (2.1) (or equivalently, (2.8)) for an ǫ-optimal solution to Problem (1.1). Theorem 3.5 Consider Algorithm EGADM (2.1), and its sequence of iterates. For any integer N > 0, define N N N 1 1 1 x˜N := xk+1, y˜N := y¯k+1, λ˜N := λ¯k+1. N N N k=1 k=1 k=1 X X X Suppose that the optimal solution sets of the primal problem (1.1) and the dual problem (3.1) are both bounded. Moreover, assume that ∇g(y) is Lipschitz continuous with Lipschitz constant L , g 1 and we choose γ ≤ 1/(2Lˆ), where Lˆ := max{2L2g +λmax(B⊤B),2λmax(B⊤B)} 2. Moreover, we choose H := 0 if A is an identity matrix, and H := τI − γA⊤A when A is not identity, where (cid:0) (cid:1) τ > γλmax(A⊤A). We have the following inequalities: 1 1 max L(x˜N,y˜N;λ)−L(x,y;λ˜N) ≤ maxkz−z0k2+ maxkx−x0k2, x∈X⋆,y∈Y⋆,λ∈Λ⋆ 2γN z∈Z⋆ 2N x∈X⋆ (cid:16) (cid:17) where Z⋆ := Y⋆ ×Λ⋆. This implies that when N = O(1/ǫ), {x˜N,y˜N,λ˜N} is an ǫ-optimal solution to Problem (1.1); i.e., the iteration complexity of (2.1) (or equivalently, (2.8)) for an ǫ-optimal solution to Problem (1.1) is O(1/ǫ). Proof. The optimality conditions of the subproblem for x in (2.1) are given by h∂f(xk+1)−A⊤λk +γA⊤(Axk+1+Byk −b)+H(xk+1−xk),x−xk+1i ≥ 0, ∀x∈ X. (3.14) By using the updating formula for λ¯k+1 in (2.1), i.e., λ¯k+1 := λk +γ∇ L(xk+1,yk;λk) = λk −γ(Axk+1+Byk −b), λ 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.