ebook img

Convex analysis and optimization: Solutions PDF

191 Pages·2005·0.903 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Convex analysis and optimization: Solutions

Corrections for the book CONVEX ANALYSIS AND OPTI- MIZATION, Athena Scientific, 2003, by Dimitri P. Bertsekas Last Changed: 5/3/04 p. 3 (+22) Change “as the union of the closures of all line segments” to “as the closure of the union of all line segments” p. 37 (-2) Change “Every x” to “Every x(cid:1)=0” p. 38 (+1) Change “Every x in” to “Every x∈/ X that belongs to” p. 38 (+19) Change “i.e.,” to “with x ,...,x ∈(cid:3)n and m≥2, i.e.,” 1 m p. 63 (+4, +6, +7, +19) Change four times “c(cid:1)y” to “a(cid:1)y” p. 67 (+3) Change “y ∈AC” to “y ∈AC” p. 70 (+9) Change “[BeN02]” to “[NeB02]” p. 110 (+3 after the figure caption) Change “... does not belong to the interior of C” to “... does not belong to the interior of C and hence does not belong to the interior of cl(C) [cf. Prop. 1.4.3(b)]” (cid:1) (cid:2) (cid:1) (cid:2) p. 148 (-8) Change “ x|r(x)≤γ ” to “ z |r(z)≤γ ” p. 213 (-6) Change “remaining vectors v , j (cid:1)= i.” to “vectors v with j j v (cid:1)=v .” j i p. 219 (+3) Change “f :C (cid:6)→(cid:3)” to “f :(cid:3)n (cid:6)→(cid:3)” i i p. 265 (+10) Change “d/(cid:8)d(cid:8)” to “−d/(cid:8)d(cid:8)” p. 268 (-3) Change “j ∈A(x∗)” to “j ∈/ A(x∗)” p. 338 (+17) Change “Section 5.2” to “Section 5.3” p. 384 (+6) Change “convex, possibly nonsmooth functions” to “smooth functions, and convex (possibly nonsmooth) functions” p. 446 (+6 and +8) Interchange “... constrained problem (7.16)” and “... penalized problem (7.19)” p. 458 (+13) Change “... as well real-valued” to “... as well as real- valued” p. 458 (-10) Change “We will focus on this ... dual functions.” to “In thiscase,thedualproblemcanbesolvedusinggradient-likealgorithmsfor differentiable optimization (see e.g., Bertsekas [Ber99a]).” 1 Convex Analysis and Optimization Chapter 1 Solutions Dimitri P. Bertsekas with Angelia Nedi´c and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com LAST UPDATE March 24, 2004 CHAPTER 1: SOLUTION MANUAL 1.1 Assume that C is convex. Then, clearly (λ +λ )C ⊂ λ C+λ C; this is true 1 2 1 2 even if C is not convex. To show the reverse inclusion, note that a vector x in λ C+λ C is of the form x = λ x +λ x , where x ,x ∈ C. By convexity of 1 2 1 1 2 2 1 2 C, we have λ λ 1 x + 2 x ∈C, λ +λ 1 λ +λ 2 1 2 1 2 and it follows that x=λ x +λ x ∈(λ +λ )C. 1 1 2 2 1 2 Hence λ C+λ C ⊂(λ +λ )C. 1 2 1 2 ForacounterexamplewhenC isnotconvex,letC beasetin(cid:3)n consisting oftwovectors,0andx(cid:4)=0,andletλ =λ =1. ThenevidentlyC isnotconvex, 1 2 and (λ +λ )C =2C ={0,2x} while λ C+λ C =C+C ={0,x,2x}, showing 1 2 1 2 that (λ +λ )C (cid:4)=λ C+λ C. 1 2 1 2 1.2 (Properties of Cones) (a) Let x∈∩i∈ICi and let α be a positive scalar. Since x∈Ci for all i∈I and eachCi isacone,thevectorαxbelongstoCi foralli∈I. Hence,αx∈∩i∈ICi, showing that ∩i∈ICi is a cone. (b) Let x∈C ×C and let α be a positive scalar. Then x=(x ,x ) for some 1 2 1 2 x ∈ C and x ∈ C , and since C and C are cones, it follows that αx ∈ C 1 1 2 2 1 2 1 1 and αx ∈ C . Hence, αx = (αx ,αx ) ∈ C ×C , showing that C ×C is a 2 2 1 2 1 2 1 2 cone. (c) Let x∈C +C and let α be a positive scalar. Then, x=x +x for some 1 2 1 2 x ∈C and x ∈C , and since C and C are cones, αx ∈C and αx ∈C . 1 1 2 2 1 2 1 1 2 2 Hence, αx=αx +αx ∈C +C , showing that C +C is a cone. 1 2 1 2 1 2 (d) Let x ∈ cl(C) and let α be a positive scalar. Then, there exists a sequence {x }⊂C such that x →x, and since C is a cone, αx ∈C for all k. Further- k k k more, αx →αx, implying that αx∈cl(C). Hence, cl(C) is a cone. k (e)FirstweprovethatA·Cisacone,whereAisalineartransformationandA·C is the image of C under A. Let z ∈A·C and let α be a positive scalar. Then, Ax = z for some x ∈ C, and since C is a cone, αx ∈ C. Because A(αx) = αz, the vector αz is in A·C, showing that A·C is a cone. Next we prove that the inverse image A−1·C of C under A is a cone. Let x∈A−1·C and let α be a positive scalar. Then Ax∈C, and since C is a cone, αAx ∈ C. Thus, the vector A(αx) is in C, implying that αx ∈ A−1 ·C, and showing that A−1·C is a cone. 2 1.3 (Lower Semicontinuity under Composition) (a) Let {x } ⊂ (cid:3)n be a seque(cid:1)nce of (cid:2)vectors converging to some x ∈ (cid:3)n. By k continuity of f, it follows that f(x ) ⊂ (cid:3)m converges to f(x) ∈ (cid:3)m, so that k by lower semicontinuity of g, we have (cid:3) (cid:4) (cid:3) (cid:4) liminfg f(x ) ≥g f(x) . k k→∞ Hence, h is lower semicontinuous. (b) Assume, to arrive at a contradiction, that h is not lower semicontinuous at some x ∈ (cid:3)n. Then, there exists a sequence {x } ⊂ (cid:3)n converging to x such k that (cid:3) (cid:4) (cid:3) (cid:4) liminfg f(x ) <g f(x) . k k→∞ Let {xk}K be a subsequence attaining the above limit inferior, i.e., (cid:3) (cid:4) (cid:3) (cid:4) (cid:3) (cid:4) lim g f(x ) =liminfg f(x ) <g f(x) . (1.1) k k k→∞, k∈K k→∞ Without loss of generality, we may assume that (cid:3) (cid:4) (cid:3) (cid:4) g f(x ) <g f(x) , ∀ k∈K. k Since g is monotonically nondecreasing, it follows that f(x )<f(x), ∀ k∈K, k whichtogetherwiththefact{xk}K →xandthelowersemicontinuityoff yields f(x)≤ liminf f(x )≤ limsup f(x )≤f(x), k k k→∞, k∈K k→∞, k∈K (cid:1) (cid:2) showing that f(xk) K → f(x). By our choice of the sequence {xk}K and by lower semicontinuity of g, it follows that (cid:3) (cid:4) (cid:3) (cid:4) (cid:3) (cid:4) lim g f(x ) = liminf g f(x ) ≥g f(x) , k k k→∞, k∈K k→∞, k∈K contradicting Eq. (1.1). Hence, h is lower semicontinuous. As an example showing that the assumption that g is monotonically non- decreasing is essential, consider the functions (cid:5) 0 if x≤0, f(x)= 1 if x>0, and g(x)=−x. Then (cid:3) (cid:4) (cid:5) 0 if x≤0, g f(x) = −1 if x>0, which is not lower semicontinuous at 0. 3 1.4 (Convexity under Composition) (a) Let x,y∈C and let α∈[0,1]. Then we have (cid:3) (cid:4) (cid:6) (cid:3) (cid:4)(cid:7) h αx+(1−α)y =g f αx+(1−α)y (cid:3) (cid:4) ≤g αf(x)+(1−α)f(y) (cid:3) (cid:4) (cid:3) (cid:4) ≤αg f(x) +(1−α)g f(y) =αh(x)+(1−α)h(y), where the first inequality above follows from the convexity of f and the mono- tonicity of g, while the second inequality follows from the convexity of g. If g is monotonically increasing and f is strictly convex, then the first inequality in the preceding relation is strict whenever x(cid:4)=y and α∈(0,1), showing that h is strictly convex. (b)Letx,y∈(cid:3)n andletα∈[0,1]. Then,bythedefinitionsofhandf,wehave (cid:3) (cid:4) (cid:6) (cid:3) (cid:4)(cid:7) h αx+(1−α)y =g f αx+(1−α)y (cid:6) (cid:3) (cid:4) (cid:3) (cid:4)(cid:7) =g f αx+(1−α)y ,...,f αx+(1−α)y 1 m (cid:3) (cid:4) ≤g αf (x)+(1−α)f (y),...,αf (x)+(1−α)f (y) (cid:6) (cid:3)1 1(cid:4) m(cid:3) m(cid:4)(cid:7) =g α f (x),...,f (x) +(1−α) f (y),...,f (y) 1 m 1 m (cid:3) (cid:4) (cid:3) (cid:4) ≤αg f (x),...,f (x) +(1−α)g f (y),...,f (y) (cid:3) 1 (cid:4) m (cid:3) (cid:4) 1 m =αg f(x) +(1−α)g f(y) =αh(x)+(1−α)h(y), where the first inequality follows by convexity of each f and monotonicity of g, i while the second inequality follows by convexity of g. 1.5 (Examples of Convex Functions) (a) It can be seen that f is twice continuously differentiable over X and its 1 Hessian matrix is given by ⎡ ⎤ 1−n 1 ··· 1 ⎢ x21 x1x2 x1xn ⎥ ⎢ 1 1−n ··· 1 ⎥ ∇2f1(x)= f1n(2x)⎢⎢⎣ x2x1 x22 .. x2xn ⎥⎥⎦ . 1 1 ··· 1−n xnx1 x1x2 x2n for all x = (x ,...,x ) ∈ X. From this, direct computation shows that for all 1 n z=(z ,...,z )∈(cid:3)n and x=(x ,...,x )∈X, we have 1 n 1 n (cid:14)(cid:14) (cid:16) (cid:16) (cid:15)n 2 (cid:15)n (cid:6) (cid:7) z(cid:4)∇2f (x)z= f1(x) zi −n zi 2 . 1 n2 x x i i i=1 i=1 4 Note that this quadratic form is nonnegative for all z ∈ (cid:3)n and x ∈ X, since f (x)<0, and for any real numbers α ,...,α , we have 1 1 n (α +···+α )2 ≤n(α2+···+α2), 1 n 1 n inviewofthefactthat2α α ≤α2+α2. Hence,∇2f (x)ispositivesemidefinite j k j k 1 for all x∈X, and it follows from Prop. 1.2.6(a) that f is convex. 1 (b) We show that the Hessian of f is positive semidefinite at all x ∈ (cid:3)n. Let 2 β(x)=ex1 +···+exn. Then a straightforward calculation yields (cid:15)n (cid:15)n z(cid:4)∇2f (x)z= 1 e(xi+xj)(z −z )2 ≥0, ∀ z∈(cid:3)n. 2 β(x)2 i j i=1 j=1 Hence by Prop. 1.2.6, f is convex. 2 (cid:3) (cid:4) (b) The function f (x) = (cid:11)x(cid:11)p can be viewed as a composition g f(x) of the 3 scalarfunctiong(t)=tpwithp≥1andthefunctionf(x)=(cid:11)x(cid:11). Inthiscase,gis convexandmonotonicallyincreasingoverthenonnegativeaxis,thesetofvalues that f can take, while f is convex over (cid:3)n (since any vector norm is convex, see the discussion preceding Prop. 1.2.4). Using Exercise 1.4, it follows that the function f (x)=(cid:11)x(cid:11)p is convex over (cid:3)n. 3 (cid:3) (cid:4) (c) The function f (x) = 1 can be viewed as a composition g h(x) of the 4 f(x) function g(t)=−1 for t<0 and the function h(x)=−f(x) for x∈(cid:3)n. In this t case, the g is convex and monotonically increasing in the set {t|t<0}, while h is convex over (cid:3)n. Using Exercise 1.4, it follows that the function f (x)= 1 4 f(x) is convex over (cid:3)n. (cid:3) (cid:4) (d) The function f (x)=αf(x)+β can be viewed as a composition g f(x) of 5 thefunctiong(t)=αt+β,wheret∈(cid:3),andthefunctionf(x)forx∈(cid:3)n. Inthis case, g is convex and monotonically increasing over (cid:3) (since α ≥ 0), while f is convexover(cid:3)n. UsingExercise1.4,itfollowsthatthefunctionf (x)=αf(x)+β 5 is convex over (cid:3)n. (cid:3) (cid:4) (cid:4) (e) The function f (x)=eβxAx can be viewed as a composition g f(x) of the 6 function g(t) = eβt for t ∈ (cid:3) and the function f(x) = x(cid:4)Ax for x ∈ (cid:3)n. In this case,g isconvexandmonotonicallyincreasingover(cid:3),whilef isconvexover(cid:3)n (sinceAispositivesemidefinite). UsingExercise1.4,itfollowsthatthefunction (cid:4) f (x)=eβxAx is convex over (cid:3)n. 6 (f) This part is straightforward using the definition of a convex function. 1.6 (Ascent/Descent Behavior of a Convex Function) (a)Letx ,x ,x bethreescalarssuchthatx <x <x . Thenwecanwritex 1 2 3 1 2 3 2 as a convex combination of x and x as follows 1 3 x −x x −x x = 3 2x + 2 1x , 2 x −x 1 x −x 3 3 1 3 1 5 so that by convexity of f, we obtain x −x x −x f(x )≤ 3 2f(x )+ 2 1f(x ). 2 x −x 1 x −x 3 3 1 3 1 This relation and the fact x −x x −x f(x )= 3 2f(x )+ 2 1f(x ), 2 x −x 2 x −x 2 3 1 3 1 imply that (cid:3) (cid:4) (cid:3) (cid:4) x −x x −x 3 2 f(x )−f(x ) ≤ 2 1 f(x )−f(x ) . x −x 2 1 x −x 3 2 3 1 3 1 Bymultiplyingtheprecedingrelationwithx −x andbydividingitwith(x − 3 1 3 x )(x −x ), we obtain 2 2 1 f(x )−f(x ) f(x )−f(x ) 2 1 ≤ 3 2 . x −x x −x 2 1 3 2 (b) Let {x } be an increasing scalar sequence, i.e., x < x < x < ···. Then k 1 2 3 according to part (a), we have for all k f(x )−f(x ) f(x )−f(x ) f(x )−f(x ) 2 1 ≤ 3 2 ≤···≤ k+1 k . (1.2) x −x x −x x −x 2 1 3 2 k+1 k (cid:3) (cid:4) Since f(xk)−f(xk−1) /(xk−xk−1) is monotonically nondecreasing, we have f(xk)−f(xk−1) →γ, (1.3) xk−xk−1 where γ is either a real number or ∞. Furthermore, f(x )−f(x ) k+1 k ≤γ, ∀ k. (1.4) x −x k+1 k We now show that γ is independent of the sequence {x }. Let {y } be k j any increasing scalar sequence. For each j, choose x such that y < x and kj j kj x <x <···<x ,sothatwehavey <y <x <x . Bypart(a), k1 k2 kj j j+1 kj+1 kj+2 it follows that f(yj+1)−f(yj) ≤ f(xkj+2)−f(xkj+1), y −y x −x j+1 j kj+2 kj+1 and letting j →∞ yields f(y )−f(y ) lim j+1 j ≤γ. j→∞ yj+1−yj Similarly, by exchanging the roles of {x } and {y }, we can show that k j f(y )−f(y ) lim j+1 j ≥γ. j→∞ yj+1−yj 6 Thus the limit in Eq. (1.3) is independent of the choice for {x }, and Eqs. (1.2) k and (1.4) hold for any increasing scalar sequence {x }. k We consider separately each of the three possibilities γ < 0,γ = 0, and γ > 0. First, suppose that γ < 0, and let {x } be any increasing sequence. By k using Eq. (1.4), we obtain (cid:15)k−1 f(x )−f(x ) f(x )= j+1 j (x −x )+f(x ) k x −x j+1 j 1 j+1 j j=1 (cid:15)k−1 ≤ γ(x −x )+f(x ) j+1 j 1 j=1 =γ(x −x )+f(x ), k 1 1 and since γ < 0 and x → ∞, it follows that f(x ) → −∞. To show that f k k decreasesmonotonically,pickanyxandy withx<y,andconsiderthesequence x =x, x =y, and x =y+k for all k≥3. By using Eq. (1.4) with k=1, we 1 2 k have f(y)−f(x) ≤γ <0, y−x sothatf(y)−f(x)<0. Hencef decreasesmonotonicallyto−∞,corresponding to case (1). Suppose now that γ =0, and let {x } be any increasing sequence. Then, k byEq.(1.4),wehavef(x )−f(x )≤0forallk. Iff(x )−f(x )<0forall k+1 k k+1 k k, then f decreases monotonically. To show this, pick any x and y with x < y, and consider a new sequence given by y1 = x, y2 = y, and yk = xK+k−3 for all k ≥ 3, where K is large enough so that y < x . By using Eqs. (1.2) and (1.4) K with {y }, we have k f(y)−f(x) f(x )−f(x ) ≤ K+1 K <0, y−x x −x K+1 K implying that f(y)−f(x) < 0. Hence f decreases monotonically, and it may decreaseto−∞ortoafinitevalue,correspondingtocases(1)or(2),respectively. If for some K we have f(x )−f(x )=0, then by Eqs. (1.2) and (1.4) K+1 K where γ = 0, we obtain f(x ) = f(x ) for all k ≥ K. To show that f stays at k K thevaluef(x )forallx≥x , chooseanyxsuchthatx>x , anddefine{y } K K K k as y1 =xK, y2 =x, and yk =xN+k−3 for all k≥3, where N is large enough so that x<x . By using Eqs. (1.2) and (1.4) with {y }, we have N k f(x)−f(x ) f(x )−f(x) K ≤ N ≤0, x−x x −x K N so that f(x) ≤ f(x ) and f(x ) ≤ f(x). Since f(x ) = f(x ), we have K N K N f(x)=f(x ). Hence f(x)=f(x ) for all x≥x , corresponding to case (3). K K K (cid:3) Finally,supp(cid:4)osethatγ >0,andlet{x }beanyincreasingsequence. Since k f(xk)−f(xk−1) /(xk −xk−1) is nondecreasing and tends to γ [cf. Eqs. (1.3) and (1.4)], there is a positive integer K and a positive scalar (cid:5) with (cid:5) < γ such that (cid:5)≤ f(xk)−f(xk−1), ∀ k≥K. (1.5) xk−xk−1 7 Therefore, for all k>K (cid:15)k−1 f(x )−f(x ) f(x )= j+1 j (x −x )+f(x )≥(cid:5)(x −x )+f(x ), k x −x j+1 j K k K K j+1 j j=K implying that f(x )→∞. To show that f(x) increases monotonically to ∞ for k allx≥x ,pickanyx<y satisfyingx <x<y,andconsiderasequencegiven K K byy1 =xK,y2 =x,y3 =y,andyk =xN+k−4 fork≥4,whereN islargeenough so that y<x . By using Eq. (1.5) with {y }, we have N k f(y)−f(x) (cid:5)≤ . y−x Thus f(x) increases monotonically to ∞ for all x ≥ x , corresponding to case K (4) with x=x . K 1.7 (Characterization of Differentiable Convex Functions) If f is convex, then by Prop. 1.2.5(a), we have f(y)≥f(x)+∇f(x)(cid:4)(y−x), ∀ x,y∈C. By exchanging the roles of x and y in this relation, we obtain f(x)≥f(y)+∇f(y)(cid:4)(x−y), ∀ x,y∈C, and by adding the preceding two inequalities, it follows that (cid:3) (cid:4) (cid:4) ∇f(y)−∇f(x) (x−y)≥0. (1.6) Conversely, let Eq. (1.6) hold, and let x and y be two points in C. Define the function h:(cid:3)(cid:13)→(cid:3) by (cid:3) (cid:4) h(t)=f x+t(y−x) . Consider some t,t(cid:4) ∈ [0,1] such that t < t(cid:4). By convexity of C, we have that x+t(y−x) and x+t(cid:4)(y−x) belong to C. Using the chain rule and Eq. (1.6), we have (cid:6) (cid:7) dh(t(cid:4)) − dh(t) (t(cid:4)−t) dt dt (cid:6) (cid:3) (cid:4) (cid:3) (cid:4)(cid:7)(cid:4) = ∇f x+t(cid:4)(y−x) −∇f x+t(y−x) (y−x)(t(cid:4)−t) ≥0. Thus, dh/dt is nondecreasing on [0,1] and for any t∈(0,1), we have (cid:17) (cid:17) h(t)−h(0) 1 t dh(τ) 1 1 dh(τ) h(1)−h(t) = dτ ≤h(t)≤ dτ = . t t dτ 1−t dτ 1−t 0 t Equivalently, th(1)+(1−t)h(0)≥h(t), and from the definition of h, we obtain (cid:3) (cid:4) tf(y)+(1−t)f(x)≥f ty+(1−t)x . Since this inequality has been proved for arbitrary t ∈ [0,1] and x,y ∈ C, we conclude that f is convex. 8 1.8 (Characterization of Twice Continuously Differentiable Convex Functions) Suppose that f :(cid:3)n (cid:13)→(cid:3) is convex over C. We first show that for all x∈ri(C) and y ∈ S, we have y(cid:4)∇2f(x)y ≥ 0. Assume to arrive at a contradiction, that there exists some x∈ri(C) such that for some y∈S, we have y(cid:4)∇2f(x)y<0. Withoutlossofgenerality,wemayassumethat(cid:11)y(cid:11)=1. Usingthecontinuityof ∇2f, we see that there is an open ball B(x,(cid:5)) centered at x¯ with radius (cid:5) such that B(x,(cid:5))∩aff(C)⊂C [since x∈ri(C)], and y(cid:4)∇2f(x)y<0, ∀ x∈B(x,(cid:5)). (1.7) By Prop. 1.1.13(a), for all positive scalars α with α<(cid:5), we have f(x¯+αy)=f(x¯)+α∇f(x¯)(cid:4)y+ 1y(cid:4)∇2f(x¯+α¯y)y, 2 for some α¯ ∈[0,α]. Furthermore, (cid:11)(x+αy)−x(cid:11)≤(cid:5) [since (cid:11)y(cid:11)=1 and α¯ <(cid:5)]. Hence, from Eq. (1.7), it follows that f(x¯+αy)<f(x¯)+α∇f(x¯)(cid:4)y, ∀ α∈[0,(cid:5)). Ontheotherhand,bythechoiceof(cid:5)andtheassumptionthaty∈S,thevectors x¯+αy are in C for all α with α ∈ [0,(cid:5)), which is a contradiction in view of the convexity of f over C. Hence, we have y(cid:4)∇2f(x)y ≥ 0 for all y ∈ S and all x∈ri(C). Next,letxbeapointinC thatisnotintherelativeinteriorofC. Then,by the Line Segment Principle, there is a sequence {x }⊂ri(C) such that x →x. k k As seen above, y(cid:4)∇2f(x )y ≥0 for all y ∈S and all k, which together with the k continuity of ∇2f implies that y(cid:4)∇2f(x)y= lim y(cid:4)∇2f(x )y≥0, ∀ y∈S. k k→∞ It follows that y(cid:4)∇2f(x)y≥0 for all x∈C and y∈S. Conversely, assume that y(cid:4)∇2f(x)y≥0 for all x∈C and y∈S. By Prop. 1.1.13(a), for all x,z∈C we have (cid:3) (cid:4) f(z)=f(x)+(z−x)(cid:4)∇f(x)+ 1(z−x)(cid:4)∇2f x+α(z−x) (z−x) 2 for some α ∈ [0,1]. Since x,z ∈ C, we have that (z−x) ∈ S, and using the convexity of C and our assumption, it follows that f(z)≥f(x)+(z−x)(cid:4)∇f(x), ∀ x,z∈C. From Prop. 1.2.5(a), we conclude that f is convex over C. 9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.