ebook img

Two globally convergent nonmonotone trust-region methods for unconstrained optimization PDF

0.57 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Two globally convergent nonmonotone trust-region methods for unconstrained optimization

Noname manuscript No. (will be inserted by the editor) Two globally convergent nonmonotone trust-region methods for unconstrained optimization Masoud Ahookhosh · Susan Ghaderi 5 1 Abstract This paper addresses some trust-region methods equipped with nonmonotone strategies for 0 solving nonlinear unconstrained optimization problems. More specifically, the importance of using non- 2 monotone techniques in nonlinear optimization is motivated, then two new nonmonotone terms are pro- n posed,andtheircombinationsintothetraditionaltrust-regionframeworkarestudied.Theglobalconver- a gence to first- and second-order stationary points and local superlinear and quadratic convergence rates J forbothalgorithmsareestablished.NumericalexperimentsontheCUTEsttestcollectionofunconstrained 9 problemsandsome highlynonlineartestfunctionsare reported, whereacomparisonamong state-of-the- art nonmonotone trust-region methods show the efficiency of the proposed nonmonotne schemes. ] C Keywords Unconstrained optimization · Black-box oracle · Trust-region method · Nonmonotone O strategy · Global convergence . h Mathematics Subject Classification (2000) 90C30 · 65K05 t a m [ 1 Introduction 1 In this paper we consider the unconstrained minimization problem v 1 minimize f(x) 1 (1) 0 subject to x∈Rn, 2 0 where f :Rn →R is a real-valued nonlinear function, which is bounded and continuously-differentiable. . We suppose that first- or second-order black-box oracle of f is available. 1 0 Motivation & history. Trust-region methods, also called restricted step methods [21], are a class of 5 iterative schemes developed to solve convex or nonconvex optimization problems, see, for example, [13]. 1 They also developed for nonsmooth problems, see [13, 16, 45, 23]. Trust-region methods have strong : v convergence properties, are reliable and robust in computation, and can handle ill-conditioned problems, i cf.[34,35].Letx bethecurrentiteration.Intrust-regionframeworktheobjectivef isapproximatedby X k a simple model in a specific region around x such that it is an acceptable approximation of the original k r a objective, which is called region of trust. Afterward, the model is minimized subject to the trust-region constraint to find a new trial point d . Hence the simple model means that it can be minimized much k easier than the original objective function. If the founded model is an adequate approximation of the objective function within the trust-region, then the point x =x +d is accepted by the trust-region k+1 k k method and the region can be expanded for the next iteration; conversely, if the approximation is poor, the region is contracted and the model is minimized within the contracted region. This scheme will be continued until finding an acceptable trial step d guaranteeing an acceptable agreement between the k model and the objective function. M.Ahookhosh FacultyofMathematics,UniversityofVienna,Oskar-Morgenstern-Platz1,1090Vienna,Austria E-mail:[email protected] S.Ghaderi DepartmentofMathematics,FacultyofScience,RaziUniversity,Kermanshah,Iran E-mail:[email protected] 2 MasoudAhookhosh,SusanGhaderi Severalquadraticandnon-quadraticmodelshavebeenproposedtoapproximatetheobjectivefunction inoptimization,see[14,22,36,39],however,theconicandquadraticmodelsaremorepopular,see[17,37]. If the approximated model is quadratic, i.e., 1 q (d):=f +gTd+ dTB d, (2) k k k 2 k where f = f(x ), g = ∇f(x ), and B ≈ ∇2f(x ), the trust-region method can be considered as a k k k k k k globally convergent generalization on classical Newton’s method. Then the trust-region sunproblem is defined by minimize q (d), k (3) subject to (cid:107)d(cid:107)≤δ . k Hence the trust-region is commonly a norm ball C defined by C :={d∈Rn |(cid:107)d(cid:107)≤δ }, k where δ >0 is a real number called trust-region radius, and (cid:107)·(cid:107) is any norm in Rn, cf. [29]. Since C is k compactandthemodeliscontinuous,thetrust-regionsubproblemattainsitsminimizeronthesetC.The most computational cost of trust-region methods relates to minimizing the model over the trust-region C. Hence finding efficient schemes for solving (3) has received much attention during past few decades, see [19, 20, 25, 31, 38]. Once the step d is computed, the quality of the model in the trust-region is evaluated by a ratio of the actual reduction of objective, f −f(x +d), to the predicted reduction of k k model, q (0)−q (d), i.e., k k f −f(x +d) r = k k . (4) k q (0)−q (d) k k Foraprescribedpositiveconstantµ ∈(0,1],ifr ≥µ ,themodelprovidesareasonableapproximation, 1 k 1 the step is accepted, i.e., x = x +d , and the trust-region C can be expanded for the next step. k+1 k k Otherwise, the trust-region C should be contracted by decreasing the radius δ and the subproblem (3) k is solved in the reduced region. This scheme is continued until that the step d accepted by trust-region test r ≥µ . Our discussion can be summarized in the following algorithm: k 1 Algorithm 1: TTR (traditional trust-region algorithm) Input:x0∈Rn,B0∈Rn×n,kmax;0<µ1≤µ2≤1,0<ρ1≤1≤ρ2,ε>0; Output:x ;f ; b b 1 begin 2 δ0←(cid:107)g0(cid:107); k←0; 3 while(cid:107)gk(cid:107)≥ε & k≤kmax do 4 solvethesubproblem(3)tospecifydk; 5 x(cid:98)k←xk+dk; computef(x(cid:98)k); 6 determinerk using(4); 7 whilerk<µ1 do 8 δk←ρ1δk; 9 solvethesubproblem(3)tospecifydk; 10 x(cid:98)k←xk+dk; computef(x(cid:98)k) 11 determinerk using(4); 12 end 13 xk+1←x(cid:98)k; 14 if rk≥µ2 then 15 δk+1←ρ2δk; 16 end 17 updateBk+1; k←k+1; 18 end 19 xb←xk; fb←fk; 20 end In Algorithm 1, it follows from r ≥µ and q (0)−q (d )>0 that k 1 k k k f −f ≥µ (q (0)−q (d ))>0, k k+1 1 k k k implying f ≤ f . This means that the sequence of function values {f } is monotonically decreasing, k+1 k k i.e.,thetraditionaltrust-regionmethodisalsocalledthemonotone trust-region method.Thisfeature seems natural for minimization schemes, however, it slows down the convergence of TTR to a minimizer if the objective involves a curved narrow valley, see [1, 27]. To observe the effect of nonmonotonicity on TTR, we study the next example. Twogloballyconvergentnonmonotonetrust-regionmethodsforunconstrainedoptimization 3 Example 1 Consider the two-dimensional Nesterov-Chebysheve-Rosenbrock function , cf. [28], 1 f(x ,x )= (x −1)2+(x −2x2+1)2, 1 2 4 1 2 1 where we solve the problem (1) by Newton’s method and TTR with the initial point x =(−0.61,−1). It 0 is clear that (1,1) is the optimizer. The implementation indicates that Newton’s method needs 7 iterations and 8 function evaluations, while monotone trust-region method needs 22 iterations and 24 function evaluations. We depict the contour plot of the objective and iterations as well as a diagram for function values versus iteration attained by these two algorithms in Figure 1. Subfigure (a) of Figure 1 shows that the iterations of TTR follow the bottom of the valley in contrast to those for Newton’s method that can go up and down to reach the ε-solution with the accuracy parameter ε = 10−5. We see that Newton’s method attains larger step compared with those of TTR. Subfigure (b) of Figure 1 illustrates function values versus iterations for both algorithms showing that the related function values of TTR decreases monotonically, while it is fluctuated nonmonotonically for Newton’s method. 102 1 0.5 100 0 10−2 −0−.15 function values10−4 10−6 −1.5 −2 10−8 Nes−Cheb−Rosen TTR TTR −2.5 Newton Newton 10−10 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 100 101 102 iterations (a)Nes-Cheb-Rosencontourplot&iterations (b)functionvaluesversusiterations Fig. 1: A comparison between Newton’s method and TTR: Subfigure (a) illustrates the contour plot of the two-dimensional Nesterov-Chebysheve-Rosenbrock function and iterations of Newton and TTR; Subfigure (b) shows the diagram of function values versus iterations. In general the monotonicity may result to the slow iterative schemes for highly nonlinear or badly- scaled problems. To avoiding this algorithmic limitation, the idea of nonmonotone strategies has been proposedtracedbacktothewatch-dogtechniquetoovercometheMartoseffectforconstrainedoptimiza- tion [12]. To improve the performance of Armijo’s line search, Grippo et al. in 1986 [27] proposed the modified Armijo’s rule f(x +α d )≤f +σα gTd , k =0,1,2,··· , k k k l(k) k k k with the step-size α >0, σ ∈(0,1/2), and k f = max {f }, (5) l(k) k−j 0≤j≤m(k) where m(0) = 0, m(k) ≤ min{m(k − 1) + 1,N} for nonnegetive integer N. It was shown that the associatedschemeisgloballyconvergent,andnumericalresultsreportedinGrippoetal.[27]andToint [40]showedtheeffectivenessoftheproposedidea.Motivatedbytheseresults,thenonmonotonestrategies has received much attention during past few decades. For example, in 2004, Zhang & Hager in [46] proposed the nonmonotone term (cid:26) (cid:26) f if k =0, 1 if k =0, C = 0 Q = (6) k (η Q C +f(x ))/Q if k ≥1, k η Q +1 if k ≥1, k−1 k−1 k−1 k k k−1 k−1 4 MasoudAhookhosh,SusanGhaderi where 0≤η ≤η ≤η ≤1. Recently, Mo et al. in [30] and Ahookhosh et al. in [3] studied the min k−1 max nonmonotone term (cid:26) f if k =1, D = k (7) k η D +(1−η )f if k ≥2, k k−1 k k where η ∈ [η ,η ], η ∈ [0,1], η ∈ [η ,1]. More recently, Amini et al. in [7] proposed the k min max min max min nonmonotone term R =η f +(1−η )f , (8) k k l(k) k k where0≤η ≤η ≤1andη ∈[η ,η ].Inallcasesitwasprovedthattheschemesareglobally min max k min max convergent and enjoy the better performance compared with monotone ones. At the same importance of using monmonotone strategies for inexact line search techniques, the combination of trust-region methods with nonmonotone strategies is interesting. Historically, the first nonmonotone trust-region method was proposed in 1993 by Deng et al. in [15] for unconstrained opti- mization.Undersomeclassicalassumptions,theglobalconvergenceandthelocalsuperlinearconvergence rate were established. Nonmonotone trust-region methods were also studied by several authors such as Toint [41], Xiao & Zho [43], Xiao & Chu [44], Zhou & Xiao [47], Ahookhosh & Amini [2], Amini & Ahookhosh [6], and Mo et al. [30]. Recently, Ahookhosh & Amini in [1] and Ahookhosh et al. in [4] proposed two nonmonotone trust-region methods using the nonmonotone term (8). Theoretical resultswerereported,andnumericalresultsshowedtheefficiencyoftheproposednonmonotonemethods. Content. In this paper we propose a trust-region method equipped with two novel nonmonotone terms. More precisely, we first establish two nonmonotone terms and then combine them with Algorithm 1 to construct two nonmonotone trust-region algorithms. If k ≥ N, the new nonmonotone terms are defined by a convex combination of the last N successful function values, and if k < N, either a convex combi- nation of k successful function values or f is used. The global convergence to first- and second-order l(k) stationary points is established on some classical assumptions. Moreover, local superlinear and quadratic convergenceratesfortheproposedmethodsarestudied.Numericalresultsregardingexperimentsonsome highly nonlinear problems and on 112 unconstrained test problems from the CUTEst test collection [24] are reported indicating the efficiency of the proposed nonmonotone terms. The remainder of paper is organized as follow. In Section 2 we propose new nonmonotone terms and their combination with the trust-region framework. The global convergence of the proposed methods are given in Section 3. Numerical results are reported in Section 4. Finally, some conclusions are given in Section 5. 2 Novel nonmonotone terms and algorithm In this section we first present two novel nonmonotone terms and then combine them into trust-region frameworktointroducetwononmonotonetrust-regionalgorithmsforsolvingtheunconstrainedoptimiza- tion problem (1). We first assume that k denotes the current iteration and N ∈ N is a constant. The main idea is to construct a nonmonotone term determined by a convex combination of the last k successful function values if k < N and by a convex combination of the last N successful function values if k ≥ N. In the other words, we construct new terms using function values collected in the set (cid:26) {f ,f ,··· ,f } if k <N, F := 0 1 k (9) k {f ,f ,··· ,f } if k ≥N, k−N+1 k−N+2 k which should be updated in each iteration. To this end, motivated by the term (40), we construct T k using the subsequent procedure  T =f if k =0, TT012 ==((110−−ηη01))ff12++ηη01f(10−η0)f1+η1η0f0 iiff kk ==12,, . . . . . . TT. N−1 ==. ((11−−ηηN−2))ffN−+1η+ηN−(12(−1η−ηN−)f3)fN−+2·+···+··η+ηN−·2···η··fη0f0 i.iff kk ==NN,−1, N N−1 N N−1 N−2 N−1 N−1 0 0 Twogloballyconvergentnonmonotonetrust-regionmethodsforunconstrainedoptimization 5 where η ∈[0,1), for i=1,2,··· ,N, are some weight parameters. Hence the new term is generated by i (cid:26) (1−η )f +η T if k <N, T := k−1 k k−1 k−1 (10) k (1−η ) f +η (1−η ) f +···+η ···η f if k ≥N, k−1 k k−1 k−2 k−1 k−1 k−N k−N whereT =f andη ∈[0,1)fori=1,2,··· ,k.ToshowthatT isaconvexcombinationofthecollected 0 0 i k functionvaluesF ,itisenoughtoshowthatthesummationofmultipliersareequaltounity.Fork ≥N, k the definition for T implies k (1−η )+η (1−η )+···+η ···η (1−η )+η ···η =1 (11) k−1 k−1 k−2 k−1 k−N−1 k−N k−1 k−N For k <N, a similar summation of the last k multipliers is equal to one. Therefore, the generated term T is a convex combination of the elements of F . k k The procedure of defining T clearly implies that the set F should be updated and saved in each k k iteration. Moreover, N(N +1)/2 multiplications is required to compute T . To avoid saving F and k k decrease the number of multiplications, we derive a recursive formula for (10). From the definition of T , k for k ≥N, it follows that T −η T =(1−η ) f +η (1−η ) f +···+η ···η f k k−1 k−1 k−1 k k−1 k−2 k−1 k−1 k−N k−N −η (1−η ) f −···−η ···(1−η ) f −η η ···η f k−1 k−2 k−1 k−1 k−N−1 k−N k−1 k−2 k−N−1 k−N−1 =(1−η ) f +η η ···η (f −f ) k−1 k k−1 k−2 k−N−1 k−N k−N−1 =(1−η ) f +ξ (f −f ) k−1 k k k−N k−N−1 where ξ :=η η ···η . For k ≥N, this equation leads to k k−1 k−2 k−N−1 T =(1−η ) f +η T +ξ (f −f ), (12) k k−1 k k−1 k−1 k k−N k−N−1 whichrequirstosaveonlyf andf andonlyneedsthreemultiplications.Moreover,thedefinition k−N k−N−1 of ξ implies k η η ξ =η η ···η = k−1 η η ···η = k−1 ξ . (13) k k−1 k−2 k−N−1 η k−2 k−3 k−N−2 η k−1 k−N−2 k−N−2 If ξ is recursively updated by (13), (10), and (12), a new nonmonotone term is defined by k (cid:26) f +η (T −f ) if k <N, Tk := mkax(cid:8)Tk−1,f (cid:9)k k if k ≥N, (14) k k where the max term is added to guarantee T ≥f . k k AsdiscussedinSection1,nonmonotoneschemesperformbetterwhentheyusestrongernonmonotone termsfarawayfromtheoptimizerandweakeroneclosetoit.Thismotivateustoconsideranewversionof thederivednonmonotonetermbyusingf incasesthatk <N.Moreprecisely,thesecondnonmonotone l(k) term is defined by (cid:26) f if k <N, Tk = ml(akx)(cid:8)T ,f (cid:9) if k ≥N, (15) k k whereξ isdefinedby(13).Itisclearthatthenewtermusesastrongertermf definedby(5)forfirst k l(k) k <N iterations and then employs the relaxed convex term proposed above. Now, to employ the proposed nonmonotone terms in the trust-region framework, it is enough to replace the ratio r (4) by the nonmonotone ratio k T −f(x +d) r = k k , (16) (cid:98)k q (0)−q (d) k k where T is defined by (14) or (15). Hence in trust-region framework we replace (4) by (16). Notice that k if r ≥µ , the, (cid:98)k 1 T −f ≥µ (q (0)−q (d ))≥0. k k+1 1 k k k This implies that f can be larger than f , however, the elements of {f } cannot arbitrarily increase, k+1 k k and the maximum increase is controlled by the nonmonotone term T . Moreover, the definitions (14) k and (15) imply that r ≥r increasing the possibility of attaining larger steps for nonmonotone schemes (cid:98)k k compared with monotone ones. 6 MasoudAhookhosh,SusanGhaderi The above-mentioned discussion leads to the following nonmonotone trust-region algorithm: Algorithm 2: NMTR (nonmonotone traditional trust-region algorithm) Input: x ∈Rn, B ∈Rn×n, k ; 0<µ ≤µ ≥1, 0<ρ ≤1≤ρ ≥1, ε>0; 0 0 max 1 2 1 2 Output: x ; f ; b b 1 begin 2 δ0 ←(cid:107)g0(cid:107); k ←0; 3 while (cid:107)gk(cid:107)≥ε & k ≤kmax do 4 solve the subproblem (3) to specify dk; 5 x(cid:98)k ←xk+dk; compute f(xˆk); 6 determine r(cid:98)k using (16); 7 if r(cid:98)k <µ1 while r(cid:98)k <µ1 do 8 δk ←ρ1δk; 9 solve the subproblem (3) to specify dk; 10 x(cid:98)k ←xk+dk; compute f(xˆk) 11 determine r(cid:98)k using (16); 12 end 13 xk+1 ←x(cid:98)k; 14 if r(cid:98)k ≥µ2 then 15 δk+1 ←ρ2δk; 16 end 17 update Bk+1; k ←k+1; 18 update Tk+1; 19 update ηk+1; 20 end 21 xb ←xk; fb ←fk; 22 end In Algorithm 2, if r ≥µ (Line 7), it is called a successful iteration and if r ≥µ (Line 14), it is (cid:98)k 1 (cid:98)k 2 called a very successful iteration. In addition, in the algorithm, the loop started from Line 3 to Line 20 is called the outer cycle, and the loop started from Line 7 to Line 12 is called the inner cycle. 3 Convergence analysis This section concerns with the global convergence to first- and second-order stationary points of the sequence {x } generated by Algorithm 2. More precisely, we intend to prove that all limit point x∗ of k the sequence {x } satisfy the condition g(x∗)=0, and there exists a point x∗ satisfying g(x∗)=0 where k H(x∗) is positive semidefinite. Furthermore, we show that Algorithm 2 is well-defined, which means that theinnercycleofthealgorithmwillbeleavedafterafinitenumberinternaliterations,andthenproveits global convergence. Moreover, local superlinear and quadratic convergence rates are investigated under some classical assumptions. To prove the global convergence of the sequence {x } generated by Algorithm 2, we require to make k the following assumptions: (H1) The objective function f is continuously differentiable and has a lower bound on the upper level set L(x )={x∈Rn |f(x)≤f(x )}. 0 0 (H2) The sequence {B } is uniformly bounded, i.e., there exists a constant M >0 such that k (cid:107)B (cid:107)≤M, k for all k ∈N. (H3) There exists a constant c>0 such that the trial step d satisfies (cid:107)d (cid:107)≤c(cid:107)g (cid:107). k k k Twogloballyconvergentnonmonotonetrust-regionmethodsforunconstrainedoptimization 7 We also assume that the decrease on the model q is at least as much as a fraction of the decrease k obtained by the Cauchy point guaranteeing that there exists a constant β ∈(0,1) such that (cid:26) (cid:27) (cid:107)g (cid:107) q (0)−q (d )≥β(cid:107)g (cid:107) min δ , k , (17) k k k k k (cid:107)B (cid:107) k for all k. This condition is called the sufficient reduction condition. Inequality (17) implies that d (cid:54)= 0 k whenever g (cid:54)= 0. It is noticeable that there are several schemes that can solve the the trust-region k subproblem (3) such that (18) is valid, see, for example, [13, 32]. Lemma 2 Suppose that sequence {x } is generated by Algorithm 2, then k |f −f(x +d )−(q (0)−q (d ))|≤O((cid:107)d (cid:107)2). k k k k k k k Proof The proof can be found in [12]. (cid:116)(cid:117) Lemma 3 Suppose that the sequence {x } is generated by Algorithm 1, then we get k f ≤T ≤f , (18) k k l(k) for all k ∈N∪{0}. Proof For k ≤ N, we consider two cases: (i) T is defined by (14); (ii) T is defined by (15). In Case (i) k k Lemma 2.1 in [3], f ≤f , for i=0,1,···k, and the fact that summation of multipliers in T equal to i l(k) k one give the result. Case (ii) is evident from (15). For k ≥N, if T =f , the result is evident. Otherwise, since k k (1−η )+η (1−η )+···+η ···η (1−η )+η ···η =1, (19) k−1 k−1 k−2 k−1 k−N−1 k−N k−1 k−N the fact that f ≤f , for i=k−N +1,··· ,k, and (10) imply i l(k) f ≤T =(1−η ) f +η (1−η ) f +···+η ···η f k k k−1 k k−1 k−2 k−1 k−1 k−N k−N ≤[(1−η )+η (1−η )+···+η ···η ]f =f , k−1 k−1 k−2 k−1 k−N l(k) l(k) giving the result. (cid:116)(cid:117) Lemma 4 Suppose that sequence {x } is generated by Algorithm 2, then the sequence {f } is decreas- k l(k) ing. Proof The condition (18) implies that T ≤f . If x is accepted by Algorithm 2, then k l(k) k+1 f −f(x +d ) T −f(x +d ) l(k) k k ≥ k k k ≥µ , q (0)−q (d ) q (0)−q (d ) 1 k k k k k k leading to f −f(x +d )≥µ (q (0)−q (d ))≥0, for all k ∈N, l(k) k k 1 k k k implying f ≥f , for all k ∈N. (20) l(k) k+1 Now, if k ≥N, by using m(k+1)≤m(k)+1 and (20), we get f = max {f }≤ max {f }=max{f ,f }≤f . l(k+1) k−j+1 k−j+1 l(k) k+1 l(k) 0≤j≤m(k+1) 0≤j≤m(k)+1 For k < N, it is obvious that m(k) = k. Since, for any k, f ≤ f , it is clear that f = f . Therefore, k 0 l(k) 0 in both cases, the sequence {f } is decreasing. (cid:116)(cid:117) l(k) Lemma 5 Suppose that (H1) holds and the sequence {x } is generated by Algorithm 2, then L(x ) k 0 involves {x }. k Proof The definition of T indicates that T = f . By induction, we assume that x ∈ L(x ), for all k 0 0 i 0 i=1,2,··· ,k, and then prove that x ∈L(x ). From (18), we get k+1 0 f ≤T ≤f ≤f ≤f , k+1 k+1 l(k+1) l(k) 0 implying that L(x ) involves the sequence {x }. (cid:116)(cid:117) 0 k 8 MasoudAhookhosh,SusanGhaderi Corollary 6 Suppose that (H1) holds and the sequence {x } is generated by Algorithm 2. Then the k sequence {f } is convergent. l(k) Proof The assumption (H1) and Lemma 4 imply that there exists a constant λ such that λ≤f ≤f ≤···≤f ≤f , k+n l(k+n) l(k+1) l(k) for all n∈N. This implies that the sequence {f } is convergent. (cid:116)(cid:117) l(k) Lemma 7 Suppose that (H1)-(H3) hold and the sequence {x } is generated by Algorithm 2, then k lim f(x )= lim f . (21) l(k) k k→∞ k→∞ Proof The condition (18) and Lemma 7 of [1] imply that the result is valid. (cid:116)(cid:117) Corollary 8 Suppose (H1)-(H3) hold and the sequence {x } is generated by Algorithm 2, then we k lim T = lim f . (22) k k k→∞ k→∞ Proof From (18) and Lemma 7, the result is obtained. (cid:116)(cid:117) Lemma 9 Suppose that (H1) and (H2) hold, and the sequence {x } is generated by Algorithm 2. Then k if (cid:107)g (cid:107)≥ε>0, we have k (i) The inner cycle of Algorithm 2 is well-defined; (ii) For any k, there exists a nonnegative integer p such that x is a very successful iteration. k+p+1 Proof (i) Let t denotes the internal iteration counter in step k, and dtk and δtk respectively show the k k k solutionofthesubproblem(3)andthecorrespondingtrust-regionradiusintheinternaliterationt .The k fact that (cid:107)g (cid:107)≥ε>0, (H2), and (17) imply k (cid:26) (cid:107)g (cid:107)(cid:27) (cid:110) ε (cid:111) q (0)−q (dtk)≥β(cid:107)g (cid:107) min δtk, k ≥βε min δtk, . (23) k k k k k (cid:107)B (cid:107) k M k Then Line 8 of Algorithm 2 implies lim δtk =0. tk→∞ k From This, Lemma 2, and (24), we obtain |rk−1|=(cid:12)(cid:12)(cid:12)(cid:12)fqk−(0)f−(xkq+(ddtktkk)) −1(cid:12)(cid:12)(cid:12)(cid:12)=(cid:12)(cid:12)(cid:12)(cid:12)fk−f(xk+q d(0tkk))−−q(q(kd(t0k))−qk(dtkk))(cid:12)(cid:12)(cid:12)(cid:12) k k k k k k O((cid:107)dtk(cid:107)2) O((δtk)2) ≤ k ≤ k →0 (t →∞), βε min(cid:8)δtk,ε/M(cid:9) βε min(cid:8)δtk,ε/M(cid:9) k k k implying that there exists a positive integer k such that for k ≥k we have r ≥µ . This and (18) lead 0 0 k 1 to T −f(x +dtk) f −f(x +dtk) r = k k k ≥ k k k ≥µ , (cid:98)k q (0)−q (dtk) q (0)−q (dtk) 1 k k k k k k implying that the inner cycle is well-defined. (ii)Assumethatthereexistsapositiveintegerk suchthatforanarbitrarypositiveintegerpthepoint x is not very successful. Hence, for any constant p=0,1,2,···, we get k+p+1 r <µ . (cid:98)k+p 2 The fact that (cid:107)g (cid:107)≥ε>0, (H2), and (17) imply k (cid:26) (cid:27) (cid:107)g (cid:107) T −f(x +d )≥µ (q (0)−q (d ))≥βµ (cid:107)g (cid:107) min δ , k+p k+p k+p k+p 1 k+p k+p k+p 1 k+p k+p (cid:107)B (cid:107) k+p (24) (cid:110) ε (cid:111) ≥βµ ε min δ , . 1 k+p M Twogloballyconvergentnonmonotonetrust-regionmethodsforunconstrainedoptimization 9 By using (22) and (24), we can write lim δ =0. (25) k+p p→∞ From Lemma 2, (25), and (23), we obtain (cid:12) (cid:12) |rk+p−1|=(cid:12)(cid:12)(cid:12)f(xqk+p()0−)−f(qxk+p(d+dk)+p) −1(cid:12)(cid:12)(cid:12) k+p k+p k+p (cid:12) (cid:12) =(cid:12)(cid:12)f(xk+p)−f(xk+p+dk+p)−(qk+p(0)−qk+p(dk+p))(cid:12)(cid:12) (cid:12) q (0)−q (d ) (cid:12) k+p k+p k+p O((cid:107)d (cid:107)2) O(δ2 ) ≤ k+p ≤ k+p →0 (p→∞). βε min{δ ,ε/M} βε min{δ ,ε/M} k+p k+p Then, for a sufficiently large p, we get r ≥µ leading to k+p 2 T −f(x +d ) f(x )−f(x +d ) k+p k+p k+p ≥ k+p k+p k+p ≥µ . q (0)−q (d ) q (0)−q (d ) 2 k+p k+p k+p k+p k+p k+p implying r ≥ µ , for a sufficiently large p. This contradicts with assumption r < µ giving the (cid:98)k+p 2 (cid:98)k+p 2 result. (cid:116)(cid:117) Lemma9(i)impliesthattheinnercyclewillbeleavedafterafinitenumberofinternaliterations,and Lemma 9(ii) implies that if the current iteration is not a first-order stationary point, then at least there exists a very successful iteration point, i.e., the trust-region radius δ can be enlarged. The next result k gives the global convergence of the sequence {x } of Algorithm 2. k Theorem 10 Supposethat(H1)and(H2)hold,andsupposethesequence{x }isgeneratedbyAlgorithm k 2. Then liminf(cid:107)g (cid:107)=0. (26) k k→∞ Proof We consider two cases: (i) Algorithm 2 has finitely many very successful iterations; (ii) Algorithm 2 has infinitely many very successful iterations. In Case 1, we suppose that k be the largest index of very successful iterations. If (cid:107)g (cid:107)>0, then 0 k0+1 Lemma 9(ii) implies that there exist a very successful iteration with larger index than k . This is a 0 contradiction to the definition of k . 0 In Case 2, by contradiction, we assume that there exist constants ε>0 and K >0 such that (cid:107)g (cid:107)≥ε, (27) k for all k ≥K. If x is a successful iteration and k ≥K, then by using (H2), (17), and (27), we get k+1 T −f(x +d )≥µ (q (0)−q (d )) k k k 1 k k k (cid:26) (cid:107)g (cid:107)(cid:27) (cid:110) ε (cid:111) (28) ≥βµ (cid:107)g (cid:107) min δ , k ≥βµ ε min δ , ≥0. 1 k k (cid:107)B (cid:107) 1 k M k It follows from this inequality and (22) that lim δ =0. (29) k k→∞ Since Algorithm 2 has infinitely many very successful iterations, then Lemma 9(ii) and (27) imply that thesequence{x }involvesinfinitelymanyverysuccessfuliterationsinwhichthetrust-regionisenlarged, k which is a contradiction with (29). This implies the result is valid. (cid:116)(cid:117) Theorem 11 Suppose that (H1) and (H2) hold, and the sequence {x } is generated by Algorithm 2. k Then lim (cid:107)g (cid:107)=0. (30) k k→∞ Moreover, there is no limit point of the sequence {x } to be a local maximizer of f. k 10 MasoudAhookhosh,SusanGhaderi Proof Bycontradiction,weassumelim (cid:107)g (cid:107)=(cid:54) 0.Hencethereexistsε>0andaninfinitesubsequence k→∞ k of {x }, indexed by {t }, such that k i (cid:107)g (cid:107)≥2ε>0, (31) ti foralli∈N.Theorem10ensurestheexistence,foreacht ,afirstsuccessfuliterationr(t )>t suchthat i i i (cid:107)g (cid:107)<ε. We denote r =r(t ). Hence there exists another subsequence, indexed by {r }, such that r(ti) i i i (cid:107)g (cid:107)≥ε for t ≤k <r , (cid:107)g (cid:107)<ε. (32) k i i ri We now restrict our attention to the sequence of successful iterations whose indices are in the set κ={k ∈N | t ≤k <r }. i i Using (32), for every k ∈κ, (28) holds. It follows from (22) and (28) that lim δ =0, (33) k k→∞ for k ∈κ. Now, using (H2), (17), and (cid:107)g (cid:107)≥ε, the condition (23) holds, for k ∈κ. This, Lemma 2, and k (33) lead to (cid:12) (cid:12) (cid:12) (cid:12) |rk−1|=(cid:12)(cid:12)(cid:12)fqk−(0)f−(xkq+(ddk)) −1(cid:12)(cid:12)(cid:12)=(cid:12)(cid:12)(cid:12)fk−f(xkq+(d0k))−−q(q(kd(0))−qk(dk))(cid:12)(cid:12)(cid:12) k k k k k k O((cid:107)d (cid:107)2) O(δ2) ≤ k ≤ k →0 (k →∞, k ∈κ). βε min{δ ,ε/M} βεδ k k Thus, for a sufficiently large k+1∈κ, we get f −f(x +d )≥µ (q (0)−q (d )) k k k 1 k k k (cid:26) (cid:107)g (cid:107)(cid:27) (cid:110) ε (cid:111) (34) ≥βµ (cid:107)g (cid:107)min δ , k ≥βµ ε min δ , . 1 k k (cid:107)B (cid:107) 1 k M k The condition (33) implies that δ ≤ε/M. Hence, for a sufficiently large k ∈κ, we obtain k 1 δ ≤ (f −f ). (35) k βµ k k+1 1 Then (18) and (35) imply r(cid:88)i−1 r(cid:88)i−1 1 1 (cid:107)x −x (cid:107)≤ (cid:107)x −x (cid:107)≤ δ ≤ (f −f )≤ (T −f ), (36) ti ri j j+1 j βµ ti ri βµ ti ri 1 1 j∈κ,j=ti j∈κ,j=ti for a sufficiently large i. Now, Corollary 8 implies 1 0≤ lim (cid:107)x −x (cid:107)≤ lim (T −f )=0, i→∞ ti ri i→∞βµ1 ti ri leading to lim (cid:107)x −x (cid:107)=0. i→∞ ti ri Since the gradient is continuous, we get lim (cid:107)g −g (cid:107)=0. (37) i→∞ ti ri In view of the definitions of {t } and {r }, it is impossible, guaranteeing (cid:107)g −g (cid:107)≥ε. Therefore, there i i ti ri is no subsequence that satisfies (31) giving the result. To observe there is no limit point of the sequence {x } to be a local maximizer of f, see [27]. (cid:116)(cid:117) k ThenextresultgivestheglobalconvergenceofthesequencegeneratedbyAlgorithm2tosecond-order stationary points. To this end, similar to [15], an additional assumption is needed: (H4) If λ (B ) represents the smallest eigenvalue of the symmetric matrix B , then there exists a min k k positive scalar c such that 3 q (0)−q (d )≥c λ (B )δ2. k k k 3 min k

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.