ebook img

An Online Convex Optimization Approach to Dynamic Network Resource Allocation PDF

1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview An Online Convex Optimization Approach to Dynamic Network Resource Allocation

1 An Online Convex Optimization Approach to Dynamic Network Resource Allocation Tianyi Chen, Student Member, IEEE, Qing Ling, Senior Member, IEEE, and Georgios B. Giannakis, Fellow, IEEE Abstract—Existing approaches to online convex optimization achieving sub-linear static regret is no longer attractive. Recent (OCO) make sequential one-slot-ahead decisions, which lead to works [5]–[8] extend the analysis of static regret to that of (possibly adversarial) losses that drive subsequent decision it- dynamicregret,wheretheperformanceofanOCOalgorithmis 7 erates. Their performance is evaluated by the so-called regret 1 that measures the difference of losses between the online solution benchmarked by the best dynamic solution with a-priori infor- 0 and the best yet fixed overall solution in hindsight. The present mationontheone-slot-aheadcostfunction.Sub-lineardynamic 2 paper deals with online convex optimization involving adversarial regret is proved to be possible, if the dynamic environment n loss functions and adversarial constraints, where the constraints changes slow enough for the accumulated variation of either are revealed after making decisions, and can be tolerable to a costs or per-slot minimizers to be sub-linearly increasing with J instantaneous violations but must be satisfied in the long term. respect to the time horizon. When the per-slot costs depend on Performanceofanonlinealgorithminthissettingisassessedby:i) 7 thedifferenceofitslossesrelativetothebestdynamicsolutionwith previous decisions, the so-termed competitive difference can be 2 one-slot-ahead information of the loss function and the constraint employed as an alternative of the static regret [9], [10]. (that is here termed dynamic regret); and, ii) the accumulated The aforementioned works [5]–[10] deal with dynamic costs ] Y amount of constraint violations (that is here termed dynamic fit). focusing on problems with time-invariant constraints that must In this context, a modified online saddle-point (MOSP) scheme is S bestrictlysatisfied,butdonotallowforinstantaneousviolations developed,andprovedtosimultaneouslyyieldsub-lineardynamic s. regret and fit, provided that the accumulated variations of per- of the constraints. The long-term effect of such instantaneous c slotminimizersandconstraintsaresub-linearlygrowingwithtime. violations was studied in [11], where an online algorithm with [ MOSP is also applied to the dynamic network resource allocation sub-linear static regret and sub-linear accumulated constraint 2 task, and it is compared with the well-known stochastic dual violation was also developed. The regret bounds in [11] have gradientmethod.Undervariousscenarios,numericalexperiments v been improved in [12] and [13]. Decentralized optimization demonstrate the performance gain of MOSP relative to the state- 4 of-the-art. with consensus constraints, as a special case of having long- 7 term but time-invariant constraints, has been studied in [14]– 9 Index Terms—Constrained optimization, primal-dual method, [16].Nevertheless,[11]–[16]donotdealwithOCOundertime- 3 online convex optimization, network resource allocation. 0 varying adversarial constraints. . In this context, the present paper considers OCO with time- 1 I. INTRODUCTION varyingconstraintsthatmustbesatisfiedinthelongterm.Under 0 7 Online convex optimization (OCO) is an emerging method- this setting, the learner first takes an action without knowing a- 1 ology for sequential inference with well documented merits priori either the adversarial cost or the time-varying constraint, v: especially when the sequence of convex costs varies in an which are revealed by the nature subsequently. Its performance i unknown and possibly adversarial manner [1]–[3]. Starting is evaluated by: i) dynamic regret that is the optimality loss X from the seminal papers [1] and [2], most of the early works relative to a sequence of instantaneous minimizers with known r evaluate OCO algorithms with a static regret, which measures costs and constraints; and, ii) dynamic fit that accumulates a thedifferenceofcosts(a.k.a.losses)betweentheonlinesolution constraint violations incurred by the online learner due to the and the overall best static solution in hindsight. If an algorithm lack of knowledge about future constraints. We compare the incursstaticregretthatincreasessub-linearlywithtime,thenits OCO setting here with those of existing works in Table I. performancelossaveragedoveraninfinitetimehorizongoesto We further introduce a modified online saddle-point (MOSP) zero; see also [3], [4], and references therein. method in this novel OCO framework, where the learner deals However, static regret is not a comprehensive performance with time-varying costs as well as time-varying but long-term metric [5]. Take online parameter estimation as an example. constraints.WeanalyticallyestablishthatMOSPsimultaneously When the true parameter varies over time, a static benchmark achieves sub-linear dynamic regret and fit, provided that the (time-invariant estimator) itself often performs poorly so that accumulatedvariationsofbothminimizersandconstraintsgrow sub-linearly. This result provides valuable insights for OCO WorkinthispaperwassupportedbyNSF1509040,1508993,1509005,NSF with long-term constraints: When the dynamic environment China61573331,NSFAnhui1608085QF130,andcas-xda06011203. comprising both costs and constraints does not change on T.ChenandG.B.GiannakisarewiththeDepartmentofElectricalandCom- puterEngineeringandtheDigitalTechnologyCenter,UniversityofMinnesota, average, the online decisions provided by MOSP are as good Minneapolis,MN55455USA.Emails:{chen3827,georgios}@umn.edu as the best dynamic solution over a long time horizon. Q. Ling is with the Department of Automation, University of Sci- To demonstrate the impact of these results, we further apply ence and Technology of China, Hefei, Anhui 230026, China. Email: qin- [email protected] the proposed MOSP approach to a dynamic network resource 2 TABLEI ASUMMARYOFRELATEDWORKSONDISCRETETIMEOCO Reference Typeofbenchmark Long-termconstraint Adversarialconstraint [1] Staticanddynamic No No [2]–[4] Static No No [5],[8],[17] Dynamic No No [6],[7] Dynamic No No [9],[10] Dynamic No No [11],[12],[14],[15] Static Yes No [16] Dynamic Yes No Thiswork Dynamic Yes Yes allocationtask,whereonlinemanagementofresourcesissought Notation.Edenotesexpectation,Pstandsforprobability,(·)(cid:62) without knowing future network states. Existing algorithms stands for vector and matrix transposition, and (cid:107)x(cid:107) denotes the include first- and second-order methods in the dual domain (cid:96) -norm of a vector x. Inequalities for vectors, e.g., x>0, are 2 [18]–[23], which are tailored for time-invariant deterministic defined entry-wise. The positive projection operator is defined formulations. To capture the temporal variations of network as [a]+ :=max{a,0}, also entry-wise. resources,stochasticformulationofnetworkresourceallocation has been extensively pursued since the seminal work of [24]; II. OCOWITHLONG-TERMTIME-VARYINGCONSTRAINTS see also the celebrated stochastic dual gradient method in [25], In this section, we introduce the generic OCO formulation [26]. These stochastic approximation-based approaches assume with long-term time-varying constraints, along with pertinent that the time-varying costs are i.i.d. or generally samples from metrics to evaluate an OCO algorithm. a stationary ergodic stochastic process [27]–[29]. However, performance of most stochastic schemes is established in an asymptoticsense,consideringtheensembleofperslotaverages A. Problem formulation or infinite samples across time. Clearly, stationarity may not We begin with the classical OCO setting, where constraints holdinpractice,especiallywhenthestochasticprocessinvolves are time-invariant and must be strictly satisfied. OCO can be human participation. Inheriting merits of the OCO framework, viewed as a repeated game between a learner and nature [1]– the proposed MOSP approach operates in a fully online mode [3]. Consider that time is discrete and indexed by t. Per slot without requiring non-causal information, and further admits t, a learner selects an action x from a convex set X ⊆ RI, t finite-sample performance analysis under a sequence of non- and subsequently nature chooses a (possibly adversarial) loss stochastic, or even adversarial costs and constraints. function f (·) : RI → R through which the learner incurs a t Relative to existing works, the main contributions of the loss f (x ). The convex set X is a-priori known and fixed over t t present paper are summarized as follows. the entire time horizon. Although this standard OCO setting is c1) We generalize the standard OCO framework with only appealing to various applications such as online regression and adversarialcostsin[1]–[4]toaccountforbothadversarial classification[1]–[3],itdoesnotaccountforpotentialvariations costs and constraints. Different from the regret analysis of (possibly unknown) constraints, and does not deal with in [11]–[15], performance here is established relative to constraints that can possibly be satisfied in the long term rather the best dynamic benchmark, via metrics that we term thanaslot-by-slotbasis.Onlineoptimizationwithtime-varying dynamic regret and fit. andlong-termconstraintsiswellmotivatedforapplicationssuch c2) WedevelopanMOSPalgorithmtotacklethisnovelOCO as navigation, tracking, and dynamic resource allocation [13], problem, and analytically establish that MOSP yields [25],[26],[30].Takingresourceallocationasanexample,time- simultaneouslysub-lineardynamicregretandfit,provided varying long-term constraints are usually imposed to tolerate thattheaccumulatedvariationsofper-slotminimizersand instantaneousviolationswhenavailableresourcescannotsatisfy constraints are sub-linearly growing with time. user requests, and hence allow flexible adaptation of online c3) OurnovelOCOapproachistailoredfordynamicresource decisions to temporal variations of resource availability. allocation tasks, where MOSP is compared with the To broaden the applicability of OCO to these scenarios, we popular stochastic dual gradient approach. Relative to the consider that per slot t, a learner selects an action x from a t latter, MOSP remains operational in a broader practical known and fixed convex set X ⊆ RI, and then nature reveals settingwithoutprobabilisticassumptions.Numericaltests not only a loss function f (·):RI →R but also a time-varying t demonstrate the gain of MOSP over state-of-the-art alter- (possibly adversarial) penalty function g (·) : RI → RI. This t natives. function leads to a time-varying constraint g (x)≤0, which is t Outline. The rest of the paper is organized as follows. The driven by the unknown dynamics in various applications, e.g., OCOproblemwithlong-termconstraintsisformulated,andthe on-demanddatarequestarrivalsinresourceallocation.Different relevant performance metrics are introduced in Section II. The from the known and fixed set X, the time-varying constraint MOSP algorithm and its performance analysis are presented g (x)≤0canvaryarbitrarilyorevenadversariallyfromslotto t in Section III. Application of the novel OCO framework and slot.Itisrevealedonlyafterthelearnermakesher/hisdecision, the MOSP algorithm in network resource allocation, as well andhenceitishardtobesatisfiedineverytimeslot.Therefore, as corresponding numerical tests, are provided in Section IV. thegoal in thiscontext isto finda sequenceof onlinesolutions Section V concludes the paper. {x ∈ X} that minimize the aggregate loss, and ensures that t 3 the constraints {g (x ) ≤ 0} are satisfied in the long-term algorithms under time-invariant constraints. We adopt it in the t t on average. Specifically, we aim to solve the following online setting of (1) by incorporating time-varying constraints optimization problem T T (cid:88) (cid:88) (cid:88)T RegdT := ft(xt)− ft(x∗t) (4) min ft(xt) (1a) t=1 t=1 {xt∈X,∀t} t=1 where the benchmark is now formed via a sequence of best (cid:88)T dynamicsolutions{x∗}fortheinstantaneouscostminimization s. t. g (x )≤0 (1b) t t t problem subject to the instantaneous constraint, namely t=1 whereT isthetimehorizon,x ∈RI isthedecisionvariable,f x∗t ∈argmin ft(x) s. t. gt(x)≤0. (5) t t x∈X is the cost function, g :=[g1,...,gI](cid:62) denotes the constraint t t t function with ith entry gi : RI → R, and X ∈ RI is a Clearly,thedynamicregretisalwayslargerthanthestaticregret t in (2), i.e., Regs ≤ Regd, because (cid:80)T f (x∗) is always no convex set. The formulation (1) extends the standard OCO T T t=1 t smaller than (cid:80)T f (x∗) according to the definitions of x∗ frameworktoaccommodateadversarialtime-varyingconstraints t=1 t t andx∗.Hence,asub-lineardynamicregretimpliesasub-linear that must be satisfied in the long term. Complemented by t static regret, but not vice versa. algorithm development and performance analysis to be carried in the following sections, the main contribution of the present To ensure feasibility of online decisions, the notion of dy- paperisincorporationoflong-termandtime-varyingconstraints namic fit is introduced to measure the accumulated violation to markedly broaden the scope of OCO. of constraints; under time-invariant long-term constraints [11], [12], [14] or under time-varying constraints [13]. It is defined as B. Performance and feasibility metrics (cid:13) (cid:13) (cid:13)(cid:34) T (cid:35)+(cid:13) Regarding performance of online decisions {xt}Tt=1, static FitdT :=(cid:13)(cid:13)(cid:13) (cid:88)gt(xt) (cid:13)(cid:13)(cid:13). (6) regret is adopted as a metric by standard OCO schemes, under (cid:13) t=1 (cid:13) time-invariantandstrictlysatisfiedconstraints.Thestaticregret Observethatthedynamicfitiszeroiftheaccumulatedviolation measures the difference between the online loss of an OCO (cid:80)T g (x ) is entry-wise less than zero. However, enforcing algorithm and that of the best fixed solution in hindsight [1]– t=1 t t (cid:80)T g (x ) ≤ 0 is different from restricting x to meet [3]. Extending the definition of static regret over T slots to t=1 t t t g (x ) ≤ 0 in each and every slot. While the latter readily accommodatetime-varyingconstraints,itcanbewrittenas(see t t implies the former, the long-term (aggregate) constraint allows also [13]) adaptation of online decisions to the environment dynamics; as T T aresult,itistolerabletohaveg (x )≥0andg (x )≤0. (cid:88) (cid:88) t t t+1 t+1 RegsT := ft(xt)− ft(x∗) (2) An ideal algorithm in this broader OCO framework is the t=1 t=1 one that achieves both sub-linear dynamic regret and sub- where the best static solution x∗ is obtained as linear dynamic fit. A sub-linear dynamic regret implies “no- regret”relativetotheclairvoyantdynamicsolutiononthelong- x∗ ∈argmin (cid:88)T f (x) s. t. g (x)≤0, ∀t. (3) term average; i.e., limT→∞RegdT/T = 0; and a sub-linear t t dynamic fit indicates that the online strategy is also feasible x∈X t=1 on average; i.e., lim Fitd/T =0. Unfortunately, the sub- T→∞ T A desirable OCO algorithm in this case is the one yielding a linear dynamic regret is not achievable in general, even under sub-linear regret [11], [12], meaning Regs = o(T). Conse- the special case of (1) where the time-varying constraint is T quently, lim Regs /T = 0 implies that the algorithm is absent [5]. For this reason, we aim at designing and analyzing T→∞ T “on average” no-regret, or in other words, not worse asymp- an online strategy that generates a sequence {x }T ensuring t t=1 totically than the best fixed solution x∗. Though widely used sub-linear dynamic regret and fit, under mild conditions that in various OCO applications, the aforementioned static regret must be satisfied by the cost and constraint variations. metric relies on a rather coarse benchmark, which may be less usefulespeciallyindynamicsettings.Forinstance,[5,Example III. MODIFIEDONLINESADDLE-POINT(MOSP)METHOD 2] shows that the gap between the best static and the best dynamic benchmark can be as large as O(T). Furthermore, In this section, a modified online saddle-point method is since the time-varying constraint g (x ) ≤ 0 is not observed developed to solve (1), and its performance and feasibility are t t before making a decision x , its feasibility can not be checked analyzed using the dynamic regret and fit metrics. t instantaneously. In response to the quest for improved benchmarks in this A. Algorithm development dynamicsetup,twometricsareconsideredhere:dynamicregret and dynamic fit. The notion of dynamic regret (also termed Consider now the per-slot problem (5), which contains the tracking regret or adaptive regret) has been recently introduced current objective f (x), the current constraint g (x) ≤ 0, and t t in [5]–[8] to offer a competitive performance measure of OCO a time-invariant constraint set X. With λ ∈ RN denoting the + 4 Algorithm 1 Modified online saddle-point (MOSP) method Assumption 1. For every t, the cost function f (x) and the t 1: Initialize: primal iterate x0, dual iterate λ1, and proper time-varying constraint gt(x) in (1) are convex. stepsizes α and µ. Assumption 2. Foreveryt,f (x)hasboundedgradientonX; 2: for t=1,2... do t i.e., (cid:107)∇f (x)(cid:107) ≤ G, ∀x ∈ X; and g (x) is bounded on X; 3: Update primal variable xt by solving (8). t t i.e., (cid:107)g (x)(cid:107)≤M, ∀x∈X. 4: Observe the current cost ft(x) and constraint gt(x). t 5: Update the dual variable λt+1 via (9). Assumption 3. The radius of the convex feasible set X is 6: end for bounded; i.e., (cid:107)x−y(cid:107)≤R, ∀x,y∈X. Assumption 4. There exists a constant (cid:15)>0, and an interior Lagrangemultiplierassociatedwiththetime-varyingconstraint, point x˜ ∈X such that g (x˜)≤−(cid:15)1, ∀t. t the online (partial) Lagrangian of (5) can be expressed as Assumption 1 is necessary for regret analysis in the OCO Lt(x,λ):=ft(x)+λ(cid:62)gt(x) (7) setting.Assumption2boundsprimalanddualgradientsperslot, which is also typical in OCO [3], [6], [12], [14]. Assumption where x ∈ X remains implicit. For the online Lagrangian (7), 3 restricts the action set to be bounded. Assumption 4 is weintroduce amodified onlinesaddlepoint (MOSP)approach, Slater’scondition,whichguaranteestheexistenceofabounded which takes a modified descent step in the primal domain, and Lagrange multiplier [31]. a dual ascent step at each time slot t. Specifically, given the Under these assumptions, we are on track to first provide an previous primal iterate x and the current dual iterate λ at t−1 t upper bound for the dynamic fit. each slot t, the current decision x is the minimizer of the t following optimization problem Theorem 1. Define the maximum variation of consecutive (cid:107)x−x (cid:107)2 constraints as min ∇(cid:62)f (x )(x−x )+λ(cid:62)g (x)+ t−1 x∈X t−1 t−1 t−1 t t−1 2α (cid:13) (cid:13) (8) V¯(g):=maxV(g ), with V(g ):=max (cid:13)[g (x)−g (x)]+(cid:13) t t (cid:13) t+1 t (cid:13) whereαisapositivestepsize,and∇f (x )isthegradient1 t x∈X t−1 t−1 (10) of primal objective f (x) at x = x . After the current t−1 t−1 and assume the slack constant (cid:15) in Assumption 4 to be larger decision x is made, f (x) and g (x) are observed, and the t t t than the maximum variation2; i.e., (cid:15) > V¯(g). Then under dual update takes the form Assumptions 1-4 and the dual variable initialization λ = 0, 1 λ =(cid:2)λ +µ∇ L (x ,λ )(cid:3)+ =(cid:2)λ +µg (x )(cid:3)+ (9) the dual iterate for the MOSP recursion (8)-(9) is bounded by t+1 t λ t t t t t t whereµisalsoapositivestepsize,and∇λLt(xt,λt)=gt(xt) 2GR+R2/(2α)+(µM2)/2 is the gradient of online Lagrangian (7) with respect to (w.r.t.) (cid:107)λt(cid:107)≤(cid:107)λ¯(cid:107):=µM+ (cid:15)−V¯(g) , ∀t (11) λ at λ=λ . t Remark 1. The primal gradient step of the classical saddle- and the dynamic fit in (6) is upper-bounded by point approach in [11], [13], [14] is tantamount to minimizing a first-order approximation of L (x,λ ) at x = x plus (cid:107)λ (cid:107) (cid:107)λ¯(cid:107) 2GR/µ+R2/(2αµ)+M2/2 t−1 t t−1 Fitd ≤ T+1 ≤ =M + a proximal term (cid:107)x−xt−1(cid:107)2/(2α). We call the primal-dual T µ µ (cid:15)−V¯(g) recursion(8)and(9)asamodifiedonlinesaddle-pointapproach, (12) since the primal update (8) is not an exact gradient step when the constraint gt(x) is nonlinear w.r.t. x. However, when gt(x) where G, M, R, and (cid:15) are as in Assumptions 2-4. is linear, (8) and (9) reduce to the approach in [11], [13], [14]. Similar to the primal update of OCO with long-term but time- Proof. See Appendix A. invariant constraints in [12], the minimization in (8) penalizes the exact constraint violation g (x) instead of its first-order t Theorem 1 asserts that under a mild condition on the time- approximation, which improves control of constraint violations varyingconstraints,(cid:107)λ (cid:107)isuniformlyupper-bounded,andmore t and facilitates performance analysis of MOSP. importantly, its scaled version (cid:107)λ (cid:107)/µ upper bounds the T+1 dynamic fit. Observe that with a fixed primal stepsize α, Fitd is in the order of O(1/µ), thus a larger dual stepsize B. Performance analysis T essentiallyenablesabettersatisfactionoflong-termconstraints. We proceed to show that for MOSP, the dynamic regret in Inaddition,asmallerV¯(g)leadstoasmallerdynamicfit,which (4)andthedynamicfitin(6)arebothsub-linearlyincreasingif also makes sense intuitively. the variations of the per-slot minimizers and the constraints are In the next theorem, we further bound the dynamic regret. small enough. Before formally stating this result, we assume that the following conditions are satisfied. 2This equivalently requires (cid:15) := mini,tmaxx∈X[−gti(x)]+ > maxx∈X 1One can replace the gradient by one of the sub-gradients when ft(x) is (cid:13)(cid:13)(cid:2)gt+1(x)−gt(x)(cid:3)+(cid:13)(cid:13),whichisvalidwhentheregiondefinedbygt(x)≤0 non-differentiable.Theperformanceanalysisstillholdstrueforthiscase. islargeenough,or,thetrajectoryofgt(x)issmoothenoughacrosstime. 5 Theorem 2. Under Assumptions 1-4 and the dual variable Observe that the sub-linear regret and fit in Corollary 1 are initialization λ = 0, the MOSP recursion (8)-(9) yields a achieved under a slightly “strict” condition that V({x∗}T )= 1 t t=1 dynamic regret o(T32) and V({gt}Tt=1) = o(T32). The next corollary shows that this condition can be further relaxed if a-priori knowledge RV({x∗}T ) R2 Regd≤ t t=1 +(cid:107)λ¯(cid:107)V({g }T )+ of the time-varying environment is available. T α t t=1 2α αG2T µM2(T +1) Corollary 2. Consider Assumptions 1-4 are satisfied, and the + + (13) 2 2 dual variable is initialized as λ1 = 0. If there exists a constant β ∈ [0,1) such that the temporal variations satisfy where V({x∗}T ) is the accumulated variation of the per-slot t t=1 V({x∗}T ) = o(Tβ) and V({g }T ) = o(Tβ), then choos- minimizers x∗ defined as t t=1 t t=1 t ing the primal and dual stepsizes as α =µ=O(Tβ−21) leads T to the dynamic fit (cid:88) V({x∗}T ):= (cid:107)x∗−x∗ (cid:107) (14) t t=1 t t−1 (cid:16) (cid:17) t=1 FitdT=O T1−2β+T1−β =O(T1−β)=o(T) (20) and V({g }T ) is the accumulated variation of consecutive t t=1 and the corresponding dynamic regret constraints (cid:16) (cid:17) V({g }T ):=(cid:88)T V(g )=(cid:88)T max(cid:13)(cid:13)[g (x)−g (x)]+(cid:13)(cid:13). (15) RegdT =O Tβ+21+T1−2β+Tβ+21 =O(Tβ+21)=o(T). (21) t t=1 t (cid:13) t+1 t (cid:13) x∈X t=1 t=1 Corollary 2 provides valuable insights for choosing opti- Proof. See Appendix B. mal stepsizes in non-stationary settings. Specifically, adjusting stepsizes to match the variability of the dynamic environment Theorem 2 asserts that MOSP’s dynamic regret is upper- is the key to achieving the optimal performance in terms of boundedbyaconstantdependingontheaccumulatedvariations dynamic regret and fit. Intuitively, when the variation of the of per-slot minimizers and time-varying constraints as well as environmentisfast(alargerβ),slowlydecayingstepsizes(thus the primal and dual stepsizes. While the dynamic regret in the larger stepsizes) can better track the potential changes. current form (13) is hard to grasp, the next corollary shall demonstrate that Regd can be very small. Remark 2. Theorems 1 and 2 are in the spirit of the recent T Based on Theorems 1-2, we are ready to establish that under work [5], [8], [17], where the regret bounds are established themildconditionsfortheaccumulatedvariationofconstraints with respect to a dynamic benchmark in either deterministic and minimizers, the dynamic regret and fit are sub-linearly or stochastic settings. However, [5], [8], [17] do not account increasing with T. for long-term and time-varying constraints, while the dynamic regret analysis is generalized here to the setting with long-term Corollary 1. Under Assumptions 1-4 and the dual variable constraints. Interesting though, sub-linear dynamic regret and initializationλ =0,iftheprimalanddualstepsizesarechosen 1 fit can be achieved when the dynamic environment consisting such that α = µ = O(T−13), then the dynamic fit is upper- of the per-slot minimizer and the time-varying constraint does bounded by not vary on average, that is, V({x∗}T ) and V({g }T ) are t t=1 t t=1 (cid:18) 2GRT1/3+R2T2/3/2+M2/2(cid:19) sub-linearly increasing over T. FitdT=O M+ (cid:15)−V¯(g) =O(T32).(16) Inaddition,ifthetemporalvariationsofoptimalargumentsand C. Beyond dynamic regret constraints satisfy V({x∗t}Tt=1) = o(T32) and V({gt}Tt=1) = Although the dynamic benchmark in (4) is more competitive o(T32), then the dynamic regret is sub-linearly increasing, i.e., than the static one in (2), it is worth noting that the sequence Regd =o(T). (17) of the per-slot minimizer x∗t in (5) is not the optimal solution T to problem (1). Defining the sequence of optimal solutions to Proof. Pluggingα=µ=O(T−31)into(12),theboundin(16) (1) as {xotff}Tt=1, it is instructive to see that computing each readily follows. Likewise, we have from (13) that minimizer x∗ in (5) only requires one-slot-ahead information t (namely, f (x) and g (x)), while computing each xoff within RegdT =O(cid:16)RV({x∗t}Tt=1)T31 +(cid:107)λ¯(cid:107)V({gt}Tt=1)+R22T13 {isx,ot{fff}Tt(=x1)}rteTquiraensdin{fogtrm(xa)t}ioTno)v.eFrothretehnistirreeatismone,hworteizounse(ththaet G2T32 M2T32(cid:17) subscrtipt “ot=ff1” in {xotff}T t=1to emphasize that this solution + + . (18) t t=1 2 2 comes from offline computation with information over T slots. Considering the upper bound on the dual iterates in (11), it Note that for the cases without long-term constraints [5]–[8], follows that (cid:107)λ¯(cid:107)=O(µ+1/α)=O(T31), which implies that the sequence of offline solutions {xotff}Tt=1 coincides with the sequence of per-slot minimizers {x∗}T . (cid:16) (cid:110) (cid:111)(cid:17) t t=1 RegdT=O max V({x∗t}Tt=1)T13, V({gt}Tt=1)T13, T23 . Regarding feasibility, {xotff}Tt=1 exactly satisfies the long- (19) term constraint (1b), while the solution of MOSP satisfies Therefore, we deduce that Regd = o(T), if V({x∗}T ) = (1b) on average under mild conditions (cf. Corollary 1). For T t t=1 o(T23) and V({gt}Tt=1)=o(T32). optimality, the cost of the online decisions {xt}Tt=1 attained 6 by MOSP is further benchmarked by the offline solutions and the total variation over the time horizon T as {xoff}T . To this end, define MOSP’s optimality gap as V({D }T ):=(cid:80)T V(D ).Thenthecostdifferencebetween t t=1 t t=1 t=1 t the best offline solution and the best dynamic solution satisfies T T OptGapoTff :=(cid:88)ft(xt)−(cid:88)ft(xotff). (22a) (cid:88)T f (x∗)−(cid:88)T f (xoff)≤2TV({D }T ) (27) t=1 t=1 t t t t t t=1 t=1 t=1 Intuitively,if{xoff}T arecloseto{x∗}T ,thedynamicregret wherex∗ istheminimizeroftheinstantaneousproblem(5),and t t=1 t t=1 t Regd is able to provide an accurate performance measure in xoff solves (1) with all future information available. Combined T t the sense of OptGapoff. Specifically, one can decompose the with (22b), it readily follows that T optimality gap as OptGapoff ≤Regd +2TV({D }T ) (28) T T t t=1 (cid:88)T (cid:88)T (cid:88)T (cid:88)T where Regd is defined in (4), and OptGapoff in (22). OptGapoff= f (x )− f (x∗)+ f (x∗)− f (xoff) T T T t t t t t t t t t=1 t=1 t=1 t=1 Proof. Instead of going to the primal domain, we upper bound (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) U via the dual representation in (25). Letting t˜denote any slot U1 U2 2 (22b) in T :={1,...,T}, we have where U1 corresponds to the dynamic regret RegdT in (4) (cid:88) (cid:88) maxD (λ)−max D (λ) (29) capturing the regret relative to the sequence of per-slot mini- t t λ≥0 λ≥0 mizerswithone-slot-aheadinformation,andU isthedifference t∈T t∈T between the performance of per-slot minimize2rs and the offline ≤(cid:88)(cid:0)Dt(λ∗t)−Dt(λ∗t˜)(cid:1)≤T mt∈aTx(cid:8)Dt(λ∗t)−Dt(λ∗t˜)(cid:9). optimalsolutions.Althoughthesecondtermappearsdifficultto t∈T quantify,wewillshownextthatU2isdrivenbytheaccumulated The first inequality comes from the definition λ∗t ∈ variationofthedualfunctionsassociatedwiththeinstantaneous argmax D (λ). Note that if max {D (λ∗)−D (λ∗)}≤ λ≥0 t t∈T t t t t˜ problems (5). 2V({D }T ), the proposition readily follows from (29). We t t=1 To this end, consider the dual function of the instantaneous will prove this inequality by contradiction. Assume there exists primal problem (5), which can be expressed by minimizing the a slot t ∈ T such that D (λ∗ )−D (λ∗) > 2V({D }T ), 0 t0 t0 t0 t˜ t t=1 online Lagrangian in (7) at time t, namely [31] which implies that Dt(λ):=xm∈iXn Lt(x,λ)=xm∈iXn ft(x)+λ(cid:62)gt(x). (23) Dt˜(λ∗t˜)(≤a)Dt0(λ∗t˜)+V({Dt}Tt=1)(<b)Dt0(λ∗t0)−V({Dt}Tt=1) (c) Likewise, the dual function of (1) over the entire horizon is ≤ Dt˜(λ∗t0), ∀t˜∈T (30) T where inequalities (a) and (c) come from the fact that (cid:88) D(λ):= min Lt(xt,λ) V({Dt}Tt=1) is the accumulated variation over T slots, and {xt∈X,∀t} t=1 hence maxt1,t2∈T (cid:107)Dt1(λ) − Dt2(λ)(cid:107) ≤ V({Dt}Tt=1), while (a)(cid:88)T (b)(cid:88)T (b)isduetothehypothesisabove.NotethatDt˜(λ∗t˜)<Dt˜(λ∗t0) = t=1xmt∈inXLt(xt,λ) = t=1Dt(λ) (24) iTnhe(3re0f)ocreo,ntwraedihcatvsethDe f(aλct∗)th−atDλ∗t˜(λis∗)th≤e2mVax({imDiz}eTr o)f,Dwt˜h(λic)h. t t˜ t t t t=1 completes the proof. where equality (a) holds since the minimization is separable across the summand at time t, and equality (b) is due to the Thefollowingremarkprovidesanapproachtoimprovingthe definition of the per-slot dual function in (23). As the primal bound in Proposition 1. problems (1) and (5) are both convex, Slater’s condition in Remark 3. Although the optimality gap in (28) appears to Assumption 4 implies that strong duality holds. Accordingly, be at least linear w.r.t. T, one can use the “restarting” trick U in (22b) can be written as 2 for dual variables, similar to that for primal variables in the T T T T unconstrained case; see e.g., [5]. Specifically, if the total vari- (cid:88) (cid:88) (cid:88) (cid:88) ft(x∗t)− ft(xotff)= mλ≥a0xDt(λ)−mλ≥a0x Dt(λ) ation V({Dt}Tt=1) is known a-priori, one can divide the entire t=1 t=1 t=1 t=1 timehorizonT :={1,...,T}into(cid:100)T/∆ (cid:101)sub-horizons(each (25) with ∆ = o(cid:0)T/V({D }T )(cid:1) slots),Tand restart the dual which is the difference between the dual objective of the static T t t=1 best solution, i.e., λ∗ ∈argmax (cid:80)T D (λ), and that of iterate λ at the beginning of each sub-horizon. By assuming λ≥0 t=1 t that V({D }T ) is sub-linear w.r.t. T, one can guarantee that the per-slot best solution for (23), i.e., λ∗∈argmax D (λ). t t=1 t λ≥0 t ∆ ≥ 1 always exists. In this case, the bound in (28) can be Leveraging this special property of the dual problem, we next T improved by establish that U can be bounded by the variation of the dual 2 function,thusprovidinganestimateoftheoptimalitygap(22a). OptGapoff ≤(cid:100)T/∆ (cid:101)Regd +2∆ V({D }T ) (31) T T ∆T T t t=1 Proposition 1. Define the variation of the dual function (23) where the two summands are sub-linear w.r.t. T provided that from time t to t+1 as Regd over each sub-horizon is sub-linear; i.e., Regd = ∆T ∆T o(∆ ). Interested readers are referred to [5] for details of this T V(D ):=max(cid:107)D (λ)−D (λ)(cid:107) (26) restarting trick, which are omitted here due to space limitation. t t+1 t λ≥0 7 andtheresourcecapabilityofdatacenterk isy¯k,whichcanbe compactly expressed by x ∈ X with X := {0 ≤ x ≤ x¯} and x¯ :=[x¯11,...,x¯JK,y¯1,...,y¯K](cid:62). The overall system diagram is depicted in Fig. 1. Mapping node j Data center k For each data center, the power cost fk(yk) := fk(yk;θk) t t t t depends on a time-varying parameter θk, which captures the t energy price and the renewable generation at data center k during slot t. The bandwidth cost fjk(xjk) := fjk(xjk;θjk) t t t t characterizes the transmission delay and is parameterized by a time-varying scalar θjk. Scalars θk and θjk can be readily t t t extendedtovectorforms.Tokeeptheexpositionsimple,weuse scalars to represent time-varying factors at nodes and edges. Fig.1. Adiagramofonlinenetworkresourceallocation.Pertimet,mapping Per slot t, the instantaneous cost f (x ) aggregates the costs node j has an exogenous workload bj plus that stored in the queue qj, and t t schedules workload xjk to data centetr k. Data center k serves an amotunt of of power consumed at all data centers plus the bandwidth costs t workloadyk outoftheassigned(cid:80)J xjk aswellasthatstoredinitsqueue at all links, namely t j=1 t qJ+k.Thethicknessofeachedgeisproportionaltoitscapacity. (cid:88) (cid:88) (cid:88) t f (x ):= fk(yk) + fjk(xjk) (33) t t t t t t (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) IV. APPLICATIONTONETWORKRESOURCEALLOCATION k∈K power cost j∈Jk∈K bandwidth cost In this section, we solve the network resource allocation where the objective can be also written as ft(xt) := f(xt;θt) problem within the OCO framework, and present numerical with θt :=[θt1,...,θtK,θt11,...,θtJK](cid:62) concatenating all time- experiments to demonstrate the merits of our MOSP solver. varying parameters. Aiming to minimize the accumulated cost while serving all workloads, the optimal workload routing and allocation strategy in this cloud network is the solution of the A. Online network resource allocation following optimization problem Consider the resource allocation problem over a cloud net- work[30],whichisrepresentedbyadirectedgraphG =(I, E) min (cid:88)T f (x ) s. t. q =[q +Ax +b ]+, ∀t with node set I and edge set E, where |I| = I and |E| = E. t t t+1 t t t Nodes considered here include mapping nodes collected in the {xt∈X,∀t} t=1 q ≥0, q =0 (34) set J = {1,...,J}, and data centers collected in the set 1 T+1 (cid:83) K={1,...,K}; i.e., we have I =J K. where q is the given initial queue length, and q = 0 1 T+1 Per time t, each mapping node j receives an exogenous guarantees that all workloads arrived have been served at the data request bj, and forwards the amount xjk to each data t t end of the scheduling horizon. Note that (34) is time-coupled, center k in accordance with bandwidth availability. Each data andgenerallychallengingtosolvewithoutinformationoffuture center k schedules workload yk according to its resource t workload arrivals and time-varying cost functions. Therefore, availability. Regarding yk as the weight of a virtual outgoing t we reformulate (34) to fit our OCO formulation (1) by relaxing edge (k,∗) from data center k, edge set E := {(j,k),∀j ∈ the queue recursion in (34), namely (cid:83) J,k ∈K} {(k,∗),∀k ∈K} contains all the links connecting mapping nodes with data centers, and all the “virtual” edges (cid:88)T q ≥q +Ax +b ≥q + (Ax +b ) (35) comingoutofthedatacenters.TheI×E node-incidencematrix T+1 T T T 1 t t is formed with the (i,e)-th entry t=1  1, if link e enters node i which readily leads to (cid:80)Tt=1(Axt +bt) ≤ qT+1 −q1 ≤ 0, A = −1, if link e leaves node i (32) sinceq1 ≥0andqT+1 =0.Therefore,insteadofsolving(34), (i,e)  0, else. we aim to tackle a relaxed problem that is in the form of OCO with long-term constraints, given by For compactness, collect the data workloads across edges e = (i,j) ∈ E in a resource allocation vector xt := (cid:88)T (cid:88)T [x11,...,xJK,y1,...,yK](cid:62) ∈RE,andtheexogenousloadar- min ft(xt) s. t. (Axt+bt)≤0 (36) rivtalratesotfallntodesintavectorb+ :=[b1,...,bJ,0...,0](cid:62) ∈ {xt∈X,∀t} t=1 t=1 t t t RI . Then, the aggregate (endogenous plus exogenous) work- wheretheworkloadflowconservationconstraintAx +b ≤0 + t t loads of all nodes are given by Ax + b . When the i-th must be satisfied in the long term rather than slot-by-slot. t t entry of Ax + b is positive, there is service residual at Clearly, (36) is in the form of (1). Therefore, the MOSP t t node i; otherwise, node i over-serves the current workload algorithm of Section III can be leveraged to solve (36) in arrival. Assume that each data center and mapping node has an online fashion, with provable performance and feasibility a local data queue to buffer unserved workloads [25]. With guarantees. Specifically, with g (x ) = Ax +b , the primal t t t t q := [q1,...,qJ+K](cid:62) collecting the queue lengths at each update (8) boils down to a simple gradient update x = mtapping tnode antd data center, the queue update is q = P (cid:0)x −α∇f (x )−αA(cid:62)λ (cid:1), where P (·) defitnes t+1 X t−1 t−1 t−1 t X [q +Ax +b ]+, where [·]+ ensures that the queue length is projection onto the convex set X. The dual update (9) is t t t always non-negative. The bandwidth limit of link (j,k) is x¯jk, λ =(cid:2)λ +µ(Ax +b )(cid:3)+, which can be nicely regarded t+1 t t t 8 Algorithm2DistributedMOSPfornetworkresourceallocation observes a realization ξ of the random variable Ξ, and then t 1: Initialize: primal iterate x0, dual iterate λ1, and proper (stochastically) selects an action xt ∈X. However, in contrast stepsizes α and µ. to minimizing the observed cost in the OCO setting, the goal 2: for t=1,2... do of the stochastic resource allocation is usually to minimize the 3: Each mapping node j performs (37a) and each data limiting average of the expected cost subject to the so-termed ccccenter k runs (37c). stability constraint, namely 4: Mapping nodes and data centers observe local costs T cccand workload arrivals. 1 (cid:88) min lim E[f (x )] (38a) 5: Each mapping node j performs (37b) and each data {xt∈X,qt,∀t} T→∞T t=1 t t ccccenter k performs (37d). s. t. q =[q +Ax +b ]+, ∀t (38b) 6: Mapping nodes (data centers) send multipliers to all t+1 t t t cccneighboring data centers (mapping nodes). 1 (cid:88)T lim E[q ]≤0 (38c) 7: end for T→∞T t t=1 where he expectation in (38a) is taken over Ξ and the ran- as a scaled version of the relaxed queue dynamics in (34), with domness of x and q induced by all possible sample paths t t q =λ /µ. t t {ξ ,...,ξ }via(38b);andthestabilityconstraint(38c)implies 1 t In addition to simple closed-form updates, MOSP can also a finite bound on the accumulated constraint violation. In afford a fully decentralized implementation by exploiting the contrast to the observed costs in (34), each decision x is t problem structure of network resource allocation, where each evaluated by all possible realizations in Ξ here. However, as mapping node or data center decides the amounts on all its q in (38b) couples the optimization variables over an infinite t outgoing links, and only exchanges information with its one- time horizon, (38) is intractable in general. hop neighbors. Per time slot t, the primal update at mapping Priorworks[25],[26],[30],[34]havedemonstratedthat(38) node j includes variables on all its outgoing links, given by can be tackled via a tractable stationary relaxation, given by (cid:104) (cid:16) (cid:17)(cid:105)x¯jk xjk= xjk −α∇fjk (xjk )−α λk−λj , ∀k ∈K T t t−1 t−1 t−1 t t 1 (cid:88) 0 min lim E[f (x )] (39a) and the dual update reduces to (37a) {xt∈X,∀t} T→∞T t=1 t t T (cid:34) (cid:32) (cid:33)(cid:35)+ s. t. lim 1 (cid:88)E[Ax +b ]≤0 (39b) λj = λj +µ bj −(cid:88)xjk . (37b) T→∞T t t t+1 t t t t=1 k∈K wherethetime-couplingconstraints(38b)and(38c)arerelaxed Likewise, for data center k, the primal update becomes to the limiting average constraint (39b). Such a relaxation can  y¯k beverifiedsimilartothequeuerelaxationin(35);seealso[25]. ytk =ytk−1−α∇ftk−1(ytk−1)−α(cid:88)(λkt −λjt) (37c) Note that (39) is still challenging since it involves expectations inbothcostsandconstraints,andthedistributionofΞisusually j∈J 0 unknown.Evenifthejointprobabilitydistributionfunctionwere where [·]y¯k:=min{y¯k,max{·,0}}, and the dual recursion is available, finding the expectations would not scale with the 0   + dimensionalityofΞ.Acommonremedyistousethestochastic (cid:88) dual gradient (SDG) iteration (a.k.a. Lyapunov optimization) λkt+1 =λkt +µ xjtk−ytk . (37d) [24], [25], [30]. Specifically, with λ ∈ RI denoting the + j∈J multipliers associated with the expectation constraint (39b), the Distributed MOSP for online network resource allocation is SDGmethodfirstobservesonerealizationξ ateachslott,and t summarized in Algorithm 2. then performs the dual update as (cid:2) (cid:3)+ B. Revisiting stochastic dual (sub)gradient λt+1 = λt+µ(Axt+bt) , ∀t (40) ThedynamicnetworkresourceallocationprobleminSection where λ is the dual iterate at time t, Ax + b is the t t t IV-Ahassofarbeenstudiedinthestochasticsetting[30],[32]. stochastic dual gradient, and µ is a positive (and typically Classical approaches include Lyapunov optimization [24], [25] constant) stepsize. The actual allocation or the primal variable andthestochasticdual(sub)gradientmethod[26],bothofwhich x appearing in (40) needs be found by solving the following t rely on stochastic approximation (SA) [27]. In the context of sub-problems, one per slot t stochasticoptimization,thetime-varyingvectors{ξ }withξ := t t [θ(cid:62),b(cid:62)](cid:62) appearing in the cost and constraint are assumed to x ∈argminf (x)+λ(cid:62)(Ax+b ). (41) t t t t t t beindependentrealizationsofarandomvariableΞ.3 InanSA- x∈X basedstochasticoptimizationalgorithm,pertimet,apolicyfirst For the considered network resource allocation problem, SDG in (40)-(41) entails a well-known cost-delay tradeoff 3Extension is also available when {ξt} constitute a sample path from an [25]. Specifically, with f∗ denoting the optimal objective ergodicstochasticprocess{Ξt},whichconvergestoastationarydistribution; seee.g.,[29],[33]. (39), SDG can achieve an O(µ)-optimal solution such that 9 lim (1/T)(cid:80)T E[f (x )] ≤ f∗ + O(µ), and guaran- ×104 T→∞ t=1 t t 5 tee queue lengths4 satisfying lim (1/T)(cid:80)T E[(cid:107)q (cid:107)]= T→∞ t=1 t O(1/µ). Therefore, reducing the optimality gap O(µ) will st4 essentially increase the average network delay O(1/µ). o c e Remark4. TheoptimalityofSDGisestablishedrelativetothe ag3 r offline optimal solution of (39), which can be thought as the e v tIinmteer-easvtienrgaglye tohpotuimgha,littyhegaopptiimna(l2it2ya)guapnduerndtehretOheCOstoscehtatisntgic. me-a2 OODDGG((µµOODDGG==10.)5) setting is equivalent to the (expected) dynamic regret (4), since Ti1 MOSP Per-slotoptimal tzheeroir. (Teoxpseecetethdi)s,dnifofeteretnhcaet EV[f({(Ex[D)]ta]}nTtd=E1)[Ainx(+28b) ]readruecteismteo- 0 Offlineoptimal t t 0 100 200 300 400 500 invariant, hence the dual problem of each per-slot subproblem Time in (39) is time-invariant. This reduction means that the SDG solver of the dynamic problem in (38) leverages its inherent Fig.2. Time-averagecostforCase1. stationarity (through the stationary dual problem), in contrast to the non-stationary nature of the OCO framework. wherepk istheenergypriceatdatacenterkattimet,andcjk is t Remark5. Belowwehighlightseveraldifferencesofthenovel theper-unitbandwidthcostfortransmittingfrommappingnode MOSP in Algorithm 2 with the SDG recursion in (40)-(41) for j to data center k. With the bandwidth limit x¯jk uniformly the dynamic network resource allocation task. randomly generated within [10,100], we set the bandwidth (D1) From an operational perspective, SDG observes the cost of each link (j,k) as cjk = 40/x¯jk,∀j,k. The resource current state ξ first, and then performs the resource allocation capacities {y¯k,∀k} at all data centers are uniformly randomly t decision x accordingly. Therefore, at the beginning of slot t, generatedfrom[100,200].Weconsiderthefollowingtwocases t SDG needs to precisely know the non-causal information ξ . for the time-varying parameters {pk,∀t,k} and {bj,∀t,j}: t t t InheritingthemeritsofOCO,ontheotherhand,MOSPoperates Case 1) Parameters {pk,∀t,k} and {bj,∀t,j} are inde- t t in a fully predictive mode, which decides x without knowing pendently drawn from time-invariant distributions. Specifically, t the cost f (x) and the constraint g (x) (or ξ ) at time t. This pk is uniformly distributed over [1,3], and the delay-tolerant t t t t feature of MOSP is of major practical importance when costs workload bj arrives at each mapping node j according to a t and availability of resources are not available at the point of uniform distribution over [50,150]. making decisions; e.g., online demand response in smart grids Case 2) Parameters {pk,∀t,k} and {bj,∀t,j} are generated t t [36] and resource allocation in wireless networking [37]. according to non-stationary stochastic processes. Specifically, (D2) From a computational point of view, MOSP reduces to pk =sin(πt/12)+nk withi.i.d.noisenk uniformlydistributed t t t asimplesaddle-pointrecursionwithprimal(projected)gradient over [1,3], while bj = 50sin(πt/12)+vj with i.i.d. noise vj t t t descent and dual gradient ascent for the network resource uniformly distributed over [99,101]. allocation problem, both of which incur affordable complexity. Finally, with time horizon T = 500, the stepsize in (37a) However, the primal update of SDG in (41) generally requires and (37c) is set to α =0.05/T1/3, and for (37b) and (37d) to solving a convex program per time slot t, which leads to much µ=50/T1/3. MOSP is benchmarked by three strategies: SDG higher computational complexity in general. inSectionIV-B,thesequenceofper-slotbestminimizersin(3), (D3) With regards to the theoretical claims, the time-varying and the offline optimal solution that solves (1) at once with all vector ξ in SDG typically requires a rather restrictive prob- futurecostsandconstraintsavailable.Notethatatthebeginning t abilistic assumption, to establish SDG optimality in either of each slot t, the exact prices {pk,∀k} and demands {bj,∀j} t t the ensemble average [25] or the limiting ergodic average for the coming slot are generally not available in practice [36]– sense [33]. In contrast, leveraging the OCO framework, MOSP [39]. Since the original SDG updates (40) and (41) require admits finite-sample performance analysis with non-stochastic non-causal knowledge of {pk,∀k} and {bj,∀j} to decide x , t t t observed costs and constraints, which can even be adversarial. we modify them for fairness in this online setting by using the pricesanddemandsatslott−1toobtainx .Inthiscasethatwe t C. Numerical experiments wetermonlinedualgradient(ODG),theperformanceguarantee Inthissection,weprovidenumericalteststodemonstratethe of SDG may not hold. Nevertheless, as shown in the next, merits of the proposed MOSP algorithm in the application of different constant stepsizes for ODG’s dual update in (40) still dynamicnetworkresourceallocation.Considerthegeographical lead to quite different performance and feasibility behaviors. workload routing and allocation task in (36) with J = 10 For this reason, ODG is studied under stepsizes µODG = 0.5 mapping nodes and K = 10 data centers. The instantaneous and 1. network cost in (33) is Figs. 2-4 show the test results for Case 1 under i.i.d. costs (cid:88) (cid:88) (cid:88) andconstraints.Clearly,MOSPinFig.2convergestoasmaller f (x ):= pk(yk)2+ cjk(xjk)2 (42) t t t t t time-average cost than ODG with the two stepsizes. The time- k∈K j∈Jk∈K average cost of MOSP is slightly higher than the per-slot optimalsolution,aswellastheofflineoptimalsolutionwithall 4AccordingtoLittle’slaw[35],thetime-averagedelayisproportionaltothe time-averagequeuelengthgiventhearrivalrate. information of the costs and constraints available over horizon 10 ×106 ×104 12 6 MOSP ODG(µODG=1) 10 ODG(µODG=0.5) 5 ODG(µODG=0.5) c regret 68 ODG(µODG=1) rage cost34 PMOefflOr-iSsnlPeotoopptitmimaall mi ve a 4 a n e-2 y m D 2 Ti1 0 0 0 100 200 300 400 500 0 100 200 300 400 500 Time Time Fig.3. DynamicregretforCase1. Fig.5. Time-averagecostforCase2. ×106 15 4000 MOSP ODG(µODG=0.5) 3000 et10 ODG(µODG=1) r mic fit2000 mic reg 5 na na y y D D 0 1000 MOSP ODG(µODG=0.5) ODG(µODG=1) -5 0 0 100 200 300 400 500 0 100 200 300 400 500 Time Time Fig.6. DynamicregretforCase2. Fig.4. DynamicfitforCase1. T. Fig. 3 confirms the conclusion made from Fig. 2, where than that of ODG with µODG =0.5, and comparable to that of the dynamic regret (cf. (4)) of MOSP grows much slower ODG with µODG = 1. Therefore, in this non-stationary case, than that of ODG. Regarding the dynamic fit (cf. (6)), Fig. MOSP also significantly outperforms ODG in both dynamic 4 demonstrates that ODG with µ = 1 has a smaller fit regret and fit. ODG than that of µ = 0.5, and similar to the dynamic fit of ODG MOSP. According to the well-known trade-off between cost V. CONCLUDINGREMARKS (optimality)anddelay(constraintviolations)in[25],increasing OCO with both adversarial costs and constraints has been µ will improve the dynamic fit of ODG but degrade its studied in this paper. Different from existing works, the focus ODG dynamic regret. Therefore, MOSP is favorable in Case 1 since is on a setting where some of the constraints are revealed after it has much smaller regret when its dynamic fit is similar to taking actions, they are tolerable to instantaneous violations, that of ODG with µ = 1. It is worth mentioning that but must be satisfied on average. Performance of the novel ODG theoretically speaking, the dynamic regret of MOSP may not OCOalgorithmismeasuredby:i)thedifferenceofitsobjective be sub-linear in this i.i.d. case, since the accumulated cost relative to the best dynamic solution with one-slot-ahead infor- and constraint variation is not necessarily small enough (cf. mation of the cost and the constraint (dynamic regret); and, ii) Theorem 2). However, MOSP is robust in this aspect at least its accumulated amount of constraint violations (dynamic fit). for the numerical tests we carried. It has beenshown that the proposed MOSP algorithm adapts to Simulation tests using non-stationary costs and constraints the considered OCO setting with adversarial constraints. Under areshowninFigs.5-7.DifferentfromCase1,thetime-average standard assumptions, MOSP simultaneously yields sub-linear cost of MOSP is not only smaller than ODG, but also smaller dynamic regret and fit, if the accumulated variations of the per- thantheper-slotoptimumobtainedvia(3);seeFig.5.Asimilar slot minimizers and adversarial constraints are sub-linearly in- conclusion can be also drawn through the growths of dynamic creasing with time. Algorithm design and performance analysis regretinFig.6.Fromahighlevel,thisisbecausethedifference in this novel OCO setting, under adversarial constraints and between the cost of the per-slot minimizers and that of the with a dynamic benchmark, broaden the applicability of OCO offline solutions is no longer small in the non-stationary case. toawiderapplicationregime,whichincludesdynamicnetwork RegardingFig.7,bothODGandMOSPhavefinitedynamicfits resource allocation and online demand response in smart grids. in the sense that the accumulated constraint violations do not Numerical tests demonstrated that the proposed algorithm out- increase with time. The dynamic fit of MOSP is much smaller performs state-of-the-art alternatives under different scenarios.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.