1 Time Complexity Analysis of a Distributed Stochastic Optimization in a Non-Stationary Environment B. N. Bharath and P. Vaishali Dept. of ECE, PESIT Bangalore South Campus, Bangalore 560100, INDIA E-mail: [email protected], [email protected] 7 1 0 2 Abstract—In this paper, we consider a distributed stochastic The DPP algorithm mentioned above assumes that the n optimization problem where the goal is to minimize the time controlactionistakenatacentralizedunitwherethecomplete a average of a cost function subject to a set of constraints on the state information is available. However, wireless network and J timeaverages ofrelatedstochasticprocessescalledpenalties.We crowdsensingapplicationsrequirea distributedcontrolaction assume that the state of the system is evolving in an indepen- 0 dent and non-stationary fashion and the “common information” thatusesonlythedelayedstateinformationateachnode[10], 1 availableateachnodeisdistributedanddelayed.Suchstochastic [17]. Thiscalls fora distributedversionof the DPP algorithm ] optimization is an integral part of many important problems in with theoretical guarantees. The author in [4] considers a T wirelessnetworkssuchasscheduling,routing,resourceallocation relaxed version of the above problem. In particular, assuming I andcrowdsensing.WeproposeanapproximatedistributedDrift- i.i.d. states with correlated “common information,” the author . Plus-Penalty (DPP) algorithm, and show that it achieves a time cs average cost (and penalties) that is within ǫ> 0 of the optimal in [4] proposes a distributed DPP algorithm, and proves that [ cost (and constraints) with high probability. Also, we provide a the approximate distributed DPP algorithm is close to being condition on the convergence time t for this result to hold. In optimal. Several authors use the above results in various con- 1 particular, for any delay D≥0 in the common information, we texts such as crowd sensing [17], energy efficient scheduling v use a coupling argument to prove that the proposed algorithm 0 converges almost surely to the optimal solution. We use an in MIMO systems [18], to name a few. However, in many 6 application from wireless sensor network to corroborate our practical applications, the states evolve in a dependent and 5 theoretical findings through simulation results. non-stationary fashion [13]. Thus, the following assumptions 2 Index terms: Drift-plus-penalty, Lyapunov function, wireless aboutthestatemadein[4]needtoberelaxed:(i)independent 0 networks, online learning, distributed stochastic optimization. and (ii) identically distributed. Further, from practical and . 1 theoreticalstandpoints,itisimportanttoinvestigatetherateof 0 I. INTRODUCTION convergenceofthedistributedalgorithmtotheoptimal.Inthis 7 Stochastic optimization is ubiquitous in various domains paper, we relax the assumption (ii) above, and unlike [4], we 1 : such as communications, signal processing, power grids, in- provide a Probably Approximately Correct (PAC) bound on v ventory control for product assembly systems and dynamic the performance. Also, we prove an almost sure convergence i X wireless networks [1]–[9]. A typical stochastic optimization of the proposed distributed algorithm to a constant within r problem involvesdesigning controlaction for a given state of the optimal. We would like to emphasize that extending the a the system that minimizes the time average of a cost function analysis in [4] to non-stationary states is non-trivial. The subjecttoasetofconstraintsonthetimeaveragepenalties[1], only work that provides a “PAC type” result for the DPP [2]. Both cost and penaltiesdependon the state of the system algorithm is [19]. However, the authors consider i.i.d. states, and the control actions taken by the users. For example, in and the decision is centralized. Moreover, the method used a typical wireless application, the cost function refers to the in [19] cannot be directly extended to a problem with non- instantaneous rate, and the penalty refers to the instantaneous stationary states since their proof requires the control action power consumed. Further, the state here refers to the channel to be stationary, and this assumption in general is not true. conditions. An algorithm known as Drift-Plus-Penalty (DPP) Now, we highlight the contribution of our work. (see [10]–[14]) is known to provide a solution for these problems with theoretical guarantees. At each time slot, the A. Main Contribution of the Paper DPPmethod,anextensionoftheback-pressurealgorithm[15], [16],findsacontrolactionthatminimizesalinearcombination In thispaper,we considera distributedstochastic optimiza- of the cost and the drift. In the problem that we consider, tion problem when the states evolve in an independent and the drift is a measure of the deviation (of the penalties) from non-stationaryfashion. In particular,we assume that the state the constraints, and the penalty corresponds to the cost. The is asymptotically stationary, i.e., the probability measure π t DPP algorithm is shown to achieve an approximatelyoptimal of the state ω(t) Ω converges to a probability measure π ∈ solution even when the system evolves in a non-stationary as t in the -norm sense. This assumption makes the 1 →∞ L fashion,andisrobusttonon-ergodicchangesinthestate[10]. extension of the method in [4] non-trivial. When π = π for t 2 all t N, the author in [4] proves theoretical guarantees by for some c< . ∈ ∞ makinguseoftheequivalencebetweenaLinearProgram(LP) thatisa functionof π and theoriginalstochasticoptimization II. MOTIVATION AND PROBLEMSTATEMENT problem. However, when the probabilities are changing, this equivalence is difficult to establish. Instead, we show that the Towards motivating the system model studied in the paper, originalproblemisequivalenttoa“perturbed”LP,whichisa we consider a network of 3 sensors, where the sensor i ob- functionof the limitingdistributionπ. Undermildconditions, servesthestateωi(t) 0,1,2,3 ,i=1,2,3,andreportsthe ∈{ } we prove that the solution to the perturbed LP is approx- observationtoacentralunit[4].Thereportingincursapenalty imately equal to that of the original problem. We use this in terms of the power consumed by the sensors to transmit result to provetheoreticalguaranteesfor an approximateDPP the state information. The state ω(t) , ω1(t),ω2(t),ω3(t) , algorithm that we propose in the paper. Moreover, unlike the t N in generalis a stochastic process t{hat evolvesin a non}- ∈ previous works, we are more interested in providing sample stationaryfashion.Assumethatthecentralunittrustssensor1 complexity bounds rather than just dealing with the averages. morethantheothers.Theproblemistomaximizetheaverage The following are the main contributions of our work of the following utility function subject to the constraint that the average power consumed by each sensor is less than P¯: 1) For the above model, we show that with high proba- bility, the average cost and penalties obtained by using α (t)ω (t) α (t)ω (t)+α (t)ω (t) the proposed approximate distributed DPP are within u0(t),min 1 1 + 2 2 3 3 ,1 , 3 6 constants of the optimal solution and the constraints, (cid:26) (cid:27)(1) respectively, provided the waiting time t > a threshold where α (t) 0,1 , i = 1,2,3 are the decision variables. i ∈ { } (seeTheorem3).Thethresholdandtheconstantscapture Note that if ω (t) = 3 for i = 1,2,3, and α (t) = 1 for i i the degree of non-stationarity (i.e., πt π 1), and the i = 2,3, then there is no increase in the utility if sensor 1 k − k number of samples used to compute an estimate of the alsodecidesto transmit,i.e.,α (t)=1.However,noneofthe 1 state distribution. sensors know the entire state of the system. In this case, the 2) Using the high probabilityresult, we show that the cost sensor 1 may also choose to transmit, thus wasting its power corresponding to the proposed algorithm almost surely leading to a suboptimal operation compared to a centralized converges to a constant within ǫ0 > 0 of the optimal scheme. In order to resolve this issue in a distributed setting, cost. We also show that the penalties induced by the we assume that a delayed “common information” is available proposedalgorithmarewithinconstantsoftheconstraint (see Sec. II of [4] for more details) using which each sensor valuesalmostsurely.Itturnsoutthatalthoughthestates picks one of the “pure strategies”. For example, each sensor are independent,the proposedalgorithminducesdepen- can acquire the information about the state ω(t) with a fixed dencies across time in the cost and penalties. To prove delay D > 0. In this case, the “common information” can thePACandthealmostsureconvergenceresults,weuse be some function of ω(t D). Thus, the problem is to find − a coupling argument where the dependent sequence of the set of optimal decision variables in a distributed fashion the cost (also, penalties) is replaced by an independent with “common information” that maximizes the average of sequence which results in an error expressed in terms the above utility subject to the constraints on the average of the β1-mixing coefficient; a term that captures the power.Next,wedescribethesystemmodelthatgeneralizesthe stochasticdependencyacrosstime(seeSec.II).Theβ1- aboveexample,andlaterprovideanalgorithmwiththeoretical mixing coefficient is bounded using information theo- guarantees. retic techniques to complete the proof. Consider a system comprising of N users making deci- 3) We show that due to non-stationarity of the states, sions in a distributed fashion at discrete time steps t the performance gap goes down slowly compared to 0,1,2,... . Each user i observes a random state ω (t) ∈ i it.eir.dm. tshtaattesd.epTehnidssisoncatphteurmedeatshurroeuogfhtkhπetc−omπpkle1xaitnydoaf Ωc{oin,traonlddaec}i“scioonmmαo(nt)informa,tioin”=Yc1(,t2),.∈..,YN.toHemrea,kefo∈ar i i the probability space averaged with respect to πt (see each user i, Ωi, and∈ Ai denote the state space, com- Theorem 3). Finally, we provide simulation results of mon information spYace andAaction/control space, respectively. a sensor network application, which is a particular use Let ω(t) , ω (t),ω (t),...,ω (t) Ω and α(t) , 1 2 N case scenario of the problem considered. α (t),α (t),.{..,α (t) , where}Ω ,∈ Ω Ω ... 1 2 N 1 2 The paperis organizedas follows. The problemstatement, an Ω{ , and , }.∈..A . Also, let u×s assu×me th×at N 1 2 N A A ×A × ×A approximate DPP Algorithm with related theoretical guaran- the number of possible values that p (t) takes is finite and k tees and simulation results are provided in Sec. II, Sec. III equal to µ N, k = 1,...,K. The decision is said to be k ∈ and Sec. V, respectively. A bound on the mixing coefficient distributed if (see [4]) is provided in Sec. IV. Sec. VI concludes the paper. There exists a function f :Ω , such that i i i Notation: We use the following notations in the paper. We • ×Y →A write f(x) =. g(x), f(x) g(x), f(x) g(x), f(x) g(x), αi(t),fi(ωi(t),Yc(t)), i=1,2,...,N, (2) and f(x) g(x) to mean(cid:22)lim f(x)≺=1, lim (cid:23)f(x) 11,, rliemspxe→ct∞i≻vefgl((yxx.))W<e1u,selimf(xx→)∞=x→fg((∞xx())gg(≥(xx))1)aifndlimlimxx→→∞∞f(fgxg((()xxx)))=>≤c • wfTohhreeervceeoYrmycm(tto)nbNeiln.ofnogrmsatotiothneYcco(mt)miosninindfeopremndateinotnosfetωY(t.) O x→∞ g(x) ∈ 3 At each time slot t, the decision α(t) and the state ω(t) in general is a stochastically dependent sequence. The “de- result in a cost p (t) , p (α(t),ω(t)) and penalties p (t) , gree” of correlation depends on the algorithm used. For k = 0 0 k p (α(t),ω(t)), k = 1,2,...,K. The central goal of the 0,1,2...,K and s N, let PALG,k( ) and PALG,k( ) k ∈ t,t+s ∗|E t ∗|E paper is to analyze an approximate distributed solution to the denotethejointandmarginaldistributionsof(p (t),p (t+s)) k k following problem when ω(t), t N is independentand non- and p (t) conditioned on the event , respectively, induced k ∈ E stationary, P0 : by any algorithm ALG.1 Note that if pk(t) and pk(t + s) are independent for each t N conditioned on some event t 1 minα(τ)∈A:τ∈N limt→s∞1upt−1t1τX−=0Ep0(τ) iTEsh,uathse,nnatht(cid:12)(cid:12)(cid:12)ue(cid:12)(cid:12)(cid:12)rPadAtl,iLtf+Gwf,eskar(ye∗nc|oefE)amb−eoavPseAtu,Lr∈Gimn,kga(x∗tihm|eEiz)ce⊗odrProeAtv+LlaeGstr,ikoa(n∗ll|bsEelot)wt(cid:12)(cid:12)(cid:12)s(cid:12)(cid:12)(cid:12)eTetVn∈=th0Ne. subject to limsup Ep (τ) c , k =1,2,...,K, k k sequencesthatare s time slots away.More precisely,we have t ≤ t →∞ τX=0 the following definition (see [21] for a related definition). α (τ) satisfies (2), i=1,2,...,N. i Definition 2: The β mixing coefficient of the process p (t), 1 k In the above, the expectation is jointly with respect to the k =0,1,2,...,K conditioned on some event is given by E distribution of the state ω(t) and a possible randomness in β (s,α ), sup M ( ) , (3) cthoerredsepcoisnidoinngαt(ot)t,hetpr∈obNle.mLPet0.pN(ooptte) tbheattthhee fiorpsttimeqaulactioosnt ALG,k |E t∈N,t≥αk t,s,k E kTV in P0 represents the time average cost while the second and where Mt,s,k(E) , PAt,LtG+,sk(∗|E) − PAtLG,k(∗|E) ⊗ the third equations represent constraints on the penalties and PALG,k( ), s 0, α 0, PALG,k PALG,k denotes the t+s ∗|E ≥ ≥ t ⊗ t+s the decisions, respectively. Informally, we are interested in product distribution, and TV is the total variational norm. k∗k proving a Probably Approximately Correct (PAC) type result Note thatin the definitionofβALG,k(s,α ), we haveused |E of the following form [19] t α, whichisrequiredlaterintheproofofourmainresults. ≥ For every ǫ > 0, with a probability of at least 1 δ , Further, if s is large, and the process is sufficiently mixing, k k • w1thPertτe−=p100(p≈k()≈(τ)()τa)nd≤pck(≈k)+(τ)ǫ,kkp=rov1i,d2e,d..t.,>Kaartehrthese−hcooldst, tbshoeemnuesweodef tethoxepdelaecrctgoteuhpdatleevβiaAaLtiGdo,eknp(ebsn,odαuen|ndEts)stth=oacth0aa.rseTtihvciaslpirddoefcfioenrsisitniosdonepwtehinal-lt and penalties, respectively,of an approximatedistributed scheme at τ N. Here c , p(opt) is the optimal cost, dent sequences can be applied. The details of this approach 0 ∈ will be clear in the proof of our main results. For notational and ck, k =1,2,...,K are as defined in P0. convenience,letusdenotethemaximumandminimumvalues First, unlike the model in [4], we assume that the state ω(t) ofp (t), k=0,1,2,...,K byp andp , respectively. evolves in an independent and non-stationary fashion across k max,k min,k Further, let (∆p) , p p . In the following time t. In particular, the distribution of ω(t) denoted πt(ω), section, we propomseax,akn Apprmoaxx,ikma−te DmiPn,Pk (ADPP) algorithm ω Ω satisfies the followingasymptotic stationarity property. ∈Assumption 1: Assume that there exists a probability with the associated theoretical guarantees. The β1 coefficient for the ADPP algorithm will be β (s,α ). measure π(ω) on Ω such that ADPP,k |E lim πt π 1 =0. III. ALGORITHMAND MAIN RESULTS t k − k →∞ Note that the efficacy of the distributed algorithm depends In the following subsection, we prove that the optimal on how accurately each node computes an estimate of πt, solution to P0 is close to a LP. t N. Naturally, we expect the bounds that we derive to ∈ be a function of the complexity of the probability measure A. Approximately Optimal LP space from which the “nature” chooses π (ω). Let us assume t that for each t N, π is chosen from a set . Assuming Since the number of possible values that pk(t), k = t ∈ P 0,1,2,...,K take is finite, the number of possible strate- that is a closed set with respect to the -norm, we have P L1 gies is also finite.2 The approximate algorithm that we are π . One way of measuring the complexity is through the ∈ P going to propose chooses one of the pure strategy S(ω) , covering number, and the metric entropy of the set , which P s (ω ),s (ω ),...,s (ω ) basedonthecommoninforma- are defined as follows. { 1 1 2 2 N N } Definition 1: (see [20]) A δ-covering o′f P is a set Pc , tFioorneYxca(mt)p,lwe,hser(eωs)i(ωcai)n∈beAai,saimndplωeith∈reΩshi,oild=ru1l,e2w,.i.th.,tNhe. , ,..., such that for all π , there exists i i 1 2 M {P P P } ⊆P ∈P′ thresholds coming from a finite set. The control action α (t) a forsomei=1,2,...,M suchthat π <δ. i i c i 1 P ∈P k −P k at the user i is chosen as a deterministic function of ω(t), The smallestM denotedM iscalled the coveringnumberof P. Further, H(P,δ),logMδ δ is called the metric entropy. ti.e.,Nα.i(Lt)et,thesi(toωtia(lt)n)umfobrearllofis∈uch{1p,u2r,e..s.t,raNte}gieasndbefoFr a,ll eacNhottiemtehatt inNmiasnydeplaraycetdic,aalnsdceandaraitoas,ofthseizaevawilta,btle dNatadea-t ∈Ni=1|Ai||Ωi|. Enumerating the F strategies, we get Sm(ω), ∈ ∈ m 1,2,...,F and ω Ω. Each ω Ω and the strategy layedbyDslotswillbeusedforestimation/inferencepurposes Q∈ { } ∈ ∈ [4],[17].Thereasonformakingthesamplesizew dependon t 1In this paper, we propose a distributed Approximate DPP (ADPP) algo- t becomes apparent later. Since p (t), k = 0,1,2,...,K de- k rithm,andhenceALGwillbeADPP. pendonYc(t)forallt(see(2)),wehavethattheprocesspk(t) 2Duetothis,thequestionofwhetherpk(t)isconvexornotdoesnotmatter. 4 Sm(ω)resultinacostp (Sm(ω),ω),k =0,1,2,...,K.Note Also, we assume that the solution to LP exists and the k i∗ P that it is possible to reduce F if the problem has a specific optimal cost is absolutely bounded. Further, define structure [4]. For each strategy m 1,2,...,F , define the ∈{ } F average cost/penalty as G(x),inf θ r(m) :Θ , (7) ( m 0,Pi∗ ∈Cx,Θ) r(m) , π′(ω)p (Sm(ω),ω), (4) mX=1 k,π′ k where Θ,(θ ,θ ,...,θ ), and for any x 0, 1 2 F ω Ω ≥ X∈ , where k = 0,1,2,...,K and the underlying distribution of Cx,Θ ′ F ω Ω is π . As in [4], we consider a randomized alg∈orithm wher∈e tPhce strategy m ∈ {1,2,...,F} is picked (Θ:m=1θmrk(m,P)i∗ ≤ck+x, k =1,2,...,K,Θ1T =1), with probability θ (t) in an independent fashion across time X m t.Here,θ (t)isafunctionofthecommoninformationY (t). where 1 , 1,1,...,1 RF. Note that G(0) corresponds m c { } ∈ The corresponding average cost/penalty at time t becomes to LP i∗. We make the following important smoothness assumpPtion about the function G(x). F Assumption 2: The function G(x) is c-Lipschitz continuous Epk(t) = θm(t)Eλpk(Sm(ω(t)),ω(t)) around the origin, i.e., for some c>0, we have m=1 X F G(x) G(y) c x y , for all x,y 0. (8) = θ (t)r(m), | − |≤ | − | ≥ m k,λ In the theoremto follow,giventhatAssumption 2 is valid, m=1 X we prove that the optimal cost of the linear optimization where λ π ,π, , i=1,2,...,M . In [4], it was shown problem in (6) is “close” to the optimal cost of P0. t i δ tihsaetqtuhievap∈lreon{btlteomthPeP0fo}wllhoewninπgt =LPπ:for all t∈N (ω(t)is i.i.d.) toTthheeopreromble1m: sLePt0p(aopnt)daLndPpi(P∗oi,p∗t)rebspeetchtieveolyp.timThaelns,oluuntidoenr Assumption 2, we have p(opt) <P p(opt)+(c+1)∆ , where minθ1,θ2,...,θF F θmr0(m,π) νfo)r, aannyd bν > 0,, m∆aπ,xPi∗pP,i∗m,axpk=0,1,2.,...,Kbmaxπ,k,P(id∗π,Pi∗ + max,k max,k min,k mX=1 Proof: See Appendix B{|. (cid:4) | | |} F subject to θ r(m) c , k =1,2,...,K m k,π ≤ k B. Approximate DPP (ADPP) Algorithm m=1 X F In this subsection, we present an online distributed algo- θ =1. (5) m rithm that approximately solves the problem P0. We assume mX=1 that at time t N, all nodes receive feedback specify- ∈ ing the values of all the penalties and the states, namely, In this paper, from Assumption 1, we have π π 0, t 1 as t . With dense covering of the spacke −, wekex→pect p1(t D),p2(t D),...,pK(t D)andω(t D).Recallthat that t→he∞limiting distribution is well approximaPted by for D −0 is the d−elay in the feed−back. Using−this information, i ≥ P we construct the following set of queues some i=1,2,...,M in the covering set. More preciesely, δ Q (t+1)=max Q (t)+p (t D) c ,0 , (9) i∗ ,arg min π 1, k { k k − − k } P Q∈{P1,...,PMδ}k −Qk k = 1,2,...,K, and t N. These queues act as the common information, i.e.,∈Y (t) = Q , where Q , aSnindcethtehecodrirsetsripbountidoinngofdiωst(atn)ciescbheadnπg,iPnig∗a,croksπs−timPei,∗kd1ire<ctlδy. (Q1(t),Q2(t),...,QK(t)). Fucrther, the patst wt samplets of ω(t) given by ω(t i), i=D,D+1,...,D+w 1 will t applying Theorem 1 of [4] is not possible. However, from { − − } be used to find an estimate of the state probabilities which Assumption 1, we know that the distribution approaches a is required for the algorithm that we propose. For all k = fixed measure π . Hence, we expect that the algorithm c 1,2,...,K, we let p (t) = 0 when t 1, 2,..., D . ∈ P k designedforπ c oranapproximationofπ,i.e., i∗ should The Lyapunov function is defined as ∈ {− − − } ∈P P eventually be close to the optimal algorithm. Therefore, we K consider the following LP denoted LP : 1 1 Pi∗ L(t), 2kQtk22 = 2 Q2i(t), (10) F Xi=1 minθ1,θ2,...,θF mX=1θmr0(m,P)i∗ faonrdathlletcorreNs.poAndhiingghderrifvtailsuegivoefnthbey∆dr(iftt)i,ndLic(att+es1t)h−atLt(hte) F constraint∈shavebeenviolatedfrequentlyinthepast.Thus,the subject to θ r(m) c , k=1,2,...,K m k,Pi∗ ≤ k control action should be taken that simultaneously minimizes mX=1 the drift and the penalty (cost). The DPP algorithm tries to F findtheoptimalcontrolactionthatminimizesanupperbound θ =1. (6) m on the DPP term E[∆(t+D)+Vp (t) Q ], V 0, which 0 t m=1 | ≥ X 5 istheessenceofthefollowinglemma.Theproofofthelemma 1 2 v t followsdirectlyfrom the proofof Lemma5 of [4], and hence omitted. Lemma 1: For a fixed constant V 0, we have ≥ E[∆(t+D)+Vp (t) Q ] B (1+2D)+ 0 | t ≤ t 12 ut12 ut 12 ut F K V β (t)r(m) + Q (t) , (11) 0 αt t m 0,πt k Ci,k,t m=1 k=1 X X Fig.1. Thefigureshowsthetimeslott αt splitintovt blocksofsizeut where , F β (t)r(m) c , r(m) , each, i.e., t αt = utvt. By choosing−αt = (√t), (√t) samples are Ci,k,t m=1 m k,πt − k k,πt available atτ−=αt. O O π (ω)p (Sm(ω),ω), k =0,1,2,...,K, ω∈Ω t k P PB , max 1 K π (ω) p (Sm(ω),ω) c 2, Theorem 2: For the ADPP algorithm, for any ǫk > t m∈{1,2,...,F}2Xk=1ωX∈Ω t | k − k|(12) α1t tτ−=N10,Eupk(τ)N−ancdk v+ αt(Npmatxs−,ukα−cthpmtinh,ka)t,vaund =fort coαnst,awntes t t t t t t P∈ ∈ ∈ − and, with a slight abuse of notation, β (t) is the probability have m with which the strategy m is used at time t. obNtaointeedthbaytraesptla→cin∞gπ,B(ωt )→byBπ.(Tωh)eienxtphreesesxiopnrefsosiroBnfcoarnBbe. Pr(1t t−1pk(τ)−ck >ǫk) ≤ utexp(((∆−2pǫ¯)2tm,kaxv,kt2)2)+ t t τ=0 X The algorithm to follow requires an estimate of πt(ω), which t can be computed using the past wt samples by means of any Pr{Eδ,τ}+(t−αt)βADPP,k(ut,αt|E[cαt:t]), (15) estimatesuchasthesampleaverage.However,whenthespace τX=αt is“simple”,onecanexpecttocomputeanestimateofπ (ω) amPofirneiteeffisceitenotflyd.iFsotrribeuxtaiomnpsle(,Mifδth<e naturfeorchaollosδes>ω(0t)),fttrhoemn w1thertτe−=10ǫ¯tE,kpk(,τ). Htǫte,rke−,αct0(tp−=mαaxt,kp−(oppmti)n,,k)a,ndǫtc,kk, k,=ǫ1k,2+,..c.k,K− estimating the distribution correspond∞sto a hypothesistesting are the constraint variables in P0. PrPoof: See Appendix C. (cid:4) problem. Hence, by approximating the measure space by P The first term in the bound in Theorem 2 corresponds to a finite set of measures gives us the flexibility to run a c P the large deviation bound when p (t)’s are independent. The hypothesis testing to find an approximate distribution based k secondtermcorrespondstoanupperboundontheprobability on the available w samples througha likelihoodratio test. In t oferrorinthetimeslotsα totfordecodingthecorrectindex the following, we provide the algorithm. t i ; equivalently, this corresponds to an “incorrect” estimate Algorithm: Given the delayed feedback of size w at ∗ • time slot t N, i.e., ω(t i D), and p (t D),ti = of the distribution of the states in these slots. The last term ∈ − − k − capturesthestochasticdependencyofp (t)acrosstimet N. 0,1,...,wt − 1 and for k = 1,2,...,K, perform the In order to provea high probabilityreskult, we need to fin∈d an following steps expressionforeachofthetermsinthebound.Next,we upper – Step 1: Find the probabilitymeasurein thatbest Pc boundtheerrortermPr δ,τ usingthefollowingassumption fits the data, i.e., pick Pjt∗ ∈Pc such that about the probability sp{aEce }c. This will come handy in the P t D proof of Lemma 2 below. jt∗ ,argj∈{1m,2,a..x.,Mδ}w1t τ=t−XD−−wt+1log(Pj(ω(τ))()1.3) PjA(ωss)u6=mp0ti,otnhe3re: eAxsisstumcoensthtaanttsfoαrδa>ll jβδ=>10,,2,s.u.c.h,Mthaδt, α > (ω)>β >0 for all ω Ω. δ j δ – Step 2: Choose m 1,2,...,F (breaking ties P ∈ t We use the above assumption in the proof of the following ∈ { } arbitrarily) that minimizes the following: lemma to bound the probability of error term in (15). K Lemma 2: An upper bound on the probability of error is Vr(mt) + Q (t)r(mt) . (14) given by 0,Pjt∗ k k,Pjt∗ k=1 X q(τ) if τ >D+w 1, – Suptedpat3e:thSeetqtu→euets+us1i,nrgec(9ei)v,eanthdegdoeltaoyeSdtefpee1d.back, Pr{Eδ,τ}≤Pe(,τu)p ,( Me1,uδp otherwiseτ, − (16) We say that there is an error in the outcome of step 1 of where q(τ) , exp 2ζ 2w + ( ,δ) , ζ , the algorithm if j∗ = i∗. Recall that i∗ corresponds to the e,u2p − δDτ τ H P δ index of the proPbatbi6lityPmeasure in the covering set that is log αδ , (cid:8) (cid:9) βδ close to π in the 1 norm sense. The error event δ,t, t N h (cid:16) (cid:17)i L E ∈ τ D oEisn[τed:τeo+fifsnt]eh,detaismtτ=te+hτsoslsEoeδt,toinutottchdoeemnineotsteerfvotharalwtτthhtioecrhτe+jist∗sa6=.nIneir∗tr.hoeFruifnortlhalotewrl,eialnesgtt Dτ,j , w1τ s=τ−XD−−wτ+1Eπτ log(cid:18)PPij∗((ωω((ss))))(cid:19), theorem, wSe state and prove our first result that will be used τ , minj=i∗ τ,j, and ( ,δ) = logMδ is the metric to prove the PAC type bound for the ADPP algorithm. eDntropy. Fur6therD, when u H=P (√t), v = (√t), and t t O O 6 αt = O(√t), we have tτ−=1αtPr{Eδ,τ} (cid:22) (t − αt)St,δ, The above result can be used to provide almost sure con- whereS ,exp φ + ( ,δ) .Intheabove,φ , vergence as well as finite sample complexity result provided 2Pζrδoo[mf:iSnteα,δet≤Aτ≤ppteDnτ{d]−i2xNDτ[,.αt,t(cid:4)δ:Pt], NH[αPt:t] ,}minαt≤τ≤twτ.τ,t,δ wfaest.shTohwis rtheqatuitrhees uβs1tompixrionvgecaobefofiucnidenotndβecAaDyPPs,ks.uFffiircsite,nwtlye From the abovelemma, we have that the errorgoes to zero considera specialcase ofthe centralizedscheme,i.e., D =0. (etxponαen)tSiallyfas0taesxτpo→ne∞nti.aTllyheffaascttatshatt tτ−=1αtwPilrl{bEeδ,τu}se(cid:22)d Tarheenp,rowveideexdtennedxt.the proof to any D ≥ 0. The details of this t t,δ − → P→ ∞ later in the paper to prove the almost sure convergenceof the algorithm to the optimal. Now, it remains to find an upper IV. BOUND ON THEMIXING COEFFICIENT bound on the first and the last term in (15). The following By using the Pinsker’s inequality that relates the total theorem uses the Assumption 3, (15) and (16) to provide variational norm and the mutual information, we have the a PAC result for the above algorithm in terms of the β 1 following bound [22] coefficient. Theorem 3: Under Assumptions 1-3, for the proposed Al- I(X ;X c ) gǫkor=ithQmuwp(itt)h+ǫ0ǫ=, k(c=+11),∆2π,.,P.i.∗,+Kψ,ta(nδd)+soαmt(epmfiatx−,nkαi−ttepmpin,oks)i+tivǫe, βADPP,k(s,αt|E[cαt:t])≤ts≥uαpts k,t k,2t−s|E[αt:t](2,0) constants V, C and c, the following holds. where X , p (t), I(X ;X c ) is the mutual 1) For every ǫ>0, with a probability of at least 1−γ0, informatiok,ntbetwekenrandomk,vtariakb,tl−essp|kE([tα)t:at]ndpk(t s),k = 1t tτ−=10p0(τ)≤ p(opt)+(c+1)∆π,Pi∗ +ψt(δ) w0,e1,u2se,.s..=,Ku ,coasndrietqiounireedd.oTnhEu[scα,tp:tr]o,vainndgaannyupspe∈rb−No.uLndatoern, α (p p ) t P + t max,k− min,k +ǫ (17) β (s,α c ) amounts to finding an upper bound on t−αt thAeDPcPo,knditionta|lEm[αut:ttu]alinformation.To presentourresults, we provided t . Here, γ > β , β , (t use the followingnotations.LetX ,(X ,X ,...,X ), α ) β (u∈,αTt,0 )+S0 ,wh0∗ereS0∗ isasde−- X , (X ,X ...,X ,tX ,0,.t..,X1,t ), anKd,tas finted iAnDPePa,r0liert. t|E[αt:t] t,δ t,δ be6=fokr,et, Q ,1(,Qt (t2),,tQ (t),k.−..1,,tQ k(+t)1),.tWe firsKt,ntote that t 1 2 K (cid:2) (cid:3) 2) For every ǫ>0, with a probability of at least 1 γ , − 1 I(X ;X c )= I(X ;X c )+ 1 t−1pk(τ) ck+Qup(t)+ αt(pmax,k−pmin,k) +ǫ, t t−s|E[αt:t] I(Xk6=,kt,t;Xt−t−s|sE|[Xαtk:t,t],E[cαt:t]) t Xτ=0 ≤ t−αt (18) = I(Xk,t;Xk,t−s|E[cαt:t])+ I(X ;X X , c )+ k =1,2,...,K, k,t 6=k,t−s| k,t−s E[αt:t] I(X ;X X , c ) provided t∈Tt,1. Here γ1 >β1∗, where I(X6=k;,tX t−s| ck,t E),[αt:t] (21) ≥ k,t k,t−s|E[αt:t] β ,(t α ) maxβ (u ,α )+S . 1∗ − t k=0 ADPP,k t t|E[αt:t] t,δ wherethelast inequalityfollowsfromthe factthatthemutual (cid:20) 6 (cid:21) information is non-negative. Thus, we have In the above, I(X ;X c ) Tt,i ,(t:(t−αt)> (∆p√)m2aǫx,0utslog(cid:18)γi−utβi∗(cid:19)), βADPP,k(s,αt|E[cαt:t])≤ts≥uαpts t t−2s|E[αt:t] . (22) Let be the set of all vectors that Q takes at time t. i∈{0,1}, ∆π,Pi∗ =maxk=0,1,2,...,Kbmax,k(dπ,Pi∗ +ν), and AlsoQ, ltet t : t 1,2,...,F be thte rule induced by M Q → { } ψt(δ), V(c+1)J¯tV+H¯t+C/t + 1+tV2D t−1BτPe(,τu)p qthueeuAeDaPtPtimalgeotr.ithInmotrhdaetrdtoeteorbmtaininesantheupsptreartebgoyungdivoenn tthhee τ=0 mutual information, we state the following assumption about X +pmax,0 t−1P(τ) + ρ t−1τP(τ), (19) the conditional distribution of the process ω(t). t e,up Vt e,up Assumption 4: For some κ > 0, Qt t, and for all τX=0 Xτ=0 t N, we assume that the following bound∈isQsatisfied ∈ mw1+hat2exDr0e≤k≤tτρ−K=10pBmaτx.,,k(cid:16)Fu1trPthPetτr−=,Kk10=Dk1π(τpτ,jm,−ax,πDkkτ1,−+ζδδ,(cid:17)ck,a)2n,dH¯PtJ¯e(t,τu)p a,,re Notex,smtuh,pmat′ PaPrr{l{oXXwtte=r=vxxa|l|uMMe tto((fQQtκt))==sigmmni′,fi,EEe[sc[αcαttt::tht]]e}} f≤acetκ.that(2th3e) as dePfined in Lemma 2. Also, Qup(t) , VtF + Γt2t channel is noisy. For example, when κ = 0, we have a2nDd) Γttτ−=10,BτPVe((,τuc)p++1p)m(∆ax,π0,Pi∗tτ−=+10PJ¯e(t,τ)u)p++H¯tρ+Cqtτ−=+10τ(P1e(,τu+)p uanciofomrmplecteolnydnitoioisnyalchdaisntnriebluftrioomn fQoτr atollXmτ.aNnedxtx, wleeadpirnegsetnot and p , k =1,2,...,K is as defined earlier. an upper bound on β (s,α c ) for the D =0 case ProoPf:maSxe,ke the Appendix E. (cid:4)P P (centralized scheme).ADPP,k t|E[αt:t] 7 A. Bound on β (s,α c ) when D =0 where µ , F Ω (K +1) is the number of possible values In order to gAeDtPPi,nksightst|oEn[αtth:te] proof of bounding the β1 that Xt can tak|e,|t∈N, and θ ,max (eκ2−1),12 <1. coefficientforthegeneralscenarioofD 0,wefirstconsider A finite time bound can easily be obntained byosubstituting ≥ the centralized scheme, i.e., D = 0, and later we provide the upper bound of Theorem 4 in Theorem 3. However, in proofs and results for the D >0 case. For D =0, the queue order to get more insights into the main result of this paper, update in the vector form becomes we will look at the asymptotic in the following subsection. 1) Asymptotics: Note that when D = 0, the authors in Q =max Q +X C,0 (24) t+1 t =0,t [19] prove the convergence of the algorithm to the optimal { 6 − } where C , (c ,c ,...,c ). Recall that Step 2 of the in probability.Here, we use a differentapproachcomparedto 1 2 K AlgorithmusesQ andthe outputfromStep 1 to finda pure [19] to show an almost sure as well as a high probability t strategy in a deterministic fashion that maximizes an upper convergence of the the proposed ADPP algorithm to the bound on the drift-plus-penaltyexpression. Thus, the strategy optimal,whenD =0.Byusingtheinsightsobtainedhere,we isa deterministicfunctionofthequeue.Notethatconditioned generalize the result to an arbitrary D 0 in the subsequent ≥ on the event c , the output of Step 1 is i for all time subsections. First, in the following lemma, we provide a high slots τ α ,E.[.α.t,:tt] . Conditioned on c , th∗is leads to the probability guarantees of the ADPP algorithm when D = 0 ∈{ t } E[αt:t] and t . following Markov chain model →∞ Lemma3:UnderAssumptions1-4,fortheproposedAlgo- (Qαt,Xαt)−→(Qαt+1,Xαt+1)−→...−→(Qt,Xt). rithm with D = 0, αt = O(√t), wt = O(√t), V = O(√t), κ < log3, and some finite positive constant c, the following Fig.2depictsthegraphicalmodelrepresentationoftheabove. holds. In order to prove an upper bound on the mutual information, For every ǫ>0, we have we use the Strong Data Processing Inequality (SDPI) for the • graphical model shown in Fig. 2. t 1 1 − lim Pr p (τ) p(opt)+(c+1)∆ t (t 0 ≤ π,Pi∗ Qt−s Qt−s+1 Qt−s+2 Qt−1 Qt →∞ τX=0 +(c+1)J¯+ǫ =1 (27) and limt→∞Pr 1t tτ−=10p(cid:9)k(τ)≤ck+ǫ = 1, k = 1,2,...,K. n o P In the above, ∆ =max b (d +ν), π,Pi∗ k=0,1,2,...,K max,k π,Pi∗ t 1 Xt−s Xt−s+1 Xt−s+2 Xt−1 Xt J¯, max p lim 1 − π π +δ , max,k τ 1 0 k K t t k − k ! Fig. 2. Figure shows the graphical model corresponding to the ADPP ≤ ≤ →∞ τX=0 algorithm withD=0andtimeslotsfromt−s≥αt tot. and pmax,k, k =1,2,...,K is as defined earlier. Proof: See Appendix G. (cid:4) Note that the Assumption 4 facilitates the analysis of the Theinterpretationsoftheaboveresultwillbeprovidedlater. β mixing coefficient, and is also related to the differential 1 Next, we use Lemma 2 along with the Borel-Cantelli Lemma privacyconstraintin [23]. The followingtheorem providesan toprovideanalmostsureconvergenceoftheADPPalgorithm. upper bound on the mixing coefficient. Theorem 4: Given Assumption 4, for D = 0, κ < log3, Theorem 5: Under Assumptions 1-4, for the proposed and for any t s αt, an upper bound on the β1 mixing Algorithm with D = 0, α = (√t), w = (√t), ≥ ≥ t t coefficient is given by V = (√t), κ < log3, and someOfinite positive cOonstant O θ(s 1)/2 c, the following holds. β (s,α c ) − [logµ],k =0,1,2,...,K ADPP,k t|E[αt:t] ≤ √2 For every ǫ>0, almost surely, we have • (25) t 1 where µ , F Ω (K +1) is the number of possible values lim 1 − p (τ) p(opt)+(c+1)∆ +(c+1)J¯+ǫ PthraotoXf:tSceaenthtaek|Ae,p|tpe∈ndNi,xaFn.d(cid:4)θ ,maxn(eκ2−1),21o<1. t→∞ t τX=0 0t 1 ≤ π,Pi∗ (28) Note that s = ut, and suppose ut grows with t, then and lim 1 − pk(τ) ck+ǫ, k =1,2,...,K. (29) the Theorem says that the mixing coefficient goes down to t t ≤ →∞ τ=0 X zero exponentially fast with t. Thus, we have the following In the above, ∆ , J¯, and p , k = 1,2,...,K are as important corollary. π,Pi∗ max,k defined earlier. Corollary1:GivenAssumption4,forD =0,u = (√t), t O Proof: See Appendix H. (cid:4) κ<log3, and for any t s α , an upper bound on the β ≥ ≥ t 1 From Theorems3 and 5, it is easy to see that the error can mixing coefficient is given by be reduced by reducing ∆ , which amounts to reducing π,Pi∗ β (u ,α c ) (θO(√t)/2)[logµ], (26) dπ,Pi∗ andν.Notethatdπ,Pi∗ <δcanbereducedbyreducing ADPP,k t t|E[αt:t] (cid:22) √2 the error in the covering of the probability space Pc. This 8 comesatacostofincreasedmetricentropysinceδ needstobe k = 0,1,2,...,K, where µ , DF Ω (K + 1) is the D reduced. However, as t , increased metric entropy does number of possible values that X can|ta|ke, t N, and t noteffect the overallres→ult.∞Further,a lower value of J¯signi- θ ,max (eκD−1),1 <1. ∈ 2 2 fies lesser error. This is possible only when the rate at which Proof: Fonr the ease ofonotation, let X ,X the probability measure π converges to π is “sufficiently” i,t t (i+1)D+1:t iD high.Inparticular,thisistrutewhen tτ−=10kπτ−πk1 =O(tζ), apnrodveQtih,tat,I(QXt−;(Xi+1)D+1:ct−iD). FirsIt(,Xin t;hXe−followincg,−w)e. whereζ <1.Inthenextsubsection,Pweprovideanalmostsure Consider t t−s|E[αt:t] ≤ 0,t ls−1,t|E[αt:t] as well as high probability result for any D 0. ≥ (a) B. Bound on β (s,α c ) when D 0 I(X ;X c ) I(X ;X c ) ADPP,k t|E[αt:t] ≥ t t−s|E[αt:t] ≤ 0,t t−s|E[αt:t] As in the previous subsection, we use s = ut. For D ≥ 0, (b) I(X ;X c ),(32) the queue update in the vector form is given by ≤ 0,t ls−1,t|E[αt:t] Q =max Q +X C,0 (30) t+1 t =0,t D where (a) and (b) follow from the definitions of X { 6 − − } t ∈ iwnhethreisCse,cti(ocn1,.cA2,s..i.n,ctKhe)aDnd=X6=00,tc−aDsei,swasedceofinndeidtieoanrlioenr tXua0l,tinafnodrmXatito−ns is∈noXn-lns−eg1a,tt,ivea.ndWethneeefdacttothuaptpetrhebomunud- sfthoheroreathvllaenntditmnEeo[cαtats:lttoi]o,tsnasnτQd ∈the{r,eαfto,Qr.e.,.,t,Qhte}.,3o.u.Dt.pe,ufiQtnoef,tShaetnedpfoXl1lowisini,g∗ QiIl(a0Xr,tt0o,t;thXXel0sD,−t1f,=otr|m0E[scαcaat:steM]),,aarwskhoeivxcphclhaisaininoe,bdtwanieenxeotdb.tSianiinnactehmeXanflonsl−elor1w,stiim→ng- 1:n 1 2 n 1:m → { } bound from the SDPI X ,X ,...X . Unlike the D = 0 case, conditioning on 1 2 m { } c leads to the following Markov chain model E[αt:t] I(X ;X c ) η I(X ;Q c ), (Qt−lsD+1:t−(ls−1)D,Xt−lsD+1:t−(ls−1)D)→... ls−1,t 0,t|E[αt:t] ≤ ch1 ls−1,t 0,t|E[αt:t](33) →(Qt−2D+1:t−D,Xt−2D+1:t−D)→(Qt−D+1:t,Xt−D+1:t), where ηch1 is the Dobrushin’s contraction coefficient for the where l , s+1 . Fig. 3 depicts the graphical model channel from Q0,t to X0,t defined as s ⌈ D ⌉ representation of the above. Note that the ith pair in the η , sup Pr X Q =γ, c Q(ls−1);t Q1;t Q0;t ch1 γ6=γ′k n 0,t| 0,t E[αt:t]o Pr X Q =γ′, c . (34) Qt−lsD+1 Qt−2D+1 Qt−D−1 Qt−D Qt−D+1 Qt−1 Qt − 0,t| 0,t E[αt:t] kTV n o ′ In the above, γ and γ represent the vector values taken by Q .Itwillbeshownlaterthatη <1.Notethatbysimple 0,t ch1 data processing inequality, we have from (33) that I(X ;X c ) η I(X ;Q c ) ls−1,t 0,t|E[αt:t] ≤ ch1 ls−1,t 0,t|E[αt:t] Xt−lsD+1 Xt−2D+1 Xt−D−1 Xt−D Xt−D+1 Xt−1 Xt ≤ηch1I(Xls−1,t;Q1,t,X1,t|E[cαt:t]).(35) X(ls−1);t X1;t X0;t The first inequality above follows from the fact that Q is 0,t a deterministic function of X and Q . Since X aFliggo.ri3th.mFwigituhreDshow0santdhetimgreapslhoitcsalfrommodtel csorreαsptotnoditn.g to the ADPP (Q2,t,X2,t) (Q1,t,X1,t) fo1r,mt sa Ma1rk,tovchain,tlhse−1a,bto→ve ≥ − ≥ → can be further bounded as follows (see Fig. 3) Markovchain is (Q ,X ), i= t (i+1)D+1:t iD t (i+1)D+1:t iD 0,1,...,l 1.Inord−ertoprovea−nupper−boundont−hemutual informatiosn−, we use the Strong Data Processing Inequality I(Xls−1,t;X0,t|E[cαt:t])≤ηch1I(Xls−1,t;Q1,t,X1,t|E[cαt:t]) η η I(X ;Q ,X c ),(36) (SDPI) for the graphical model shown in Fig. 3. Using the ≤ ch1 ch2 ls−1,t 2,t 2,t|E[αt:t] abovementionedMarkovproperty,weneedtoboundtheterm Ipr(eXsetn;Xt at−bso|unEd[cαto:nt])thferoβm1(m22ix)i.nIgnctoheeffifoclileonwtifnogrtDheore0mc,awsee. w(Qhe2,rte,Xηc2h,t2) tios (tQhe1,tD,oXb1r,uts)hdine’fisnecdoeafsficient for the channel Theorem 6: Given Assumption 4, for D 0, D≥< log3, ≥ κ and for any t s max αt,2D+1 , an upper bound on η , sup Pr X ,Q X =p,Q =q, c the β mixing≥coeffi≥cient is{given by } ch2 ′ ′ 1,t 1,t| 2,t 2,t E[αt:t] 1 (p,q)=(p ,q ) 6 (cid:12)(cid:12) n o βADPP,k(s,αt|E[cαt:t])≤ θ(s−D√+21)/2D [logµD], (31) −PrnX1,t,(cid:12)(cid:12)Q(cid:12)(cid:12)1,t|X2,t =p′,Q2,t =q′,E[cαt:t]o(cid:12)(cid:12)TV. (37) (cid:12)(cid:12) 3Recallthati∗ istheindexcorrespondingtoπi∗,whichisthedistribution Note that Xls−1,t → (Qj,t,Xj,t) → (Qj−1,t,Xj−1(cid:12),(cid:12)t) forms “close” toπ. aMarkovchainforallj =2,3,...,ls 2.Thecorresponding − 9 Dobrushin coefficient is given by Lemma 6: Under Assumptions 1-3, for the proposed Al- gorithm with D 0, α = (√t), w = (√t), κ < log3, η , ≥ t O t O D chj and some finite positive constant c, the following holds. sup′ ′ Pr Xj−1,t,Qj−1,t|Xj,t =p,Qj,t =q,E[cαt:t] • For every ǫ>0, we have (p,q)=(p ,q ) 6 (cid:12)(cid:12) n o t 1 −Pr (cid:12)(cid:12)X(cid:12)(cid:12) j−1,t,Qj−1,t|Xj,t =p′,Qj,t =q′,E[cαt:t] TV tlim Pr(1t − p0(τ)≤p(opt)+(c+1)∆π,Pi∗ = sup′n′ Pr Qj−1,t|Xj,t =p,Qj,t =q,E[cαt:t]o(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) →∞ +τX(=c0+1)J¯+ǫ =1 (43) (p,q)=(p ,q ) 6 (cid:12)(cid:12)(cid:16) n o (cid:12)(cid:12)(cid:12)(cid:12)×Pr Xj−1,t|Qj−1,t =q,E[cαt:t] and lim Pr 1 t−1p (cid:9)(τ) c +ǫ =1, (44) − Pr Qjn−1,t|Xj,t =p′,Qj,t =q′,E[ocα(cid:17)t:t] t→∞ (t τX=0 k ≤ k ) (cid:16) nPr X Q =q′, c o .(38) k =1,2,...,K. × j−1,t| j−1,t E[αt:t] TV In the above, ∆ =max b (d +ν), Using these in (36), nand applying the bound repeoat(cid:17)ed(cid:12)(cid:12)(cid:12)(cid:12)ly, we π,Pi∗ k=0,1,2,...,K max,k π,Pi∗ (cid:12)(cid:12) t 1 get J¯, max p lim 1 − π π +δ , max,k τ 1 I(X ;X c ) 0≤k≤K t→∞ t τ=0k − k ! ls−1,t 0,t|E[αt:t] X ≤ηch1ηch2I(Xls−1,t;Q2,t,X2,t|E[cαt:t]) aPnrdoopf:maTx,hk,ekp=roo1f,2is,.s.i.m,KilaristoasthdeefipnreodofeaorflieLr.emma 3, and ls−2 hence omitted. (cid:4) η I(X ;Q ,X c ).(39) ≤ chj ls−1,t ls−2,t ls−2,t|E[αt:t] NotethattheeffectofDiscapturedthroughtherequirement jY=1 of κ, i.e., κ< log3. In particular, large values of the delay D D But, I(X ;Q ,X c ) H(X ) requirestringentconstrainton the “noisyness”of the channel, logN , wlsh−e1r,et Nls−2,,t DlFs−2Ω,t(|KE[α+t:t]1)≤is the nlus−m2b,ter o≤f i.e., a lower value of κ. However, in many cases, κ need D D wpoessgiebtlevaluesthatXls−2,t ca|n|take. Usingthisin the above, tnootzebreoleoxwp,onanendtihaellnyc.eThthues,βth1ecAoeDfPfiPcieanlgtomriathymnomtacyonnvoetrbgee asymptotically optimal. In the next subsection, we consider ls−2 the generalcase of D >1. We show thatthe ADPP algorithm I(X ;X c ) η logN . (40) 0,t ls−1,t|E[αt:t] ≤ chj D convergesto the optimal in the almost sure sense. jY=1 Theorem 7: Under Assumptions 1-4, for the proposed Next, in Lemmas 4 and 5, we prove an upper bound on ηchj Algorithm with D≥1, αt =O(√t), wt =O(√t), κ≤ loDg3, for every j =1,2,...,l 2. and some finite positive constant c, the following holds. s − Lemma 4: Under Assumption 4, for every D 0, and For every ǫ>0, almost surely, we have D < log3, we have the following upper bound on η≥ • κ ηch1 ≤max (eκD2−1),21 <1. ch1 (41) tl→im∞1t τXt−=10p0(τ)≤p(opt)+(c+1)∆π,Pi∗+(c+1)J¯(4+5ǫ) (cid:26) (cid:27) Proof: See Appendix I. (cid:4) 1 t−1 and lim p (τ) c +ǫ, k =1,2,...,K. (46) k k Lemma 5: Under Assumption 4, for every D 0, and t t ≤ D < log3, we have the following upper bound ≥ →∞ Xτ=0 κ In the above, ∆ , J¯, and p , k = 1,2,...,K are as (eκD 1) 1 defined earlier. π,Pi∗ max,k ηchj ≤max 2− ,2 <1, j =1,2,...,ls−2. (42) Proof: The proof is similar to the proof of Theorem 5, and (cid:26) (cid:27) hence omitted. (cid:4) Proof: See Appendix J. (cid:4) In the next section, we present the simulation results to Since there are l 1 terms in the overall product, we need s − corroborate some of the observations made in the paper. l 1 1 s 2D+1. Using the bounds in Lemmas 4 s − ≥ ⇒ ≥ and 5 in (40), and substituting the result in (22), we get the desired result, which completes the proof of Theorem 6. (cid:4) V. SIMULATIONRESULTS Theaboveresultsaysthatforagivenκ,theADPPalgorithm Forthesimulationsetup,weconsiderthe3sensorsexample convergestotheoptimalprovidedthedelayD intheavailable of Sec. II. The problem is to maximize the average of the samples at each node is bounded by a constant log3. utility in (1) subject to an average power constraint of 1/3. κ 1) Almost sure convergence: Using Lemmas 4 and 5, and Here, the utility is the negative of the cost. The probability the main result in Theorem 6, the following result can be measure π is chosen from a set of 8 distributions, and t obtained in a fashion similar to the D = 0 case. In the convergesto 0.1,0.7,0.1,0.1 .Duetolackofspace,weskip { } following lemma, we provide a high probability guarantees thedetailsofthedistributionthatisusedinthetransienttime. of the ADPP algorithm for a general D 1 and as t . The optimal value of this is p(opt) =0.394. When α (t)=1, i ≥ →∞ 10 APPENDIX A 1 MCDIARMID’S INEQUALITY 0.9 Theorem 8: (See [24]) Let Z ,Z ,...,Z be independent 1 2 n 0.8 V = 50 randomvariablesalltakingvaluesinthe set .Letf : n utility0.7 V = 40 R be a function that satisfies the following Z Z → ge V = 20 Avera0.6 optimal f(z1,...,zi,...,zn)−f(z1,...,zi′,...,zn) ≤ci, (47) 0.5 for(cid:12)(cid:12)all i=1,2,...,n. Then, for all ǫ>0 (cid:12)(cid:12) 0.4 (cid:12) (cid:12) 2ǫ2 Pr f Ef >ǫ exp − . (48) 0 1000 2000Time t3000 4000 5000 { − }≤ n c2 (cid:26) i=1 i(cid:27) Fig.4. Figureshowstheplotoftheaverageutility versustime. APPENDIX B P PROOF OF THEOREM1 Inthisproof,weusethefactthatbydecreasingtheobjective 1 function and increasing the constraints c , k = 1,2,...,K k 0.9 in P0 will result in a decreased optimal value. Consider the er0.8 V = 50 cost/penalties of the problem P0 w ge po0.7 V = 40 limsup1 t−1 π (ω)p (α(τ),ω) Avera00..56 V = 20 t→∞ t τX=0ωX∈Ω τ k constraint (a) 1 t−1 0.4 = limsup i∗(ω)pk(τ) 0 1000 2000 3000 4000 5000 t→∞,t>t′"t Xτ=0ωX∈ΩP Time t t 1 1 − + (π (ω) π(ω))p (τ) τ k Fig.5. Figureshowstheplotoftheaveragepowerversustime. t − τ=0ω Ω XX∈ t 1 1 − i = 1,2,3, a power of 1 watt each is consumed. Figures 4 + t (π(ω)−Pi∗(ω))pk(τ)# (49) τ=0ω Ω and 5 show the plots of utility and penalty averaged over XX∈ ′ 1000 instantiations, versus time t for different values of V, for k = 0,1,2,...,K and any t > 0. In the above, D = 00 and wt = 40 for all t, demonstrating the tradeoff pk(τ) , pk(α(τ),ω(τ)) and (a) follows by adding and in terms of V. For large values of t, the utility achieved subtracting i∗(ω) as mentioned earlier and π(ω). Since by the algorithm with V = 20 is close to optimum while limt πtPπ 1 =0, for every ν >0, there exists a t′ N →∞k − k ′ ′ ∈ satisfying the constraintstherebyconfirmingthe optimalityof such that for all t>t , πt π 1 <ν. Using this t , and the k − k thealgorithm.Itisimportanttonotethatthemixingcoefficient fact that can be easily estimated, and hence mixing condition can be (π (ω) π(ω))p (t) (π (ω) π(ω)) p (t) verified through simulation. τ k t k (cid:12) − (cid:12)≤ | − || | (cid:12)(cid:12)ωX∈Ω (cid:12)(cid:12) ωmX∈aΩx p , p ν (50) (cid:12) (cid:12) max,k min,k VI. CONCLUDING REMARKS (cid:12) ′ (cid:12) ≤ {| | | |} for every k and t>t , we have In this paper, we considered a distributed stochastic opti- t 1 mization problem with independent and asymptotically sta- b ν 1 − (π (ω) π(ω))p (τ) b ν, max,k τ k max,k tionary states. We showed that this stochastic optimization − ≤ t − ≤ τ=0ω Ω problem is approximately equal to a LP that is a function XX∈ (51) of the limiting distribution of the state. For the proposed where bmax,k ,max pmax,k , pmin,k . Similarly, we have {| | | |} approximate DPP algorithm, we showed that with certain t 1 1 − wpriothbianbicliotinesstaγn0tsaonfdthγe1,opthtiemaavlesroalguetiocnosatnadndthepecnoanlstiterasinatrse, −bmax,kdπ,Pi∗ ≤ t τ=0ω Ω(π(ω)−Pi∗(ω))pk(τ) XX∈ respectively, provided the waiting time t > a threshold. The b d , (52) ≤ max,k π,Pi∗ threshold is in terms of the mixing coefficient that indicates whered isasdefinedearlier.Using(51)and(52)in(49), the non-stationarity of the cost/penalties. The approximation π,Pi∗ we get the following lower bound for all k =1,2,...,K. errors capture the degree of non-stationarity (i.e., π π ), t 1 k − k t 1 thenumberofsamplesusedtocomputeanestimateofthestate 1 − limsup Ep (τ) distribution.Also,wehaveprovedanalmostsureconvergence t k t of the proposed algorithm to a constant close to the optimal. →∞ τX=0 t 1 Finally, we presented simulations results to corroborate our 1 − theoretical findings. ≥limt sup t Pi∗(ω)pk(τ)−∆π,Pi∗, (53) →∞ τX=0ωX∈Ω