1 Charging Scheduling of Electric Vehicles with Local Renewable Energy under Uncertain Electric Vehicle Arrival and Grid Power Price Tian Zhang, Wei Chen, Member, IEEE, Zhu Han, Senior Member, IEEE, and Zhigang Cao, Senior Member, IEEE Abstract—In the paper, we consider delay-optimal charging be charged periodically. Then the EV charging becomes an scheduling of the electric vehicles (EVs) at a charging station important topic [4], [5]. In particular, there are some works with multiple charge points. The charging station is equipped on the scheduling of EV charging in literature [6]-[20]. 3 with renewable energy generation devices and can also buy In[6],theEVbatterychargingbehaviorwasoptimizedwith 1 energy from power grid. The uncertainty of the EV arrival, 0 the intermittence of the renewable energy, and the variation of the objective to minimize charging costs, achieve satisfactory 2 the grid power price are taken into account and described as state-of-energylevels,andoptimalpowerbalancing.In[7],the independentMarkov processes. Meanwhile, the charging energy problemofoptimizingplug-inhybridelectricvehicle(PHEV) n for each EV is random. The goal is to minimize the mean charge trajectory (i.e. timing and rate of the charging) was a waiting time of EVs under the long term constraint on the J studied to reduce the energy cost and battery degradation. cost. We propose queue mapping to convert the EV queue to 1 the charge demand queue and prove the equivalence between For the purpose of improving the satisfiability of EVs, a 1 the minimization of the two queues’ average length. Then we reservation-basedschedulingalgorithmforthechargingstation focus on the minimization for the average length of the charge to decide the service order of multiple requests was proposed ] demand queue under long term cost constraint. We propose a in [8]. In [9], a joint optimalpowerflow (OPF)-charging(dy- C framework ofMarkovdecisionprocess(MDP)toinvestigatethis namic) optimizationproblemwas formulatedwith the goal of O schedulingproblem.Thesystemstateincludesthechargedemand queuelength, thecharge demand arrival, theenergy level in the minimizingthegenerationandchargingcostswhilesatisfying . h storage battery of the renewable energy, the renewable energy the network, physical and inelastic-load constraints. In [10], t arrival, and the grid power price. Additionally the number of utilizingtheparticleswarmoptimization,aproposedalgorithm a charging demands and the allocated energy from the storage m optimally manages a large number of PHEVs charging at battery compose the two-dimensional policy. We derive two a municipal parking station. In [11], the minimization of [ necessary conditions of the optimal policy. Moreover, we discuss the reduction of the two-dimensional policy to be the number the waiting time for EV charging via scheduling charging 1 of charging demands only. We give the sets of system states activitiesspatiallyandtemporallyinalarge-scaleroadnetwork v for which charging no demand and charging as many demands was investigated. By modeling an EV charging system as 7 as possible are optimal, respectively. Finally we investigate the 5 a cyber-physical system, a decentralised online EV charging proposed radical policy and conservative policy numerically. 4 schedulingschemewasdevelopedin[12].In[13],theauthors 2 IndexTerms—Electricvehicle,chargingscheduling,renewable formulated the EV charging scheduling problem to fill the 1. energy, Markov decision process. electric load valley as an optimal control problem, and a 0 decentralized algorithm was derived. In [14], a strategy to 3 I. INTRODUCTION coordinate the charging of plug-in EVs (PEVs) was proposed 1 As an important method of operation to mitigate the short- by using the non-cooperative games [15]. Flexible charging : v age of the fossil fuel and severe environmentalproblems, the optimizationforEVsconsideringdistributiongridconstraints, i electric vehicle (EV) technology has attracted much interest both voltage and power, was investigated in [16]. In [17], X in recentyears.Comparedto conventionalvehicles, EVshave the trade off between distribution system load with quality r a advantages in the following aspects: energy efficiency, eco- of charging service was considered, and the centralized algo- effect, performance benefits, and energy independence [1]. rithms to schedule the chargingof vehicles were designed. In However, a fuel driven vehicle can produce less CO2 than [18] and [19], real-time scheduling policies of EV charging an EV if the charging energy is entirely produced by coal- were considered when both the renewable energy and energy fired powerplants [2]. Thus, the renewable energy(e.g., solar from the grid are available. In [20], the PEV charging and or wind energy [3]) should be the energy source of the EVs wind power scheduling were integrated, and the synergistic fully or at least partially to achieve the real environmental controlalgorithmofplug-onvehiclechargingandwindpower advantages. scheduling was proposed. Since EVs are propelled by an electric motor (or motors) In the paper, we focus on the scheduling approach of that is powered by rechargeable battery packs, EVs need to EV charging at a charging station. The charging station has multiplechargepointsand isequippedwith renewableenergy T.ZhangiswiththeSchoolofInformationScienceandEngineering,Shan- dong University, Jinan 250100, China. He is also with Tsinghua University. generationdevices and storage battery. The chargedenergy at E-mail:[email protected] a charge point during a period is constant and is called an W. Chen and Z. Cao are with the State Key Laboratory on Mi- energy block. We model the arrival of the renewable energy crowave and Digital Communications, Department of Electronic Engineer- ing, Tsinghua National Laboratory for Information Science and Technology asaMarkovchain.Thechargingenergycanalsobepurchased (TNList),TsinghuaUniversity,Beijing100084,China.E-mail:{wchen,czg- from power grid, and the price changes also according to dee}@tsinghua.edu.cn anotherMarkovchain.The arrivalofthe EVsis assumed asa Z. Han is with the Department of Electrical and Computer Engineering, University ofHouston,Houston,TX77004,USA.E-mail:[email protected] Markovprocess.OnceanEVarrivesatthechargingstation,it 2 waits ina queuebeforecharging.Ineachperiod,thecharging Energy from the station chooses some EVs from the head of the queue for Renewable Storage power grid charging. Meanwhile, the station also determines how much energy battery energy is supplied from the storage battery (the rest of the requiredenergyissuppliedfromthepowergrid).Theobjective is minimizing the mean waiting time of EVs under the long Electric vehicle term cost constraint. Charge point arrival Since the amount of charging energy (i.e., the number Queue of energy blocks to charge) for each EV is random, the scheduling problem is very challenging. We propose queue Charge point mappingmethodtosolvethedifficulty.WemaptheEVqueue to a charge demand queue. In the charge demand queue, each demand means an energy block that need to charge and some consecutive demands correspond to an EV’s required Charge point charingenergy.We provethattheminimizationoftheaverage EV queue length is equivalent to the minimization of the A charging station with Mcharge average charge demand queue length. Then we focus on the points (charge capacity =M) charge demand queue minimization under the cost constraint. The scheduling problem can be equivalently reconstructed as Fig.1. Systemmodel follows.Thedemandarrivesaccordingtoadiscrete-timebatch Markovianarrivalprocess (D-BMAP) and waits in the charge charging no demand is optimal. We also obtain the system demand queue before service (charging). In each period, the state conditions when charging as many demands as possible charging station chooses some demands from the head of the is optimal. charge demand queue for charging. Meanwhile, the station The rest of the paper is structured as follows. In Section alsodetermineshowmuchenergyissuppliedfromthestorage battery (the rest of the required energy is supplied from the II, the system model is described and we formulate an op- timization problem that can be studied under the framework power grid). The objective is minimizing the mean length of of MDP. Section III presents a spacial case of the formulated the chargedemandqueue underthe long term cost constraint. optimization problem as a constrained MDP to demonstrate Next, we find that the reconstructed optimization problem the solving process of the general case. Next, we analyze can be studied under a Markov decision process (MDP) the optimal policy of the constrained MDP in Section IV. In framework. The system state contains the charge demand SectionV,thenumericalresultsareperformed.Finally,Section queue length, the demand arrival, the energy level in the VI concludes the paper. storage battery of the renewable energy,the renewable energy arrival, and the grid power price. Meanwhile, the number of charging demands and the allocated energy from the storage II. SYSTEMMODEL AND PROBLEM FORMULATION battery constitute the two-dimensional policy. We find that Time is divided into periods of length τ each. The EVs the general case of the reconstructed optimization problem arriveatthechargingstationaccordingtoafinite-stateergodic can be analyzed similarly as the analysis of a special case. Markov chain {A[n]}. The EVs wait in a queue before Then we focus on the analysis of the special case that charging as illustrated in Fig. 1. The charging station has M is formulated as a constrained MDP [21]. We analyze the charge points, i.e., at most M EVs can be charged in each optimal two-dimensional policy of the constrained MDP by period. The charging station has renewable energy generation transforming to an average cost MDP and its corresponding devices, and it can also gets power from the power grid. The discount cost MDP thereafter. First, the constrained MDP renewable energy is modeled as another finite-state ergodic is converted to an unconstrained MDP by using Lagrangian Markov process {E [n]}. The renewable energy is viewed a relaxation. Moreover, we derive that the optimal solution of as free, and the price for the grid power during the n-th the unconstrained MDP with a certain Lagrangian multiplier periodisdenotedasP[n].Thegridpowerpriceremainsstatic is the optimal for the original constrained MDP. Next, the duringeachperiodandchangesbetweendifferentperiods.The unconstrained MDP can be analyzed by transforming to its sequenceoftheprice,{P[n]},isafinite-stateergodicMarkov corresponding discount cost MDP. We obtain two necessary chain. We assume that the charged energy at one charging conditions for the optimal solution. Third, we analyze the point during a period is constant, and is denoted as E.2 In relations between the two elements of the two-dimensional the n-th period, k[n] EVs from the head of the EV queue policy, and find that the number of charging demands1 is are allowed to charge. During the n-th period, the charging dominant. Thus, we propose a conjecture that the constrained station allocates w[n] power from the storage battery, and the MDP problem can be reduced to a MDP problem with the rest power will be supplied by the power grid. Assume that policy to be the number of charging demands only. We then therequiredchargingenergyoftheEV, E , isindependenton c derive the conditionsof the system state when the policy that 2ItisassumedthatifanEVutilizes mchargepoints duringaperiod, the 1 Inthespecialcase, wecanuse“EV”and“demand” interchangeably. amountofchargedenergyismE. 3 EV queueQ (π′,π′,···)withπ′ generatinganaction(k′[n],w[n])witha 0 1 n EV arrival EV 2 EV 1 probability [21], [24] at the n-th period. We denote the set of ′ ′ ′ all policies as Π . Let x [n]=(q [n],a [n],e [n],e [n],p[n]) e b a ′ be a (fixed) system state. The feasible (k [n],w[n]) in state ′ ′ ′ x [n] belongs to K (x [n]) = 0,1,··· ,min{q [n],M} × e W(x′[n])={0, 1,··· ,eb[n]}.5Theoptimizationproblemthat τ τ (cid:8) (cid:9) minimizes the mean charge demand queue length under the An energy block long term cost constraint, B, can be expressed as n−1 ′ 1 ′ Demand arrival Charge demand queue Q Requoirfe EdV e 2nergy Requoirfe EdV e 1nergy πm∈iΠn′Dxπ :=linm→s∞upnEπx′ "Xi=0Qe[i]# (2) e Fig.2. Queuemapping n−1 Bxπ′′ :=limsupn1Eπx′′ C′[i] ≤B, (3a) each other, and Ec = LE with L being uniformly distributed n→∞ "i=0 # inD[1i,re2c,t..a.,nCal]y,3siis.eo.f, Lthe∼EUV[1q,u.e..u,eCl]e.ngthunderthe longterm s.t. K′[i]≤min{Qe[i],M}X, (3b) E [i] b cost constraint is difficult due to the randomness of L. We W[i]≤ , (3c) pErVopionstehethEeVquqeuueeuemcaoprpreinsgpomndetshtoodseavsesrhaolwconnsinecFuitgiv.e2c.hEaargche with initialstate x′ =(qτe,a′,eb,ea,p). demands (the number of the demands denotes the amount of Since D-BMAP can be represented by a two-dimensional required energy) in the charge demand queue.4 The number discrete-time Markov chain (DTMC) [22], the optimization of EVs at the beginning of the n-th period is Q[n] and the problem in (2) can be analyzed in the framework of MDP. length of the charge demand queue is denoted as Q [n]. We Moreover,the following lemma proves the equivalence of the e convert the average EV queue length minimization to the meanenergydemandqueuelengthminimizationandthemean average chargedemand queue minimization.Furthermore,we EV queue length minimization. will prove that they are equivalent. Lemma 1. The minimization of the mean charge demand The demand arrival can be given by A′[n] = A[n]L , i=1 i queue length is equivalent to the minimization of the mean where L ∼U[1,...,C]. i EV queue length. P Remark: As {A[n]} is a Markov chain, we can derive that {A′[n]} is a D-BMAP. Proof: See Appendix A. ′ In the n-th period, k [n] demands from the head of the charge demand queue are allowed to charge. During the n- III. SIMPLIFIED PROBLEM th period, the charging station allocates w[n] power from the Forconciseness,wegiveaspecialcaseofproblem(2)inthe storage battery, and the rest power will be supplied by the sectionandinvestigatethisrelativelysimplifiedprobleminthe powergrid. Denote the numberof chargeddemandsin the n- ′ following of the paper to show the solving process. General thperiodasK [n].Theevolutionofthechargedemandqueue ′ ′ cases can be analyzed through similar solving process. length, Q [n], is Q [n+1]=Q [n]−K [n]+A [n]. Denote e e e When C =1, we have L =1. Then the queue mapping is thecapacityofthe renewableenergystoragebatteryas E . i max anidentitytransformand“EV”and“demand”areinterchange- The stored battery energy at the beginning of the n-th period able. Thus, we can directly analyze the EV queue using the is E [n]. The battery energy evolution can be expressed as b ′ ′ MDP framework. We have K[n]= K [n], A[n] = A [n] and Eb[n+1]=min Eb[n]−W[n]τ +Ea[n],Emax Q[n]=Qe[n]. The queue length evolution is − (cid:8):= Eb[n]−W[n]τ +Ea[n] (cid:9). (1) Q[n+1]=Q[n]−K[n]+A[n]. (4) Thecostinthen-thperiod(cid:0)isC′[n]= K′[n]E−W[(cid:1)n] +P[n]. The battery energy evolution is the same as (1). The cost at τ ′ the n-th period is given by Denote the state space as X and(cid:16)denote the actio(cid:17)n space ′ as A . Let the (random) system state and action in the n-th K[n]E + ′ ′ ′ C[n]= −W[n] P[n], (5) period be X [n] = (Qe[n],A [n],Eb[n],Ea[n],P[n]) ∈ X τ ′ ′ ′ and K [n],W[n] ∈ A , respectively. Define a policy π = (cid:16) (cid:17) where (·)+ := max{·,0}. The system state becomes 3C(cid:0)isagivenconsta(cid:1)nt. X[n] = (Q[n],A[n],Eb[n],Ea[n],P[n]) with state space 4AdemandmeansE energy(i.e.,anenergyblock)needtobecharged.In X and the action is (K[n],W[n]) with action space A. Fig. 2, the first EV (EV 1) in the EV queue wants to charge 3×E, then {X[n],(K[n],W[n])} is a controlled Markovprocess. Define it corresponds to the first three consecutive charge demands in the charge a policy π = (π ,π ,···) that π generates an action demandqueue.ThesecondEV(EV2)charges2×E,thenitcorrespondsto 0 1 n thetwoconsecutivechargedemandsafterthefirstEV’scorrespondingcharge demands. 5Theenergyhasbeendiscretized. 4 (k[n],w[n]) with a probability at the beginning of the n- The optimal solution for the discounted problem is called a th period. We denote the set of all policies as Π. The discount optimal policy. feasible (k[n],w[n]) in state x[n] belongs to K(x[n]) = The following lemma reveals the existence of the optimal 0,1,··· ,min{q[n],M} × W(x[n]) = {0,τ1,··· ,ebτ[n]}. stationary deterministic policy of UPβ, and furthermore, how A stationary deterministic policy is π = (g,g,···), where to derive the average cost optimal policy. (cid:8) (cid:9) g is a measurable mapping from X to K(x[n])×W(x[n]). Lemma 3. There exists a stationary deterministic policy Our objective is to find a policy that minimizes the mean (k,w) that solves UP , which can be obtained as a limit of β queue delay under the long run constraint on the cost. The discount optimal policies as α→1. optimization problem (i.e., the constrained MDP) is given by Proof: See Appendix C. n−1 minDπ :=limsup 1Eπ Q[i] (6) Based on the above analysis, we find that the constrained π∈Π x n→∞ n x" # MDP can be analyzed through the defined average cost MDP i=0 X and its corresponding discount cost MDP thereafter. Hence, 1 n−1 we first investigate the solution of the discount cost MDP in Bπ :=limsup Eπ C[i] ≤B, (7a) x n x the following subsection. n→∞ " # i=0 s.t. K[i]≤mEin[{i]Q[i],M}X, (7b) B. The discount optimal policy b W[i]≤ , (7c) For state-action pair (x=(q,a,e ,e ,p),(k,w)), let u = where x=(q,a,eb,ea,p)τ∈X is the initial system state. aq −staktioannadryηd=eteerbm−iniwstτic. Tpohleincy(.uT(xhbe),naη,(txh)e)dciasncoaulnsoteddecfionset Remark: (6) is the special case of (2) with C = 1. C = 1 optimality equation [23], [24] is given by means that EVs charge the same amounts of energy, E (e.g., an EV production company). Vα(q,a,eb,ea,p)= min u∈{0,1,···,min{q,M}} η∈{0,1,···,eb} IV. ANALYSIS OF THEOPTIMALPOLICY (q−u)E e −η + b β − p+q In this section,we performtheoreticalstudy onthe optimal τ τ policy. First, we prove that the constrained MDP can be (cid:26) (cid:16) (cid:17) + αE V (u+A,A,(η+E )−,E ,P) ,(10) analyzed through an unconstrained MDP. Then, we focus on a,ea,p α a a theanalysisoftheunconstrainedMDP.Weanalyzetheuncon- (cid:27) (cid:2) (cid:3) andthecorrespondingvalueiterationalgorithm(orsuccessive strainedMDPbyusingitscorrespondingdiscountMDP.Next, approximation method) is we consider the dimension reduction of the two-dimensional policy. Finally, we propose two stationary deterministic poli- V (q,a,e ,e ,p)= min α,n b a cies based on the theoretical results. u∈(cid:8)0,1,···,min{q,M}(cid:9) η∈{0,1,···,eb} (q−u)E e −η + b A. Transformation to the unconstrained MDP and discount β − p+q+α× τ τ MDP (cid:26) (cid:16) (cid:17) Define fβ(x,k,w) := β kτE − w +p + q. We have the Ea,ea,p Vα,n−1(u+A,A,(η+Ea)−,Ea,P) (11) following unconstrained MDP (i.e., UP ). (cid:27) (cid:0) (cid:1)β with V (q,a,e(cid:2),e ,p)=0. (cid:3) α,0 b a minJπ(x):=limsup 1Eπ n−1f (X[i],K[i],W[i]) . (8) First, regarding Vα(q,h,a,eb,e), we have the following π β n→∞ n x" β # properties (Property 1 - Property 3). i=0 X Property 1. V (q,h,a,e ,e) is an increasing function of q. Remark: UP is an average cost MDP. Its optimal solution α b β is referred to as the average cost optimal policy. Proof: See Appendix D. The following lemma reveals that the constrained problem Property 2. V (q,a,e ,e ,p) is a non-increasingfunctionof has the same solution as UP with a certain β. α b a β e . b Lemma 2. There exists β >0 forwhich the optimalsolution Proof: See Appendix E. of the unconstrained MDP in (8) (i.e., UP ) is also optimal β In practice, the allocated renewable energywill not surpass for the constrained MDP in (6). the required charging energy. Thus, kE ≥wτ, i.e., Proof: See Appendix B. (q−u)E e −η b Next, we define a discount cost MDP with discount factor − ≥0. (12) τ τ α corresponding to UP for each initial system state x = β Property 3. V (q,a,e ,e ,p) is convex in (q,e ). (q,a,e ,e ,p), with value function α b a b b a Proof: See Appendix F. ∞ V (x)=minEπ αif (X[i],K[i],W[i]) . (9) Next, the following two lemmas reveal two necessary con- α x β π " # ditions for the optimality, respectively. i=0 X 5 Lemma 4. In state x = (q,a,e ,e ,p), (u(x),η(x)) is not Lemma 6. Given state x = (q,a,e ,e ,p), the average cost b a b a the discount optimal solution if u(x) > q−min{q,M} and optimal policy (u∗(x),η∗(x)) should satisfy the following η(x)+e >E . inequality array a max Remark: Lemma 4 reveals the sufficient condition for the Z˜ (u∗,a,η∗,e ,p)≤βEp≤Z˜ (u∗+1,a,η∗,e ,p), (20) non-optimality, and it can also be viewed as the necessary 1 a τ 1 a condition for the optimality. That is to say, any optimal −p solutions should not satisfy the condition. Z˜ (u∗,a,η∗,e ,p)≤β ≤Z˜ (u,a,η∗+1,e ,p), (21) 2 a τ 2 a Lemma 5. Denote the discount optimal policy in state x = (q,a,eb,ea,p) as (u∗(x),η∗(x)). Then, (u∗(x),η∗(x)) Z˜ (u∗,a,η∗,e ,p)≤βp(E −1) satisfies the following inequality array6 3 a τ ≤ Z˜ (u∗+1,a,η∗+1,e ,p), (22) E 3 a Z (u∗,a,η∗,e ,p)≤β p≤Z (u∗+1,a,η∗,e ,p), (13) 1 a τ 1 a where Z˜ (u,a,η,e ,p)= lim Z (u,a,η,e ,p), −p 1 a 1 a Z (u∗,a,η∗,e ,p)≤β ≤Z (u,a,η∗+1,e ,p), (14) α→1 2 a τ 2 a Z˜ (u,a,η,e ,p)= lim Z (u,a,η,e ,p), 2 a 2 a α→1 p Z (u∗,a,η∗,e ,p)≤β (E −1) and 3 a τ Z˜ (u,a,η,e ,p)= lim Z (u,a,η,e ,p). ≤ Z3(u∗+1,a,η∗+1,ea,p), (15) 3 a α→1 3 a where D. Reducing the policy’s dimension Z (u,a,η,e ,p) The number of charging EVs k and the power allocation 1 a = αE G (u+A,A,(η+E )−,E ,P) (16) from the battery w are coupled together, they affect each a,ea,p 1 a a other. However, if we assume that k has been chosen, then with (cid:2) (cid:3) the required total power has been fixed. In this case, we will allocate as much power as possible from the battery to meet G1(q,a,eb,ea,p)= the requiredtotalpower,i.e., the greedypolicyforthe battery V (q,a,e ,e ,p)−V (q−1,a,e ,e ,p), (17) power allocation. This is because the power from the battery α b a α b a is free (please refer to (5)). We can guess that the greedy allocation strategy of battery power is the optimal policy. Z (u,a,η,e ,p) 2 a However, it is difficult to prove. The difficulty lies in the fact = αE V (u+A,A,(η+E )−,E ,P) a,ea,p α a a that the remaining battery energy will affect the future action − Vα(u+A(cid:2),A,(η−1+Ea)−,Ea,P) , (18) andcost(e.g.,(10)).Ontheotherhand,oncewhasbeenfixed, the powerallocation fromthe powergrid can also affect k. In and (cid:3) summary, when k is chosen, the optimal w∗ is the greedy policy. By contrast, if w is fixed, the optimal k is not fixed, Z (u,a,η,e ,p)= 3 a we need to solve the power allocation from the power grid αE V (u+A,A,(η+E )−,E ,P) a,ea,p α a a to find the optimal k∗. Thus, we can reduce the policy from − Vα(u−1(cid:2)+A,A,(η−1+Ea)−,Ea,P) . (19) (k,w) to k. We have the following conjecture. Proof: See Appendix G. (cid:3) Conjecture 1. Let πk = (k[0],k[1],···), and (6) can be Remark: Lemma 5 gives the necessary condition of the converted to discountoptimality,i.e.,theoptimalpolicy(orpolicies)should n−1 1 be the solution(s) of the inequality array. Specially, if the minBπk :=limsup Eπk Q[i] (23) inequality array has a single solution, the corresponding πk x n→∞ n x "i=0 # X single solution is the optimalpolicy since the existence of the n−1 1 K[i]E optimal policy. Bπk :=limsup Eπk x n x τ n→∞ C. The average cost optimal policy s.t. −τ1 min{K[i]E,EhbXi[=i0]}(cid:16)+P[i] ≤B, (24a) First, Lemma 4 still holds for the average cost MDP. Next, K[i]≤min{Q[i],M}, (cid:17) i (24b) bleamsemda.on Lemma 3 and Lemma 5, we have the following where theevolution of energy in the battery becomes E [i+1]=(E [i]−min{K[i]E,E [i]}+E [i])−. (25) b b b a 6Using Property 3, we can derive that Z1(u,a,η,ea,p) ≤ Remark:Thepolicycanbereducedindimension((k,w)→ Z1(u + 1,a,η,ea,p), Z2(u,a,η,ea,p) ≤ Z2(u,a,η + 1,ea,p), andZ3(u,a,η,ea,p)≤Z3(u+1,a,η+1,ea,p). k). If the stated β in Lemma 2 satisfying β ≫ 1, Conjecture 6 1 can be proved based on (10) in addition with Lemma 3 and E. Two stationary deterministic policies Lemma 2. Based on all above theoretical analysis, we propose the Inthefollowing,wediscusstheoptimalpolicyafterdimen- following two specific stationary deterministic policies. For sion reduction. For state-action pair (x=(q,a,eb,ea,p),k), state x = (q,a,eb,ea,p), we define the radical policy as let u = q − k, and u(x) can also define a stationary k =min{q,M},w= min{eb,kE} .Thatistosay,wecharge deterministic policy. We have the following lemmas to reveal τ the properties of the optimal policy. a(cid:16)s many EVs as possible, and use(cid:17)the greedy policy for the battery energy allocation, i.e., if the required energy is not Lemma 7. Denote the discount optimal policy in state x = greater than the battery energy, then all the energy will be (q,a,e ,e ,p) as u∗(x). Then, u∗(x) satisfies supplied from the storage battery and no grid power will be b a used. Otherwise, all the storage battery energy is allocated, E Z(u∗)≤β p≤Z(u∗+1), (26) and the rest will be supplied from the power grid. τ In the radical policy, the average cost constraint is not where considered. Then we propose another policy (i.e., the con- Z(u)=αE V (u+A,A,(η(u)+E )−,E ,P) servative policy) that guarantees the average cost constraint a,ea,p α a a throughsatisfying the cost constraintsin each period.We call − Vα(u−1+h A,A,(η(u−1)+Ea)−,Ea,P) the policy k = min{q,M,eb+Bp¯τ},w = min{eb,kE} the E τ η(u)−η(u−1) i conservative(cid:16)policy. That is to say, we first guarantee th(cid:17)at the + β p (27) τ cost of charging in each period is less than the average cost constraint, then charge as many EVs as possible and utilize with the greedy policy for the battery energy allocation. η(u):=max{0,e −(q−u)E}. (28) In the whole paper, we assume that the power from power b grid and renewable energy generator is sufficient to stabilize Furthermore, the average cost optimal policy u∗ satisfies the queue length. The stability issue such as the bounds on E average generation rate of renewable energy or average EV Z˜(u∗)≤β p≤Z˜(u∗+1) (29) τ arrival rate will be studied in future work. with V. NUMERICAL RESULTS Z˜(u)= lim Z(u). (30) In this section, we perform simulations to demonstrate the α→1 relationsamong the mean EV arrival, mean renewable energy Proof: See Appendix H. arrival, upper bound of the average cost, average cost, and Lemma 8. For x=(q,a,eb,ea,p) satisfying average EV queue length. Meanwhile, we consider different charge point numbers and capacities of the renewable energy E Z q−min{q,M} >β p, (31) storage battery.In the simulations, the period length is τ =1, τ and the size of the “energy block” is E =10. u=q−min{q,M(cid:0) }isthediscoun(cid:1)toptimalpolicy.Inaddition, Fig. 3 shows the average cost performance with respect to for (q,a,eb,ea,p) satisfying the mean EV arrival, A¯. In the simulations, we utilize the E radical policy. We consider the i.i.d. cases of A, Ea, and P. Z(q)<βτp, (32) A takes 0 and 2A¯ with equalprobability 0.5. Ea takes values {0,50,100} with probabilities {0.1,0.4,0.5}. P takes values u=q is the discount optimal policy. {5,10,20}withprobabilities{0.2,0.3,0.5}.Theperformance Proof: See Appendix I. is averaged over 105 periods. We set the number of charge Remark: u = q−min{q,M}, i.e., k = min{q,M} means points M = 50 and M = 8 in Fig. 3(a) and Fig. 3(b), chargingasmanyEVsaspossible.IfthenumberofEVsinthe respectively. Furthermore, we plot the curves for different queue is less than the charge pointnumberM, charge all the storage battery capacities: E = 100, E = 300 and max max EVs. Otherwise, charge M EVs from the head of the queue. infinite capacity, respectively. u=q, i.e., k =0 denotes charging no EV. In Fig. 3(a), we can see that when A¯ is small, the cost Based on Lemma 8 and Lemma 3, we have is nearly zero. However, when A¯ is large (e.g., A¯ ≥ 10), the cost increases rapidly with increase of A¯ according to Lemma 9. For x=(q,a,e ,e ,p) satisfying b a roughly a linear function. It is because when A¯ is small, Z˜ q−min{q,M} >βEp, (33) the required energy is small and the battery can supply the τ energy. Thus, no grid power will be consumed and the cost u = q − min{q(cid:0),M} is the ave(cid:1)rage cost optimal policy. In is zero. Once A¯ is larger than a certain value, the required addition, for (q,a,e ,e ,p) satisfying energy is larger than the battery energy, then the grid power b a will be utilized. As M is large (compared to the considered E Z˜(q)<β p, (34) A¯),i.e.,therestrictiononthenumberofchargepointswillnot τ influencetheperformance,we have k =min{q,M}=q with u=q is the average cost optimal policy. a high probability. The grid power consumption will increase 7 withincreaseofA¯.Moreover,whenA¯islarge,thegridpower 2500 becomesthemainenergysource.Basedon(5),wederivethat the cost varies with A¯ roughly according to a linear relation. Emax=(cid:102) 2000 E =300 From Fig. 3(b), we can find that the average cost is zero max when A¯ is small, and with increase of A¯, the average cost Emax=100 increases.ButonceA¯islargerthanacertainvalue,theaverage st costremainsconstant.Itcanbe explainedasfollows:when A¯ co1500 e is small, the required energy can be supplied by the battery g a with a very high probability and no grid power is needed. er v1000 Thenthe averagecostis zero.When A¯increases,the required A energy increases. Once the battery energy is not enough, the grid power will be consumed to fulfill the gap between the 500 required energy and battery energy. With increase of A¯, the grid power consumption increases since the average battery energyisconstant.Thus,theaveragecostincreases.However, 0 0 5 10 15 20 when A¯ is large enough, we get k = min{q,M} = M with Mean EV arrival a high probability because M is not large in this simulations. (a) M =50 Then, the required energy k×E =M ×E, i.e., it becomes a 250 constant.Thatmeansthegridpowerconsumptionisaconstant also. Thus, the cost remains static. E =(cid:102) Fig. 4 depicts the average cost performance with respect max 200 E =300 to the mean renewable energy arrival, E¯ . The radical policy max a E =100 is applied in the simulations. A takes values 0 and 10 with max equal probability 0.5. Ea take values {0,57E¯a,170E¯a} with ost150 probabilities {0.1,0.4,0.5}, respectively. P is the same as in c e Fig. 3 and M = 50. E = 100, E = 300 and infinite g max max a capacity are also respectively considered in the simulations. er v100 From the figure, we can find that the cost decreases with A increase of E¯ . But once E¯ is large enough, the cost almost a a remains static. First, in the range of small E¯a, when E¯a 50 increases,morefreerenewableenergywillarriveandbestored in the battery. And then, the cost will decrease. If the battery capacity is largeenough,all the arrivedrenewableenergycan 0 be stored in the battery. With the increase of E¯ , the battery 0 5 10 15 20 a Mean EV arrival energy will increase all the time. Once the battery energy is (b) M =8 larger than the required energy for charging, no grid power is needed then, and the cost becomes zero since that time. Fig.3. Theaveragecostperformancev.s.A¯underdifferentvaluesofEmax. If the battery capacity is not large (e.g., E = 100 in the max figure), the overflow occurs when E¯ is large. That is to say, a thebatteryenergywill remain E eventhoughwe increase max E¯ . On the other hand, E is smaller than the required a max charge energy, so grid power is still needed. Consequently, the cost is non-zero and remains static. conservative policy is applied. In the simulations, A chooses From Fig. 3 and Fig. 4, we can observe that the larger values {0,12} with equal probability 0.5. E and P have the battery capacity, the lower the cost. That is because when a the same settings as in Fig. 3. In the plotting, we consider E is larger,the probabilityof overflowwillbe lower(itis max different values of the battery capacity and charge point zeroforinfinitecapacity).Then,less freerenewableenergyis number. We can observe that the average length performance wastedandthecostwillbelower.Furthermore,wecanderive that if A¯ is less a certain value or E¯ is larger than a certain improves with increase of B¯, and when B¯ is larger than a a certain value, the average length performance become almost value, the average cost can be less than a certain value, Then we claim that when A¯ is less a certain value or E¯ is larger constant. The reason is as follows: when B¯ is small, k = than a certain value, the radical policy is also optaimal even min{q,M,eb+Bp¯τ}=min{q,eb+Bp¯τ}withahighprobability E E when considering the constraint.7 and it increases with increase of B¯. Thus, the average EV Fig. 5 illustrates the average EV queue length performance queue length performanceincreases. Once B¯ is large enough, withrespectto theupperboundsoftheaveragecostwhenthe we get k =min{q,M},andthe averagelengthremainsstatic withrespecttoB¯.Additionally,bycomparingthefourcurves, we can derive that the larger the capacity or the charge point 7Notice that the radical policy is optimal for the mean EV queue delay minimization withouttheaverage costconstraint. number, the better the length performance. 8 600 Meanwhile, if an EV comes earlier than another EV, it will leave earlier in the EV queue serving. Using the queue 500 Emax=(cid:102) mappingmechanism,theearlierarrivedEVwillleavenolater E =300 also in the energy demand queue serving.8 That is to say, max E =100 the queue mapping is an isotonic mapping. Then, we claim 400 max st that a policyminimizingthe mean EV queue lengthresults in co minimal mean demand queue length, and vice versa. e g300 a er APPENDIXB Av PROOF OFLEMMA2 200 The proofis basedon the resultsof [25]. We provethatfor some β, the optimal policy π∗ of the unconstrained MDP (8) 100 (i.e., UP ) satisfies 1) π∗ yields Bπ∗ and Dπ∗ as limits for β all x∈X; 2) Bπ∗ =B¯. Observe that limsup and liminf are 0 equal for each β > 0 (since the controlled chain is ergodic 0 50 100 150 200 and the policy is stationary [24]). Mean renewable energy arrival Fig.4. Theaveragecostunderdifferent E¯a andEmax. APPENDIXC PROOF OFLEMMA3 45 First,wederivethattheconditionsofProposition2.1in[26] are satisfied. Then a discountoptimalstationary policy exists. 40 M=50, E =300 Next,weprovethatforsomex0,Vα(x)−Vα(x0)<∞.Third, max ength35 MM==580, ,E Emmaxa=x=301000 tthhaetreJβeπx<its∞a pinolitchye πpra∈ctiAcalanpdrobanlemin.itOiatlhestrawteisex, t∈heXcossutcihs e l30 M=8, E =100 infiniteforallpoliciesandanypolicyisoptimal.Accordingly, eu max we can prove the lemma by applying Theorem 3.8 in [26]. u q25 V E APPENDIXD ge 20 PROOF OFPROPERTY 1 a M=8 er We verify the increasing property by induction. According Av15 to(11),V =0andV = β (q−min{q,M})E−eb +p+q.The M=50 α,0 α,1 τ 10 increasing property in q holds.(cid:0)Assume V (q(cid:1),a,e ,e ,p) α,n−1 b a is increasing in q. Depending on the values of M, we have 5 0 200 400 600 800 1000 the following two cases. Upper bound of the average cost Case 1: M ≥ q + 1. Fix (a,e ,e ,p), in the state (q + b a 1,a,e ,e ,p),thesetoffeasibleuis{0,1,··· ,q+1}whereas b a Fig.5. TheaverageEVlengthperformance v.s.B¯ . itis{0,1,··· ,q}forstate (q,a,e ,e ,p).Considerstate (q+ b a 1,a,e ,e ,p), let the optimal action be (u∗,η∗) with u∗ ∈ b a {0,1,··· ,q}, hence VI. CONCLUSION We consider the scheduling of the EVs’ charging at a Vα,n(q+1,a,eb,ea,p)= chargingstationwhoseenergyisprovidedfromboththepower (q+1−u∗)E e −η∗ + b β − p+(q+1)+α× grid and local renewable energy.Under the uncertainty of the τ τ EVarrival,therenewableenergy,thegridpowerprice,andthe E(cid:16) V (u∗+A,A,(η(cid:17)∗+E )−,E ,P) .(35) a,ea,p α,n−1 a a chargingenergyofeachEV,westudythemeandelayoptimal scheduling with the average cost constraint. We analyze the As (u∗,η∗) is f(cid:2)easible in state (q,a,eb,ea,p), (cid:3) optimal policy of the formulated MDP problem. In addition, (q−u∗)E e −η∗ + b V (q,a,e ,e ,p)≤β − p+q twospecificstationarypolicies(radicalpolicyandconservative α,n b a τ τ policy) are applied in the simulations to revealthe impacts of + αE V (cid:16)(u∗+A,A,(η∗+E )−(cid:17),E ,P) a,ea,p α,n−1 a a relevant parameters on the performance. ≤ V (q+1,a,e ,e ,p). (36) α,n (cid:2) b a (cid:3) APPENDIXA If (u∗,η∗) with u∗ =q+1, PROOF OFLEMMA1 V (q+1,a,e ,e ,p)=q+1 α,n b a First, the energy demand queue length and the EV queue + αE V (q+1+A,A,(η∗+E )−,E ,P) . lengthhavethefollowingrelation.Q [n]= Q[n]L withL a,ea,p α,n−1 a a e i=1 i i (37) being irrelevant to the queue state. Thus the average energy (cid:2) (cid:3) demand queue length is n1 nj=1Qe[j] = n1P nj=1 Qi=[j1]Lji. 8Leaveatthesametimeispossible. P P P 9 Meanwhile, since (q,η∗) is feasible in state (q,a,e ,e ,p), optimal policy for (q ,e ) and (q ,e ). Then, we get b a 1 b1 2 b2 φV (q ,a,e ,e ,p)+(1−φ)V (q ,a,e ,e ,p) V (q,a,e ,e ,p)≤q α,n 1 b1 a α,n 2 b2 a α,n b a (q −u )E e −η + αEa,ea,p Vα,n−1(q+A,A,(η∗+Ea)−,Ea,P) = φ β 1 τ 1 − b1τ 1 p+q1 (a) h (cid:16) (q −u )E e (cid:17)−η i ≤ V (q+(cid:2)1,a,e ,e ,p), ((cid:3)38) + (1−φ) β 2 2 − b2 2 p+q α,n b a τ τ 2 where (a) holds since the induction hypothesis. + αEa,ea,ph φ(cid:16)Vα,n−1(u1+A,A,(η1+(cid:17)Ea)−,iEa,P) Case 2: M ≤ q. The set of feasible u is {0,1,··· ,M} in + (1−φ)Vh (u +A,A,(η +E )−,E ,P) α,n−1 2 2 a a boththestate(q+1,a,e ,e ,p)andstate(q,a,e ,e ,p).Then b a b a wecanprovetheincreasingpropertyofV (q,a,e ,e ,p)by (b) i α,n b a ≥ β φ(q −u )+(1−φ)(q −u ) E − φ(e −η ) using (35) and (36). 1 1 2 2 b1 1 h(cid:16) p (cid:17) (cid:16) + (1−φ)(e −η ) +[φq b2 2 τ 1 APPENDIXE + (1−φ)q2]+αEa,(cid:17)eai,p Vα,n−1(φu1+(1−φ)u2 PROOF OFPROPERTY 2 h + A,A,φ(η +E )−+(1−φ)(η +E )−,E ,P) 1 a 2 a a Based on (11), the property can be proved through (c) i induction. First, we have Vα,0 = 0 and Vα,1 = ≥ β φ(q1−u1)+(1−φ)(q2−u2) E − φ(eb1−η1) β (q−min{q,M})E−eb +p+q.Thusthenon-increasingproperty + (1h(cid:16)−φ)(e −η ) p (cid:17) (cid:16) τ b2 2 τ in(cid:0)e holds for n =(cid:1)0,1. Next, assume V (q,a,e ,e ,p) is abnon-increasing function of eb. Fix (qα,,na−,e1a,p), fbor satate + [φq1+(1−φ)q2](cid:17)+iαEa,ea,p Vα,n−1(φu1 (q,a,eb,ea,p), let (u∗,η∗) be the optimal policy. We get + (1−φ)u +A,A,(φη +(1h−φ)η +E )−,E ,P) 2 1 2 a a (q−u∗)E eb−η∗ + (d) i Vα,n(q,a,eb,ea,p)=β τ − τ p+ ≥ Vα,n(φq1+(1−φ)q2,a,φeb1+(1−φ)eb2,ea,p), q+αEa,ea,p Vα,n−(cid:16)1(u∗+A,A,(η∗+Ea(cid:17))−,Ea,P) . where (b) holds because of the convexity of (cid:2) (3(cid:3)9) Vα,n−1(q,h,a,eb,e), (c) holds because of Proposition 1 as well as Property 2, and (d) holds since Since(u∗,η∗)isfeasibleinstate(q,a,eb+1,ea,p),wederive (φu1 + (1 − φ)u2,φη1 + (1 − φ)η2) is feasible for φ(q ,a,e ,e ,p)+(1−φ)(q ,a,e ,e ,p). 1 b1 a 2 b2 a (q−u∗)E e +1−η∗ + b Vα,n(q,a,eb+1,ea,p)≤β − p+ APPENDIXG τ τ q+αE V (u(cid:16)∗+A,A,(η∗+E )−,E ,(cid:17)P) . PROOF OFLEMMA5 a,ea,p α,n−1 a a Let (40) (cid:2) (cid:3) (q−u)E e −η b S(u,η)=β − p+q Combing (39) and (40), we get τ τ + αE (cid:16) V (u+A,A,(η(cid:17)+E )−,E ,P) . (41) a,ea,p α a a Vα,n(q,a,eb,ea,p)≤Vα,n(q,a,eb+1,ea,p). First, we have (cid:2) (cid:3) E Then we complete the proof of the property. S(u+1,η)−S(u,η)=−β p τ + αE V (u+1+A,A,(η+E )−,E ,P) a,ea,p α a a − V (u+A,A,(η+E )−,E ,P) (42) APPENDIXF α (cid:2) a a PROOF OFPROPERTY 3 and (cid:3) E S(u−1,η)−S(u,η)=β p First, we prove the following proposition. τ + αE V (u−1+A,A,(η+E )−,E ,P) Proposition 1. For φ ∈ (0,1) and ∀x1,x2,y, we have a,ea,p α a a φmin{x1,y}+(1−φ)min{x2,y}≤min{φx1+(1−φ)x2,y}. − Vα(u+A(cid:2),A,(η+Ea)−,Ea,P) . (43) Proof: The proposition can be verified by considering Then applying S(u∗ +1,η∗)−S(u∗,η∗) (cid:3)≥ 0 and S(u∗ − min{x ,x }>y, max{x ,x }<y, and min{x ,x }≤y ≤ 1,η∗)−S(u∗,η∗)≥0, we obtain (13). Similarly, as 1 2 1 2 1 2 max{x ,x }, respectively. p 1 2 S(u,η+1)−S(u,η)=β The convexity is proved by induction. For n = 0, V = τ 0 and is convex. Assume Vα,n−1(q,h,a,eb,e) is convαe,x0 in + αEa,ea,p Vα(u+A,A,(η+1+Ea)−,Ea,P) (q,e ). Fix (q,a,e ,e ,p), let (u ,η ) and (u ,η ) be the − V (u+A,A,(η+E )−,E ,P) (44) b b a 1 1 2 2 α (cid:2) a a (cid:3) 10 and [2] J.Keiser,M.Lutzenberger,andS.Albayrak,“Windpower-awarevehicle- to-grid algorithms for sustainable EV energy management systems,” −p S(u,η−1)−S(u,η)=β Proc.the1stIEEEInternational ElectricVehicleConference(IEVC’12), τ Greenville, SouthCarolina, Mar.2012. + αE V (u+A,A,(η−1+E )−,E ,P) [3] L.Xie,P.M.S.Carvalho,L.A.F.M.Ferreira,J.Liu,B.Krogh,N.Popli, a,ea,p α a a andM. D.Ilic, “Wind energy integration inpower systems:operational − Vα(u+A(cid:2),A,(η+Ea)−,Ea,P) , (45) challenges and possible solutions,” Proc.IEEE,vol. 99, no. 1,pp. 214- 232,Jan.2011. we can reach (14) from S(u∗,η∗ +1)−S(cid:3)(u∗,η∗) ≥ 0 and [4] E. Hossain, Z. Han, and V. Poor, Smart Grid Communications and S(u∗,η∗−1)−S(u∗,η∗)≥0. In addition, Networking, edited, UK:CambridgeUniversity Press,2012. [5] G.B. Shrestha, S.G. Ang, and S.G. Ang, “A study of electric vehicle p battery charging demand inthe context ofSingapore,” Proc. Int. Power S(u+1,η+1)−S(u,η)=β (1−E) Eng.Conf.,Meritus Mandarin, Singapore, Dec.2007,pp.64-69. τ [6] O. Sundstrom and C. Binding, “Optimization methods to plan the + αEa,ea,p Vα(u+1+A,A,(η+1+Ea)−,Ea,P) charging of electric vehicle fleets,” Proc. Int. Conf. Control, Commun., − V (u+A,A,(η+E )−,E ,P) (46) PowerEng.(CCPE),Chennai,TamilNadu,India,Jul.2010,pp.323-328. α (cid:2) a a [7] S.Bashash,S.J.Moura,andH.K.Fathy,“Chargetrajectoryoptimization ofplug-in hybrid electric vehicles forenergy cost reduction andbattery and (cid:3) health enhancement,” Proc. Amer. Control Conf., Baltimore, Maryland, p Jun.2010,pp.5824-5831. S(u−1,η−1)−S(u,η)=β (E −1) [8] H.-J. Kim, J. Lee, G.-L. Park, M.-J. Kang, and M. Kang, “Anefficient τ schedulingschemeonchargingstationforsmarttransportation,”Interna- + αEa,ea,p Vα(u−1+A,A,(η−1+Ea)−,Ea,P) tionalConference onSuComs(Security-Enriched UrbanComputingand SmartGrid),Daejon, Korea,Sep.2010. − V (u+A,A,(η+E )−,E ,P) . (47) α (cid:2) a a [9] S.Sojoudi and S.H. Low,“Optimal charging ofplug-in hybridelectric vehicles in smart grids,” Proc. IEEE PES General Meeting, Detroit, Then, (15) can be obtained by applying S(cid:3)(u∗−1,η∗−1)− Michigan, Jul.2011. S(u∗,η∗)≥0 and S(u∗+1,η∗+1)−S(u∗,η∗)≥0. [10] W. Su and M. Y. Chow, “Performance evaluation of a PHEV parking station using particle swarm optimization,” Proc. IEEE PES General Meeting, Detroit, Michigan, Jul.2011. APPENDIXH [11] H.QinandW.Zhang,“Chargingschedulingwithminimalwaitingina networkofelectricvehiclesandchargingstations,”ACMVANET’11,Las PROOF OFLEMMA7 Vegas,Nevada, USA,Sep.2011. [12] R. Jin, B. Wang, P. Zhang, and P. B. Luh, “Decentralised online First, based on Conjecture 1, we only need to consider charging scheduling for large populations of electric vehicles: a cyber- the policy set {(u,η) : (u,η = η(u)) ∩ (u,η) ≥ (0,0)}. physical system approach,” International Journal of Parallel, Emergent Consequently, andDistributedSystems,accepted forpublication, 2012. [13] L. Gan, U. Topcu, and S. H. Low, “Optimal decentralized protocol (q−u)E e −η(u) for electric vehicle charging,” IEEE Trans. Power Syst., accepted for b S(u,η(u))=β − p+q publication, 2012. τ τ [14] Z. Ma, D. S. Callaway, and I. A. Hiskens, “Decentralized charging + αEa,ea,p(cid:16)Vα(u+A,A,(η(u)+E(cid:17)a)−,Ea,P) . control of large populations of plug-in electric vehicles,” IEEE Trans. ControlSyst.Technol.,accept forpublication, 2012. (48) (cid:2) (cid:3) [15] W. Saad, Z. Han, H. V. Poor, and T. Basar, “Game-theoretic methods forthesmartgrid,”IEEESignalProcess.Mag.,vol.29,no.5,pp.86-105, Then applying S(u∗+1,η(u∗+1))−S(u∗,η(u∗)) ≥0 and Sep.2012. S(u∗ −1,η(u∗ −1))−S(u∗,η(u∗)) ≥ 0, we get Z(u∗) ≤ [16] O. Sundstrom and C. Binding, “Flexible charging optimization for βEp≤Z(u∗+1).Next,usingLemma3,wereachthesecond electric vehicles considering distribution grid constraints,” IEEE Trans. τ SmartGrid,vol.3,no.1,pp.26-37,Mar.2012. half of the lemma. [17] J.Huang,V.Gupta,andY.-F.Huang,“SchedulingalgorithmsforPHEV charging in shared parking lots,” Proc. Amer. Control Conf., Fairmont QueenElizabeth, Montral, Canada, Jun.2012. APPENDIXI [18] A. Subramanian, M. Garcia, A. Dominguez-Garcia, D. Callaway, K. PROOF OFLEMMA8 Poolla,andP.Varaiya,“Real-timeschedulingofdeferrableelectricloads,” Proc.Amer.ControlConf.,FairmontQueenElizabeth, Montral,Canada, Following the proof of Lemma 7, we can prove the first Jun.2012. half of the lemma by contradiction. Specifically, suppose [19] S.ChenandL.Tong,“iEMSforlargescalechargingofelectricvehicles: architecture and optimal online scheduling,” Proc. IEEE SmartGrid- u = q − min{q,M} is not the optimal solution, then Comm’12,TainanCity,Taiwan,Nov.2012. S(u∗−1,η(u∗−1))−S(u∗,η(u∗))≥0shouldhold.Wehave [20] C.-T.Li,C.Ahn,H.Peng,andJ.S.Sun,“Synergisticcontrolofplug-in Z q −min{q,M} ≤ Z(u∗) ≤ βEp and the contradiction vehicle chargingandwindpowerscheduling,” IEEETrans.PowerSyst., τ accepted forpublication, 2012. occurs. We can verify the second half of the lemma similarly [21] E. Altman, Constrained Markov decision processes, London/Boca Ra- (cid:0) (cid:1) by using contradiction. Assume u = q is not the optimal ton:Chapman&Hall/CRC Press,1999. solution,thenS(u∗+1,η(u∗+1))−S(u∗,η(u∗))≥0should [22] J.D.CordeiroandJ.P.Kharoufeh,“Batchmarkovianarrivalprocesses (BMAP),”WileyEncyclopedia ofOperationsResearchandManagement be satisfied. Consequently,we get Z(q)≥Z(u∗+1)≥βEp. τ Science, edited byJamesJ.Cochran, JohnWiley &Sons,2010. The contradiction occurs then. [23] O. H. Lerma and J. B. Lassere, Discrete-Time Markov Control Pro- cesses:BasicOptimality Criteria, NewYork:SpringerVerlag,1996. [24] E. A. Feinberg and A. Shwartz, Handbook of Markov Decision Pro- REFERENCES cesses: Methods and Applications, edited, Boston: Kluwer Academic Publishers,2002. [1] Y. Li, R. Kaewpuang, P. Wang, D. Niyato, and Z. Han, “An energy [25] D. J. Ma, A. M. Makowski, and A. Shwartz, “Estimation and optimal efficientsolution:integratingplug-inhybridelectricvehicleinsmartgrid control forconstrained Markovchains,” Proc.IEEEConf.Decision and with renewable energy,” Proc. the 1st IEEE INFOCOM Workshop on Control,Athens,Greece, Dec.1986. Communications andControl forSustainable EnergySystems(CCSES): [26] M. Schal, “Average optimality in dynamic programming with general GreenNetworking andSmartGrids,Orlando,Florida, Mar.2012. statespace,”Math.Oper.Res.,vol.18,no.1,pp.163-172,Feb.1993.