ebook img

Ensemble Control of Cycling Energy Loads: Markov Decision Approach PDF

0.65 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Ensemble Control of Cycling Energy Loads: Markov Decision Approach

Chapter 1 Ensemble Control of Cycling Energy Loads: Markov Decision Approach MichaelChertkovandVladimirY.Chernyak 7 1 0 2 n a Abstract Markov Decision Process (MDP) framework is adopted to represent en- J semblecontrolofdevices,whoseenergyconsumptionpatternisofacyclingtype, 8 e.g.thermostaticallycontrolledloads.Specificallyweutilizeanddeveloptheclass 1 ofMDPmodels,coinedpreviouslylinearly-solvableMDP,thatdescribeoptimaldy- namicsoftheprobabilitydistributionofanensembleofmanycyclingdevices.Two ] Y principally different settings are discussed. First, we consider optimal strategy of S theensembleaggregatoraimedatbalancingbetweenminimizingthecommonob- . jectivethatconsistsofthecostoftheensembleconsumption,undervaryingpriceof s c electricity,andthewelfaredeviationfromnormal,evaluatedastheKL-divergence [ between the optimal and nominal transition probabilities between different states of the ensemble’s participants. Second, we shift to the Demand Response setting, 1 v where the aggregator aims to minimize the welfare deviation under the condition 1 that the aggregated consumption of the ensemble matches the perfectly varying in 4 timetargetconsumption,requestedbythesystemoperator.Wediscussamodifica- 9 tionofbothsettings,aimedatencouragingorconstrainingthetransitionsbetween 4 differentstates.ThedynamicprogrammingfeatureoftheresultingmodifiedMDPs 0 . isalwayspreserved,howeverlin(ear)-solvabilityislostfullyorpartially,depending 1 on the type of modification. We also conducted some (limited in scope) numeri- 0 cal experimentation using the formulations of the first setting. We conclude with 7 1 discussingfuturegeneralizationsandapplications. : v i X r a MichaelChertkov Theoretical Division, T-4 & CNLS, Los Alamos National Laboratory Los Alamos, NM 87545, USAandEnergySystemCenter,Skoltech,Moscow,143026,Russia,e-mail:chertkov@lanl. gov VladimirY.Chernyak DepartmentofChemistry,WayneStateUniversity,5101CassAve,Detroit,MI48202,USA,e- mail:[email protected] 1 2 MichaelChertkovandVladimirChernyak 1.1 GeneralMotivation/Introduction Power systems, as well as other energy systems, undergo a transition from tradi- tionaldevice-orientedanddeterministicapproach,toavarietyofnovelapproaches accountingfor • stochasticity and uncertainty in how the devices, especially new devices like wind-farms,areoperated; • networkaspects,e.g.,intermsofoptimization,controlanddesign/planning; • utilizing massive amount of newly available measurements/data for the afore- mentionedoptimization,control,anddesignsettings. DemandResponse(DR)becameoneimportantcomponentofthesmartgridde- velopment,seee.g.[6,33]andreferencestherein.NovelDRarchitecturebrakesthe traditionalparadigm,whereonlygeneratorsareflexible,suggestingthatparticipa- tionofloadsincontrolcanbebeneficialforthegridatlarge,andalsosufficiently comfortableforconsumers/loads. DRassumesthattheloadsfollowoperationalcommandsoftheSystemOperator (SO).UsingDRtocontrolthefrequencyintherangefromtensofsecondstomin- utesisonesuchpotentiallyveryattractivenichefortheuseoftheDR[32,13,3,38]. Frequencycontrolisatooltobalanceproductionandconsumption.Traditionallythe frequency control is achieved by adjusting the generators, while DR (when devel- oped)helpstoachievethebalancebyalsoadjustingtheloads.Thetaskoffrequency controlissetbytheSOforloadsasatemporalconsumptionrequest,whichcanbe formulated ahead of time (say, for the next 10 min) or in real time, expecting fre- quentupdates(say,everytenseconds). Thistypeoffrequencycontrol,namelyDRservices,arealreadyprovidedbybig consumers like aluminium smelters [35] and desalination plants [2]. However, the potential effect would be an order of magnitude larger if small loads are available to provide DR services. However, direct involvement of small-scale consumers is expensiveduetoassociatedcommunicationandcontrolcosts.Apotentiallyviable way out of this problem is to control many small-scale consumers indirectly – via an intermediate entity, also called the aggregator of an ensemble, including many (saythousandsortensofthousands)small-scaleconsumers[11,22,12,27,28,25, 26,7,6,4].Therefore,thisnovelDRarchitectureassumes,action-wise,separation intothefollowingthreelevels: (a)SO sends control request/signal to an aggregator. The SO request is stated as a temporal profile (possibly binned into intervals, of say 15s) expected from the ensembleforanupcomingintervaldurationof,say,10min. (b)Aggregator processes the SO signal by solving an optimization/control prob- lem and communicating the same A-signal, which is the output/solution of the optimization/control problem, to all members of the ensemble (end-point con- sumers). The A-signal is broadcasted to the end-point devices/consumers, thus makingthecommunicationpartcheap. 1 EnsembleControlofLoads 3 (c)End-point devices receive and implement the A-signal. Implementation is as- sumed to be straightforward, or at least requiring only light computations at a deviceforturningtheA-signalintocontrollableactions. There is a number of challenges, associated with this novel aggregator-based DRarchitecture.Thechallengesareofbothconceptual/formulationandofthesolu- tion/implementationtype.Letusmentionacoupleofthesechallenges,mostrelevant tolevel(b)ofthecontrolarchitecturewhichisinthefocusofthismanuscript. In posing the A-optimization one is looking for a simple/minimal but also a sufficiently accurate way of modeling individual devices. In particular, one needs to decide on how to model the devices in terms of their achievable states and spatial-temporal resolution. To address the conceptual challenge we will state the aggregator-level problems utilizing the language of the Markov Decision Process (MDP),thereforecontrollingthetransitionprobabilitiesbetweenstates.Moreaccu- rately,wewillassumethateachdevicecanfinditselfinall(orsubsetofall)ofthe finitenumberofstates.Thenwedescribetheprobabilisticstateoftheensemblein termsofthevectorofnon-negativenumberssummedtounity,eachrepresentingthe fraction of the consumers, observed in a given state at any given moment of time. The optimization/control degree of freedom for the A-optimization is represented bythesetofstochastictransitionprobabilitymatricesbetweenthestates,definedat eachtimeslotoverthediscretizedandfinitetimehorizon. In addition to the frequency control formulation we will also discuss in this manuscriptanothersettingwhichisofinterestbyitself,butwhichisalsolesschal- lengingintermsoffindingefficientcomputationalsolution.Inthisalgorithmically lesscomplexcasewewillbelookingforanoptimalbalancebetweenthecostofthe ensemble operations, when the price of electricity changes in time, and the devia- tionoftheprobabilisticstateoftheensemblefromwhatwouldbeanatural/normal behavioroftheensembleinthecasewithoutthepricebias. Thediscrete-time,discrete-spacemodelinghasitscontinuous-time,mixed-space counterpart known under the name of Thermostatically-Controlled-Loads (TCL), orevenmoregenerallycyclingloads,characteristicofperiodic(orquasi-periodic) evolutioninthephasespace.See[6]andourownrecentpaper[9]fordetaileddis- cussions of the TCL modeling and many relevant references. In this manuscript we choose to work with the discrete-time, discrete-space MP models for the sake of universality and flexibility of the latter. Indeed, an MP model can be viewed as a macro-model following from its TCL, micro-model counterpart in the result of a model reduction/coarse-graining in space-time, see e.g. [31]. However, the MPs represent much broader class of models, compared to those obtained via coarse- grainingoftheTCLcounterparts,orevenwhencomparedtoacombination(aver- age)ofdifferentTCLmodels.ThismostgeneralclassofMDPmodelsisespecially usefulinthecontextofMachineLearning(ML)whereanMPisreconstructedfrom actualmeasurements/samplngoftheunderlyingensemble. FromMP,whichrepresentsstochasticdynamics,wetransitiontoMDPthatrep- resents stochastic optimization with the MP/ensemble state constrained by the re- quirement of being within the class of state dynamics, described by the chosen class of MPs. Many choices of MDP would be reasonable from the prospectives 4 MichaelChertkovandVladimirChernyak ofpracticalengineering.Inthismanuscriptweadoptanddevelopanapproach,pi- oneered in [19], and further developed in [23, 24, 36, 16, 17, 30]. This relatively unexploredformulationofthestochasticoptimalcontrolisknownunderthenames ofthe“pathintegralcontrol”inthecaseofcontinuous-time,continuous-spacefor- mulation [23], and also as the “Linearly Solvable MDP” (LS-MDP) in the case of discrete-time, discrete-space setting [36, 16, 17]. Linear solvability of the sim- plest version of the scheme is advantageous, as it allows to carry out analytical expressions for the optimal solution one step further than done otherwise through theDynamicProgramming(DP)approach.(WeremindthatDPinvolvesrecursive solution the Bellman-Hamilton-Jacobi (BHJ) equations.) One extra advantage of the approach, which holds even in the case when reduction of the BHJ-equation to a linear equation is not possible, is related to the fact that the penalty term as- sociatedwithdeviationoftheoptimized/actualtransitionprobabilityfromitsideal shape (not perturbed by the SO signal) has a very natural form of the generalized Kublack-Leibler(KL)comparisonofthetwoprobabilitymeasures. switch on switch off 0.2 0.5 0.1 0.03 0.02 0.03 0.1 0.02 0.02 0.2 0.5 0.1 0.03 0.02 0.03 0.1 4 3 2 1 0.1 0.02 0.2 0.5 0.1 0.03 0.02 0.03 0.03 0.1 0.02 0.2 0.5 0.1 0.03 0.02 0.02 0.03 0.1 0.02 0.2 0.5 0.1 0.03 0.03 0.02 0.03 0.1 0.02 0.2 0.5 0.1 5 6 7 8 0.1 0.03 0.02 0.03 0.1 0.02 0.2 0.5 0.5 0.1 0.03 0.02 0.03 0.1 0.02 0.2 Fig.1.1 AnexemplaryMDPresultingfromspace-binning/discretizationandtime-discretization ofatwo-levelmixed-stateTCLmodel,e.g.ofthetypediscussedin[9].ThismodelisusedinSec- tion1.4forillustrativeexperiments.Thedirectedgraphofthe“natural”transitionsandrespective stochasticp¯matrixareshownintheleftandrightSubfiguresrespectively. WhendiscussingMPsandMDPswewillutilizethefollowing(ratherstandard) notationsandterminology: • Dynamicstateoftheensembleisdescribedintermsofavector,ρ(t)=(ρ (t)|∀α), α whereρ (t)≥0istheprobabilitytofindadevice(fromtheensemble)inthestate α αattimet.Thevectorisnormalized(noprobabilityloss)atalltimesconsidered, ∑αρα(t)=1, ∀t, ∀α. 1 EnsembleControlofLoads 5 • p(t)=(p (t)|∀t,∀α,β)isthetransitionprobabilitymatrix,describinganMP αβ and set to be an optimization variable within the MDP framework. The matrix element, p (t), represents the transition probability, i.e. the probability for a αβ devicewhichwasatthestateα atthemomentoftimet transitiontothestateβ att+1.Thematrixisstochastic,i.e. ∑p (t)=1, ∀t=0,···,T−1, ∀β, (1.1) αβ α • The evolution of the MP/ensemble state is described by the Master Equation (ME) ρ (t+1)=∑p (t)ρ (t), ∀t ∀α. (1.2) α αβ β β TheMEshouldbesupplementedbytheinitialconditionforthestateoftheen- semble: ρ (0)=ρ , ∀α (1.3) α in;α Giventheinitialcondition(1.3)oneneedstorunME(1.2)forwardintimetofind thestatesofMPensembleatallfuturetimes.Thedirectionoftimeisprincipal, asitreflectsphysicalcausalityofthesetting. • MDPformulationsdiscussedinthemanuscriptarestatedasoptimizationsover thematrix p.Alloftheformulationsdependonthetarget p¯,correspondentto p whichwouldbeoptimaliftheensembleis“leftalone”forsufficientlylongtime. In the case of the aforementioned frequency control the formulation discussed in Section 1.5, p¯ corresponds to the steady state of the ensemble when the fre- quencytrackingguidanceisignored.IntheformulationsofSections1.2,1.3, p¯ corresponds to the steady state of the ensemble, operating for sufficiently long timeinthecaseofaflat,i.e.timeandstateindependent,cost.AnexemplaryMP andrespective“natural” p¯,introducedintheresultofcoarse-grainingofaTCL, isshowninFig.(1.1). The material in the remainder of the manuscript is organized as follows. MDP aimedatfindingtheprofitoptimaltransitionprobabilityvectorforanensembleof devicesisdefinedandanalyzedinSection1.2.MDPdiscussedinthisSectionmea- surethedeviationofthe”optimal” p,fromits“normal”counterpart, p¯,intermsof the standard KL divergence. This formulation is thus an LS-MDP, where we have placed the material related to a detailed technical discussion of the MDP’s linear solvabilityandrelatedfeaturesandpropertiesinAppendix1.9.Wefurtherdiscuss in Section 1.3 a modified MDP that differentiates between the transitions, i.e. dis- countssometransitionsandencouragesothers,incomparisonwithdifferentiation- neutralformulationoftheprecedingSection.InSection1.4weexperimentwiththe optimizationobjectivesintroducedanddiscussedinSections1.2,1.3.InSection1.5 we describe and discuss the solution of an MDP, seeking to minimize the welfare deviation, stated in terms of the KL distance (or weighted KL-distance) between 6 MichaelChertkovandVladimirChernyak the “optimal” and “normal” transition probabilities, while also matching the SO- objectiveexactly.PreliminarydiscussionoftheMDPapproachintegrationintothe OptimumPowerFlow,e.g.,seekingtocontrolthevoltageandminimizethepower losses,isgiveninSection1.6.Section1.7isreservedfortheconclusionsanddis- cussionsofthepathforward. 1.2 Profit(balancedwithwelfare)Optimal OurmainMDPformulationofinterest,whichwecoinprofitvswelfare,isstatedas thefollowingoptimization       T−1  p (t)  minE ∑ ∑ U (t+1) + ∑log αβ  (1.,4) p,ρ ρt=0 α  (cid:124)α (cid:123)(cid:122) (cid:125) β p¯αβ   CostofElectricity  (cid:124) (cid:123)(cid:122) (cid:125) welfarepenalty Eqs.(1.1,1.2,1.3) (cid:32)(cid:32) (cid:33) (cid:33) T−1 p (t) =min ∑ ∑ U (t+1)+∑log αβ ρ (t) , (1.5) p,ρ t=0 α α β p¯αβ α Eqs.(1.1,1.2,1.3) wherethematrix pistheoptimization/controlvariable,whichisstochastic,accord- ing to Eq. (1.1). Here in Eq. (1.4) p¯ is an exogenously known stochastic matrix describing the transition probabilities, correspondent to normal dynamics/mixing withintheensemble,i.e.explainingthedynamicswhichtheensemblewouldshow inthecaseofnormal(welfarecomfortable),U =0,objective.Thestochasticityof p¯meansthat p¯satisfiesEqs.(1.1),where pisreplacedby p¯. Weassumethatwhenfollowing p¯theensemblereachesastatisticalsteadystate, ρ(st),i.e.mixes,sufficientlyfast, p¯ρ(st)=ρ(st). α The essence of the optimization (1.4) is a compromise an aggregator aims to achieveinbalancingbetweensavingmostfortheensemblewhilealsokeepingthe levelofdiscomfort(welfarepenalty)toitsminimum.Thetwoconflictingobjectives arerepresentedbythetwotermsinEq.(1.4). Optimization(1.4)israthergeneral,whilefortheexampleofaspecificdiscrete- time-discrete-spaceTCL,onesetsU tonon-zeroonlyforthestatesαthatrepresent α the“switch-on”statesoftheTCL. Eq.(1.4)isoftheso-calledlinearly-solvableclassofMDPs,introducedin[36], discussed in [16],[17], [29], and also briefly described, as a special case, in Ap- pendix1.9.SolutionofEqs.(1.4)isfullydescribedbyEqs.(1.24,1.27,1.28). According to the general description part of the Appendix 1.9, the problem is solvedintwoDPsteps 1 EnsembleControlofLoads 7 • Backwardintimestep.Compute precursivelyadvancingbackwardsintimeac- cordingtoEqs.(1.24,1.27)whereγ(t)issubstitutedby1,andstartingfromthe finalcondition(1.28). • Forwardintimestep.Reconstructρ runningEqs.(1.2)forwardintimewiththe initialcondition(1.3). 1.3 Differentiatingstatesthroughapenalty/encouragment The penalized Kublack-Leibler welfare penalty term in Eq. (1.4) is restrictive in termsofhowthetransitionsbetweendifferentstates,andalsoobservedatdifferent moments of time, compare to each other. To encourage/discourage, or generally differentiate the transitions one may weight the terms in the KL-sum differently, thusintroducingtheγ (t)factors: αβ       T−1  p (t) minE ∑ ∑ U (t+1) +∑γ (t)log αβ  (1..6) p,ρ ρt=0 α  (cid:124)α (cid:123)(cid:122) (cid:125) β αβ p¯αβ   CostofElectricity  (cid:124) (cid:123)(cid:122) (cid:125) welfarepenalty Eqs.(1.1,1.2,1.3) ThisgeneralizationofEq.(1.4)aimstoeasetheimplementationoftheoptimalde- cision,e.g.emphasizingordownplayingthecontrollabilityofatransitionsbetween thestatesandatdifferentmomentsoftime. Solution of the optimization (1.6) is described in Appendix 1.9. Even though linearsolvabilityofthestate-uniformformulation,discussedintheprecedingSec- tion1.2,doesnotextendtothenonuniformformulationofEq.(1.6),thebasicDP approachstillholdsandtheproblemissolvedviathefollowingbackward-forward algorithm • Backwardintimestep.Startingwiththefinalconditions(1.17),onesolves Eqs.(1.18)forϕ recursivelybackwardsintime.SolvingEqs.(1.18)requires,at eachtimestep,toexecuteaninnerloopiterations(tillconvergence),accordingto Eqs.(1.22),tofindtheLagrangianmultiplierλ.Ateachstepbackwardsintime pisreconstructedaccordingtoEqs.(1.19). • Forwardintimestep.Reconstructρ runningEqs.(1.2)forwardintimewiththe initialcondition(1.3). 1.4 ComputationalExperiments In this Section we describe some (preliminary) computational experiments, con- ducted for MDP settings discussed in the two preceding Sections. We choose for 8 MichaelChertkovandVladimirChernyak Fig.1.2 ExemplarysolutionoftheMDPproblemforthecasewithoutpenalty,i.e.withtheobjec- tiverepresentedbyEq.(1.4).Theinitialprobabilitydistribution,correspondstothesteadystateof the“target”MCwiththetransitionprobabilitiesshowninFig.(1.1).Eightcurvesshowoptimal dynamicsoftherespective,ρα(t), α=1,···,8. Fig.1.3 ExemplarysolutionoftheMDPproblemforthecasewithpenalty,i.e.withtheobjective represented by Eq. (1.6). The initial probability distribution, corresponds to the steady state of the“target”MCwiththetransitionprobabilitiesshowninFig.(1.1).Eightcurvesshowoptimal dynamics of the respective, ρα(t), α =1,···,8 for the setting, equivalent to the one used in Fig.(1.2). 1 EnsembleControlofLoads 9 theexperimentstheexemplarycaseof8statesandthetargettransitionprobability, p¯,intheformshowninFig.(1.1).Forthesametarget, p¯,weshowthesolutionof theoptimizationproblemthatcorrespondstothesettingofEq.(1.4)andEq.(1.6), respectively, where in the latter case we choose a time-independent penalty fac- tor γ =10 for all transition except for those that correspond to advancing one αβ (counter-clock-wise)stepalongthecycle.Thespecial“alongthecycletransitions” are not penalized, i.e. the respective transition factors are chosen equal to unity, γ =1, α =1,···,7. We used the Algorithms described at the end of mod[α+1],α Section1.2andattheendofSection1.3respectively. TheresultsareshowninFigs.(1.2,1.3).ComparingtheFigures,corresponding tothetworegimes(withandwithoutpenalty),oneobservesthatimposingpenalty leadstoamorehomogeneousdistributionρ overthestates,α. α 1.5 Service(frequencycontrol)Optimal Anotherviablebusinessmodelforanaggregatorofanensembleofcyclingdevices is to provide ancillary frequency control services to a regional SO. The auxiliary servicesconsistinadjustingtheenergyconsumptionoftheensembletoanexoge- noussignal.Trackingthesignalmayrequirefromsome(orall)participantsofthe ensembletosacrificetheirnaturalcyclingbehavior.InthisSectionweaimtostudy, whetheraperfecttrackingofapre-defined(exogenous)isfeasibleandthen,inthe case of feasibility, we would like to find the optimal solution causing least dis- comfort to the ensemble. Putting it formally, the aggregator solves the following optimizationproblem     T−1 p (t) min E ∑ ∑γ (t)log αβ  (1.7) p ρ αβ p¯  t=0α,β αβ    (cid:124) (cid:123)(cid:122) (cid:125) welfarepenalty s.t. Eqs.(1.1) s(τ)=∑ε ρ (τ), ∀α, ∀τ =1,···,T (1.8) α α α (cid:124) (cid:123)(cid:122) (cid:125) energytrackingconstraint where p¯ is the ’target’ distribution represented by a stochastic matrix, e.g. leading to mixing into the standard steady state whenU =0; p is the stochastic, i.e., con- strainedbyEqs.(1.1),matrix;s(t)istheamountofenergy,requestedbythesystem operator to balance the (transmission level) grid; ε is the amount of energy con- α sumedbyadevicewhenitstaysinthestateα foraunittimeslot.Asstatedabove, we consider the most challenging setting which assumes that the tracking is per- 10 MichaelChertkovandVladimirChernyak fect, that is the total consumption of the ensemble is exactly equal to the amount requestedbytheSO. IntroducingtheLagrangianmultiplierfortheenergytrackingconstraint,onere- statesEq.(1.7)asthefollowingmin-maxoptimization (cid:32) (cid:34) (cid:35) (cid:33) T−1 p (t) T T minmax E ∑ ∑γ (t)log αβ +∑∑ξ(t)ε −∑ξ(t)s(t) (1.,9) p,ρ ξ ρ t=0α,β αβ p¯αβ t=1 α α t=1 Eqs.(1.1) ReversingtheorderofsummationinEq.(1.9),weobservethatoptimizationover p andρ intheresultingexpressionbecomesequivalent,underasubstitution, U (τ)=ξ(τ)ε , ∀α, τ =1,···,T. (1.10) α α to the penalized Kublack-Leibler welfare penalty optimization (1.6), discussed in theprecedingSection1.3. Inspiteofthecloserelationbetweentheenergytrackingproblem(1.8)andthe profit optimality problem (1.6) the former is more difficult to implement. Indeed, whileonecansolveEq.(1.6)inonlyonebackward-forwardrun,wearenotaware of existence of such an efficient algorithm for solving Eq. (1.8). The difficulty is relatedtothefactthatξ(t),itself,shouldbederivedintheresultofaKKTcondition, reenforcingtheenergytrackingconstraint(1.8).Anaturalresolutionofthisproblem isthroughanouterloopiterationincludingthefollowingtwosub-steps: • Run the backward-forward penalty optimization algorithm described at the end ofSection1.3usingthecurrentξ(t)(outer-stepspecific)profile. • Updatecurrentξ(t)accordingto(1.8),utilizingthecurrentρ(t)derivedonthe previoussub-step. Theouter-loopiterativeschemeisinitiatedwithξ(t)derivedfrom(1.8)withρ(t) correspondenttothe“normal”MP,p→p¯.Onerunsiterationstillapre-settolerance is achieved. We plan to experiment with and analyze convergence of the iterative schemeinthefuture. 1.6 HybridModeling:TowardsVoltageAware&Hierarchical EnsembleControl In this Section we describe one possible scheme of an MDP approach integration intocontrolofpowerdistribution.Thedescriptionhereissketchyandpreliminary, meanttosetupaformulationforfurtherexploration. Theaggregationofmanyloadsdiscussedinthemanuscriptsofarhasignoredthe details of power flows as well as related voltage and Power Flow (PF) constrains. WehaveassumedthatanyofthediscussedsolutionsisPF-feasible,inthesensethat solution of PF equations are realizable for any of the consumption configuration and,moreover,thevoltagesandpowerflowsalongthepowerlinesstaywithinthe

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.