ebook img

Load curve data cleansing and imputation via sparsity and low rank PDF

0.35 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Load curve data cleansing and imputation via sparsity and low rank

IEEETRANSACTIONSONSMARTGRID,VOL.X,NO.X,XXXXXX2012 1 Load Curve Data Cleansing and Imputation via Sparsity and Low Rank Gonzalo Mateos, Member, IEEE, and Georgios B. Giannakis, Fellow, IEEE Abstract—Thesmartgridvisionistobuildanintelligentpower network operations under strict reliability requirements, also 3 network with an unprecedented level of situational awareness facing malicious cyber-attacks. 1 and controllability over its services and infrastructure. This Statistical inference techniques are expected to play an in- 0 paper advocates statistical inferencemethodsto robustifypower creasinglyinstrumentalrolein powersystem monitoring[10], 2 monitoring tasks against the outlier effects owing to faulty readings and malicious attacks, as well as against missing data not only to meet the anticipated “big data” deluge as the n due to privacy concerns and communication errors. In this installed base of phasor measurement units (PMUs) reaches a J context,anovelloadcleansingandimputationschemeisdeveloped out throughout the grid, but also to robustify the monitoring 1 leveragingthelowintrinsic-dimensionalityofspatiotemporalload tasks against the “outlier” effects owing to faulty readings, profilesandthesparsenatureof “bad data.”A robustestimator 3 malicious attacks, and communication errors, as well as based on principal components pursuit (PCP) is adopted, which effects a twofold sparsity-promoting regularization through an against missing data due to privacy concerns and technical ] C ℓ1-norm of the outliers, and the nuclear norm of the nominal anomalies[8].Inthiscontext,aloadcleansingandimputation load profiles. Upon recasting the non-separable nuclear norm schemeisdevelopedinthispaper,buildingonrecentadvances O intoaformamenabletodecentralizedoptimization,adistributed in sparsity-cognizant information processing [12], low-rank . (D-) PCP algorithm is developed to carry out the imputation h matrix completion [9], and large-scale distributed optimiza- and cleansing tasks using networked devices comprising the so- at termed advanced metering infrastructure. If D-PCP converges tion [4]. m and a qualification inequality is satisfied, the novel distributed Load curve data refers to the electric energy consumption estimatorprovablyattainstheperformanceofitscentralizedPCP periodicallyrecordedbymetersatpointsofinterestacrossthe [ counterpart,whichhasaccesstoallnetworkwidedata.Computer power grid, e.g., end-user premises, buses, and substations. 1 simulations and tests with real load curve data corroborate the Accurate load profiles are critical assets aiding operational v convergence and effectiveness of the novel D-PCP algorithm. decisions in the envisioned SG system [7], and are essential 7 Index Terms—Advanced metering infrastructure, distributed 2 for load forecasting [24]. However, in the process of acquir- algorithms, load curve cleansing and imputation,principalcom- 6 ing and transmitting such massive volumes of information ponents pursuit, smart grid. 7 for centralized processing, data are oftentimes corrupted or 1. lost altogether. In a smart monitoring context for instance, 0 I. INTRODUCTION incomplete load profiles emerge due to three reasons: (r1) 3 PMU-instrumented buses are few; (r2) SCADA data become The US power grid has been recognized as the most 1 available at a considerably smaller time scale than PMU : important engineering achievement of the 20th century [26], v data; and (r3) regional operators are not willing to share all yet it presently faces major challenges related to efficiency, i their variables [23]. Moreover, a major requirement for grid X reliability, security, environmental impact, sustainability, and monitoring is robustness to outliers, i.e., data not adhering r market diversity issues [25]. The crystallizing vision of the a smartgrid(SG) aspires to builda cyber-physicalnetworkthat to nominal models [1], [22]. Sources of so-termed “bad data” include meter failures, as well as strikes, unscheduled canaddressthesechallengesbycapitalizingonstate-of-the-art generator shutdowns, and extreme weather conditions [7], information technologies in sensing, control, communication, [11]. Inconsistent data can also be due to malicious (cyber- optimization, and machine learning. Significant effort and ) attacksthatinduceabruptloadchanges,orcounterfeitmeter investment are being committed to architect the necessary readings [18]. infrastructure by installing advanced metering systems, and In light of the aforementioned observations, the first con- establishingdatacommunicationnetworksthroughoutthegrid. tribution of this paper is on modeling spatiotemporal load Accordingly, algorithms that optimally exploit the pervasive profiles, accounting for the structure present to effectively sensing and control capabilities of the envisioned advanced impute missing data and devise robust load curve estima- metering infrastructure (AMI) are needed to make the neces- tors stemming from convex optimization criteria (Section II). sarybreakthroughsinthekeyproblemsinpowergridmonitor- Existing approaches to load curve cleansing have relied on ingandenergymanagement.Thisisnoeasyendeavorthough, separate processing per time series [7], [11], [21], and have in view of the challenges posed by increasingly distributed notcapitalizedonspatialcorrelationstoimproveperformance. PartofthispaperwillbepresentedattheProc.of3rdIntl.Conf.onSmart The aim is for minimal-rank cleansed load data, while also GridCommunications, Tainan,Taiwan,Nov.5-8,2012. exploitingoutliersparsityacrossbusesandtime.Anestimator The authors are with the Dept. of ECE and the Digital Technology tailoredtothesespecificationsisprincipalcomponentspursuit Center, University of Minnesota, Minneapolis, MN, 55455 USA (e-mail: [email protected]; [email protected]). (PCP) [5], [6], [28], which is outlined in Section III. PCP IEEETRANSACTIONSONSMARTGRID,VOL.X,NO.X,XXXXXX2012 2 minimizes a tradeoff between the least-squares (LS) model where X, O, andE denotethe nominalloadprofiles,the out- fitting error and a twofold sparsity-promoting regularization, liers,andsmallmeasurementerrors,respectively.Fornominal implemented through an ℓ1-norm of the outliers and the observations yn,t =xn,t+en,t, one has on,t =0. nuclearnormofthenominalloadprofiles.WhilePCPhasbeen Remark 1 (Model under-determinacy): The model is in- widelyadoptedincomputervision[5], forvoiceseparation in herentlyunder-determined,sinceevenforthe(mostfavorable) music [13], andunveilingnetworkanomalies[20], its benefits case of fulldata, i.e., Ω≡{1,...,N}×{1,...,T},there are to power systems engineering and monitoring remains so twice as many unknowns in X and O as there is data in Y. far largely unexplored. The second contribution pertains to Estimating X and O becomes even more challenging when developing a distributed (D-) PCP algorithm, to carry out the data are missing, since the number of unknowns remains the imputation and cleansing task using a network of intercon- same, but the amount of data is reduced. necteddevicesaspartoftheAMI(SectionIV).Thisispossible In any case, estimation of {X,O} from PΩ(Y) is an ill- by leveraging a general algorithmic framework for sparsity- posed problem unless one introduces extra structural assump- regularizedrankminimizationputforthin[20].Uponrecasting tionsonthemodelcomponentstoreducetheeffectivedegrees the (non-separable) nuclear norm present in the PCP cost of freedom. To this end, two cardinal properties of X and into a form amenable to decentralized optimization (Section O will prove instrumental. First, common temporal patterns IV-A), the D-PCP iterations are obtained in Section IV-C via among the energy consumption of a few broad classes of the multi-block alternating-directions method of multipliers loads(e.g.,industrial,residential,seasonal)inadditiontotheir (ADMM)solver[3],[4].Inanutshell,periterationeachsmart (almost) periodic behaviors render most rows and columns meterexchangessimplemessageswithits(directlyconnected) of X linearly dependent, and thus X typically has low-rank. neighborsinthenetwork,andthensolvesitsownoptimization Second, outliers (or attacks) only occur sporadically in time problem to refine its current estimate of the cleansed load and affect only a few buses, yielding a sparse matrix O. profile. In the context of power systems, the ADMM has Smoothness of the nominal load curves is related to the low- been recently adopted to carry out dynamic network energy rank property of X, which was adopted in [7] to motivate management[16],anddistributedrobuststate estimation[14]. a smoothing splines-based algorithm for cleansing. However, Computer simulations corroborate the convergenceand op- while in [7] smoothness was enforced per load profile time- timality of the novel D-PCP algorithm, and demonstrate its series, i.e., per row (x )′ of X; the low-rank property of n effectiveness in cleansing and imputing real load curve data X also captures the spatial dependencies introduced by the (Section V). Concluding remarks and directions for future network. Approaches capitalizing on outlier- and “bad data-” research are outlined in Section VI, while a few algorithmic sparsity can be found in e.g., [14], [15] and [21]. details are deferred to the Appendix. B. Communication network model II. MODELINGANDPROBLEM STATEMENT Suppose that on top of the energymeasurementfunctional- Thissectionintroducesthemodelfor(possibly)incomplete ity, the N networked smart meters are capable of performing and grossly corrupted load curve measurements, acquired by simple local computations, as well as exchanging messages geographically-distributed metering devices monitoring the among directly connected neighbors. Single-hop communica- power grid. The communication network model needed to tion models are appealing due to their simplicity, since one account for exchanges of information among smart meters is doesnothavetoincurtheroutingoverhead.TheAMInetwork describedaswell. Lastly, the task ofload curvecleansingand isnaturallyabstractedtoanundirectedgraph G(N,L), where imputation is formally stated. the vertex set N := {1,...,N} corresponds to the network nodes,andthe edges(links)inL representpairsof nodesthat A. Spatiotemporal load curve data model are connected via a physical communication channel. Node Let the N × 1 vector y(t) := [y1,t,...,yN,t]′ (′ stands n ∈ N communicates with its single-hop neighboring peers fortransposition)collectthe spatial loadprofilesmeasuredby in J , and the size of the neighborhood will be henceforth n smart meters monitoring N network nodes (buses, residential denotedby|J |.ThegraphGisassumedconnected,i.e.,there n premises),at a givendiscrete-timeinstant t∈[1,T].Consider exists a (possibly multihop) path that joins any pair of nodes theN×T matrixofobservationsY :=[y(1),...,y(T)].The in the network. This requirement ensures that the network n-th row(y )′ ofY is the time series ofenergyconsumption is devoid of multiple isolated (connected) components, and n (load curve) measurements at node n, while the t-th column allows for the data collected by e.g., smart meter n, namely y(t) of Y represents a snapshot of the networkwide loads then-throw(y )′ ofY, toeventuallyreacheveryothernode n taken at time t. To model missing data, consider the set in the network. This way, even when only local interactions Ω⊆{1,...,N}×{1,...,T} of index pairs (n,t) defining a areallowed,theflowofinformationcanpercolatethenetwork. samplingoftheentriesofY. Introducingthematrixsampling Theimportanceofthenetworkmodelwillbecomeapparent operator PΩ(·), which sets the entries of its matrix argument in Section IV. not indexed by Ω to zero and leaves the rest unchanged, the (possibly) incomplete spatiotemporal load curve data in the C. Load curve cleansing and imputation presence of outliers can be modeled as The load curve cleansing and imputation problem studied PΩ(Y)=PΩ(X+O+E) (1) here entails identification and removal of outliers (or “bad IEEETRANSACTIONSONSMARTGRID,VOL.X,NO.X,XXXXXX2012 3 data”), in addition to completion of the missing entries from to a central monitoring and data analytics station, which the nominal load matrix, and denoising of the observed uses their aggregation in PΩ(Y) to reject outliers and im- ones. To some extent, it is a joint estimation-interpolation pute missing data. While for the most part this is the pre- (prediction)-detection problem. With reference to (1), given vailing operational paradigm nowadays, there are limitations generally incomplete, noisy and outlier-contaminated spa- associated with this architecture. For instance, collecting all tiotemporal load data PΩ(Y), the cleansing and imputation this information centrally may lead to excessive overhead tasks amount to estimating the nominal load profiles X and in the communication network, especially when the rate of the outliers O, by leveraging the low-rank property of X and data acquisition is high at the meters. Moreover, minimizing the sparsity in O. Collaboration between metering devices (or avoiding altogether) the exchanges of raw measurements (collecting networkwide data) is considered here, rather than may be desirable for privacy and cyber-security reasons, as local processing of load curves per bus. well as to reduce unavoidable communication errors that Note that load cleansing and imputation are different from translate to missing data. Performing the optimization in a loadforecasting[24],whichamountstopredictingfutureload centralized fashion raises robustness concerns as well, since demand based on historical data of energy consumption and the central data analytics station represents an isolated point theweatherconditions.Actually,cleansingandimputationare of failure. These reasons motivate devising fully-distributed critical preprocessing tasks utilized to enhance the quality of iterative algorithms for PCP, embedding the load cleansing loaddata,thatwouldbesubsequentlyusedforloadforecasting and imputation functionality to the AMI. This is the subject and optimum power flow [2]. of the next section. III. PRINICIPAL COMPONENTSPURSUIT IV. DISTRIBUTEDCLEANSING ANDIMPUTATION An estimator matching nicely the specifications of the load A distributed (D-)PCP algorithm to solve (P1) using a curve cleansing and imputation problem stated in Section network of smart meters (modeled as in Section II-B) should II-C, is the so-termed (stable) principal components pursuit be understood as an iterative method, whereby each node (PCP) [5], [6], [28], that will be outlined here for complete- carries out simple local (optimization) tasks per iteration ness. PCP seeks estimates {Xˆ,Oˆ} as the minimizers of k = 1,2,..., and exchanges messages only with its directly 1 2 connected neighbors. The ultimate goal is for each node to (P1) {mX,iOn}2kPΩ(Y−X−O)kF +λ∗kXk∗+λ1kOk1 formlocalestimatesxn[k]andon[k]thatcoincidewiththen- throwsof Xˆ andOˆ ask →∞,where {Xˆ,Oˆ}isthesolution wherethe ℓ1-norm kOk1 := n,t|on,t| andthe nuclearnorm of (P1) obtainedwhen all dataPΩ(Y) are centrallyavailable. kXk := σ (X) (σ (X) denotes the i-th singular value of ∗ i i i P Attaining the centralized performance with distributed data is X) are utilized to promote sparsity in the number of outliers impossible if the network is disconnected. P (nonzero entries) in O, and the low rank of X, respectively. To facilitate reducing the computational complexity and The nuclear and ℓ1-norms are the closest convex surrogates memorystoragerequirementsoftheD-PCPalgorithmsought, to the rank and cardinality functions, which albeit the most it is henceforth assumed that an upper bound rank(Xˆ) ≤ ρ natural criteria they are in general NP-hard to optimize. The is a priori available [recall Xˆ is the estimated low-rank tuning parameters λ1,λ∗ ≥ 0 control the tradeoff between cleansed load profile obtained via (P1)]. As argued next, fitting error, rank, and sparsity level of the solution. When the smaller the value of ρ, the more efficient the algorithm an estimate σˆv2 of the observation noise variance is available, becomes.Smallvaluesof ρ are well motivateddue to the low guidelinesforselectingλ∗ andλ1 havebeenproposedin[28]. intrinsicdimensionalityofthespatiotemporalloadprofiles(cf. The nonzeroentries in Oˆ reveal “bad data” across both buses Section II-A). Because rank(Xˆ) ≤ ρ, (P1)’s search space is and time. Clearly, it does not make sense to flag outliers in effectivelyreducedandonecanfactorizethedecisionvariable data that has not been observed, namely for (n,t) ∈/ Ω. In as X=PQ′, where P and Q are N×ρ and T ×ρ matrices, those cases (P1) yields oˆn,t = 0 since both the Frobenious respectively.Adoptingthisreparametrizationof Xin(P1)and and ℓ1-norms are separable across the entries of their matrix making explicit the distributed nature of the data (cf. Section arguments. II-B), one arrives at an equivalent optimization problem Being convex (P1) is computationally appealing, and it N has been shown to attain good performance in theory and 1 practice. Forinstance, in the absence of noise and when there (P2) {Pm,Qin,O} 2kPΩn(yn−Qpn−on)k22 n=1(cid:20) isnomissingdata,identifiabilityandexactrecoveryconditions X λ were reported in [5] and [6]. Even when data are missing, + ∗kPQ′k∗+λ1konk1 N it is possible to recover the low-rank component under some (cid:21) technicalassumptions[5].Theoreticalperformanceguarantees whichis non-convexdueto thebilinearterm PQ′, andwhere in the presence of noise are also available [28]. P:=[p1,...,pN]′. The numberof variablesis reducedfrom Regardingalgorithms,aPCPsolverbasedontheaccelerated 2NT in (P1), to ρ(N +T)+NT in (P2). The savings can proximal gradient method was put forth in [17], while the be significant when ρ is small, and both N and T are large. ADMM was employed in [27]. Implementing these central- NotethatthedominantNT-termin thevariablecountof(P2) ized algorithms presumes that networked metering devices is due to O, which is sparse and can be efficiently handled continuouslycommunicatetheirlocalload measurementsy even when both N and T are large. n,t IEEETRANSACTIONSONSMARTGRID,VOL.X,NO.X,XXXXXX2012 4 Remark 2 (Challenges facing distributed implementation): Notice that (P3) and (P4) are equivalent optimization prob- Problem (P2) is still not amenable for distributed lems, since the network graph G(N,L) is connected by implementation due to: (c1) the non-separable nuclear assumption.Eventhoughconsensusis a fortioriimposedonly normpresentin thecostfunction;and(c2)theglobalvariable within neighborhoods, it extends to the whole (connected) Q coupling the per-node summands. network and local estimates agree on the global solution of Challenges (c1)-(c2) are dealt with in the ensuing sections. (P3).ToarriveatthedesiredD-PCPalgorithm,itisconvenient to reparametrize the consensus constraints in (P4) as A. A separable low-rank regularization Q =F¯m, Q =F˜m, and F¯m =F˜m, m∈J , n∈N To address (c1), consider the following alternative charac- n n m n n n n (3) terization of the nuclear norm (see e.g. [20]) kXk := min 1 kPk2 +kQk2 , s. to X=PQ′. where {F¯mn,F˜mn}n∈N, are auxiliary optimization variables ∗ {P,Q} 2 F F that will be eventually eliminated (cf. Remark 3). (cid:0) (cid:1) (2) Theoptimization(2)isoverallpossiblebilinearfactorizations of X, so that the number of columns ρ of P and Q is also a C. The D-PCP algorithm variable. Leveraging (2), the following reformulation of (P2) Totackle(P4),associateLagrangemultipliersM¯mandM˜m n n provides an important first step towards obtaining the D-PCP withthefirstpairofconsensusconstraintsin(3).Introducethe algorithm: quadratically augmented Lagrangian function [3] N (P3) {Pm,Qin,O}n=1(cid:20)21kPΩn(yn−Qpn−on)k22+λ1konk1 Lc(V1,V2,V3,M) = N 12kPΩn(yn−Qnpn−on)k22 X n=1(cid:20) λ X +2N∗ Nkpnk22+kQk2F (cid:21). +λ1konk1+ 2λN∗ Nkpnk22+kQnk2F As asserted in [20, Lem(cid:0)ma 1], adopting (cid:1)the separable N (cid:0) (cid:1)(cid:21) Frobenius-norm regularization in (P3) comes with no loss of + hM¯m,Q −F¯mi+hM˜m,Q −F˜mi optimality relative to (P1), providedrank(Xˆ)≤ρ. By finding nX=1mX∈Jn(cid:16) n n n n m n (cid:17) the global minimum of (P3) [which could have considerably N c less variablesthan(P1)],one can recoverthe optimalsolution + kQ −F¯mk2 +kQ −F˜mk2k2 (4) 2 n n F m n F F of(P1).Thiscouldbechallenginghowever,since(P3)isnon- nX=1mX∈Jn(cid:16) (cid:17) convex and it may have stationary points which need not be where c > 0 is a penalty parameter, and the primal variables globally optimum. Interestingly, it is possible to certify global optimality of a are split into three groups V1 := {Qn}Nn=1, V2 := {pn}Nn=1 stationarypoint{P¯,Q¯,O¯}of(P3).Specifically,onecanestab- andV3 :={on,F¯mn ,F˜mn}nm∈∈NJn.Fornotationalbrevity,collect Ol¯is}hitshattheifgklPobΩa(lYly−oPp¯tQi¯m′a−lOs¯o)lkut<ionλo∗,ft(hPe1n){[X2ˆ0,:=PrPo¯pQ¯. 1′,]O.ˆT:h=e tahllatLtahgerarnemgeaimniunlgtipcloienrsstrainintMs in:=(3){,Mn¯amnm,eMl˜ymnCF}nm:∈=∈NJn{.F¯Nmno=te qualification condition kPΩ(Y−P¯Q¯′−O¯)k < λ∗ captures F˜mn, m∈Jn, n∈N}, have not been dualized. To minimize (P4) in a distributed fashion, (a multi-block tacitly the role of ρ. In particular, for sufficiently small ρ the residualkPΩ(Y−P¯Q¯′−O¯)kbecomeslargeandconsequently variant of) the ADMM will be adopted here. The ADMM is an iterative augmented Lagrangian method especially well the condition is violated [unless λ is large enough, in which ∗ suited for parallel processing [3], [4], which has been proven case a sufficiently low-ranksolution to (P1) is expected].The conditionontheresidualalsoimplicitlyenforcesrank(Xˆ)≤ρ, successfultotackletheoptimizationtasksstemmingfromgen- eraldistributedestimatorsofdeterministicand(non-)stationary whichisnecessaryfortheequivalencebetween(P1)and(P3). randomsignals;seee.g.,[14],[20]andreferencestherein.The proposedsolverentailsan iterativeprocedurecomprising four B. Local variables and consensus constraints steps per iteration k =1,2,..., [n∈N, m∈J in (5)-(6)] n To decompose the cost in (P3), in which summands inside [S1] Update dual variables: the square brackets are coupled through the global variable Q [cf. (c2) under Remark 2], introduce auxiliary variables M¯m[k]=M¯ m[k−1]+c(Q [k]−F¯m[k]) (5) {Q }N representing local estimates of Q per smart meter n n n n n. Tnono=bt1aina separablePCP formulation,use these estimates M˜mn[k]=M˜ mn[k−1]+c(Qm[k]−F˜mn[k]). (6) along with consensus constraints [S2] Update first group of primal variables: N 1 (P4) {Pnm,Qinn,O}n=1(cid:20)2kPΩn(yn−Qnpn−on)k22 V1[k+1]=arg mVi1nLc(V1,V2[k],V3[k],M[k]). (7) X +λ1konk1+ 2λN∗ Nkpnk22+kQnk2F [S3] Update second group of primal variables: s. to Q =Q , m∈(cid:0) J , n∈N. (cid:1)(cid:21) V2[k+1]=arg minLc(V1[k+1],V2,V3[k],M[k]). (8) n m n V2 IEEETRANSACTIONSONSMARTGRID,VOL.X,NO.X,XXXXXX2012 5 Algorithm 1 : D-PCP at smart meter n∈N Itis then apparentthatthe Hadamardproductcan be replaced inputyn,Ωn,λ∗,λ1, and c. withtheusualmatrix-vectorproductasPΩn(z)=Ωnz,where initializeS[0]=0T×ρ, and Qn[1], pn[1] at random. Ω := diag(ω ). Operators ⊗ and vec[·] denote Kronecker n n for k=1,2,... do product and matrix vectorization, respectively. Finally, the Receive {Qm[k]} from neighbors m∈Jn. [S1] Update local dual variables: soft-thresholdingoperatorisSλ1(·):=sign(·)max(|·|−λ1,0). [SSn2[]k]U=pdSante[kfi−rst1]g+roucpPomf∈loJcna(lQpnr[ikm]a−l vQamria[kb]l)e.s: Rinesmpeacrtikon3o(fEAlimlgionraitthimon1orfevreedalusntdhaatntthvearreidaubnledsa)n:tCauaxreilfiualry An[k+1]:={(pn[k]p′n[k])⊗Ωn+(λ∗/N +2c|Jn|)IρT}−1 variables {F¯m,F˜m,M˜m}m∈Jn have been eliminated. Each n n n n∈N smart meter, say the n-th, does not need to separately keep Qn[k+1]=unvec(cid:16)An[k+1]n(pn[k]⊗Ωn)(yn−on[k]) track of all its non-redundant multipliers {M¯ m} , but n m∈Jn −vec(Sn[k])+vec(cPm∈Jn(Qn[k]+Qm[k]))o(cid:17). onlyupdatetheirrespectivesumsSn[k]:=2 m∈Jn M¯ mn[k]. [S3] Update second group of local primal variables: Whenemployedtosolvenon-convexproblemssuchas(P4), pn[k+1]={Q′n[k+1]ΩnQn[k+1]+Iρ}−1 ADMM so far offers no convergence guaraPntees. However, ×Q′n[k+1]Ωn(yn−on[k]). there is ample experimental evidence in the literature which [S4] Update third group of local primal variables: on[k+1]=Sλ1(Ωn(yn−Qn[k+1]pn[k+1])). supports convergence of ADMM, especially when the non- Transmit Qn[k+1] to neighbors m∈Jn. convexproblemathandexhibits“favorable”structure[4]. For end for instance,(P4)isbi-convexandgivesrisetothestrictlyconvex return Qn[∞],pn[∞],on[∞]. optimizationsubproblemseach time L is minimizedwith re- c specttooneofthegroupvariables,whichadmituniqueclosed- form solutions per iteration [cf. (7)-(9)]. This observation [S4] Update third group of primal variables: and the linearity of the constraints suggest good convergence propertiesforthe D-PCP algorithm.Extensivenumericaltests V3[k+1]=arg min Lc(V1[k+1],V2[k+1],V3,M[k]) including those presented in Section V demonstrate that this V3∈CF (9) is indeed the case. While a formal convergence proof is the which amount to a block-coordinate descent method cycling subject of ongoing investigation, the following proposition over V1 → V2 → V3 to minimize Lc, and dual variable asserts that upon convergence, the D-PCP algorithm attains updates [3]. At each step while minimizing the augmented consensus and global optimality. For a proof (omitted here Lagrangian,the variable groups not being updated are treated due to space limitations), see [20, Appendix C]. asfixed,and aresubstitutedwith theirmostupto datevalues. Proposition 1: Suppose iterates {Q [k],p [k],o [k]} n n n n∈N Different from the standard two-block ADMM [3], [4], the generated by Algorithm 1 converge to {Q¯ ,p¯ ,o¯ } . If n n n n∈N multi-block variant here cycles over three groups of primal {Xˆ,Oˆ} is the optimal solution of (P1), then Q¯1 = Q¯2 = variables [19]. ... = Q¯N. Also, if kPΩ(Y − P¯Q¯′1 − O¯)k < λ∗, then Reformulatingtheestimator(P1)toitsequivalentform(P4) {Xˆ =P¯Q¯′,Oˆ =O¯}. 1 renderstheaugmentedLagrangianin(4)highlydecomposable. Theseparabilitycomesintwoflavors,bothwithrespecttothe V. NUMERICAL TESTS variable groups V1-V3, as well as across the network nodes This section corroborates convergence and gauges perfor- n∈N.Thisleadstohighlyparallelized,simplifiedrecursions mance of the D-PCP algorithm, when tested using synthetic to be run by the networked smart meters. Specifically, it is and real load curve data. shownin the Appendixthatthe aforementionedADMM steps [S1]-[S4] give rise to the D-PCP iterations tabulated under Algorithm 1. Per iteration, each device updates: [S1] a local A. Synthetic data tests matrix of dual prices S [k]; [S2]-[S3] local cleansed load A network of N = 25 smart meters is generated as a n estimates Q [k + 1] and p [k + 1] obtained as solutions realization of the random geometric graph model, meaning n n to respective unconstrained quadratic problems (QPs); and nodes are randomly placed on the unit square and two nodes [S4]itslocaloutliervector,throughasparsity-promotingsoft- communicate with each other if their Euclidean distance is thresholding operation. The (k+1)-st iteration is concluded less than a prescribed communication range of d = 0.4; c after smart meter n transmits Q [k + 1] to its single-hop see Fig. 1. The time horizon is T = 600. Entries of E n neighborsin J .Regardingcommunicationcost,Q [k+1]is are independentand identically distributed (i.i.d.), zero-mean, n n a T ×ρ matrix and its transmission does not incur significant Gaussian with variance σ2 = 10−3; i.e., e ∼ N(0,σ2). l,t overhead for small ρ. Observe also that PΩ(yn) need not be Low-rank spatiotemporal load profiles with rank r = 3 are exchangedwhichisdesirabletopreservedatasecrecy,andthe generated from the bilinear factorization model X = WZ′, communication cost is independent of N. where W and Z are N ×r and T ×r matrices with i.i.d. Before moving on, a clarification on the notation used in entries drawn from Gaussian distributions N(0,100/N) and Algorithm1isdue.TodefinematrixΩ in[S2]-[S4],observe N(0,100/T), respectively. Every entry of O is randomly n first that the local sampling operator can be expressed as drawn from the set {−1,0,1} with Pr(o = −1) = n,t PΩn(z) = ωn⊙z, where ⊙ denotes Hadamard product, and Pr(on,t = 1) = 5 × 10−2. To simulate missing data, a the binary masking vector ω ∈ {0,1}T has entries equal to sampling matrix Ω ∈ {0,1}N×T is generated with i.i.d. n 1ifthecorrespondingentryof zisobserved,and0otherwise. Bernoulli distributed entries o ∼ Ber(0.7) (30% missing n,t IEEETRANSACTIONSONSMARTGRID,VOL.X,NO.X,XXXXXX2012 6 101 Smart meter n=10 1 Smart meter n=20 100 Smart meter n=30 0.8 10−1 00..46 Consensus error1100−−32 10−4 0.2 10−5 0 0 0.2 0.4 0.6 0.8 1 10−60 100 200 300 400 500 600 700 800 Iteration index (k) Fig.1. Asimulated networkgraphwith N =25nodes. Fig.3. Evolutionoftheconsensuserror. School # 2 102 O, N=15 DC−enPtCraPlized solver [28] mption (kW)122505 OEOsruitgtilmiienarastle ldo aloda cdu crvuerve 101 O, N=25 O, N=35 wer consu150 mation error100 Po 00 50 100 GovTe15irmn0me e(hnot ubrusi)ld2i0n0g 250 300 Global esti10−1 X, N=25 mption (kW)456000 10−2 X, N=15 X, N=35 wer consu2300 10−30 100 200 300 400 500 600 Po100 50 100 T15im0e (hours)200 250 300 Iteration index (k) Fig.4. Schoolandgovernmentbuilding loadcurvedatacleansing. Fig. 2. Convergence of the D-PCP algorithm for different network sizes. D-PCPattains thesameestimation errorasthecentralized solver. B. Real load curve data test Here,theD-PCPalgorithmistestedonrealloadcurvedata. The dataset consists of power consumption measurements (in data on average). Finally, measurements are generated as kW) for a government building, a grocery store, and three PΩ(Y) = Ω⊙(X+O+V) [cf. (1)], and smart meter n schools (N = 5) collected every fifteen minutes during a has available the n-th row of PΩ(Y). period of more than five years, ranging from July 2005 to Toexperimentallycorroboratetheconvergenceandoptimal- October 2010. Data is downsampled by a factor of four, to ity (as per Proposition 1) of the D-PCP algorithm, Algorithm yield one measurement per hour. For the present experiment, 1 is run with c = 1 and compared with the centralized only a subset of the whole data is utilized for concreteness, benchmark(P1),obtainedusingthesolverin [27]. Parameters whereT =336waschosencorrespondingto336hourperiods. λ1 = 0.0141 and λ∗ = 0.346 are chosen as suggested For the governmentbuilding case, a snapshot of the available in [28]. For both schemes, Fig. 2 shows the evolution of loadcurvedataspanningthestudiedtwo-weekperiodisshown the global estimation errors e [k] := kX[k]−Xk /kXk in blue e.g., in Fig. 4 (bottom).Weekday activity patternscan X F F and e [k]:=kO[k]−Ok /kOk . It is apparent that the D- be clearly discerned from those corresponding to weekends, O F F PCP algorithm converges to the centralized estimator, and as asexpectedformostgovernmentbuildings;butdifferent,e.g., expectedconvergenceslows down due to the delay associated for the load profile of the grocery store in Fig. 5 (bottom). with the information flow throughoutthe network. The test is To runthe D-PCP algorithm,an underlyingcommunication also repeated for network sizes of N = 15 and 35 devices, graph was generated as in Section V-A. A randomly chosen to illustrate that the time till convergence scales gracefully subset of 30% of the measurements was removed to model as the network size increases. Finally, for N = 35 and with missing data. For one of the schools and the government Q¯[k] := Q [k]/N, Fig. 3 depicts the consensus error building data, Fig. 2 depicts the cleansed load curves that n n e [k]:=kQ [k]−Q¯[k]k /kQ¯[k]k forthreerepresentative closely follow the measurements, but are smooth enough to c,n n F F P smart metering devices. In all cases the error decays rapidly avoid overfitting the abnormal energy peaks on the so-termed to zero, showing that networkwide agreement is attained on “building operational shoulders.” Indeed, these peaks are in the estimates Q [k] mostcasesidentifiedasoutliers.The effectivenessin termsof n IEEETRANSACTIONSONSMARTGRID,VOL.X,NO.X,XXXXXX2012 7 Government building which admitthe closed-formsolutions shown in Algorithm 1. W)50 Original load curve Movingon to [S4], from the decomposablestructure of the Power consumption (k123400000 50 100 150 200 250EMsistisminagte dda 3ltoa0a 0pdo cinutrsve oascunag,tlma[kre+Lntae1sds]o=Lsaugabrratganmsgkoiisann(n[o21ctfe.yt(hn4a,)tt]−Q(9qn)′n:d,=te[ck[oq+unp,11le,]ps..ni.[nk,tqo+np,1Te]r]′-−)noode2 Time (hours) (cid:26) (cid:0) (cid:1) Grocery store W)90 +λ1|o|1 , t=1,...,T mption (k7800 =Sλ1(yn,t−q′n,t[k+(cid:27)1]pn[k+1]), t=1,...,T wer consu5600 and Nn=1|Jn| additional unconstrained QPs Po400 50 100 150 200 250 300 F¯m[Pk+1]=F˜m[k+1]=arg min −hM¯ m[k]+M˜ m[k],F¯mi Time (hours) n n F¯mn n n n c n + kQ [k+1]−F¯mk2 +kQ [k+1]−F¯mk2 (10) Fig.5. Governmentbuildingandgrocerystoreloadcurveimputation,when 2 n n F m n F 30%ofthedataaremissing. which(cid:2)admit the closed-form solutions (cid:3)o imputation of missing data is illustrated in Fig. 3 (identified 1 outliersarenotshownhere);notehowthecleansed(gray)load F¯m[k+1]=F˜m[k+1]= (M¯m[k]+M˜m[k]) n n 2c n n curve goes throughthe (red) missing data points. The relative 1 error in predicting missing data is around 6%, and degrades + (Qn[k+1]+Qm[k+1]). (11) 2 to 8% when the amount of missing data increases to 50%. Note that in formulating (10), F˜m was eliminated using the n VI. CONCLUSION constraints F¯mn = F˜mn defining CF. Using (11) to eliminate F¯m[k] and F˜m[k] from (5) and (6) respectively, a simple Anovelrobustloadcurvecleansingandimputationmethod n n induction argument establishes that if the initial Lagrange isdevelopedinthispaper,rootedatthecrossroadsofsparsity- multipliers obey M¯m[0] = −M˜ m[0] = 0, then M¯ m[k] = cognizant statistical inference, low-rank matrix completion, n n n −M˜ m[k] for all k ≥ 0, where n ∈ N and m ∈ J . The and large-scale distributed optimization. The adopted PCP n n set {M˜ m} of multipliers has been shown redundant,and (11) estimator jointly leverages the low-intrinsic dimensionality of n readily simplifies to spatiotemporalload profiles, and the sparse (that is, sporadic) natureofoutlyingmeasurements.Aseparablereformulationof 1 F¯m[k+1]=F˜m[k+1]= (Q [k+1]+Q [k+1]). (12) PCP is shown to be efficiently minimized using the ADMM, n n 2 n m and gives rise to fully-decentralized iterations which can be It then follows that F¯m[k]=F¯n[k] for all k ≥0, an identity run by a network of smart-metering devices. Comprehensive n m that will be used later on. By plugging (12) in (5), the (non- tests with synthetic and real load curve data demonstrate the redundant) multiplier updates become effectiveness of the novel load cleansing and imputation ap- c proach,andcorroboratetheconvergenceandglobaloptimality M¯ m[k]=M¯m[k−1]+ [Q [k]−Q [k]], n∈N, m∈J . n n 2 n m n of the D-PCP algorithm. (13) Aninterestingfuturedirectionisto devisereal-timecleans- If M¯ m[0] = −M¯n[0]= 0, then the structure of (13) reveals n m ing and imputation algorithms capable of processing load that M¯m[k] = −M¯ n[k] for all k ≥ 0, where n ∈ N and n m curve data acquired sequentially in time. Online adaptive m∈J . n algorithms enable tracking of “bad data” in nonstationary The minimization (9) in [S4] also decouples in N simpler environments, typically arising due to to e.g., network topol- sub-problems, namely ogy changes and missing data. In addition, it is of interest to rigorously establish convergence of the D-PCP algorithm. Q [k+1]=argmin 1kΩ (y −Qp [k]−o [k])k2 n n n n n 2 Such results could significantly broaden the applicability of Q 2 (cid:26) ADMM for large-scale optimization over networks, even in + λ∗ kQk2 + hM¯m[k]+M˜n[k],Qi thepresenceofnon-convexbuthighlystructuredandseparable 2N F n m cost functions. mX∈Jn c + kQ−F¯m[k]k2 +kQ−F˜n[k]k2 APPENDIX 2 n F m F ) ALGORITHMICCONSTRUCTION mX∈Jn(cid:16) (cid:17) 1 2 =argmin kΩ (y −Qp [k]−o [k])k The goal is to show that [S1]-[S4] can be simplified to the n n n n 2 Q 2 iterationstabulatedunderAlgorithm1. Focusingfirst on[S3], (cid:26) 2 λ Q [k]+Q [k] (8) decomposes into N ridge-regression sub-problems + ∗ kQk2 +hS [k],Qi+c Q− n m 2N F n 2 pn[k+1]=argmpin kyn−Qn[k+1]p−on[k]k22+λ∗kpk22 mX∈Jn(cid:13)(cid:13)(cid:13) (14(cid:13)(cid:13)(cid:13))F) (cid:13) (cid:13) n o IEEETRANSACTIONSONSMARTGRID,VOL.X,NO.X,XXXXXX2012 8 where in deriving (14) it was used that: i) M¯m[k] = M˜n[k] [20] M. Mardani, G. Mateos, and G. B. Giannakis, “In-network sparsity- n m which follows from the identities M¯m[k] = −M˜ m[k] and regularizedrankminimization:Applicationsandalgorithms,”IEEETrans. M¯m[k] = −M¯ n[k] established earnlier; ii) the dnefinition SignalProcess.,2012,seealsoarXiv:1203.1507v1[cs.MA]. n m [21] G.Mateos andG.B.Giannakis, “Robustnonparametric regression via Sn[k] := 2 m∈JnM¯ mn[k]; and iii) the identity F¯mn[k] = sparsity control with application to load curve data cleansing,” IEEE F˜n[k] which allows one to merge the identical quadratic Trans.SignalProcess.vol.60,no.4,pp.1571–1584, April2012. m P [22] L.Mili, M.G.Cheniae, andP.J.Rousseeuw, “Robust stateestimation penaltytermsandeliminatebothF¯mn[k]andF˜nm[k]using(12). of electric power systems,” IEEE Trans. Circuits Syst. I, vol. 41, no. 5, Problem (14) is again an unconstrained QP, which is readily pp.349–358,May1994. solved in closed form by e.g., vectorizing Q and examining [23] K.M.Rogers,R.D.Spadoni,andT.J.Overbye,“Identificationofpower system topology from synchrophasor data,” in Proc. of Power Systems the first-order condition for optimality. Conference andExposition, Phoenix,AZ,Mar.2011. Finally, note that upon scaling by two the recursions (13) [24] M.Shahidehpour, H.Yamin, and Z.Li,Market Operations in Electric and summing them over m ∈ J , the update recursion for Power Systems: Forecasting, Scheduling, and Risk Management. New n York,NY:Wiley-IEEEPress,2002. S [k] in Algorithm 1 follows readily. (cid:4) n [25] U.S. Department of Energy, “The smart grid: An introduction,” 2008, [Online.] Available: ACKNOWLEDGMENT http://www.oe.energy.gov/SmartGridIntroduction.htm. [26] W.Wulf,“Greatachievementsandgrandchallenges,”TheBrattleGroup, The authors would like to thank NorthWrite Energy Group Freeman,SullivanandCo.,andGlobalEnergyPartners,LLC,Tech.Rep. and Prof. Vladimir Cherkassky (Dept. of ECE, University of 3/4,Fall2010,[Online.]Available: http://www.greatachievements.org/. [27] X. M. Yuan, J. Yang, “Sparse and low-rank matrix decomposition via Minnesota) for providing the data analysed in Section V-B. alternating direction methods,”PacificJournalofOptimization,2012(to appear). REFERENCES [28] Z. Zhou, X. Li, J. Wright, E. Candes, and Y. Ma, “Stable principal componentpursuit,”inProc.ofIntl.Symp.onInformationTheory,Austin, [1] A.AburandA.Gomez-Exposito,PowerSystemStateEstimation:Theory TX,Jun.2010,pp.1518–1522. andImplementation. NewYork,NY:MarcelDekker, 2004. [2] A.R.BergenandV.Vittal,PowerSystemAnalysis.UpperSaddleRiver, NJ:Prentice Hall, 2000. [3] D. Bertsekas and J. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods.Athena-Scientific, 1999. Gonzalo Mateos (M’12) received his B.Sc.degree [4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed in Electrical Engineering from Universidad de la optimization andstatistical learning viathealternating direction method Repu´blica (UdelaR), Montevideo, Uruguay in2005 ofmultipliers,” Found.TrendsMachLearning, vol.3,pp.1–122,2010. and the M.Sc. and Ph.D. degrees in Electrical and [5] E.J.Candes,X.Li,Y.Ma,andJ.Wright,“Robustprincipal component ComputerEngineeringfromtheUniv.ofMinnesota, analysis?” JournaloftheACM,vol.58,no.1,pp.1–37,2011. Minneapolis, in2009and2011.Since2012,hehas [6] V.Chandrasekaran,S.Sanghavi,P.R.Parrilo,andA.S.Willsky,“Rank- been a post doctoral associate with the Dept. of sparsityincoherenceformatrixdecomposition,” SIAMJ.Optim.,vol.21, ElectricalandComputerEngineeringandtheDigital no.2,pp.572–596,2011. Technology Center, Univ.ofMinnesota. [7] J. Chen, W. Li, A. Lau, J. Cao, and K. Wang, “Automated load curve From 2003 to 2006, he was a teaching assistant data cleansing in power systems,” IEEE Trans. Smart Grid, vol. 1, pp. with the Dept. of Electrical Engineering, UdelaR. 213–221,Sep.2010. From2004to2006,heworkedasaSystemsEngineeratAseaBrownBoveri [8] D.Duan,L.Yang,andL.L.Scharf,“PhasorstateestimationfromPMU (ABB), Uruguay. His research interests lie in the areas of communication measurements with bad data,” in Proc. IEEE Workshop on Comp. Adv. theory, signal processing and networking. His current research focuses on inMulti-Sensor AdaptiveProc.,SanJuan,PuertoRico,Dec.2011. distributed optimization, sparsity-cognizant signal processing, and statistical [9] M.Fazel, “Matrix rank minimization with applications,” Ph.D.disserta- learningforcartographyofcognitivenetworksincludingthesmartpowergrid. tion,Electrical Eng.Dept.,StanfordUniversity, 2002. [10] H. Gharavi, A. Scaglione, M. Dohler, and X. Guan, “Technical chal- lenges of the smart grid: From a signal processing perspective,” IEEE SignalProcess.Mag.vol.29,no.5,pp.12-13,Sep.2012. Georgios B. Giannakis (Fellow’97) received his [11] Z. Guo, W. Li, A. Lau, T. Inga-Rojas, and K. Wang, “Detecting X- DiplomainElectricalEngr.fromtheNtl.Tech.Univ. outliers inloadcurve datainpowersystems,” IEEETrans.PowerSyst., of Athens, Greece, 1981. From 1982 to 1986 he vol.27,pp.875–884,May2012. was with the Univ. of Southern California (USC), [12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical where he received his MSc. in Electrical Engineer- Learning,2nded.Springer, 2009. ing, 1983, MSc. in Mathematics, 1986, and Ph.D. [13] P.-S. Huang, S. D. Chen, P. Smaragdis, and M. Hasegawa-Johnson, inElectrical Engr.,1986.Since1999hehasbeena “Singing-voice separation from monaural recordings using robust prin- professorwiththeUniv.ofMinnesota,wherehenow cipal component analysis,” in Proc. of Intl. Conf. on Acoustics, Speech. holds an ADC Chair in Wireless Telecommunica- andSignalProcess.,Kyoto,Japan,Mar.2012,pp.57–60. tionsintheECEDepartment,andservesasdirector [14] V.KekatosandG.B.Giannakis,“Distributedrobustpowersystemstate oftheDigital TechnologyCenter. estimation,”IEEETrans.PowerSyst.,vol.28,Feb.2013(toappear);see His general interests span the areas of communications, networking and alsoarXiv:1204.0991v2[stat.ML]. statistical signal processing -subjects onwhich he has published more than [15] O. Kosut, L. Jia, J. Thomas, and L. Tong, “Malicious data attacks on 340 journal papers, 560 conference papers, 20 book chapters, two edited the smart grid,” IEEE Trans. Smart Grid vol. 2, no. 4, pp. 645 - 658, booksandtworesearchmonographs.Currentresearchfocusesoncompressive Dec.2011. sensing, cognitive radios, cross-layer designs, wireless sensors, social and [16] M. Kraning, E. Chu, J. Lavaei, and S. Boyd, “Message passing for power grid networks. He is the (co-) inventor of 21 patents issued, and the dynamicnetworkenergymanagement,” Tech.Report, Apr.2012. (co-)recipient of8bestpaperawardsfromtheIEEESignalProcessing(SP) [17] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, and Y. Ma, “Fast andCommunications Societies, includingtheG.MarconiPrizePaperAward convex optimization algorithms for exact recovery of a corrupted low- inWirelessCommunications.HealsoreceivedTechnicalAchievementAwards rankmatrix,”UIUCTech.ReportUILU-ENG-09-2214,July2009. fromtheSPSociety(2000),fromEURASIP(2005),aYoungFacultyTeaching [18] Y.Liu,P.Ning,andM.K.Reiter, “Falsedatainjection attacks against Award, and the G. W. Taylor Award for Distinguished Research from the state estimation inelectric power grids,”in Proc.ofConf. onComputer University of Minnesota. He is a Fellow of EURASIP, and has served the andCommunications Security, Chicago, IL,Nov.2009,pp.9–13. IEEEinanumberofposts,includingthatofaDistinguishedLecturerforthe [19] Z.-Q. Luo, “On the linear convergence of the alternating direc- IEEE-SPSociety. tion method of multipliers,” Tech. Report, Aug. 2012; see also arXiv:1208.3922v1[math.OC].

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.