ebook img

A Bayesian Approach for Parameter Estimation with Uncertainty for Dynamic Power Systems PDF

0.72 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Bayesian Approach for Parameter Estimation with Uncertainty for Dynamic Power Systems

1 A Bayesian Approach for Parameter Estimation with Uncertainty for Dynamic Power Systems Noe´mi Petra, Cosmin G. Petra, Zheng Zhang, Emil M. Constantinescu, and Mihai Anitescu Abstract—We address the problem of estimating the uncer- the parameters of their generator equivalents that need to tainty in the solution of power grid inverse problems within the be used for balancing the load and assessing the dynamical 6 frameworkofBayesianinference.Weinvestigatetwoapproaches, stability will need to be determined from measurements. The 1 anadjoint-basedmethodandastochasticspectralmethod.These dynamical parameters—such as the equivalent inertia of a 0 methods are used to estimate the maximum a posteriori point 2 of the parameters and their variance, which quantifies their windfarm—pose particular challenges because they are not uncertainty. Within this framework we estimate several param- observableinsteadystate[1].Thereforetheyarelikelytoneed b eters of the dynamic power system, such as generator inertias, more frequent data to capture even fast perturbations from e which are not quantifiable in steady-state models. We illustrate F whose transients they can be inverted. The rapid deployment the performance of these approaches on a 9-bus power grid of PMU means that such data streams will become rapidly 1 exampleandanalyzethedependenceonmeasurementfrequency, 1 estimation horizon, perturbation size, and measurement noise. available, and thus such parameters can be obtained, provided We assess the computational efficiency, and discuss the expected that dynamic parameter estimation can be carried out. ] performance when these methods are applied to large systems. In this paper, we focus on dynamical parameter estimation C Index Terms—Power systems, uncertainty, parameter estima- for energy systems. Given the increasing dynamic ranges of O tion, inverse problems, Bayesian analysis. the energy systems and the uncertainty due to evolving user . behavior and the increased use of distributed generation, we h t find it important to provide uncertainty estimates for these a I. INTRODUCTION parameters. In this way the operator can assess the realistic m Estimating the parameters of a system given noisy mea- stability range for next-generation energy systems. [ surements is a critical problem in the operation of energy In prior work, parameter estimation in power grid models systems. Decisions about the best and safe usage of resources 2 typicallyhasbeenputinthecontextofaggregatedloadmodels v depend critically on knowing the current parameters or states; [2].Mostoftentheparametersareobtainedasaresultofleast- 8 typically, not all these quantities are instrumented. Therefore, squaresapproaches[3].Generally,derivative-freemethodsare 4 their values are obtained indirectly by reconciliation between preferred, which typically lead to minimizations based on 4 the mathematical model of the system and existing measure- 7 geneticalgorithms[4];however,derivative-basedleast-squares mentsbyaninverseestimationprocedure,suchasstateestima- 0 have been introduced by Hiskens et al. [5], [6], [7]. . tion. Before the advent of phasor measurement units (PMUs) Since, in an operational environment, one needs to provide 1 the phase angle differences in an electrical network were 0 an answer in all circumstances, in this work we embrace a determined primarily indirectly by estimation from SCADA 6 Bayesian point of view. In this case, even with very little 1 data. While PMU instrumentation can be rapidly installed on information we can produce an estimate that at least will : many parts of the power grid, thus resulting in their phasor v encapsulate prior information about the possible ranges of angleswithrespecttoauniversaltimereferencebeingdirectly i parameters. With more informative data the estimation will X sensed, the ones without such measurements will still need to approach the real value of the parameters, without changing r be inferred indirectly from model and measurements. a the inference framework. In this sense the spread of the pos- Moreover, the advent of renewable and distributed energy terior probability density function (pdf), namely the solution generation systems creates additional challenges that need of the Bayesian inverse problem, will quantify how much mathematicalinversion.Theamount,type,andsettingofgen- information from the data can be used for identifying the eration may not be known a priori by the operator. Therefore, parameters. The challenge in solving this Bayesian inverse problemisincomputingstatisticsofthepdf,whichisasurface Cosmin G. Petra, Zheng Zhang, Emil M. Constantinescu, and Mihai An- itescu(e-mail:{petra,zhengzhang,emconsta,anitescu}@mcs.anl.gov)arewith in high dimensions. This is extremely difficult for problems theMathematicsandComputerScienceDivision,ArgonneNationalLabora- governed by expensive forward models (as is the power grid tory,Lemont,IL60439. model) and high-dimensional parameter spaces (as is the case Noemi Petra (e-mail: [email protected]) is with the Department of Applied Mathematics, School of Natural Sciences, University of California, foralarge-scalepowergrid).Thedifficultystemsfromthefact Merced,CA95343. that evaluation of the probability of each point in parameter This material was based upon work supported in part by the Office of space requires solution of the forward problem, and many Science, U.S. Dept. of Energy, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. N. P. also acknowledges such evaluations may be required to adequately sample the partial funding through the U. S. Dept. of Energy, Office of Workforce posteriordensityinhighdimensionsbyconventionalMarkov- DevelopmentforTeachersandScientists,2015VisitingFacultyProgram.We chain Monte Carlo (MCMC) methods. Hence, quantifying the thank Hong Zhang and Shrirang Abhyankar for their help in setting up the IEEEexampleanditsadjointimplementation. uncertaintiesinparametersbecomesintractableasweincrease 2 the grid dimension. Therefore, the approach we take is based onalocalGaussianapproximationoftheposterioraroundthe maximum a posteriori (MAP) point. This approximation will be accurate when the parameter-to-observable map behaves nearly linearly over the support of the posterior [8]. WepresenttwomethodsforcomputingMAPandestimating the parametric uncertainty: (1) an adjoint-based method and (2)asurrogatemodelingapproachbasedonpolynomialchaos expansions. These methods solve the same problem but have Fig.1. IEEE9-bustestcasesystem.Herethebuses1,2and3aregenerator different properties and computational cost. We will use these busesand5,6,and8loadbuses. techniques to estimate the inertias of three generators in an IEEE 9-bus model. The situation models the circumstance conditions x are known. Nevertheless, initial conditions can 0 where the actual bulk or distributed inertia is not known to also be considered uncertain; and the framework introduced thegridoperator(aswouldbethecaseofawindfarmorother herein naturally extends to such cases. energy resources). The Bayesian formulation poses the parameter estimation We carry out extensive validation experiments to demon- problem as a problem of statistical inference over parameter stratetheconsistencyandaccuracyofthemethods.Moreover, space. The solution of the resulting Bayesian inverse prob- weuseourapproachtoinvestigatetheeffectofimportantdata lem is a posterior probability density function (pdf). Bayes’ features on the precision of MAP. These features include the Theorem states the posterior pdf explicitly as frequency of the measurements and the size of the perturba- tion.Inaddition,wecomparethebehaviorofthetwomethods π (m)∝π (d|m)π (m). post like prior and discuss their computational efficiency and the expected For the likelihood model, π (d|m), we assume that the like complexities when applied to larger systems. measured quantities are the bus voltages from a disturbance. We note that here we measure the voltage at all buses in II. PROBLEMFORMULATION the IEEE 9-bus power grid; however, our framework can be Assume that we have measurements of a dynamical system used to experiment with various measurement scenarios (e.g., that can be modeled by an additive Gaussian noise model measurements at a subset of buses) at various time intervals and measurements of different quantities. Furthermore, since d=f(m)+η, η ∼N(0,Γ ), (1) noise thenoiseηisindependentofm,thusd|m∼N(f(m),Γ ), noise where Γ ∈ Rq×q is the measurement noise covariance the likelihood is given by noise (cid:16) (cid:17) matrix and f(·) a (generally nonlinear) operator mapping π (d|m)∝exp −(cid:107)f(m)−d)(cid:107)2 . (3) model parameters m to observations d. Here, evaluation of like Γn−ois1e thisparameter-to-observablemapf(m)requiressolutionofa The noise covariance, Γ , can be obtained by offline studies noise differential-algebraic system (DAE) that models the dynamics of the measurement setup. If the measurements are from of a power grid (followed by extraction of the DAE solution PMUs,onecanreasonablyassumethatthemeasurementnoise at observation points): is independent between sensors and white noise in time for one of them (on the time scale of interest, which is between x˙ =h(t,x,y,m), x(0)=x , (2a) 0 0.03 and 30 s). The variance then can be computed from the 0=g(t,x,y), y(0)=y . (2b) precision rating of the instrument. 0 The Bayesian prior, π (m), on the other hand, requires Herexrepresentsthedynamicstatevariables(e.g.,rotorangle, prior quantificationoftheexistinginformationabouttheparameters. generator speed), y the static algebraic variables (e.g., bus Considerable literature exists in the area of eliciting priors, voltages and line currents), x and y are the initial states, 0 0 but it certainly requires an intimate analysis of the system at t represents time, and m the model parameters. The right- hand[10].Forexample,forawindfarm,onecanusehistorical hand side h in (2a) is in general a nonlinear function that logs or a simulation-based model to create a statistical model models the dynamics of the system, and g in (2b) is a set of the active inertia at a given time of the year, conditional of algebraic equations modeling the passive network of the on ambient conditions, or use information from similar wind- powersystem.FortheIEEE9-buspowergridmodelproblem, farms. Here we use a Gaussian prior, a common choice for as illustrated in Figure 1, for each generator we have seven Bayesian inverse problems [11]. We use a prior with large differential (i.e., x ∈ R21) and two algebraic equations, and variance and diagonal covariance because of the lack of a for each network node two additional algebraic equations priori information about the parameters. (i.e.,y ∈R24)[9].Theinferenceparametermweconsiderin Restating Bayes’ theorem with Gaussian noise and prior, this paper is the inertia of each generator, and thus m∈R3. the posteriori density function of m is described as [8], [11] Inrealisticapplications,theinitialstatex maynotbeknown 0 (cid:16) 1 1 (cid:17) either, and it would have to be inferred from data. However, π (m)∝exp − (cid:107)f(m)−d)(cid:107)2 − (cid:107)m−m (cid:107)2 , (4) since our focus is on understanding the reconstructability post 2 Γn−ois1e 2 pr Γp−r1 of parameters that cannot be determined from steady-state wherem andΓ ∈Rn×narethemeanandcovariancematrix pr pr measurements, such as inertias, we assume that the initial of the prior distribution π (m), respectively. prior 3 Despite the choice of Gaussian prior and noise probability The iterative procedures of a gradient computation are as distributions, the posterior probability distribution need not follows. First, given a parameter sample m, DAE (6) is be Gaussian, because of the nonlinearity of f(m) [8], [11]. solved for the forward solution u. The solution u is stored Here we make a quadratic approximation of the negative log or checkpointed and further used to evaluate the data misfit of the posterior (4), resulting in a Gaussian approximation termrin(7).Theadjointequationisthensolved(backwardin N(m ,Γ ) of the posterior. The mean of this posterior ap- time) to obtain the adjoint solution (λ,µ). Both the forward MAP post proximation, m , is the MAP point obtained by minimizing and adjoint solutions, along with the current parameter m, MAP the negative log posterior: are used to evaluate ∇ J(m). Thus, a gradient computation m requires two (forward and adjoint) DAE solves. m =argminJ(m):=−logπ (m). (5) MAP post 3) Numerical solution to posterior minimization: The op- m timization problem (5) is solved with the bounded limited- The posterior covariance matrix Γ is then obtained by post memory variable-metric quasi-Newton method for nonlinear computing the inverse of the Hessian of J at m [8], [11]. minimization implemented in TAO [15]. The method main- tains a secant approximation to the Hessian from a limited III. SOLUTIONMETHODS number of previous evaluations of J(m) and ∇ J(m) m We present two methods for solving the inverse problem: and uses this approximation to compute the quasi-Newton the adjoint-based method and the stochastic spectral method. search direction. This approach achieves asymptotic superlin- In Section IV we will illustrate the circumstances in which ear convergence characteristic of Newton method, but with- one approach will be favored over the other. out evaluating second-order derivatives [16]. The numerical estimation procedure starts with an initial guess for m and A. Adjoint-based method iteratively updates this parameter by performing a More´- Thuentesearch[17]alongthequasi-Newtondirection.During We first introduce a numerical discretization of the forward this search a couple of evaluations of J may be needed in problem. Then we detail the adjoint method in Section III-A2 order to ensure sufficient decrease. The process stops when for computing the gradients required when solving (5). (cid:107)∇ J(m)(cid:107) is small, which indicates that m is a local 1) The forward problem: We represent (2) compactly by m minimizer. Mu˙ =F(t,u;m), u(0)=[x(0),y(0)]T , (6) B. Stochastic spectral method where u := (x,y) denotes the state variables, F = [h(·),g(·)]T, and M is the DAE mass matrix, which is Forthesecondmethod,theDAE(6)issimulatedatasmall block identity for x variables and zero in the rest. Note number of samples to build a surrogate model. Then, the that M should not be confused with the parametric inertia obtained surrogate model (instead of the forward solver) is m. Equation (6) is discretized by using a time-stepping usedinthesubsequentoptimizationtoestimatetheparameters. method.Forinstance,atrapezoidal-rulediscretizationleadsto This method is particularly useful when the dimension of Mu = Mu + ∆t(F(t ,u ;m)+F(t ,u ;m)), the parameter space is small and the forward solver has a k+1 k 2 k k k+1 k+1 where ∆t = t −t . With fixed u(0), each choice of the largestate-spacedimension,becauseitsavesonthenumberof k+1 k parameters m will generate a new trajectory. forward dynamic simulations. Our example has 3 parameters 2) The adjoint problem and gradient computation: To and 21 state variables, so it belongs to this category. facilitate the gradient computation needed to solve (5), we Given the prior density function of m, a set of polynomial use a Lagrangian approach that augments J with additional chaos basis functions {Ψα(m)}p|α|=0 are specified. Here terms consisting of the forward DAE problem (2). Using the α ∈ Nn is an index vector, |α| denotes the (cid:96)1 norm, and discrete adjoint approach [12], [13] in PETSc [14], we obtain positive integer p is the highest order of the basis func- the following discrete adjoint equations: tions. These basis functions are orthornormal to each other, (cid:82) namely Ψ (m)Ψ (m)π (m)dm = δ . Then, f(m) MTλ∗ =λ + ∆t(cid:0)FT(u )λ∗+rT(t ,u )(cid:1), Rn α β prior α,β k+1 2 u k+1 u k+1 k+1 is approximated by a truncated generalized polynomial chaos λ =MTλ∗+ ∆t(cid:0)FT(u )λ∗+rT(t ,u )(cid:1) (7) expansionf(m)≈fˆ(m)= (cid:80) cαΨα(m)withcαdefined k 2 u k u k k (cid:82) |α|≤p µk =µk+1+ ∆2t(cid:0)FmT(uk+1)+FmT(uk)(cid:1)λ∗+ absascisαfu=ncRtinonΨsαis(mK)f=(m(p)+πprnior)(!m/()pd!mn!)..The total number of ∆t(cid:0)rT (t ,u )+rT (t ,u )(cid:1), In our implementations, cα are computed in two ways. 2 m k+1 k+1 m k k The first choice is to employ projection-based stochastic with k = N − 1,...,0, λ = 0, µ = 0, where r = collocation [18], [19], [20]. Let {m ,w }N be a set of N N i i i=1 −log(π (d|m)), and the gradients are defined by F = ∂F, quadrature points and weights corresponding to a numerical like u ∂u F = ∂F , r = ∂r, and r = ∂r . integration rule in the parameter space. Then we have c ≈ mThe ∂gmradieunt of∂uJ withmrespe∂cmt to m can be found by (cid:80)N w Ψ (m )f(m ). Popular methods for choosingαthe i=1 i α i i enforcing that the derivative of the Lagrangian with respect quadrature points include tensor-product rules and sparse-grid to u and the adjoint variables (λ,µ) vanish. This is given by methods [21]. The former needs (p+1)n samples to simulate ∇ J(m)=µ −Γ−1(m−m ). the dynamic power systems, whereas the latter needs fewer m 0 pr pr 4 1.1 4 A. Computational Setup 3 1.05 The IEEE 9-bus example is implemented by using PETSc mplitiude 1 Phase 12 andisavailableasapartofthePETScdistribution.Forfuture, Voltage A0.95 Voltage −−021 lsaicrgepra,reaxllaeml pclaepsa,btihleitiseestu[p14h]a,s[t2h6e].adTvhaentafogrewoafrdhaavnindgaidnjtoriinn-t 0.9 gbus1 gbus1 problems needed by TAO for the numerical minimization of gbus2 −3 gbus2 lbus5 lbus5 the posterior, as described in Section III-A, are set up and 0.850 1 2 3 4 5 −40 1 2 3 4 5 t t solved by using the PETSc time-stepping library for DAEs. Both methods used in this paper will produce MAP es- Fig.2. Busvoltageamplitude(left)andphase(right)fortheloadbuses1and timates and their variance for the inertia parameters m (a 2andgeneratorbus5(red,yellowandgreensolidlines,respectively)based on the “truth” inertia parameter m = [23.64,6.40,3.01]. The squares common notation for them in power engineering literature is true showthecorrespondingsyntheticobservationswith1%noise. H [9]). We compare the MAP estimate to the truth using a relative error metric: (cid:114) 1(cid:88) samples by using nested grid samples. The second way is to Err = n (m(i)−m (i))2/m (i)2. (8) n i=1 true true use an interpolation method such as stochastic testing [22]. Specifically,K samplesareselected,andthec ’sareobtained To facilitate interpretation we use a similar relative metric α by solving a linear equation. In [22], a set of samples are for the standard deviation. Specifically, we use the following generatedbyaquadraturerule(suchasatensorproductGauss- formulatonormalizethesquarerootofthetraceoftheHessian quadrature method); then K dominant samples {m }K are inverse (τ): j j=1 subselected such that the matrix V (with its jth row being (cid:113)(cid:88) τ = n Γ (i,i)/m2 (i). (9) made of Ψα(mj), |α|≤p) is well conditioned. i=1 post true With a pth-order polynomial chaos expansion for f(m), Another statistic of interest is the positioning of the real the negative log posterior now becomes a non-negative 2pth- parameterinrelationtothedistribution.Thisisnotcompletely order polynomial function. We first write it as a com- captured by the variance, because there could be significant bination of polynomial chaos basis function by stochastic biasintheestimation.Tothisend,wecomputethecumulative collocation, then convert it to the summation of mono- normal scores (CNS) p for the actual values in relation to the mials: −logπ (m) ≈ (cid:80)2p q mα, with mα = Gaussian approximation of the distribution: post |α|=0 α mα11mα22...mαnn. With this surrogate model, (5) is simplified (cid:104) (cid:112) (cid:105) to mMAP = argmin(cid:80)2|αp|=0qαmα. This nonconvex opti- pi =erf (m(i)−mtrue(i))/ Γpost(i,i) . (10) m mization can be solved locally with gradient-based methods Here erf is the error function, the cumulative density of the as in Section III-A or globally with specialized polynomial standard normal. CNS are between 0 and 1 and indicate how optimizationsolverssuchasGloptiPoly[23],[24],[25],when likelyisthattherealparametersaredrawnfromtheaposteriori the parameter dimension is low. distribution,withthedistinctionthatvaluesveryclosetoeither 0 or 1 are considered unlikely. Todeterminewhetherouranalysishadagoodoutcome,we IV. NUMERICALRESULTS use the following considerations. If the estimation procedure is successful, the error Err should be small by engineering InthissectionweevaluatehowwelldoestheMAPestimate standards (a few percentages or less). If the stochastic model approach the parameters used to generate the data and how is a good depiction of reality, then τ should be mostly good of an error indicator is the posterior variance. larger than Err but comparable. This reflects the fact that A load disturbance at t = 0.1s, constant for 0.2s, is we are uncertain about the parameter used in the estimation inserted in the 9-bus model to provoke a transient. Its value (as opposed to the deterministic case); but when the data during the switching action, L, is what characterizes this dis- is informative, the standard deviations should be comparable turbance.WeuseasΓ adiagonalmatrixwithdiagonalentries pr to the error (though exact relational statements are difficult). [5.76,0.36,0.09].Thepriormeanandthe“truth”inertiavalues A measure of successful representation of the uncertainty are m = [24.00,6.00,3.10] and m = [23.64,6.4,3.01], pr true analysis and validation of the statistical approach is that the respectively. We carry out the forward simulation of the DAE standardconfidencevaluescontaintherealparameter.Thatis, (2) and we create synthetic voltage measurements at all 9 the CNS of the real parameters should be away from 0 and 1 buses.Here,weconsiderthecaseofindependentobservations; (for example, in the [0.1,0.9] range) but not clustered at 0.5, hence Γ is a diagonal matrix, with diagonal entries for all noise which would indicate an excessively conservative variance. computations, unless otherwise specified, 10−4. The resulting voltage amplitudes, phases, and synthetic measurements are B. Results depicted in Figure 2. This synthetic data is then used in the Bayesian framework encapsulated in (5). We aim to quantify 1) Dependence on experimental design parameters: Our the estimation error as a function of L and the frequency of approach has two experimental design parameters: the length the observations, which we assume consist of time series of of the time horizon over which the estimation is carried out the voltages at all 9 buses, mimicking PMU data streams. and the frequency of the data. We now present the behavior 5 TABLEII of Err, τ, and the CNS values as a function of the various COMPUTATIONALCOSTFORCOMPUTINGTHEMAPPOINTMEASUREDIN choices of these parameters. NUMBEROFFORWARDANDADJOINTSOLVES. Table I shows the estimation results for different estimation horizons t and data frequencies ∆obs, shown by the first Load Noise #Iter #fwd/adjSolves Time(s) f t 4.25 0.01 10 13 28 column in (a) and (b), respectively. The second, third, and 4.25 0.1 9 13 31 forth columns (m , i = 1, 2, 3) indicate the inverse solution, 7.00 0.01 9 12 30 i 7.00 0.1 10 14 34 thatis,theMAPpointobtainedwiththeadjoint-basedandthe surrogate-based methods, separated by “/”; the fifth column (#iter) indicates the number of iterations taken by the adjoint- vations or longer estimation intervals, the latter appears to be based method to converge. The sixth and seventh columns more beneficial to the quality of the estimation once we are (τ and Err) show the standard deviation normalized by the in range of 10 measurements per second or better. “truth”inertiaparameter(asgivenin(9))forthetwomethods 2) Dependence on the nature of the perturbation: Having and the deterministic error computed with the adjoint-based established in Section IV-B1 that the Bayesian posterior stan- method), respectively. The last three columns show the p- dard deviation is a good indicator of the parameter error, we valuescomputedwiththeadjoint-basedmethodbyusing(10). estimate its behavior with the size of the load perturbation L. For these simulations the forward problem time step was We note that if there were no perturbation, the system would ∆ = 0.01, the load parameter (at load bus 5) was 5.5, and t beinsteadystate,anditsinertiaswouldthusnotbeobservable. the iterations were terminated when the norm of the gradient fell below 10−6. We thus anticipate that a larger perturbation would result in betterestimationpropertiesandthuslowerposteriorvariances. As we see in Table I(b), for data frequency of 10 measure- ments per second or better, the deterministic error Err never In Figure 3 we show a surface plot of the trace of the gets above 2%. In that range, the scaled standard deviation τ Gaussianized posterior covariance (the sum of the parameter is5%orbetterandtheratioofτ/Err isalwayslessthan4.5. variances) for several noise and load values (left) and the The CNS values are comfortably within [0.1,0.9]. We also “whiskers boxplot” of the prior and posterior mean and note that the error, Err, does not significantly improve with variances for L and σm values of (4.25, 0.01) and (7,0.1), re- finer measurements, and it oscillates with the decrease of the spectively.Thesefiguresshowthat,asanticipated,thevariance data frequency. On the other hand, the scaled standard devia- increasesasthenoiseincreasesandtheperturbationdecreases, tiondoesimproveastheadditionaldatareducestheimpactof whichindicatesthatthedeterministicerrorwillhaveasimilar the prior. We conclude that in the range of 10 measurements behavior. The computational cost for computing MAP points per second or better, by the standards indicated above, our (measured in number of forward and adjoint solves) is shown statistical approach is successful. Moreover, when providing in Table II, which indicates that the optimization effort is confidence intervals based on our Bayesian framework, the unaffected by the values of the perturbation parameters. [0.1,0.9] confidence interval always contains the real value. We also note that 10 measurements per second is comfortably C. Computational analysis within the capabilities of typical PMU data streams of 30 measurements per second. Wenowdiscussthecomputationalcostforthetwomethods InTableI(a),welisttheeffectofthelengthoftheestimation presentedinthisstudy.Theadjoint-basedmethodrequiresthe interval on the estimation. We do this at a data frequency value of the full nonlinear model and its gradient for each of 20 Hz (∆obs = 0.05), which is within the sampling rates iteration. Additional iterations may be required in the line- t (30 to 0.033 Hz) supported by PMUs. We observe that for search procedure. The number of forward and adjoint solves estimation horizons of 1s or longer, our statistical approach forselectedcasesislistedinTableII.Forthesesimulationswe is also successful. In that range, both the deterministic error usedt =2s,∆ =0.01s,and∆obs =0.1s.Theiterationsfor f t t and the statistical errors are less than 4%, and their ratio is these simulations were terminated when the norm of the gra- never more than 4. Also, both error indicators are decreasing dient fell below 10−6. To compare the adjoint-based method withlongertimehorizons,whereasthedeterministicerrorwas cost with the stochastic spectral method, we need to account relatively insensitive to measurement frequency. for the cost of computing the adjoint, which is roughly the The CNS values for the larger inertia, m , fit comfortably same as in the forward run. In addition to the computational 1 within the [0.1,0.9] confidence interval. The smaller inertias, time, however, the stochastic spectral element method has the however, are contained only in the [0.01,0.99] range for the advantage of working without sensitivity information. Given verylongestimationhorizons.Thisinterval,whichfornormal theconsiderableamountoflegacysoftwareforwhichadjoints distributions is about 3 standard deviations left and right would be labor-intensive to implement, this could confer it a of the mean, is not abnormally wide by statistical analysis practical advantage. Moreover, the variance can be naturally standards. But it does suggest that smaller inertias are harder estimated with no additional cost, whereas the adjoint-based to estimate accurately relative to larger ones, which is not approach would need either finite differences or second-order altogether surprising. However, when seen in the light of the adjoints to compute the covariance. smallrelativestandarddeviation(about1%),thoseconfidence In Table III we show the costs of constructing surrogate intervals will be tight from an operational perspective. models using different approaches. In Table IV we show Therefore, when having the choice of more frequent obser- the MAP results using different orders of surrogate models 6 TABLEI ASTUDYOFTHEEFFECTOFTHETIMEHORIZONANDMEASUREMENTFREQUENCYONTHEABILITYTORECOVERTHEINERTIAPARAMETERFORTHE POWERGRIDINVERSEPROBLEM. t m1 m2 m3 #iter τ Err p1 p2 p3 f (a)∆t =0.01,∆otbs =0.05 5.0 23.60/23.60 6.35/6.37 3.02/3.00 15 1.59e-02/1.79e-02 5.17e-03 0.1679 0.1484 0.5804 3.0 23.79/23.81 6.39/6.41 3.06/3.05 9 1.85e-02/2.04e-02 1.01e-02 0.9659 0.4163 0.8470 1.0 23.56/23.55 6.32/6.32 3.06/3.06 14 3.60e-02/3.66e-02 1.30e-02 0.4019 0.2384 0.7423 0.8 23.67/23.63 6.54/6.53 2.95/2.95 11 5.81e-02/5.80e-02 1.76e-02 0.5123 0.7314 0.2295 0.6 22.45/22.45 6.14/6.13 3.01/3.00 10 9.43e-02/9.29e-02 3.74e-02 0.1924 0.2337 0.4892 ∆otbs (b)tf =1,∆t =0.01 0.01 23.25/23.23 6.29/6.28 2.97/2.97 12 1.65e-02/1.67e-02 1.56e-02 0.0078 0.0195 0.1641 0.02 23.81/23.76 6.50/6.49 3.00/2.99 12 2.34e-02/2.36e-02 9.79e-03 0.7635 0.8897 0.4218 0.05 23.56/23.55 6.32/6.32 3.06/3.06 14 3.60e-02/3.66e-02 1.30e-02 0.4019 0.2384 0.7423 0.10 22.91/23.87 6.42/6.43 3.06/3.04 13 4.82e-02/4.89e-02 1.11e-02 0.7332 0.5607 0.6522 0.35 23.53/23.52 6.23/6.21 2.98/2.98 11 9.04e-02/9.08e-02 1.69e-02 0.4297 0.2452 0.4412 30 pproiostrerior (7.0,0.1) 8 4 posterior (4.25,0.01) 7.5 3.8 3 28 3.6 7 2.5 26 3.4 2 6.5 3.2 1.5 24 6 3 1 0.5 22 5.5 2.8 0 0.1 20 5 2.6 6 4 0.040.060.08 18 4.5 2.4 load 2 0.02 noise 4 2.2 m1 m2 m3 Fig.3. Left:SurfaceplotofthetraceoftheGaussianizedposteriorcovarianceasafunctionofnoiseandloadfortf =2s.Right:A“whiskersboxplot”of thepriorandposteriormeanandvariancesforloadandnoisevalues(7,0.1)and(4.25,0.01)forthethreeinertiaparameters.Thecentralmarkisthemedian, theedgesoftheboxarethe25thand75thpercentilesandthe“whiskers”extendtothemostextremedatapoints. constructedbydifferentmethods. Clearly,theaccuracyissig- n is large, the number of simulation samples required can nificantlyimprovedwhenweincreasetheorderofpolynomial be very large, leading to an extremely high computational chaosexpansionfrom1to2,buttheimprovementismarginal cost. Arguably one can obtain efficiently a high-dimensional when we use third-order polynomial chaos expansions. From surrogate model by using some advanced techniques, such these tables we see that we can obtain good-quality estimates as compressed sensing [27], tensor recovery [28], and proper of the parameters and their variance using only 10 forward generalizeddecomposition[29];butthesetechniquesmaystill runs (degree 2). This is less intensive than the adjoint-based beinefficientforextremelyhigh-dimensionalcases(e.g.,when method by a factor of about 2. n>1000). 1) Challenges as we increase the number of parameters While a definitive comparison between the two approaches and the complexity of the problem: The adjoint-based method is difficultto make in generalbecause of themultiple features has two major requirements: (1) code differentiation, that is, of the target problems, for a small number of parameters and the computation and implementation of derivatives such as lack of sensitivity information, the stochastic spectral element the ones in (7), and (2) storing the forward trajectory through approach would be a strong candidate for a solution. In our checkpoints.Becauseonlyafewthousandsofstatesneedtobe case, it did produce good estimates a factor of 2 faster than storedifthetimescalesremainthesame,evenforinterconnect the adjoint-based approach for the proper choice of degree size examples this is unlikely to become a problem even and construction method, which may be difficult to guarantee on a desktop. On the one hand point (1) is a significant a priori. undertaking, although HPC tools such as PETSc increasingly 2) Considerationsaboutdeployment: Whilethisisonlyan provide support for it natively. On the other hand, the cost initial study, a practical implementation is worth considering. of the adjoint-based method is independent of the number of In such cases the initial state and the load would need to be parameters, and parallel implementations are also possible. inverted as well. Because these are classical analyses, a tiered The stochastic spectral method proved to be robust in approach is possible, where they are estimated separately. our experimental setting, requiring few model evaluations to One can, of course, create a unified estimation approach with construct a viable surrogate. In addition, all calculations can hybrid data sources; a mix of PMU and other data, such as be trivially parallelized, and a variance estimator is intrinsic. SCADA, may need to be considered. While the performance Moreover, once a surrogate is obtained, one need not to ofthemethodwouldneeditselftobere-evaluated,thiscanbe regenerate it if the model and setting do not change. As done in the Bayesian framework described in Section II. As discussed above, however, when the parameter dimensionality described, the method assumes that we have a way to identify 7 TABLEIII atimethatisaconstantfactoroftheoneoftheforwardsimula- TOTALNUMBEROFFORWARDSIMULATIONSTOCONSTRUCTTHE SURROGATEMODELS. tionirrespectiveofthenumberofparametersconsidered.This method was implemented in PETSc. The latter has the benefit Totalnumberofforwardsimulations. polyn.order of needing no sensitivity capabilities, of employing only stoch.testing tensorprod. sparsegrid 1 4 8 7 forward simulations, and of providing an intrinsic estimate of 2 10 27 19 the variance. It is suitable for the case of a limited number 3 20 64 39 of parameters. We have demonstrated these methods on a 9- bus example case that is available for download [31]. The TABLEIV MAPRESULTSUSINGDIFFERENTORDEROFGPCEXPANSIONS. three parameters to be estimated were the generator inertias. Forthisexamplewehavegeneratedsyntheticdataoftransient surrogatemodel gPCorder m1 m2 m3 behaviorbyperturbingtheloadandaddingmeasurementnoise 1 22.818 6.745 2.248 that we have used to assess the behavior of our approaches. stoch.testing 2 23.600 6.372 3.000 3 23.611 6.351 3.021 When applying our method we have found that estimation 1 23.751 6.420 2.973 time horizons of 1s or more and data frequency of at least SCw/tensorprod. 2 23.585 6.374 2.991 10 samples per second were sufficient for the error to be 3 23.618 6.347 3.026 1 23.962 6.322 3.156 less than 2%, the posterior variance to be a good estimate SCw/sparsegrid 2 23.584 6.375 2.990 of the error, and some of the standard confidence intervals 3 23.617 6.361 3.016 to cover the real parameter (with the 3 standard deviations ones always containing the real parameters). We have also observed that, as expected, the error and posterior variance “micro-transients” suitable to trigger dynamical estimation. decrease with increased system perturbation and decreased This can be done for PMU data. The method can also be measurementerror.Thecomputationaleffortwasontheorder modified to support any type of perturbation, as well as in a of 10 forward simulations for the stochastic spectral method “rolling horizon” approach, where it is not triggered but used and 30 forward simulations for the adjoint-based method. For continuously. This can be done, for example, by restarting usage in larger systems under real time constraints, and under the estimation with the prior covariance being the posterior realistic data streams and use cases, further work may be one from the previous estimation interval. We anticipate that necessary. Nevertheless, for the small parameter case the state as long as the perturbations show enough dynamic range so oftechnologyissuchthat,withtheuseofparallelcomputing, that the method can excite transients that are informative the stochastic spectral method may already provide sufficient about the inertias, similar behaviors and performance can be capabilities. expected.Amoresignificantconcernistheabilitytocompute the estimate in real time. We note that forward simulations for power grid transients using PETSc on interconnect-sized REFERENCES networkshavebeenrunfasterthanrealtimewithlessthan16 [1] Y. Zhang, J. Bank, Y.-H. Wan, E. Muljadi, and D. Corbus, “Syn- cores [30]. Therefore, for a few dynamic parameters to invert chrophasormeasurement-basedwindplantinertiaestimation,”inGreen withuncertainty,thestochasticspectralelementmethodcould TechnologiesConference,2013IEEE. IEEE,2013,pp.494–499. in principle work “out of the box”. For a large number of [2] V. Knyazkin, C. Canizares, L. H. So¨der et al., “On the parameter estimation and modeling of aggregate power system loads,” IEEE dynamic parameters to invert, the issue is whether the opti- TransactionsonPowerSystems,vol.19,no.2,pp.1023–1031,2004. mization can be fast enough. Certainly a promising direction [3] B.-K.Choi,H.-D.Chiang,Y.Li,H.Li,Y.-T.Chen,D.-H.Huang,and is the usage of a rolling horizon approach in conjunction with M. G. Lauby, “Measurement-based dynamic load models: derivation, comparison, and validation,” IEEE Transactions on Power Systems, inexact optimization. vol.21,no.3,pp.1276–1283,2006. [4] H. Bai, P. Zhang, and V. Ajjarapu, “A novel parameter identification approach via hybrid learning for aggregate load modeling,” Power V. CONCLUSIONS Systems,IEEETransactionson,vol.24,no.3,pp.1145–1154,2009. We have presented a Bayesian framework for parameter [5] I. A. Hiskens and A. Koeman, “Power system parameter estimation,” Journal of Electrical and Electronics Engineering Australia, vol. 19, estimation with uncertainty focused on the estimation of pp.1–8,1999. dynamic parameters of energy systems. This investigation is [6] I. Hiskens, “Inverse problems in power systems,” Bulk Power System prompted by the rapid expansion of PMU sensors and the DynamicsandControlV,pp.180–195,2001. [7] I.Hiskensetal.,“Powersystemmodelingforinverseproblems,”IEEE increasedusageofrenewablegenerationwhoseinertiafeatures TransactionsonCircuitsandSystemsI:RegularPapers,vol.51,no.3, may change in time and may not be known to the stakeholder pp.539–551,2004. that must ensure transient stability operation of the system. [8] A.Tarantola,InverseProblemTheoryandMethodsforModelParameter Estimation. Philadelphia,PA:SIAM,2005. For such systems, inertia cannot be assumed known and must [9] P. W. Sauer and M. A. Pai, Power system dynamics and stability. thusbeestimatedtogetherwithitsuncertainty.Becauseinertia PrenticeHall,1998. has no impact on steady-state features of the system, it needs [10] J.E.OakleyandA.O’hagan,“Uncertaintyinpriorelicitations:anon– parametricapproach,”Biometrika,vol.94,pp.427–441,2007. transient scenarios under which to be estimated. [11] J. Kaipio and E. Somersalo, Statistical and Computational Inverse We have proposed two methods to compute the MAP Problems, ser. Applied Mathematical Sciences. New York: Springer- estimates and their variances: an adjoint-based method and Verlag,2005,vol.160. [12] W.Hager,“Runge-Kuttamethodsinoptimalcontrolandthetransformed a stochastic spectral method. The former has the advantage adjoint system,” Numerische Mathematik, vol. 87, no. 2, pp. 247–282, that it can compute gradients of the log-likelihood function in 2000. 8 [13] H.Zhang,S.Abhyankar,E.Constantinescu,andM.Anitescu,“Discrete Cosmin G. Petra is an assistant computational sensitivityanalysisofpowersystemdynamics,”draft,2016. mathematician in the Mathematics and Computer [14] S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, Science Division at Argonne National Laboratory. K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, He received the B.Sc. degree in mathematics and M. G. Knepley, L. C. McInnes, K. Rupp, B. F. Smith, S. Zampini, computer science from “Babes¸-Bolyai” University, and H. Zhang, “PETSc Web page,” 2015. [Online]. Available: Romania, and the M.S. and Ph.D. degrees in ap- http://www.mcs.anl.gov/petsc pliedmathematicsfromtheUniversityofMaryland, [15] T.Munson,J.Sarich,S.Wild,S.Benson,andL.C.McInnes,“TAO2.0 Baltimore County. His research interests lie at the users manual,” Mathematics and Computer Science Division, Argonne intersection of numerical optimization, operations NationalLaboratory,Tech.Rep.ANL/MCS-TM-322,2012. researchandhigh-performancescientificcomputing [16] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed. New withafocusontheoptimizationofcomplexenergy York:Springer,2006. systemsunderuncertainty. [17] J.J.More´ andD.J.Thuente,“Linesearchalgorithmswithguaranteed sufficientdecrease,”ACMTrans.Math.Softw.,vol.20,no.3,pp.286– 307,Sep.1994. [18] D. Xiu and J. S. Hesthaven, “High-order collocation methods for differentialequationswithrandominputs,”SIAMJ.Sci.Comp.,vol.27, Zheng Zhang received his Ph.D. degree in elec- no.3,pp.1118–1139,Mar2005. trical engineering and computer science from the [19] I.Babusˇka,F.Nobile,andR.Tempone,“Astochasticcollocationmethod MassachusettsInstituteofTechnology(MIT),Cam- forellipticpartialdifferentialequationswithrandominputdata,”SIAM bridge,MA.Heiscurrentlyapostdocassociatewith J.Numer.Anal.,vol.45,no.3,pp.1005–1034,Mar2007. theMathematicsandComputerScienceDivisionat [20] F. Nobile, R. Tempone, and C. G. Webster, “A sparse grid stochastic ArgonneNationalLaboratory.Hisresearchinterests collocationmethodforpartialdifferentialequationswithrandominput include high-dimensional uncertainty quantification data,”SIAMJ.Numer.Anal.,vol.46,no.5,pp.2309–2345,May2008. anddataanalysisfornanoscaledevicesandsystems, [21] T.GerstnerandM.Griebel,“Numericalintegrationusingsparsegrids,” energy systems, and biomedical applications. He Numer.Algor.,vol.18,pp.209–232,Mar.1998. received the 2015 Doctoral Dissertation Seminar [22] Z. Zhang, T. A. El-Moselhy, I. M. Elfadel, and L. Daniel, “Stochastic Award from the Microsystems Technology Labora- testing method for transistor-level uncertainty quantification based on tory of MIT, and the 2014 Best Paper Award from IEEE Transactions on generalized polynomial chaos,” IEEE Trans. CAD of Integr. Circuits CADofIntegratedCircuitsandSystems.Hisindustrialresearchexperiences andSyst.,vol.32,no.10,pp.1533–1545,Oct2013. includeCoventorInc.andMaxim-IC. [23] J.B.Lasserre,“Globaloptimizationwithpolynomialsandtheproblem ofmoments,”SIAMJ.Optimization,vol.11,no.3,p.796817,2001. [24] ——, “An explicit equivalent positive semidefinite program for 0-1 nonlinear programs,” SIAM J. Optimization, vol. 12, no. 3, pp. 756– 769,2001. EmilM.ConstantinescureceivedhisPh.D.degree [25] D. Henries and J. B. Lasserre, “Gloptipoly: Global optimization over incomputersciencefromVirginiaTech,Blacksburg, polynomials with Matlab and SeDuMi,” ACM Trans. Mathematical in 2008. He is currently a computational mathe- Software,vol.29,no.2,p.165194,2003. maticianintheMathematicsandComputerScience [26] S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, Division atArgonne National Laboratoryand he is K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, ontheeditorialboardofSIAMJournalonScientific M. G. Knepley, L. C. McInnes, K. Rupp, B. F. Smith, S. Zampini, Computing.Hisresearchinterestsincludenumerical and H. Zhang, “PETSc users manual,” Argonne National Laboratory, analysisoftime-steppingalgorithmsandtheirappli- Tech. Rep. ANL-95/11 - Revision 3.6, 2015. [Online]. Available: cationstoenergysystems. http://www.mcs.anl.gov/petsc [27] X. Yang and G. E. Karniadakis, “Reweighted l1 minimization method for stochastic elliptic differential equations,” J. Comp. Phys., vol. 248, no.1,pp.87–108,Sept.2013. [28] Z. Zhang, H. D. Nguyen, K. Turitsyn, and L. Daniel, “Probabilistic powerflowcomputationvialow-rankandsparsetensorrecovery,”arXiv Mihai Anitescu is a senior computational mathe- preprint:arXiv:1508.02489,pp.1–8,Aug2015. maticianintheMathematicsandComputerScience [29] A.Nouy,“Propergeneralizeddecompositionandseparatedrepresenta- DivisionatArgonneNationalLaboratory,aprofes- tions for the numerical solution of high dimensional stochastic prob- sorintheDepartmentofStatisticsattheUniversity lems,”Arch.Comp.Meth.Eng.,vol.27,no.4,pp.403–434,Dec2010. ofChicago,andaSeniorFellowoftheComputation [30] S.Abhyankar,B.Smith,andE.Constantinescu,“Evaluationofoverlap- Institute at the University of Chicago. He obtained pingrestrictedadditiveSchwarzpreconditioningforparallelsolutionof his engineer diploma (electrical engineering) from verylargepowerflowproblems,”inProceedingsofthe3rdInternational thePolytechnicUniversityofBucharestin1992and WorkshoponHighPerformanceComputing,NetworkingandAnalytics hisPh.D.inappliedmathematicalandcomputational forthePowerGrid,ser.HiPCNA-PG’13. ACM,2013,pp.5:1–5:8. sciences from the University of Iowa in 1997. He [31] [Online]. Available: https://github.com/Argonne-National-Laboratory/ specializes in the areas of numerical optimization, PowerSystemsEstimation.git computationalscience,numericalanalysisanduncertaintyquantification.He co-authoredmorethan100peer-reviewedpapersinscholarlyjournals,book chapters, and conference proceedings, and he is on the editorial board of MathematicalProgrammingAandB,SIAMJournalonOptimization,SIAM Journal on Scientific Computing, and SIAM/ASA Journal in Uncertainty Quantification; and he is a senior editor for Optimization Methods and NoemiPetraisanassistantprofessorintheApplied Software. Mathematics department in the School of Natural Sciences at the University of California, Merced. SheearnedherPh.D.degreeinappliedmathematics (To be removed before publication) Government License: The submitted fromtheUniversityofMaryland,BaltimoreCounty manuscripthasbeencreatedbyUChicagoArgonne,LLC,OperatorofArgonne in2010.PriortojoiningUCMerced,Noemiwasthe NationalLaboratory(“Argonne”).Argonne,aU.S.DepartmentofEnergyOffice recipientoftheICESPostdoctoralFellowshipinthe ofSciencelaboratory,isoperatedunderContractNo.DE-AC02-06CH11357.The InstituteforComputationalEngineering&Sciences U.S. Government retains for itself, and others acting on its behalf, a paid-up at the University of Texas at Austin. Her research nonexclusive,irrevocableworldwidelicenseinsaidarticletoreproduce,prepare interestsincludeinverseproblems,uncertaintyquan- derivativeworks,distributecopiestothepublic,andperformpubliclyanddisplay tificationandoptimalexperimentaldesign. publicly,byoronbehalfoftheGovernment.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.