1 A Random Matrix Theoretical Approach to Early Event Detection in Smart Grids Xing He, Robert C. Qiu, Fellow, IEEE, Qian Ai, Member, IEEE, Yinshuang Cao, Jie Gu, Zhijian Jin Abstract—Power systems are developing very fast nowadays, patternsofanomaliesinpowersystemstomakecorresponding both in size and in complexity; this situation is a challenge emergency responses [4]. In other words, EED aims to tell for Early Event Detection (EED). This paper proposes a data- signals from noises—we treat receivable sample errors and 5 driven unsupervised learning method to handle this challenge. irregular fluctuations (from distributed generators and small 1 Specifically, the random matrix theories (RMTs) are introduced 0 asthestatisticalfoundationsforrandommatrixmodels(RMMs); loads) as noises; whereas we treat system faults, network 2 based on the RMMs, linear eigenvalue statistics (LESs) are reconfigurations, and dramatic changes from loads/generators defined via the test functions as the system indicators. By (oftenunintended)assignals.Asmartgrid,especiallytheone p comparing the values of the LES between the experimental e large in scale, is a complex big data system essentially [5, 6]. and the theoretical ones, the anomaly detection is conducted. S Forsuchasystem,itisabigchallengetoconductEEDwithin Furthermore, we develop 3D power-map to visualize the LES; 5 it providesa robustauxiliary decision-makingmechanism tothe a tolerable elapsed time and hardware resources. 1 operators. In this sense, the proposed method conducts EED The integration of big data analytics and unsupervised with a pure statistical procedure, requiring no knowledge of learningmechanismisaneffectiveapproachtothischallenge. ] system topologies, unit operation/control models, etc. The LES, For the former, big data analytics is a scientific trend dealing E as a key ingredient during this procedure, is a high dimensional with complex data [7, 8]. It is a data-driven tool and aims to M indictor derived directly from raw data. As an unsupervised learning indicator, the LES is much more sensitive than the low work out statistical characteristics (especially correlations) in- t. dimensional indictors obtained from supervised learning. With dicated by linear eigenvalue statistics (LESs) [9]. That means, a the statistical procedure, the proposed method is universal and it conducts data processing in high dimensions, rather than st fast; moreover, it is robust against traditional EED challenges builds and analyzes individual models based on assumptions [ (such as error accumulations, spurious correlations, and even bad data in core area). Case studies, with both simulated data and simplifications, to help understand and gain insight into 2 and real ones, validate the proposed method. To manage large- the systems [10]. Big data analytics has already been suc- v scale distributed systems, data fusion is mentioned as another cessfully applied in numerous phenomena, such as quantum 0 data processing ingredient. systems [11], financial systems [12], biological systems [13], 6 0 Index Terms—big data, early event detection, data-driven, wireless communication networks [10, 14, 15]; we believe 0 unsupervised learning, smart grid, random matrix model, linear that it will also have a wide applied scope in power systems 0 eigenvalue statistics, 3D power-map, data fusion [9, 16, 17]. For the latter, the supervised learning methods 2. are prevailing in data processing. The key parts are the in- 0 I. INTRODUCTION ferredfunctionsandempiricalmodels;thesefunctions/models, 5 produced via a subjective training procedure (often in low 1 DATAhavebecomeastrategicresourceforpowersystems. dimensions), lead to a determinate parameter as the system : Data are readily accessible caused by developments of v indicator [18]. However, for a complex system, it is hard to various technologies and devices, such as Information Com- i findaconvincingtrainingwaytoensurethevalidityofthelast X munication Technology (ICT), Advanced Metering Infrastruc- indicator; besides, it is impossible to train all the scenarios to r ture (AMI), Phasor Measurement Units (PMUs), Intelligent a inferaneventidentificationframeworkwhichisrobustenough Electronic Devices (IEDs), Supervisory Control and Data to manage all the 4Vs data. In contrast, the unsupervised Acquisition (SCADA) [1]. As a result, data with features of method proposed in this paper utilizes the data in the form volume, velocity, variety, and veracity (i.e. 4Vs data) [2], as of random matrix models (RMMs), which are derived from well as curses of dimensionality [3], are inevitably generated the raw data in a statistical manner. Hence, the unsupervised and daily aggregated in power systems. methods is more suitable for EED in smart grids. Early event detection (EED), by continuously monitoring and processing 4Vs data, detects and identifies emerging A. Contribution This work was partly supported by National Natural Science Foundation ofChina(No.51577115). Thispaperproposesanoveldata-drivenunsupervisedlearn- XingHe,RobertC.Qiu,YinshuangCao,QianAi,JieGu,andZhijianJin ing method to conduct EED in smart grids, and a comparison are with the Department of Electrical Engineering, Research Center for is made with supervised ones (e.g. a dimensionality reduction Big Data Engineering Technology, State Energy Smart Grid R&D Cen- ter, Shanghai Jiaotong University, Shanghai 200240, China. (e-mail: hex- method based on Principal Component Analysis (PCA) [19]). ing [email protected]) First, random matrix theories (RMTs) are briefly introduced Robert C. Qiu is also with the Department of Electrical and Computer as the solid mathematic foundations for RMMs. Built on Engineering, Tennessee Technological University, Cookeville, TN 38505, USA.(e-mail:[email protected]) RMMs, two major data processing ingredients—LES designs 2 and data fusion—are systematically studied. 1) LESs are high based on principal component analysis (PCA) [19]. However, dimensionalstatisticalindicators;withdifferenttestfunctions, itisasupervisedlearningmethod;thebadsubspace(i.e.,linear LESs gain insight into the systems from different perspec- combinations of the labelled data from several empirical cho- tives. Moreover, some theoretical values related to LESs are senPMUsnamedaspilotPMUs),causedbyimpropertraining predictable as the reference points via the latest theorems. procedure, will lead to bad results. Although many researches Additionally,3Dpower-mapisdevelopedtovisualizetheLES (especially those methods based on specific physical models) forthedecision-makingauxiliaryfunctions.Thisvisualization were done in various fields, little attention has been paid issensitivetoevents—itisabletodetectserioussystemfaults, to the data-driven unsupervised learning methods, which are aswellassomesmallfluctuations;moreover,thevisualization based on solid mathematical foundations and with universal isrobustagainstbaddata—evenwithdatalossinthecorearea, statisticalprocedures.Additionally,relatedmathematicalwork we can still achieve the proper judgements. 2) Besides, data is introduced in Sec II. fusion, by putting together diverse data sources, provides us a comprehensive view towards systems. It is a deep research; II. RANDOMMATRIXTHEORIES we just give a brief mention. In general, based on RMTs and The nomenclature is given as Table I. the big data applying architecture [6], this paper presents a series of work associated with EED: the theorems of LESs, TABLE I: Some Frequently Used Notations in the Theories a briefly discussion about data fusion, the indicators derived Notations Meanings from LESs, and the visualization of the results. Case studies, X,x,x x amatrix,avector,asinglevalue,anentryofamatrix with both simulated data and real ones, validate the proposed Xˆ,xˆ,xˆ, i,j hat:rawdata method, and related theories and theorems. X˜,x˜,x˜,Z˜ tilde:intermediatevariables,formedbynormalization N,T,c thenumbersofrowsandcolumns;c=N/T CN×T N×T dimensionalcomplexspace B. Related Work Xu thesingularvalueequivalentofX˜ S covariancematrixofX:S= 1XXH ∈CN×N 1) Our Previous Work: Paper [6] is the first one; it is N M anotherformofcovariancematrixofX:M=cS also the first attempt to apply big data analytics systemati- Z,L Lindependentmatricesproduct:Z=(cid:81)Li=1Xu,i cally to power systems. It provided a feasible architecture as λS,λZ˜,λM theeigenvalueofmatrixS,Z˜,M the approach with two independent procedures—engineering λS,i thei-theigenvalueofmatrixS r thecircleradiusonthecomplexplaneofeigenvalues procedure and mathematical one. Random matrix theories τ lineareigenvaluestatistics (RMTs), such as Ring Law and M-P Law, were elaborated τMSR meanvalueofradiusforalleigenvaluesofZ˜:µ(rλZ˜) as the solid foundations; Mean spectral radius (MSR) was ϕ,ϕ thetestfunctionanditsFouriertransformation (cid:98) proposed to indicate the statistical correlations (now we know X randomvariable E(X),D(X) expectation,varianceforX that MSR is a specific LES). Specifically, moving split- µ(x),σ2(x) mean,varianceforx window (MSW) technology was introduced to deal with data X◦ X−E(X) in temporal dimensions; in spatial dimensions, block calcu- κi i-thcumulantofarandomvariableX lation was presented. Simulated data in various fields were [ζ(θ)]θθ==θθ21 ζ(θ1)−ζ(θ2) studiedascases.Thenwemovedforwardtothesecondstage. Paper [20], based on the first one (specially, Ring Law, M-P Law, the architecture, MSR, MSW, the simulated 118-buses A. From Physical System to Random Matrix system), talked about the correlation analysis approaches. An For a system, we assume t times observation for n- augmentedmatrixwasstudiedasthekeyingredient.Actually, dimensionalvectorsxˆ ,xˆ ,··· ,xˆ (xˆ ∈Cn×1,j=1,···,t),and augmented matrix is a form of data fusion—it consists of adatasource,denoted1as2Ω(instizejofn×t),isobtained.Ωis two data sources with totally different sizes and meanings: in a high-dimensional space but not an infinite one (Or more the status one for the basic part, and the factor one for the explicitly, we are interested in the practical regime in which augmentedpart.Asthesamestagework,paper[21],usingthe n=100–10000, and t is sufficient large); this disables most 70 nodes network testbed, tested some data fusion ingredients classical tools. In contrast, RMTs enable us to select arbitrary based on RMTs: the product of non-hermitian random matri- data—both in temporal dimensions (e.g. T from t) and in ces,thegeometricmean,thearithmeticmean,andtheproduct spatial dimensions (e.g. N from n)—to form Xˆ ∈CN×T of random Ginibre matrices. With the experimental data, the naturally; Xˆ is a random matrix due to the presence of effectiveness of these data fusion ingredients, is validated for ubiquitous noises. Furthermore, we can convert Xˆ into a signal detection in Massive MIMO systems. normalizedmatrixX˜ row-by-row(see(18));thus,therandom 2) Others’ Work: Data are a core resource and data man- matrix model (RMM) is built to map the system. agement is the keypoint for EED in smart grids. There have Inourpreviouswork[6,20,21],wehavealreadyelaborated been numerous discussions about utilizing PMUs to improve RMTs, Ring Law, M-P Law, and MSR; here we just give wide-areamonitoring,protectionandcontrol(WAMPAC)[22– some key results1. Whereas, the data processing ingredients, 24].Xuinitiatedpowerdisturbancedataanalytics,andshowed a wide scope in the future [1]. The mathematical foundations, 1AlthoughtheasymptoticconvergenceinRMTsisconsideredunderinfinite dimensions, the asymptotic results are remarkably accurate for relatively system frameworks, and applying architectures are missing moderate matrix sizes such as tens. This is the very reason why RMTs can yet.Recently,Xieproposedanearlyeventdetectionalgorithm handlepracticalmassivesystems. 3 including linear eigenvalue statistic (LES) designs and data 2) Law of Large Numbers: fusion, are the major parts; they are elaborated in Sec III. The law of Large Numbers is the first step in studies of eigenvaluedistributionsforacertainrandommatrixensemble. Theresult,fortheWignerensemble,obtainedinitiallyin[30], B. Random Matrix Theories (RMTs) was improved in [31], where the Stieltjes transformation was RMTs are effective to power systems, which has already introducedandthefamoussemicirclelawwasshownunderthe been validated [20]. First, we assume that: for the rectangular minimal conditions on the distribution of W (the Lindeberg N×T random matrix X, the entries are independent identi- type conditions) [29]. The law of Large Numbers tells us that cally distributed (i.i.d.) variables, satisfying the conditions: N−1N [ϕ] converges in probability to the limit N E(X )=0, E(X X )=δ δ σ2 (1) (cid:90) i,j i,j m,n i,m j,n lim 1N [ϕ]= ϕ(λ)ρ(λ)dλ (7) where σ is the variance, and δ is the Kronecker Delta Function: N→∞N N α,β (cid:40) where ρ(λ) is the probability density function (PDF) of the eigen- 1 α=β δ = values. In particular, for the Gaussian orthogonal ensemble α,β 0 α(cid:54)=β (GOE) [28] (see (21),(22), and (23) in the appendix), ρ(λ) is according to the semicircle law: 1) Ring Law: ConsideraLindependentmatricesproductZ =(cid:81)Li=1Xu,i, (cid:40) 1 √4ω2−λ2 λ2 <4ω2 where Xu ∈CN×N is the singular value equivalent [25] of ρsc(λ)= 02πω2 λ2(cid:62)4ω2 (8) X˜ (see (19)); X˜ is obtained directly from raw data Xˆ (see (18)). Furthermore, the matrices product Z is converted into The covariance matrix ensemble is another classical type ; Z˜ (see (20)). Thus, the empirical spectrum density (ESD) of M-P Law, as describe in Sec II, is adapted to this ensemble. Z˜ converges almost surely to the same limit given by The covariance matrix is widely used in engineering due to therectangularform—wecanalsostudytheRMMX ∈CN×T (cid:40) 1 |λ|(2/L−2) , (1−c)L/2 (cid:54)|λ|(cid:54)1 with N(cid:54)=T. We will further discuss this ensemble below. ρ (λ)= πcL (2) 3) Central Limit Theorems (CLTs): ring 0 , otherwise Central Limit Theorems (CLTs), as the natural second step, as N,T →∞ with the ratio c=N/T ∈(0,1]. aim to study the LES fluctuations. Lots of papers devote 2) Marchenko-Pastur Law (M-P Law): to proofs of CLTs for different random matrix ensembles (see [28, 29, 32–36]). CLTs for LESs with polynomial test M-P Law describes the asymptotic behavior of the the functions of some generalizations for the Wigner and covari- covariance matrix: ances matrices were proved in [35] via moment methods. In S= 1XXH ∈CN×N (3) contrast, CLTs for LES with real analytic test functions of T the Wigner and covariances matrices were established in [34] where X is the rectangular N×T non-Hermitian random matrix under additional assumptions that satisfying condition (1). (cid:12) Then, the ESD of S converges to the distribution of M-P (cid:12)E(W 2)=2,E(W 4)=3E2(W 2)=3 Wigner (cid:12) i,i i,j i,j Law [26, 27] with density function: (cid:12)(cid:12)E(Xi,j4)=3E2(Xi,j2) Covariance (cid:40) 1 (cid:112)(a −λ)(λ−a ) , a (cid:54)λ(cid:54)a ρ (λ)= 2πλcσ2 + − − + In the recent paper [28], CLTs for LESs of the Wigner mp 0 , otherwise and covariances matrices were proved under assumptions that (4) E(W 2)=2,thethirdandtheforthmomentsofallentriesare √ i,i where a±=σ2(1± c)2. the same, but E(Wi,j4) is not necessary 3. Moreover, the test 3) Mean Spectral Radius (MSR): functions are not supposed to be real analytic. It was assumed MSR is a high-dimensional indicator. For a specific matrix that the Fourier transformation ϕ satisfies the inequality (cid:98) Z˜ (asdescribedinRingLawpartabove),wecancalculatethe (cid:90) eigenvalues λ on the complex plane. The mean value of all (1+|k|5)|ϕ(k)|dk <∞ Z˜ (cid:98) these eigenvalues’ radii length is denoted as τ : MSR which means that ϕ has more than 5 bounded derivatives. τ =(cid:80)N 1|λ | (5) MSR i=1N Z˜,i B. CLT for Covariance Matrices III. DATAPROCESSINGINGREDIENTS A. Linear Eigenvalue Statistics and Related Research Parallel to (3), we study another typical covariance matrix: 1) Definition: M= 1XXH = 1S∈CN×N (9) N c Thelineareigenvaluestatistic(LES)ofamatrixX ∈CN×N its PDF is also according to M-P Law: isdefinedviathecontinuoustestfunctionϕ:R→C[28,29] (cid:40) 1 (cid:112)(a −λ)(λ−a ) , a(cid:54)λ(cid:54)b NN[ϕ]=(cid:80)Ni=1ϕ(λi) (6) ρmp2(λ)= 02πλσ2 + − , otherwise (10) 4 √ where a =σ2(1±1/ c)2, and σ is the variance. Similarly, we can design other test functions to obtain ± The CLT for M is given as follows [29]: diverse LESs as the indicators, and some theoretical values; here we list some classical test functions: Theorem III.1 (M. Sheherbina, 2009). Consider a rectan- gular N×T non-Hermitian random matrix X, with entries 2. Chebyshev Polynomials: T3 =4x3−3x Xi,j satisfying the condition (1); M is the covariance matrix 3. Chebyshev Polynomials: T4 =8x4−8x2+1 (see (9)). Let the real valued test function ϕ satisfy condition 4. Determinant: DET=ln(x) (cid:107)ϕ(cid:107)3/2+ε < ∞(ε>0). Then NN◦[ϕ], in the limit N,T → 5. Likelihood-ratio function: LRF=x−ln(x)−1 ∞,c=N/T ≤1,convergesinthedistributiontotheGaussian In Sec V, we will show their applications in smart grids. random variable with zero mean and the variance: 2 (cid:90)(cid:90) D. Universality Principle V [ϕ]= ψ2(θ ,θ )(1−sinθ sinθ )dθ dθ SC cπ2 1 2 1 2 1 2 Akin to CLTs, universality [10] refers to the phenomenon −π2<θ1,θ2<π2 thattheasymptoticdistributionsofvariouscovariancematrices + κ4 (cid:32)(cid:90) π2 ϕ(ζ(θ))sinθdθ(cid:33)2 (such of eigenvalues and eigenvectors) are identical to those π2 −π ofGaussiancovariancematrices.Theseresultsletuscalculate 2 (11) the exact asymptotic distributions of various test statistics whereψ(θ ,θ )= [ϕ(ζ(θ))]θθ==θθ12,andζ(θ)=1+1/c+2/√csinθ; withoutrestrictivedistributionalassumptionsofmatrixentries. κ =E(cid:0)X14(cid:1)−2 3 is t[hζ(eθ)4]θθ-t==hθθ12cumulant of entries of X. The presence of the universality property suggests that high- 4 dimensionalphenomenonisrobusttotheprecisedetailsofthe modelingredients[37].Forexample,onecanperformvarious C. LES Designs and Theoretical Values hypothesis tests under the assumption that the matrix entries ForGaussvariableX,E(X)=0,E(X2)=1,andE(X4)=3 not Gaussian distributed but use the same test statistic as in the Gaussian case. (see (24)). A typical scenario is assumed: N=118 and The data of real systems can be viewed as a spatial and T=240, thus c=N/T =0.4917; temporalsamplingoftherandomgraph.Randomnessisintro- 1) LES for Ring Law: duced by the uncertainty of spatial locations and the system MSR(seeSecII)isaspecialLES2;itisdefinedasfollows: uncertainty. Under real-life applications, we cannot expect (cid:88)N 1 the matrix entries follow i.i.d. distribution. Numerous studies τMSR = N|λZ(cid:101),i| (12) based on both simulations [6] and experiments, however, i=1 demonstrate that the Ring Law and M-P Law are universally followed. In such cases, universality properties provide a where λ (i=1,2,...,N) are the eigenvalues of Z(cid:101), and |λ | is Z(cid:101),i Z(cid:101),i crucial tool to reduce the proofs of general results to those the radius of λ on the complex plane. AccordingtZ(cid:101)o,i(7),thetheoreticalexpectationofrwhenN → in a tractable special case—the i.i.d. case in our paper. ∞ (E(τ )), are calculated as follows: MSR E. Data Fusion E(τ )=E(r)=(cid:82)(cid:82) P(r)×r·rdrdθ MSR Area (13) =(cid:82)2π(cid:82)√1 1 r·rdr dθ =0.8645 Data fusion (including the augmentation, the blocking, the 0 1−c cπ sum, and the product of matrices) is another data processing whereP(r)isgiveninformula(2),andc=0.4917forthisscenario. ingredient.ComparingtotheLESdesigns,whichaimtodefine Also, we can calculate the theoretical variances (D(τ )): MSR the LES τ via the test functions ϕ(λ) for a determinate E(r2)=(cid:82)(cid:82) P(r)×r2·rdrdθ =0.7542 X, data fusion manages to handle multiple data sources Area (14) (i.e. X ,X ,···), even with diverse features (e.g. in totally D(τ )=D(r)=E(r2)−E(r)2 =0.0068 1 2 MSR different size). The theories about data fusion are deep and 2) LESs for Covariance Matrices: novel: Go¨tze, Ko¨sters, and et al, in [38–40], have already 1. Chebyshev Polynomials ϕ(λ)=T =2x2−1 studied the performance of the matrices in the form of : 2 N F(1)+···+F(m) (cid:88) n n τ = (2λ 2−1) (15) T2 i whereF(i) =X(i,1)···X(i,k),andX(0,0),··· ,X(i,j),··· ,X(m,k) n n n n n n i=1 are independent n×n matrices with independent entries. For the scenario, according to (7) and (10), we get E(τT2): We will not go so far and just show some typical ap- (cid:90) plications; data fusion is very common and meaningful in E(τT2)=N ϕ(λ)ρmp2(λ)dλ=6600 (16) engineering. Our previous work [20] conducted data fusion as follows: the status data source (a matrix B in size of N×T), and according to (11), we get D(τT2): and the factor data source (a vector c in size of 1×T), in a certainmanner,areputtogethertoformtheaugmentedmatrix D(τ )=1080 (17) T2 A(A=[B;C]);thisA,asanewrandommatrixmodel,isused toconductcorrelationanalysis.Besides,Zhang[21],usingthe 2Since λ are highly correlated random variables (each one is a com- Z(cid:101),i datafroma70nodesnetworktestbed,validatedthedatafusion plicated function of the random matrices X(cid:101)i (i=1,2,...,L)), τMSR is a randomvariable. in the field of signal detection for Massive MIMO systems. 5 IV. UNSUPERVISEDLEARNINGMETHOD Step 6) and step 7) are the executive parts based on data processing procedure—steps 1)-5). The key steps are step 4) Thissectionmakesacomparisonbetweentheunsupervised andstep5);theyconstitutethetrainingprocedure.Bychoosing learningmethods(e.g.RingLawAnalysis)andthesupervised m(cid:48) PMUs(aspilotPMUs)fromtotalN PMUs,theprocedure ones (e.g. PCA) from diverse perspectives. tags the system in a reduced subspace (N→m(cid:48)) ; in this way, the function v is inferred. ci A. Procedure of Ring Law Analysis TheRingLawAnalysisconductsEEDwithfollowingsteps: C. Comparison of Supervised and Unsupervised Methods The supervised learning methods are prevailing among StepsofRingLawAnalysis massive systems. The key idea for these methods is to tag 1)Selectarbitraryrawdata(orallavailabledata)asthedatasourceΩ 2)FormingrandommatrixmodelXˆ atacertaintimeti; the systems—by labelled parameters, principal eigenvectors, 3)ObtainZ˜ byvariabletransformations(Xˆ →X˜ →Xu→Z→Z˜); inferred functions, etc; the detailed steps, by dealing with 4)CalculateeigenvaluesλZ˜ andplottheRingonthecomplexplane; priormodelsanddatainalowdimensionalspace,makeupan 5)Conducthigh-dimensionalanalysis; empirical training procedure. In a word, supervised learning 4a)Observetheexperimentalringandcompareitwiththereferenceone; 4b)Calculateτ astheexperimentalvalue; methods are of ’labelling’ steps based on assumptions and MSR 4c)Compareτ withthetheoreticalvalueNE(r); simplifications. For supervised learning methods, there are MSR 6)Repeat2)-5)atthenexttimepoint(ti=ti+1); some problems, essentially, hard to be solve: 7)Visualizeτ onthetimeseries; MSR 8)Makeengineeringexplanations. a). The error accumulations, spurious correlations, incidental corre- lationsareunavoidableassystemsgrowlarge;theredoesnotexista solid mechanism to eliminate or to relieve them. Steps 2)–7) conduct high dimensional analysis only us- b). The training procedure is not strict. Taking paper [19] for an ing raw data, and then visualize the indicator τ; they are example, it chooses the pilot PMUs in an unconvinced way: the unsupervised statistical proceedings without assumptions and most unrelated ones, rather than the best suitable ones, are chosen. simplifications.Instep2),fordifferentpurposes,arbitraryraw Some additional devices, especially the empirical ones, are essential data, even ones from distributed nodes or intermittent time as mentioned in the paper—e.g., the PMUs of some topologically series, are able to be focused on to form the RMM Xˆ. It andphysicallysignificantbusesshouldbepilotones,whilethePMUs is also an online data-driven method requiring no knowledge whicharehistoricallyeventfulshouldnot.Additionally,theimproper of the physical models/topologies. In addition, the size of setting of the pilot PMUs total number m(cid:48), or the pre-specified Xˆ is controllable during step 2); it relieves the curse of variance threshold ς, will make the result worse. dimensionality in some ways. c).Thebaddatainthecorearea(e.g.theincomplete,theinaccurate, andtheunavailabledata),ortheworstsituation—lossofallthedata B. Procedure of Principal Component Analysis (PCA) of the event area, will almost disable the methods. d).Moreover,itisimpossibletotrainoveralltheeventsorscenarios. PCAisaprevailingdata-drivenmethod[24].Xieproposeda Theremustbesomethingunexpectedinthelargescalesystems,even dimensionalityreductionmethodbasedonPCAasaapproach ofwhichwecannotgiveaproperdescriptioninlowdimensions.Just to EED in smart grids [19]; the steps are listed as follows: a simple intuitive example: we can easily deduce−y =−Y v ci B ci from y =Y v . Whereas, it is hard to make a verdict that the ci B ci StepsofPCA simultaneous reverse of the pilot PMUs data Y and the non-pilot B 21))SCelec=tdYaHtaYY∈=C[yN1,×yN2,,·c·a·lc,yuNlat]e∈λC(Cn×N),;yi=[y1,i,y2,i,···,yn,i]T; PMUs data yci is an anomaly or not. Y Y In general, traditional model-based methods or data-driven 43))RFoeramrramngediamndensseiloencatlthperintocpipaml ceoigmepnovnaelunetss:uλb1sp→acpe1L,·(·p·,,pλm,·→··p,pm;) supervised learning methods are in low dimensions; they and project the original N variables onto it: Select m(cid:48)1(cid:54)2m vectmor- highly depend on only a few parameters related to physical based variables as the pilot PMUs from N PMUs to form the linear models, subjective hypotheses, empirical causality, training basis matrix YB=[yb1,yb2,···,ybm(cid:48)]∈Cn×m(cid:48). The selected m(cid:48) procedure, etc. variablesshouldbeasorthogonaltoeachotheraspossible,whichmeans min(cosθ)=(y ·y )/(|y ||y |) (i,j=1,2,···,m(cid:48); i(cid:54)=j) On the other hand, data-driven unsupervised ones are bi bj bi bi 5)Representnon-pilotPMUsyci (i=1,2,···,N−m(cid:48))fortraining:Let usually based on high-dimensional statistical models which vci=[v1,ci,v2,ci,···,vm(cid:48),ci]T bethevectorofregressioncoefficients: are built on solid mathematical foundations. In other words, yci=(cid:104)(yb1,yb2,···,ybm(cid:48))·(v1,ci,v2,ci,···,vm(cid:48),ci)(cid:105)=YBvci the related theories, algorithms, and statistics (e.g. RMTs (cid:124) (cid:123)(cid:122) (cid:125) and LESs) are always based on probability and designed in Basis a multi-dimensional space. Thus, the unsupervised methods ⇒vci=(YBHYB)−1YBHyci; 6)Trainatts=s−1: tend to analyze the interrelations and interactions (seen as correlations) directly using the raw data without any label; vci(s−1)⇐f(YB(s−1),yci(s−1)); they are fast, sensitive, and universal. Moreover, they are pure statistical data processing in high-dimensions which will not 7)Judgeatt=s: (cid:107)ycis,YBsvci(s−1)(cid:107). bring in systematical errors. It comes to the conclusion that theunsupervisedlearningmethods,withadvantagesdescribed as above, perform better than the supervised ones. 6 V. CASESTUDIES NS=t1=1680, 0T||=V2e4r0= 2 |8|L =1 NS=t1=1680, 1T||=V2e4r0= 2 |8|L =1 1.5 eigenvalue 1.5 eigenvalue In this section, we use both simulated data and real ones 1 rκinMnSeRr==00..7816229975 1 rκinMnSeRr==00..7613269973 to validate the proposed approach. For the simulated case, we 0.5 0.5 adoptthestandardIEEE118-bussystem[41](shownasFigure 10). Detailed information about the simulation is referred to Imag(Z) 0 Imag(Z) 0 the case118.m in Matpowerpackage and Matpower4.1 Users −0.5 −0.5 Manual[42].Therearegenerallytwoscenariosinthesystem: −1 −1 1) only white noises, e.g., small random fluctuations of loads −1−.51.5 −1 −0.5 0 0.5 1 1.5 −1−.51.5 −1 −0.5 0 0.5 1 1.5 Real(Z) Real(Z) and Gaussian sample errors; 2) Signals plus noises, it means (a)RingLawatts=600s (b)RingLawatts=601s that there are also sudden changes, or even serious faults. For the real case, we use a 48-hours database of some power 0.5 T=2S4t0=, 6N0=01, 1h8= 0 a.4=008.12850||0V9e,r =b2=85 .8372 1.4 T=2S4t0=, 6N0=11, 1h8= 0 a.4=008.12850||0V9e,r =b2=85 .8372 ghyripdoitnheCsihsinteas.tAinbgo:vneoramll,althheyEpoEtDhemsisayHbe(mnoodseiglendalasprbeisneanrty) 00.4.45 HMKeiasrrtnoceeglnr aDkmoe−nPsiatys tEusr tLimawat:i ofcnx: fnx 1.2 HMKeiasrrtnoceeglnr aDkmoe−nPsiatys tEusr tLimawat:i ofcnx: fnx 0 0.35 1 and abnormal hypothesis H (signal present). 1 0.3 0.8 PDF0.25 PDF 0.2 0.6 0.15 0.4 A. Simulated 118-bus System 0.1 0.2 0.05 According to From Physical System to Random Matrix in 00 1 2 3 4 5 6 00 1 2 3 4 5 6 Sec II, the data source Ω = vˆ ∈R118×1500 (n=118, (c)M-PLawatts=600s (d)M-PLawatts=601s V i,j t=1500) is obtained to map the simulated system. Tlength=240 N=118 ||Ver=28 1) Conduct EED Based on Ring Law, M-P Law, MSR: 1 X: 600 Fordetailsaboutthiswork,seeourpreviouswork[6].Here, Y: 0.8665 X: 841 X: 1200 we assume the events as Table III; Figure 1 shows the results. Y: 0.8151 Y: 0.8122 X: 240 At sampling time t =600 s, the RMM Xˆ includes a time Y: 0.8646 period ATime=361:60s0 s; the noises play a dominant part XY:: 600.61308 XY:: 1430076 500 during the period. Figure 1a shows that the distribution of λZ˜ ↓XY:: 103.503619 MW) is more closely to the reference ring (L=1); and Figure 1c 0.5 d ( shows that the distribution of λ (in blue bars) fits the M- X: 720 an P distribution (in blue line) quiteMwell. Whereas, at sampling MSR XY:: 630000 Y: 0.4927 Dem time t =601 s, Figure 1b shows that the eigenvalue points X: 1307 er s Y: 0.3478 w collapse to the center point of the circle, which means that MSR of data V Po Load52 thecorrelationsofthedatahavebeenenhancedsomehow;and Figure 1d also shows the deviation between the experimental distribution and the theoretical one. 0 440000 660000 880000 11000000 11220000 11440000 0 Sample Time (s) From τ -t curve in Figure 1e, it is observed that τ MSR MSR (e)MSRonTimeSeries starts to decrease (0.8665,0.6308,··· ,0.4927) at t=600 s, just when the event (sudden change of P ) occurs as the Fig. 1: Ring Law, M-P Law and MSR for Simulated Data V Bus-52 signal.Theinfluencelastsforfulltimelength(T=240 s)and the decreasing lasts for half (120 s); thus, a ”U”-shaped curve TABLE II: LESs and their Values is observed. In this way, we conduct anomaly detection; the time for the beginning point of ”U” (t=600 s for this case) MSR T2 T3 T4 DET LRF is right the anomaly start time. E0:TheoreticalValue 2) LES Designs: E(τ) 0.8645 1.34E3 1.01E4 8.35E4 48.3 73.68 D(τ) 0.0068 6.65E2 9.35E4 1.30E7 1.32 1.42 Designing LESs is a major target in this paper; here, we cv 0.0954 0.0193 0.0304 0.0432 0.0238 0.0162 study diverse LESs with different test functions. Keeping S1:Onlysmallfluctuationsaround0MW T=240 s, we can divide the temporal space into 5 subspaces µ(τ) 0.8648 1.33E3 9.93E3 8.19E4 73.68 73.3 (stages) according to the status of P as follows: σ2(τ) 0.0080 6.53E1 2.20E4 4.67E6 0.406 0.322 Bus-52 S2:Astepsignal(PBus-52:0MW→300MW)isincluded µ(τ) 0.5149 1.29E4 1.92E6 3.04E8 −174 295 TimeAreas(timelength) Descripiton σ2(τ) 0.0788 3.30E6 1.51E11 5.66E15 890 893 S1 241s −− 600s (360s) fluctuationsaround0M S2 601s −− 840s (240s) astepsignal S3:Onlysmallfluctuationsaround300MW S3 841s −− 1200s (360s) fluctuationsaround300M µ(τ) 0.8141 1.54E3 1.73E4 2.89E5 27.9 93.1 S4 1201s −− 1306s (106s) arampsignal σ2(τ) 0.0250 1.81E2 3.30E5 4.20E8 0.507 0.419 S5 1307s −− 1500s (194s) staticvoltagecollapse S4:ARampsignalasthesystemincoming µ(τ) 0.6448 6.43E3 6.40E5 7.54E7 −61.4 182 σ2(τ) 0.0571 7.20E6 1.68E11 3.29E15 1.74E3 1.73E3 TheresultsoftheLESs,designedaccordingtoLESDesigns S5:Staticvoltagecollapseforthesystem and the Theoretical Values in Sec III, are shown as Table II: µ(τ) 0.4136 7.49E3 6.48E5 7.25E7 −598 719 σ2(τ) 0.1076 7.47E6 2.99E11 1.02E15 4.16E4 4.16E4 *c =(cid:112)D(τ)/E(τ) is the coefficient of variation v 7 The above results validate that it is feasible to analyse the T −2=238 sampling points (i.e. t =602:839 s) in the an- s power system using the random matrix model (RMM); the imation. Therefore, we conjecture that some event occurs in upper layer should be ontology and we do not go that far. A3; even we can go further that the event is influential to A1, Here, we just make some engineering statements/analyses: A2,A4,A5,andhaslittleimpactonA6.Theseconjectures,in 1. Independent of the system models/topologies, we can a reasoning way, coincide with the common sense that there design the LES τ. For a RMM X with determinate size is a sudden change in A3 at t=601 s. N ×T, if the test function ϕ(λ) is given, the LES τ is b) For Figure 3, with sustainable growth of power demand at obtained; some related theoretical values (the expectation some bus (P for this case), the whole system becomes Bus-52 E(τ), the variance D(τ), and the coefficient of variation c ) moreandmorevulnerable.Thevulnerabilitycanbeestimated, v are able to be calculated as well. before the system has a breakdown due to voltage collapse, 2.Amongϕ(λ)givenabove,LRFperformsbestintheview via the visualization of η. of c (Low c means high precision and repeatability of the c) Moreover, if the most related data (data of A3 for this v v assay [43]; here, we regard 12% as the upper bound). case) are lost somehow, hardly any information can be gotten 3.Wecanmakeasummaryabouttheperformanceoftheex- by V as Figure 6, whereas the proper judgements can still be perimental values (mean µ(τ) and variance σ2(τ)) comparing achieved by η as Figure 5. to the theoretical ones (expectation E(τ) and variance D(τ)): In general, the combination of high-dimensional indicators a).DuringS1,µ(τ)isclosetoE(τ);σ2(τ)ismuchlessthanD(τ); (e.g. η) and 3D power-map is really a novel and feasible b). During S3, µ(τ) has a little bias (i.e. |µ(τ)−D(τ)|), but more approach to EED in smart grids. (cid:112) than S1; σ2(τ) is acceptable ( σ2(τ)/µ(τ)<12%); c).ForS2,S4,andS5,µ(τ)hasmuchmorebias;σ2(τ)isalways C. Conduct EED Using Real Data too big to be accepted; d). variance σ2(τ) is much more sensitive than mean µ(τ). This case is based on the operation data of a certain Then, we can conjecture that: interconnected power grid in China; the data lists are shown a). The more stable the system is, the more effective the theoretical as Figure 11. The data source includes data of substations, values become (µ(τ) is close to E(τ); σ2(τ) is less than D(τ)); breakers, lines, buses, generators, frequencies, etc. These data b). Different test functions ϕ(λ) have different characteristics and are from 6 administrative regions—5 provincial ones and 1 functions.Inthissense,wecanbalancethereliabilityandsensitivity directly affiliated one. The sampling frequency is once per for anomaly detection in a special system; minute and the sampling lasts for 3 days (4320 minutes). c).Inaddition,atestfunctionisakintoafilterinsomesense;beyond 1) Model—Size 42×90, voltage data, τ : MSR event detection (distinguish signals from noises), it has the potential Weonlyfocusonaregionalbusvoltagedatawith42nodes; to trace a specific anomaly (distinguish the signal from others); thus, the data source Ω is of 42×4320=181440 sampling d).Foraspecialpurpose,e.g.thelowestcv orthelowestbias,there voltagedata.Here,wechooseτ astheindicator.TheRMM MSR should exist an optimal combination of the Chebyshev Polynomials Xˆ at each sampling point is modeled in size of 42×90; the as the test function. timelengthofthesplit-windowis90(T=90m,andT/2=45 m). Figure 7 shows the result. B. AdvantagesofLESandVisualizationUsing3DPower-Map Tlength=90 N=42 Block=3||Ver=165 ALESisanindicatorinhighdimensions;itsvaluedepends 1 on a wide sample data in the form of the entries of the RMM Inner Tlag=Tlength/2=45Min Tstart Tend MSR ≈45Min Xˆ (Sec IV). As a result, LES τ, as a statistical indicator, is 0.8 Mon Tue Wed sensitiveandrobustagainstbaddata;theseadvantageswillbe 10:52 6:56 10:51 6:51 10:47 validated via the visualization of τ using 3D power-map later. 6:55 0.6 For this case, we keep the hypothetical scenario as Table III, and take τ as the indicator. We obtain the regional τ LRF LRF 0.4 for each region3 in a similar way to that for the whole system (118 nodes); thus, the τ -t curves are plotted as Figure 2. X: 4687:48 X: 19007:40 XY:: 303.3255647:25 We denote η=τLRF/ELR(FτLRF) as the high-dimensional indi- 0Y.2: 0.308711:51XY:: 701.2132Y7: 0.329711:47XY:: 201.240799 11:47 X: 3580 cator. For each time point, with an interpolation method [45], Y: 0.1826 a 3D map is able to be plotted. Figure 3, 5 depict some key 0 500 1000 1500 2000 2500 3000 3500 4000 framesofthe3Dpower-mapanimationwithη,whereasFigure Sample Time (Min) 4, 6 with the raw data V: Fig. 7: τ -t Curve for Regional Bus Voltage Data MSR a) For Figure 3, at time t=601 s, η of the area around A3 changes relative dramatically; this trend last for the next In Figure 7, we can extract some half ”U”-shaped curves 3thisdivisiondependsonthespecificnetworkstructure;however,apoten- accordingtothelegends(theredforstartpointsandthepurple tial problem lies here: how small the partition (see Figure 10) can be that for end ones). As our analyses above, the start points corre- keepsthetheoriesstillwork.Tofindtheanswerweshouldstudyhowtoturn spondtotherighttimewhentheeventoccurs.Theyarealmost big data into tiny data [44]; it is another topic and we do not expand here. Theoretically,thebiasofτ iscloselyrelatedtothesizeofRMMsXˆ (more t=06:55 , maybe for the working start, and t=10:50 , specifically,N) maybe for the lunch break, respectively. 8 Tlength=240 N=11 ||Ver=65 Tlength=240 N=33 ||Ver=66 Tlength=240 N=19 ||Ver=67 300 400 300 A1 Theoretical Value A2 Theoretical Value A3 Theoretical Value Experimental Value Experimental Value Experimental Value Load52 600 Load52 600 Load52 600 2LRF of data V00 Power Demand(MW) 2LRF of data V00 Power Demand(MW) 2LRF of data V00 Power Demand(MW) 200 200 200 150 440000 660000 880000 11000000 11220000 11440000 0 100 440000 660000 880000 11000000 11220000 11440000 0 150 440000 660000 880000 11000000 11220000 11440000 0 Sample Time(s) Sample Time(s) Sample Time(s) Tlength=240 N=9 ||Ver=68 Tlength=240 N=25 ||Ver=69 Tlength=240 N=21 ||Ver=70 230 A4 Theoretical Value 400 A5 Theoretical Value 250 A6 Theoretical Value Experimental Value Experimental Value Experimental Value 22LRF of data V1205 Load52 26Power Demand(MW)0000 2LRF of data V00 Load52 26Power Demand(MW)0000 2LRF of data V00 Load52 5Power Demand(MW)00 205 100 200 440000 660000 880000 11000000 11220000 11440000 0 100 440000 660000 880000 11000000 11220000 11440000 0 150 440000 660000 880000 11000000 11220000 11440000 0 Sample Time(s) Sample Time(s) Sample Time(s) Fig. 2: τ -t Curve for Single Partitioning: A1–A6 LRF A1 A2 A5 A3 A6 A4 (a)ts=600s (b)ts=601s (c)ts=720s (d)ts=1198s (e)ts=1250s (f)ts=1350s Fig. 3: Visualization of the high-dimensional indictor η with Full Data Sets (a)ts=600s (b)ts=601s (c)ts=720s (d)ts=1198s (e)ts=1250s (f)ts=1350s Fig. 4: Visualization of the Voltage V with Full Data Sets A1 A2 A5 A6 A4 (a)ts=600s (b)ts=601s (c)ts=720s (d)ts=1198s (e)ts=1250s (f)ts=1350s Fig. 5: Visualization of the high-dimensional indictor η without Data Set of A3 (a)ts=600s (b)ts=601s (c)ts=720s (d)ts=1198s (e)ts=1250s (f)ts=1350s Fig. 6: Visualization of the Voltage V without Data Set of A3 9 2) Model— T=90 m, power flow data, τ : method for early event detection (EED). Based on random LRF Wefocusonthepowerflowdata(687lines);wechooseτ matrix theories (RMTs) as our mathematical foundations, the LRF and keep T=90 m. Figure 8 shows the results; some special linear eigenvalue statistics (LESs) are proposed as the key time points, such as t=4:44 and t=10:42 , are able to be indicators.Inaddition,wecomparetheproposedmethodwith observed.Foreachadministrativeregion,weconductasimilar a supervised one (a dimensionality reduction method based procedure and Figure 9 shows the results. on PCA). Case studies, with both simulated data and real ones, validate the effectiveness and the advantages of the η=τLRF/E(τLRF) Global Observation (A0): Tlength=90, N=687 proposed method. Especially, its robustness against bad data 1.6 is highlighted—the combination of 3D power-map and high- 1.55 dimensional indicators, even with data source loss in the core 1.5 area, is still able to conduct EED effectively. 1.45 This work is another attempt towards applying big data to 11.3.45 X: 654 10:54 XY:: 210.48023 10:42 XY:: 315.31592 p(RowMeMr ss)ysatreemas.pIrtopmearintolyolatrogureevsetahlapthryasnidcoaml symstaetmrixs imnohdieglhs Y: 1.371 15:05 1X0: :337963 14:43 dimensionalperspectives.Thisenablesustominethepotential 1.3 X1:6 9:6000 4:44 XY:: 213.34255 5:06X: 3Y1:8 16.34 19:1XY4:: 410.33241 values from the complex 4Vs data resources efficiently. Three 1.25 4:43 Y: 1.3 X: 1724 Y: 1.303 X: 283 Y: 1.277 mainstepsareessentialastheprocedure:1)tobuildtheRMM 1.2 Y: 1.253 Sample Time (Min) with raw data; 2) to conduct high-dimensional analyses with 0 500 1000 1500 2000 2500 3000 3500 4000 statistical transformations; 3) to interpret the results to human Fig. 8: η-t Curve for Global Power Flow Data beings. The proposed data-driven unsupervised approach is universal,asitmainlyrelyonrawdatawhichareindependent 25η=τLRF/E(τLRF) Distributed Observation: Tlength=90 of empirical models or labelled parameters; it is also sensitive since it uses the high dimensional statistics as the indicators. Moreover,forsomedataprocessingingredients,thetheoretical 20 values are predictable by the latest theorems. However, our findings only make up a tiny fraction of 15 A0:N=687 high-dimensional analytics. More problems need to be further A1:N=100 A2:N=69 A3:N=15 A4:N=212 A5:N=64 A6:N=227 studied: 1) how to design the test function as a filter to 10 detectpotential anomaliesin realsystems(SecIII, V);2) how to choose data source or make data fusion to meet certain 5 engineeringrequirements(SecIII);3)howtoexplainthehigh- dimensional findings to human beings (Sec IV, V); 4) how 0 0 500 1000 1500 Sample Time (Min) 3000 3500 4000 4500 to turn big data into tiny data (Sec V). One wonders if this Fig. 9: η-t Curves for Distributed Power Flow Data direction will be far-reaching in years to come toward the Big Data Age for smart grids. In Figure 8, we regard the bottom part (the purple) but not thetopasthestarttime.Thatisbecausethe”U”-shapedcurves APPENDIXA for the τMSR (Figure 1) and the τLRF (Figure 2) are reversed. FORMULASFORRMT 3) Summary: A. Convert Xˆ ∈CN×T to X˜ ∈CN×T: From both models above, some key time points, as well as the daily periodicity, are able to be observed. The bus voltage and the power flow have different engineering mean- x˜ =σ(x˜i)(xˆ −µ(xˆ ))+µ(x˜ ),1(cid:54)i(cid:54)N (18) i σ(xˆ ) i i i ings essentially: the former is closely related to local loads, i whereas the later is to power exchanges between connected where xˆ =(xˆ ,xˆ ,··· ,xˆ ) and µ(x˜ )=0, σ2(x˜ )=1. regions. This may be the reason why their key time points i i,1 i,2 i,T i i are different. Moreover, we believe that the information loss during the processing proceeding in high dimensions should B. Convert X˜ ∈CN×T to Xu ∈CN×N: be much less than in low dimensions, even negligible with (cid:113) a proper designed LES (the LES τ is almost arbitrary; it X = X˜X˜HU (19) is only related to the test function ϕ(λ)). In our opinions, u the integration of statistic analyses in high dimensions and where U∈CN×N is a Haar unitary matrix; X X H ≡X˜X˜H. u u engineeringexplanations(seeProcedureofRingLawAnalysis in Sec IV) are the two procedures to mine the potential values from the complex 4Vs data. More work (including more real C. Convert Z ∈CN×N to Z˜ ∈CN×N: data) are essential for further mining. √ z˜ =z /( Nσ(z )) ,1(cid:54)i(cid:54)N (20) VI. CONCLUSION i i i This paper proposes a data-driven unsupervised learning where z =(z ,z ,··· ,z ), Z =(cid:81)L X . i i,1 i,2 i,N i=1 u,i 10 APPENDIXB 2) (cid:82)∞e−t2dt, (cid:82)∞t2e−t2dt: 0 0 DEFINITIONOFTHEGAUSSIANORTHOGONALENSEMBLE (GOE) Suppose: (cid:90) ∞ This is a real symmetric n×n random matrix: a= e−t2dt 0 M=n−1/2W, W={W ∈R,W =W }n (21) j,k j,k k,j j,k=1 then: (cid:90) ∞ (cid:90) ∞ defined by the probability law: a2 = e−t2dt e−u2du Zn−11e−TrW2/4ω21(cid:54)j(cid:89)(cid:54)k(cid:54)n dWj,k (22) =(cid:90)0(cid:90) e−(t2+u20)dtdu=(cid:90) π2 dθ(cid:90) ∞e−r2rdr 0 0 where Z is the normalization constant. Since: t,u≥0 n1TrW2 = (cid:88) Wj,j2+2 (cid:88) Wj,k2 = π212(cid:90) ∞e−tdt= π212(cid:2)−e−t(cid:3)∞0 = π4 0 1(cid:54)j(cid:54)n 1(cid:54)j<k(cid:54)n so: √ theaboveimpliesthat{Wj,k}1(cid:54)j(cid:54)k(cid:54)n areindependentGaus- (cid:90) ∞e−t2dt= π (25) sian random variables such that: 2 0 E(W )=0 E(W 2)=ω2(1+δ ) (23) Because: j,k j,k j,k (cid:90) ∞ (cid:104) (cid:105)∞ (cid:90) ∞ a= e−t2dt= te−t2 − tde−t2 APPENDIXC 0 0 0 (cid:90) ∞ EXPECTATIONOFTHERANDOMVARIABLEWITH =− t(−2t)e−t2dt GAUSSIANDISTRIBUTION 0 (cid:90) ∞ X is a random variable with standard normal distribution; =2 t2e−t2dt it is described by the probability density function (PDF): 0 so: √ f(x)= √1 e−x22 (cid:90) ∞t2e−t2dt= a = π (26) 2π 2 4 0 1) E(X),E(X2),E(X4): APPENDIXD (cid:90) ∞ E(X)= xf(x)dx=0 11G 2 A1 A2 A340G 4142G 53 54G 56G 55 G 59 E(cid:0)X2(cid:1)===(cid:90)−2−−√√√∞∞∞122xπ2(cid:90)(cid:18)√∞(cid:104)12xπee−−e−xx222x2/2d2d√(cid:105)xx∞−∞=−(cid:90)−(cid:90)∞∞−∞∞x2e√−12x2π2d−x1x(cid:19)de−x22 5A3829GG42289G161G311761213G1G1G332113710G71145G18G221109 3254 3G3 7324GGG379G7313G6703387434G644474569G545281G62186G66547G590 558167656266G3G46601 AG4 = √2 2(cid:90)π ∞0e−t2dt 2 10G27G 114 11526 G25G 2223 74G75 118 G 767880 8719 G GG99 106A6 2π√0π A2 84A583 7872 G959976 94 98G100104G105G G107 E(cid:0)X4(cid:1)==(cid:90)√π∞ x24√=11e−x22dx=(cid:90) ∞ x4√1 1 de−x22 85G 8867G888990GG 9291GG93 10A21601 10G3G111110G11001891G2 2π 2π−x −∞ −∞ Fig. 10: Partitioning network for the IEEE 118-bus system. =−√1 (cid:18)(cid:104)x3e−x2/2(cid:105)∞ −(cid:90) ∞ e−x22dx3(cid:19) There are six partitions, i.e. A1, A2, A3, A4, A5, and A6. 2π −∞ −∞ =2√3 2√2(cid:90) ∞e−x22 x2d√x APPENDIXE 2π 2 2 0 12 (cid:90) ∞ = √ e−t2t2dt TABLE III: Series of Events π √0 = √12π 4π =3 PPABURS\-5A2T(IMMEW) [0001:600] [360001:1200] [t1−20910:01500] (24) *PBus-52 is the power demand of bus-52 (cid:51)(cid:39)(cid:41) (cid:3)(cid:5)(cid:83)(cid:71)(cid:73)(cid:41)(cid:68)(cid:70)(cid:87)(cid:82)(cid:85)(cid:92)(cid:3)(cid:51)(cid:85)(cid:82)(cid:5)(cid:3) (cid:90)(cid:90)(cid:90)(cid:17)(cid:73)(cid:76)(cid:81)(cid:72)(cid:83)(cid:85)(cid:76)(cid:81)(cid:87)(cid:17)(cid:70)(cid:82)(cid:80)(cid:17)(cid:70)(cid:81)