ebook img

Nuclear Norm Subspace Identification (N2SID) for short data batches PDF

0.33 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Nuclear Norm Subspace Identification (N2SID) for short data batches

Nuclear Norm Subspace Identification (cid:63) (N2SID) for short data batches Michel Verhaegen∗ Anders Hansson∗∗ ∗Delft Center for Systems and Control Delft University Delft, The Netherlands ∗∗Division of Automatic Control Linköping University Linköping, Sweden Abstract: Subspace identification is revisited in the scope of nuclear norm minimization methods. It is shown that essential structural knowledge about the unknown data matrices 4 in the data equation that relates Hankel matrices constructed from input and output data can 1 beusedinthefirststepofthenumericalsolutionpresented.Thestructuralknowledgecomprises 0 the low rank property of a matrix that is the product of the extended observability matrix and 2 thestatesequenceandtheToeplitzstructureofthematrixofMarkovparameters(ofthesystem n in innovation form). The new subspace identification method is referred to as the N2SID (twice a theNofNuclearNormandSIDforSubspaceIDentification)method.Inadditiontoincludekey J structural knowledge in the solution it integrates the subspace calculation with minimization 7 of a classical prediction error cost function. The nuclear norm relaxation enables us to perform 1 such integration while preserving convexity. The advantages of N2SID are demonstrated in a numerical open- and closed-loop simulation study. Here a comparison is made with another ] Y widely used SID method, i.e. N4SID. The comparison focusses on the identification with short data batches, i.e. where the number of measurements is a small multiple of the system order. S . s c Keywords: Subspace system identification, Nuclear norm optimization, Rank constraint, Short [ data batches 1 1. INTRODUCTION eredinthispaper.Thelackofconvexitycanresultinthat v 3 the optimization method get stuck in a local minimum, 7 System identification is a key problem in a large number and thereby complicating the analysis of the numerical 2 of scientific areas. Generally there are two families of sys- results, such as e.g. the difficulty to distinguish between a 4 tem identification methods: (1) prediction error methods badmodelestimateduetoalocalminimumorduetoabad 1. and (2) subspace methods, Ljung (1999); Verhaegen and model parametrization. This parametrization needs to be 0 Verdult (2007). Either of these approaches can be treated givenbeforestartingtheparameteroptimizationproblem, 4 in the time or frequency domain, and for the sake of and thus the use of the approach can be quite complex 1 simplicity we restrict ourselves to the time domain in this and labor intensive for the non-expert user. However, the : paper. latter fact has been greatly relieved by computer added v software packages such as in Ljung (2007). i The central theme in prediction error methods is to X parametrize the predictor (observer) to generate an es- Motivated by the drawbacks of prediction error methods, r a timate of the output and then formulate an optimization the goal with subspace identification methods is to derive problem to minimize a (weighted) cost function defined approximate modelsratherthanmodelsthatare“optimal” on the difference between the measured output and the withrespecttoachosencostfunction.Theapproximation observerpredictedoutput.Thiscostfunctiongenerallyisa is based on linear algebra transformations and factoriza- sampleaverage(forthefinitedatalengthcase)ofthetrace tions with structured Hankel matrices constructed form of the covariance matrix of the prediction error. Though theinput-outputdata.Allexistingsubspaceidentification the prediction error framework provides a vast amount of methods aim to derive a low rank matrix from which key insightsinstudyingandanalyzingtheestimatedpredictor, subspaces, hence the name subspace identification, are its main drawback is the non-convexity for general multi- derived. The low rank approximation is in general done variable state space models in innovation form, as consid- using a singular value decomposition (SVD). Recently a new family of subspace identification methods was pre- (cid:63) Part of the research was done while the first author was a sented that use the nuclear norm instead of a SVD in Visiting Professor at the Division of Automatic Control, Depart- order to improve the low rank approximation, Liu and ment of Electrical Engineering, Linköping University, Sweden. This Vandenberghe (2009a,b); Mohan and Fazel (2010); Fazel work was partially supported by the European Research Council et al. (2012); Hansson et al. (2012); Liu et al. (2013). Advanced Grant Agreement No. 339681. Corresponding Author: [email protected]. (cid:26) The drawback of Subspace Identification (SID) is two- x(k+1) = Ax(k)+Bu(k)+Ke(k) (1) fold.Firstgeneralsubspaceidentificationmethodslackan y(k) = Cx(k)+Du(k)+e(k) optimization criterion in the calculation of the predictor. with x(k) ∈ Rn,y(k) ∈ Rp,u(k) ∈ Rm and e(k) a zero- This drawback is relaxed in a number of recent nuclear mean white noise sequences with covariance matrix R . norm based SID methods that regularize the low rank e approximationproblemwithasampleaverageofthetrace Wewillconsiderapossibleopenorclosedloopscenarioin of the covariance matrix of the prediction error. The whichtheinputandoutputdatasequencesaregenerated. second drawback is that the low rank approximation is Also the case of output only identification is considered, not performed in the first step of the algorithm. As a making the approach outlined in this paper a general and consequence,thislowrankdecompositiondoesnotoperate new framework to identify linear dynamical systems. with the original input-output data but with approximate processeddata.Thelatterapproximationdestroysthelow The problem considered can be formulated as follows: rank property and especially for short data batches, since Given the input-ouput (i/o) data batches mostoftheSIDschemesonlyprovideconsistentestimates. {u(k),y(k)}N that are generated with a sys- k=1 As it is known that exploiting structure (that is present tem belonging to the class of LTI systems as in the model) is beneficial, especially when working with represented by (1), the problem is using short short data batches, we review the derivation of subspace length i/o data (with N larger but of the order identification in order to be able to deal with two key of n) to estimate the system order, denoted structure properties in the data equation used by SID by nˆ, and to determine approximate system methods. The data equation is a relationship between the matrices (AˆT,BˆT,CˆT,Dˆ,KˆT) that define the Hankel matrices constructed from the input and output observer: data, respectively. The key structural properties are the xˆ (k+1) = Aˆ xˆ (k)+Bˆ u (k)+ low rank property and the Toeplitz structure of unknown  T T(cid:16)T T v (cid:17) Kˆ y (k)−Cˆ xˆ (2) model dependent matrices in the data equation. The key T v T T in the derivation of the new scheme is that both these  yˆ (k) = Cˆ xˆ (k)+Dˆu (k) v T T v structural properties are invoked in the first step of the with xˆ (k) ∈ Rnˆ, such that the approximated algorithm.ThealgorithmisabbreviatedbyN2SID,stand- T output yˆ (k) is close to the measured output ing form Nuclear Norm (two times N, i.e. N2) Subspace v Identification. yv(k) of the validation pair {uv(k),yv(k)}Nk=v1 as expressed by a small value of the cost func- The paper is organized as follows. In Section 2 the identi- tion, ficationproblemforidentifyingamultivariablestatespace modelinasubspacecontextwhiletakingapredictionerror 1 (cid:88)Nv (cid:107)y (k)−yˆ (k)(cid:107)2 (3) costfunctionintoconsiderationispresented.Theproblem N v v 2 v k=1 formulation does however not require a parametrization of the system matrices of the state space model as is The quantitative notions like “small” and “approximate” classicallydoneinpredictionerrormethodsforparametric will be made more precise in the new N2SID solution input-outputmodels,Ljung(1999).Thekeydataequation toward this problem. It should be remarked that contrary in the analysis and description of subspace identification tomanyearlierSubspaceIdentification(SID)methods,the methods is presented in Section 3. Here we also highlight present problem formulation explicitly takes a prediction the important structural properties in the submatrices of error cost function like (3) into consideration. this equation, the low rank property of the matrix that is A key starting point in the formulation of subspace meth- the product of the extended observability matrix and the ods is the relation between (structured) Hankel matrices state sequence of the underlying state space model (given constructedfromthei/odata.Thisrelationshipwillasde- in innovation form) and the (lower triangular) Toeplitz fined in Verhaegen and Verdult (2007) be the Data Equa- structure of the model Markov parameters. A convex re- tion. It will be presented in the next section. Here we will laxation is presented in Section 4 to take these structural also highlight briefly how existing subspace methods have constraintsintoconsideration.Somepreliminaryresultsof missed to take important structural information about theperformancesareillustratedinSection5inacompari- the matrices in this equation into account from the first sonstudyofN2SIDwithN4SIDforbothopenandclosed- stepsofthesolution.N2SIDwillovercometheseshortcom- loopidentificationproblemswithshortdatabatches.Here ings. Taking these structural properties intoconsideration weconsidertheidentificationofsecondordersystemswith makes N2SID attractive especially for identifying MIMO just 50 data points. Finally, we end this paper with some LTI models when only having short data batches. concluding remarks. 3. THE DATA EQUATION AND ITS STRUCTURE 2. THE SUBSPACE IDENTIFICATION PROBLEM Let the LTI model (1) be represented in its so-called Insystemidentificationachallengingproblemistoidentify observer form: Linear Time Invariant (LTI) systems with multiple inputs (cid:26) and multiple outputs using short length data sequences. x(k+1) = (A−KC)x(k)+(B−KD)u(k)+Ky(k) Takingprocessandmeasurementnoiseintoconsideration, y(k) = Cx(k)+Du(k)+e(k) themodeloftheLTIsystemcanbewritteninthefollowing (4) innovation form: We will denote this model compactly as: (cid:26)x(k+1) = Ax(k)+Bu(k)+Ky(k) the term CAs−1X. This is based on the fact that A is (5) y(k) = Cx(k)+Du(k)+e(k) asymptotically stable and the assumption that s is chosen large enough. Then in a later step the estimated ARX withAtheobserversystemmatrix(A−KC)andB equal parametersareusedtodefineamatrixthatasymptotically to(B−KD).Thoughthispropertywillnotbeusedinthe (both in terms of s and N) is of low rank n. Other sequel,thematrixAcanbeassumedtobeasymptotically recent methods that make use of the application of the stable. nuclear norm, try to enforce the low rank property to an For the construction of the data equation, we store the already pre-processed matrix. For example in Liu et al. measured i/o data in (block-) Hankel matrices. For fixed (2013)thecaseofmeasurementnoisewasconsideredonly, N assumedtobelargerthentheordernoftheunderlying and in a pre-processing step the input Hankel matrix U s system, the definition of the number of block-rows fully is annihilated from the data equation by an orthogonal defines the size of these Hankel matrices. Let this dimen- projection. sioning parameter be denoted by s, and let s > n, then Though statistical consistency generally holds for the ex- the Hankel matrix of the input (similarly for the output) isting subspace identification schemes, they refrain from is defined as: exploiting key structural matrix properties in the data u(1) u(2) ··· u(N −s+1) equation in the first step of the algorithm. Such struc- . u(2) u(3) ..  tural information may be key when dealing with short U =  (6) s  ... ...  dsuabtaspbaacetcihdeesn.tTifihcearteiofonrainndthfoermneuxltatseectthieonnewweNw2ilSlIDrevaipse- u(s) u(s+1) ··· u(N) proach. The Hankel matrix defined from the output y(k) and the innovation e(k) are denoted by Y and E , respectively. 4. N2SID s s The relationship between these Hankel matrices, that readily follows from the linear model equations in (5), 4.1 Pareto optimal Subspace Identification require the definition of the following structured matrices. First we define the (extended) observability matrix Os. From the data equation (10) it follows that the minimum OT =(cid:104)CT ATCT ··· ATs−1CT(cid:105) (7) variancepredictionoftheoutputequalsyˆ(k)=y(k)−e(k). s Let the Hankel matrix Yˆ be defined from this sequence s Second,wedefineaToeplitzmatrixfromthequadrupleof yˆ(k) as we defined Y from y(k). Then the data equation s systems matrices {A,B,C,D} as: becomes:  D 0 ··· 0 Yˆ =O X+T U +T Y (11) s s u,s s y,s s CB D 0 Let T denote the class of lower triangular (block-)   p,m Tu,s = ... ...  (8) Toeplitzmatriceswithblockentriesp×mmatricesandlet H denotetheclassof(block-)Hankelmatriceswithblock CAs−2B ··· D p entries of p column vectors. Then the two key structural and in the same we define a Toeplitz matrix Ty,s from the propertieslistedinSection3aretakenintoaccountinthe quadruple {A,K,C,0}. Finally, let the state sequence be following problem formulation: stored as: (cid:16) (cid:17) min rank Yˆ −T U −T Y (12) X =[x(1) x(2) ··· x(N −s+1)] (9) u,s s y,s s Yˆs∈Hp,Tu,s∈Tp,m,Ty,s∈Tp,p Then the data equation compactly reads: (cid:16) (cid:17)(cid:16) (cid:17)T Y =O X+T U +T Y +E (10) andminE[ y(k)−yˆ(k) y(k)−yˆ(k) ],whereEdenotes s s u,s s y,s s s This equation is a simple linear matrix equation that theexpectationoperator.Thisoptimizationproblemseeks highlights the challenges in subspace identification, which for the Pareto optimal solution with respect to the two (cid:16) (cid:17) (cid:16) is to approximate from the given Hankel matrices Ys and cost functions rank Yˆ −Tu,sUs−Ty,sYs and E[ y(k)− U the column space of the observability matrix and/or s (cid:17)(cid:16) (cid:17)T that of the state sequence. yˆ(k) y(k)−yˆ(k) ]. This optimization is however not The equation is highly structured. For the sake of brevity tractable. For that purpose we will develop in the next in this paper we focus on the following two key structural subsection a convex relaxation. This will make it possible properties: to obtain all Pareto optimal solutions using scalarization. (1) The matrix product O X is low rank since s>n. s 4.2 A convex relaxation (2) The matrices T and T are (block-) Toeplitz. u,s y,s In all existing subspace identification methods, these two A convex relaxation of the NP hard problem formulation key structural matrix properties are not used in the in (12) will now be developed. The original problem is first step of the algorithmic solution. Some pre-processing reformulated in two ways. First the rank(·) operator is step of the data (Hankel) matrices is usually performed, substituted by the nuclear norm. The nuclear norm of followed by a low rank factorization. To further illustrate a matrix X denoted by (cid:107)X(cid:107) is defined as the sum of (cid:63) this point, consider the approach in Chiuso (2007) as the singular values of the matrix X. It is also known an example method. Then the first step consists in the as the trace norm, the Ky Fan norm or the Schatten estimation of a high order ARX model that is defined by norm Liu and Vandenberghe (2010). This is known to be the last (block-) row of the data equation (10) neglecting a good approximation, Fazel et al. (2001); Fazel (2002). Second the minimum variance criterion is substituted by FIT 100 thefollowingsampleaverageofthetraceofthecovariance (cid:16) (cid:17)(cid:16) (cid:17)T matrix E[ y(k)−yˆ(k) y(k)−yˆ(k) ]: 90 80 N 1 (cid:88) N (cid:107)y(k)−yˆ(k)(cid:107)22 70 k=1 60 By introducing a scalarization, or regularization, parame- D ter λ ∈ [0,∞) all Pareto optimal solutions of the convex SI 50 2 reformulation of the N2SID problem can be formulated in N one line. 40 min (cid:107)Yˆ −Tu,sUs−Ty,sYs(cid:107)(cid:63)+ 30 Yˆs∈Hp,Tu,s∈Tp,m,Ty,s∈Tp,p N (13) 20 λ (cid:88) (cid:107)y(k)−yˆ(k)(cid:107)2 N 2 10 k=1 Itiswell-knownthatthisproblemcanberecastasasemi- 00 20 40 60 80 100 definite programming problem, Fazel et al. (2001); Fazel N4SID (2002),andhenceitcanbeefficientlysolvedwithstandard Fig. 1. Scatter plot showing the fit N2SID versus N4SID solvers. We have used the modeling language YALMIP, for 100 randomly generated examples. Löfberg (2001), to perform experiments, results of which we will present next. We choose model order by looking at the singular values Themethodencompassesinastraightforwardmannerthe ofthematrixthatminimizesthenuclearnorm.Themodel identificationproblemswithoutputdataonly.Inthatcase order is equal to the index of the singular values that is the convex relaxed problem formulation reads: closesttothelogarithmicmeanofthelargestandsmallest singularvalue.However,incasethatindexisgreaterthan min (cid:107)Yˆ −T Y (cid:107) + y,s s (cid:63) 10 we choose as model order 10. Yˆs∈Hp,Ty,s∈Tp,p λ (cid:88)N (14) Since our approach gives as solution a whole family of (cid:107)y(k)−yˆ(k)(cid:107)2 models parameterized with regularization parameter λ we N 2 k=1 perform a second optimization over this parameter with respect to a fit criterion as implemented in the command 5. ILLUSTRATIVE EXAMPLES FIT of the System Identification Toolbox in Matlab. We consider 20 logarithmically spaced values of λ/N in the 5.1 Open-loop experiment interval [10−0.5,104]. We then compare our method with a standard subspace Inthissectionwereportresultsonnumericalexperiments method as implemented in the command n4sid of the in Matlab. In the first example we consider 100 second System Identification Toolbox of Matlab. This algorithm order single-input single-output systems as in (1) ran- will pick model order equal to the one in the interval 0 domly generated with the command drss. The matrix to 10 which give the best fit on the identification data. K has been generated with the command randn which We also make sure that n4sid uses the same value of s as gives a matrix of elements drawn from a standardized we do. We also insist that the direct term in the model is normaldensityfunction.Anymodelforwhichtheabsolute estimated.Thecomparisonbetweenourmodelandtheone value of the largest eigenvalue of the system matrix A ofn4sidisdonebycomputingthefitofthetwomodelson is larger than 0.99 has been discarded. Data for system a second validation data set. In Figure 1 the fit of N2SID identification has been generated for each model and with versus N4SID is plotted for the 100 random examples. time horizon N = 50. The noise e(k) is white and drawn In this plot one negative fit value has been removed. In from a normal density function with standard deviation more than 80% of the cases N2SID has a higher fit value equal to 0.2. The input signal u(k) is a sequence of ±1 than N4SID. The average fit values are 76.0 for N2SID obtained by taking the sign of a vector of values obtained and71.1forN4SID.Theaveragemodelordersare6.67for from a standardized normal density function. The initial N2SID and 5.01 for N4SID. This clearly demonstrates the value is obtained from a normal density function with advantageofusingN2SIDonshortdatasets.Wehavealso standard deviation 5, where the components are uncor- performed experiments on longer data sets. Preliminary related. The parameter s has been equal to 15. We do resultsshowthatthereisnosignificantdifferencebetween not consider the nuclear norm of the matrix as defined in the two methods. This is an indication that for large data (13),butinsteadwefirstmultiplyitwitharandommatrix length batches both methods deliver consistent results. from the right. This random matrix was obtained from a standardized normal density function and the number of columns was 22. It is known that this type of randomiza- 5.2 Closed-loop experiment tion is a powerful tool in low rank matrix approximation, Halko et al. (2011). The reason we do this modification is In the second example we consider closed loop identifica- that it will reduce the computational complexity without tion. For this purpose we consider a model on the form significantly affecting the quality of the results obtained. (1), where FIT 6. CONCLUDING REMARKS 70 Subspace identification is revisited in this paper in the 60 scopeofnuclearnormoptimizationmethods.Anewwayto impose structural matrix properties in the data equation 50 in subspace identification on the measured data has been presented. The new subspace identification method is 40 referred to as N2SID. It is shown that especially for small D SI data length batches when the number of samples is only a 2 N30 small multiple of the order of the underlying system, the incorporationofstructuralinformationaboutthelowrank propertyofthematrixrevealingtherequiredsubspaceand 20 the (block) Toeplitz structure of the matrix containing the unknown Markov parameters enables to improve the 10 results of widely used SID methods. In addition to the structural constraints the N2SID method also enables 0 to make a trade-off in the first step of the calculations 0 10 20 30 40 50 60 70 N4SID between the subspace approximation and the prediction errorcostfunction.Assuchitalsoovercomesthepersistent Fig. 2. Scatter plot showing the fit N2SID versus N4SID drawbackthatSIDdidnotconsidera(classicalprediction for100differentrealizationsofthereferencevalueand error) cost function. the noise. The single integrative step that aims at imposing key structuralmatrixpropertiesmakesatrade-offbetweenthe N2SID N4SID 1 1 prediction error cost function and the problem to retrieve 0.8 0.8 the subspace of interest. This integrative approach may 0.6 0.6 help to simplify the analysis of the optimality of SID 0.4 0.4 methods and to further clarify their link with prediction maginary part−00..022 maginary part−00..022 eurprotrhme petohsosidbsi.liTtyhifsonrenwewwadyevoeflloopomkienngtsupinonthSeIDfuwtuilrleo.pen i i −0.4 −0.4 −0.6 −0.6 REFERENCES −0.8 −0.8 Chiuso, A. (2007). The role of vector autoregressive mod- −1 −1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 real part real part eling in predictor-based subspace identification. Auto- matica, 43(6), 1034 – 1048. Fig. 3. Scatter plot showing the eigenvalues of the A- Fazel, M. (2002). Matrix Rank Minimization with Appli- matrix for N2SID and N4SID for 100 different real- cations. Ph.D. thesis, Stanford University. izations of the reference value and the noise. Fazel, M., Hindi, H., and Boyd, S. (2001). A rank minimization heuristic with application to minimum (cid:20)0 1 (cid:21) (cid:20)0(cid:21) (cid:20)−0.3(cid:21) order system approximation. In Proceedings of the A= ; B = ; K = 0 0.7 1 0.04 American Control Conference, 4734–4739. Fazel, M., Pong, T.K., Sun, D., and Tseng, P. (2012). C =[1 0]; D =0 Hankel matrix rank minimization with applications to We control the model with an observer-based state feed- system identification and realization. Submitted for back where K is the observer gain and where the state possible publication. feedback matrix is L=[0.25 −0.3]. This design will place Halko,N.,Martinsson,P.G.,andTropp,J.A.(2011).Find- the closed loop eigenvalues all at 0.5. The controller also ingstructurewithrandomness:Probabilisticalgorithms has a feed-forward from a reference value r(k), which is a fot constructing approximate matrix decompositions. sequence of ±1 obtained by taking the sign of a vector SIAM Review, 53, 217–288. of values obtained from a standardized normal density Hansson, A., Liu, Z., and Vandenberghe, L. (2012). Sub- function. This signal is multiplied with a gain before it is space system identification via weighted nuclear norm added to the input signal such that the closed loop steady optimization. In Proceedings of the 51st IEEE Confer- gainfromreferencevaluetoy(k)isequaltoone.Thenoise ence on Decision and Control. Submitted. e(k) is white and drawn from a normal density function Liu, Z., Hansson, A., and Vandenberghe, L. (2013). Nu- withstandarddeviationequalto0.1.Inthisexperimentwe clearnormsystemidentificationwithmissinginputsand will not look for the best model order, but instead we just outputs. Systems & Control Letters, 62, 605–612. compute models of order n=2. All the other parameters Liu, Z. and Vandenberghe, L. (2009a). Interior-point are the same as in the previous experiment. We show in method for nuclear norm approximation with applica- Figure 2 the fit of N2SID versus N4SID for 100 different tion to system identification. SIAM Journal on Matrix realizations of the reference value and the noise. The fit is Analysis and Applications, 31(3), 1235–1256. betterforN2SIDin59%ofthecases.Moreover,asisseen Liu, Z. and Vandenberghe, L. (2009b). Semidefinite pro- in Figure 3, the spread of the eigenvalues of the A-matrix gramming methods for system realization and identifi- is smaller for N2SID as compared to N4SID. cation. In Proceedings of the Joint 48th IEEE Confer- ence on Decision and Control and 28th Chinese Control Conference, 4676–4681. Liu, Z. and Vandenberghe, L. (2010). Interior-point method for nuclear norm approximation with applica- tion to system identification. SIAM Journal on Matrix Analysis and Applications, 31(3), 1235–1256. Ljung, L. (1999). System Identification — Theory for the User. Prentice-Hall, Upper Saddle River, N.J., 2nd edition. Ljung, L. (2007). System Identification Toolbox for use with Matlab. Version 7. TheMathWorks,Inc,Natick, MA, 7th edition. Löfberg, J. (2001). Yalmip: A matlab interface to sp, maxdet and socp. Technical Report LiTH-ISY-R-2328, Department of Electrical Engineering, Linköping Uni- versity, SE-581 83 Linköping, Sweden. Mohan,K.andFazel,M.(2010).Reweightednuclearnorm minimization with application to system identification. In Proceedings of American Control Conference, 2953– 2959. Verhaegen, M. and Verdult, V. (2007). Filtering and Identification: A Least Squares Approach. Cambridge University Press.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.