Table Of Content

(cid:63) N2SID:NuclearNormSubspaceIdentification MichelVerhaegena AndersHanssonb aDelftCenterforSystemsandControl DelftUniversity Delft,TheNetherlands 6 bDivisionofAutomaticControl 1 LinköpingUniversity 0 Linköping,Sweden 2 c e D Abstract 4 Theidentificationofmultivariablestatespacemodelsininnovationformissolvedinasubspaceidentificationframeworkusing 1 convexnuclearnormoptimization.Theconvexoptimizationapproachallowstoincludeconstraintsontheunknownmatrices in the data-equation characterizing subspace identification methods, such as the lower triangular block-Toeplitz of weighting ] matrices constructed from the Markov parameters of the unknown observer. The classical use of instrumental variables to Y remove the influence of the innovation term on the data equation in subspace identification is avoided. The avoidance of the S instrumentalvariableprojectionstephasthepotentialtoimprovetheaccuracyoftheestimatedmodelpredictions,especially . forshortdatalengthsequences.ThisisillustratedusingadatasetfromtheDaSIylibrary.Anefficientimplementationinthe s c frameworkoftheAlternatingDirectionMethodofMultipliers(ADMM)ispresentedthatisusedinthevalidationstudy. [ 2 Keywords: Subspacesystemidentification,Nuclearnormoptimization,Rankconstraint,Shortdatabatches,Alternating v DirectionMethodofMultipliers 5 9 4 4 1 Introduction A number of recent developments have been made to 0 integrate the low rank approximation step in SID with . 1 The generation of Subspace IDentification (SID) meth- a goodness of fit into a single multi-criteria convex op- 0 timization problem. These contributions were inspired odsfortheidentificationofLinearTime-Invariant(LTI) 5 by the work in [3] to approximate a constraint on the statespacemodelsasdevelopedoriginallyin[20,10,21] 1 rank of a matrix by minimizing its nuclear norm. It re- deriveapproximate modelsratherthanmodelsthatare : sulted into a number of improvements to the low rank v “optimal” withrespecttoagoodnessoffitcriteriumde- i finedintermsoftheweightednormofthedifferencebe- approximationstepovertheclassicallyusedSVDinSID, X [12,13,16,5,7,11,18,19]. tweenthemeasuredoutputandthemodelpredictedout- r put.Theapproximationisbasedonlinearalgebratrans- a When considering identifying innovation state space formations and factorizations of structured Hankel ma- models, a common approach is to make use of instru- trices constructed from the input-output data that are mentalvariables[7].Itiswellknownthattheprojection related via the so-called data equation [25]. All existing operation related to the use of instrumental variables SIDmethodsaimtoderivealowrankmatrixfromwhich may result into a degradation of the accuracy of the key subspaces, hence the name subspace identification, estimatedquantities. are derived. The low rank approximation is in general doneusingaSingularValueDecomposition(SVD). InthispaperwepresentanewSIDmethodforidentify- (cid:63) ingmultivariablestatespacemodelsininnovationform Part of the research was done while the first author was withintheframeworkofnuclearnormoptimization.The a Visiting Professor at the Division of Automatic Control, new SID method avoids the use of instrumental vari- DepartmentofElectricalEngineering,LinköpingUniversity, ables.ThemethodisaconvexrelaxationofParetoopti- Sweden.ThisworkwaspartiallysupportedbytheEuropean Research Council Advanced Grant Agreement No. 339681. mizationinwhichstructuralconstraintsareimposedon CorrespondingAuthor: theunknownsinthedataequation,suchastheirblock- [email protected]. Toeplitzmatrixstructure.ThisParetooptimizationap- PreprintsubmittedtoAutomatica 26May2016/ proach allows to make a trade-off between a Prediction subspaceidentificationmethodispresentedinSection3. Errortypeofoptimalitycriteria,thatisminimizingthe Themulti-criteriaoptimizationproblem,theanalysisof (co-)variance of the one-step ahead prediction of a lin- theuniquenessofsolutionanditsconvexrelaxationare ear Kalman filter type observer, on one hand, and find- presented in Section 4. In Section 5 we explain how to inganobserveroflowestcomplexity,i.e.oflowestmodel obtainanefficientADMMcode.Theresultsoftheper- order,ontheotherhand.Akeyresultisthatthestruc- formances are illustrated in Section 6 in a comparison turalToeplitzconstraintissufficienttofindtheminimal studyofN2SIDwithtwootherSIDmethods,N4SIDand observer realization when the optimal one-step ahead the recent Nuclear Norm based SID methods presented prediction of the output is known. The incentive to es- in [11] and with PEM. Finally, we end this paper with timateaKalmanfiltertypeofobserveralsojustifiesthe someconcludingremarks. constraint to attempt to minimize the variance of the one-stepaheadpredictionerror. 1.1 Notations It is interesting to note that this key result stipulates preciseconditionsonthepersistancyofexcitationofthe input(inopen-loopexperiments).Formanyinstrumen- WeintroducetheMatlab-likenotationthatforavector tal variable based SID methods it is still an open ques- ormatrixX ∈RM×N(cid:0)CM×N(cid:1)itholdsthatXm:n,p:q is tion what the persistency of excitation condition is on thesub-matrixofXwithrowsmthroughnandcolumns a generic input sequence to guarantee the algorithm to pthroughq.Ifoneofthedimensionsofthematrixisn, work for finite data length samples or to be consistent then 1 : n can be abbreviated as just :. Moreover, with [9]. Xm:−1:n,p:q is meant the sub-matrix of X obtained by taking rows m through n in reverse order, where m is greaterthanorequalton.Similarnotationmaybeused The convex relaxation of the new SID approach is forthecolumns. denoted by Nuclear Norm Subspace IDentification (N2SID).Itsadvantagesaredemonstratedinacompar- isonstudyonreal-lifedatasetsintheDaSIydatabase, 2 TheSubspaceIdentificationProblem [2]. In this comparison study N2SID demonstrated to yieldmodelsthatleadtoimprovedpredictionsovertwo existing SID methods, N4SID and the recent Nuclear Insystemidentificationachallengingproblemistoiden- NormbasedSIDmethodspresentedin[11]andwiththe tify Linear Time Invariant (LTI) systems with multi- Prediction Error Method (PEM) [15]. For the sake of pleinputsandmultipleoutputsusingshortlengthdata brevitytheresultsaboutonlyonedatasetarereported. sequences. Taking process and measurement noise into Formoreinformationonamoreextensiveexperimental consideration, a general state space model for LTI sys- analysisleadingtosimilarconclusionswereferto[24]. temscanbegiveninso-calledinnovationform,[25]: ThefoundationsforN2SIDwerepresentedin[23].There (cid:40) x(k+1) = Ax(k)+Bu(k)+Ke(k) the resulting optimization problem was solved using a (1) Semi-Definite Programming (SDP) solver after a refor- y(k) = Cx(k)+Du(k)+e(k) mulation of the problem into an equivalent SDP problem. In addition to this the problem formulation was with x(k) ∈ Rn,y(k) ∈ Rp,u(k) ∈ Rm and e(k) a zero- approximated in order to obtain a problem of manage- mean white noise sequence. Since we are interested in able size for current SDP solvers. In this paper we will shortdatasetsnorequirementonconsistencyisincluded instead solve the problem with the Alternating Direc- inthefollowingproblemformulation. tionMethodofMultipliers(ADMM).ADMMisknown tobeagoodchoiceforsolvingregularizednuclearnorm ProblemFormulation:Giventheinput-ouput(i/o)data problems as the ones we are solving in this paper, [11]. batches{u(k),y(k)}N ,withN >nandassumedtobe Inordertogetanefficientimplementationwehavecus- k=1 retrieved from an identification experiment with a sys- tomized the computations for obtaining the coefficient tembelongingtotheclassofLTIsystemsasrepresented matrixofthenormalequationsassociatedwithourfor- by(1),theproblemistodetermineapproximatesystem mulationusingFastFourierTransformation(FFT)tech- matrices(Aˆ ,Bˆ ,Cˆ ,Dˆ,Kˆ )thatdefinethenˆ-thorder niques. This is the key to obtain an efficient implemen- T T T T observerof“low” complexity: tation.  (cid:16) (cid:17) Thepaperisorganizedasfollows.InSection2theiden- xˆ (k+1) = Aˆ xˆ (k)+Bˆ u (k)+Kˆ y (k)−Cˆ xˆ T T T T v T v T T tification problem for identifying a multi-variable state spacemodelinasubspacecontextwhiletakingapredic-  yˆv(k) = CˆTxˆT(k)+Dûv(k) tion error cost function into consideration is presented. (2) The data equation and necessary preliminaries on the such that the approximated output yˆ (k) is “close” v assumptionsmadeintheanalysisanddescriptionofthe to the measured output y (k) of the validation pair v 2 {u (k),y (k)}Nv as expressed by a small value of the The Hankel matrices from the output y(k) and the in- v v k=1 costfunction, novatione(k)aredefinedsimilarlyanddenotedbyY s,N and E , respectively. The relationship between these s,N 1 (cid:88)Nv Hankel matrices, that readily follows from the linear (cid:107)y (k)−yˆ (k)(cid:107)2. (3) modelequationsin(5),requirethedefinitionofthefol- N v v 2 v lowingstructured matrices.Firstwedefinetheextended k=1 observabilitymatrixO : s The quantitative notions like “low” and “close approximation” will be made more precise in the new N2SID (cid:104) (cid:105) OT = CT ATCT ··· ATs−1CT . (7) solutiontowardthisproblem.Thesolutiontothisprob- s lemisprovidedunder2Assumptions.Thefirstislisted here,thesecondattheendofSection3. Second,wedefineaToeplitzmatrixfromthequadruple ofsystemsmatrices{A,B,C,D}as: Assumption A.1. The pair (A,C) is observable and (cid:104) (cid:105)   thepair(A, B K )isreachable. D 0 ··· 0    CB D 0 AodkseiysstthaertrinelgaptiooinntbientwtheeefnorsmtruulcattuiorendofHsaunbkseplamceamtreicthes- Tu,s = ... ...  (8)   constructed from the i/o data. This relationship will as CAs−2B ··· D definedin[25]bethedataequation.Itwillbepresented inthenextsection. and in the same way we define a Toeplitz matrix T y,s from the quadruple {A,K,C,0}. Finally, let the state 3 TheDataEquation,itsstructureandPrelim- sequencebestoredas: inaries (cid:104) (cid:105) Let the LTI model (1) be represented in its so-called XN = x(1) x(2) ··· x(N −s+1) . (9) observerform: Thenthedataequationcompactlyreads: (cid:40) x(k+1) = (A−KC)x(k)+(B−KD)u(k)+Ky(k) Y =O X +T U +T Y +E . (10) y(k) = Cx(k)+Du(k)+e(k) s,N s N u,s s,N y,s s,N s,N (4) This equation is a simple linear matrix equation that Wewilldenotethismodelcompactlyas: highlights the challenges in subspace identification, (cid:40) whichistoapproximatefromthegivenHankelmatrices x(k+1) = Ax(k)+Bu(k)+Ky(k) Y andU thecolumnspaceoftheobservabilityma- (5) s,N s,N y(k) = Cx(k)+Du(k)+e(k) trixand/orthatofthestatesequenceoftheobserver(5). The equation is highly structured. In this paper we fo- with A the observer system matrix (A − KC) and B cusonthefollowingkeystructuralpropertiesaboutthe equal to (B −KD). Though this property will not be unknownmatricesin(10): used in the sequel, the matrix A can be assumed to be asymptoticallystable. (1) ThematrixproductO X islowrank whens>n. s N For the construction of the data equation, we store the (2) ThematricesTu,s andTy,s areblock-Toeplitz. measured i/o data in block-Hankel matrices. For fixed (3) ThematrixEs,N isblock-Hankel. N assumed to be larger then the order n of the under- lyingsystem,thedefinitionofthenumberofblock-rows The interesting observation is that these 3 structural fully defines the size of these Hankel matrices. Let this properties can be added as constraints to the multi- dimensioningparameterbedenotedbys,thentheHan- criteriaoptimizationproblemconsideredwhilepreserv- kelmatrixoftheinputisdefinedas: ingconvexity.ThisisdemonstratedinSection4.   u(1) u(2) ··· u(N −s+1) TheanalysisinSection4requiresthefollowingprelimi-  .  naries. u(2) u(3) ..  Us,N = ... ... . (6) eDxecifitinnigtioofno1rde[2r5s]:ifAansidgnoanllyu(ikf)th∈ereRmexiisstspearnsiisntetengtelyr   u(s) u(s+1) ··· u(N) N suchthatthematrixUs,N hasfullrowrank. 3 Lemma2 [9]: Consider the state space model in inno- Assumption A.2. Consider the model (5), then there vation form (1) and let all stochastic signals be station- existsanintegerN suchthatthecompoundmatrix, ary and ergodic, let Assumption A.1 be satisfied and let the input u(k) be quasi-stationary [15] and persistently   X excitingoforders+n,then: N   U  s,N   (cid:34) (cid:35) 1 XN (cid:104) (cid:105) Ys,N lim XT UT >0 N→∞N U N s,N s,N hasfullrowrank. 4 N2SID Corollary3 Let the conditions stipulated in Lemma 2 hold, and let u(k) be statistically independent from the 4.1 ParetooptimalSubspaceIdentification innovationsequencee((cid:96))forallk,(cid:96),then,   Whenassumingtheoptimalobservergiven,thequantity XN yˆ(k) is the minimum variance prediction of the output Nl→im∞N1 Us,N(cid:104)XNT UsT,N YsT,N(cid:105)>0 adnefidneeqdufarlotmoyth(kis)s−eqeu(ken).cLeeyˆt(kth)eaHswanekdeelfimnaetdriYxYˆs,Nfrobme Y s,N s,N y(k). Then the data equation (10) can be reformulated into: Proof: Since e(k) is white noise, it follows that Yˆ =O X +T U +T Y . (12) s,N s N u,s s,N y,s s,N E[x(k)e((cid:96))T] = 0 (with E denoting the expectation operator),for(cid:96)≥k.Thisincombinationwiththeinde- Let T denote the class of lower triangular block- p,m pendency between u(k) and e((cid:96)), the white noise prop- Toeplitzmatriceswithblockentriesp×mmatricesand erty of e(k) and the ergodicity or the quasi-stationarity let H denote the class of block-Hankel matrices with p ofthesignalsyields, block entries of p column vectors. Then the three key structural properties listed in Section 3 are taken into   account in an optimization problem seeking a trade-off X 1  N (cid:104) (cid:105) betweenthefollowingcostfunctions, lim U  XT UT ET >0 (11) N→∞N  s,N N s,N s,N (cid:16) (cid:17) Es,N rank Γs,N −ΘuUs,N −ΘyYs,N (13) (cid:16) (cid:17)(cid:16) (cid:17)T and TrE[ y(k)−γ(k) y(k)−γ(k) ] Considering model (1), let the block-Toeplitz matrices T(cid:48) andT bedefinedastheToeplitzmatrixT in(8) u,s e,s u,s Here E denotes the expectation operator. The matrix but from the quadruples (A,B,C,D) and (A,K,C,I), respectively. Let OsT = (cid:104)CT ATCT ··· ATs−1CT(cid:105). iΓnsg,Nth∈e HHapnikselthmea(tbrlioxckYˆ-) Haannkdelcomnasttrruixctaepdpfrrooxmimtahte- s,N Then we can state the following alternative data equa- approximation of the one-step ahead prediction of the tion: output denoted by γ(k) in the same way Yˆ was con- s,N structed from yˆ(k). Further, we have the following con- Ys,N =OsXN +Tu(cid:48),sUs,N +Te,sEs,N straintsonthematricesΘu ∈Tp,m andΘy ∈Tp,p. Bythisdataequation,wehavethat, An optimal trade-off between the above two cost func- tions is called a Pareto optimal solution. Moreover, the      Pareto optimization problem is not tractable. For that X I 0 0 X N N purposewewilldevelopinthenextsubsectionaconvex      U = 0 I 0 U  relaxation ofthecostfunctions.Thiswillmakeitpossi- s,N s,N      ble to obtain all Pareto optimal solutions using scalar- Y O T(cid:48) T E s,N s u,s e,s s,N ization. Theresultsfollowsfrom(11)andthefactthatthematrix Beforestatingthisconvexrelaxationananalysisismade Te,s issquareandinvertible. (cid:50) on the additional structure that can be imposed on the block-Toeplitz matrices Θ and Θ and/or under what u y Based on this result the following assumption is stipu- conditions their block-Toeplitz structure is sufficient to lated. findauniquesolution. 4 4.2 Additional structure in the block-Toeplitz matrices (13) only with Γ fixed to Yˆ , let s > n and let As- s,N s,N T andT sumptionsA.1andA.2besatisfied,Then, u,s y,.s (cid:16) (cid:17) In this section we analyse the additional structure min rank Yˆ −Θ U −Θ Y =n s,N u s,N y s,N present in the block-Toeplitz matrices Tu,s and Ty,.s as Θu∈Tp,m,Θy∈Tp,p well as the conditions under which the block-Toeplitz structure is sufficient to find the system matrices Furthertheargumentsoptimizingtheaboveoptimization (A ,B ,C ,D,K ). These conditions, not includ- problem,denotedasΘˆ ,Θˆ areuniqueandequalto, T T T T u y ing the additional structural constraint highlighted in Lemma4,issummarizedinTheorem1ofthispaper. Θˆ =T Θˆ =T u u,s y y,s Lemma4 Let s > n, then we can partition the block- withT ,T thetrueunderlyingblock-Toeplitzmatrices u,s y,s ToeplitzmatricesTu,sandTy,.s,definedinthedataequa- inthedataequation(10). tion (10)as, Proof:Letδ ∈T ,δ ∈T ,then,   u p,m y p,p T | 0 u,n Tu,s =  (14) Yˆ −Θ U −Θ Y =Yˆ −(T +δ )U − H |T s,N u s,N y s,N s,N u,s u s,N u,s−n u,s−n (T +δ )Y y,s y s,N =O X −δ U −δ Y and likewise for the matrix T . Here the matrices s N u s,N y s,N y,s H andH canbedecomposedas, u,s−n y,s−n Therefore,   C (cid:16) (cid:17) rank Yˆ −Θ U −Θ Y =  CA  s,N u s,N y s,N [Hu,s−n|Hy,s−n]= .. [An−1B ··· B|An−1K ··· K] . (cid:16) (cid:17) CAs−n−1 rank O X −δ U −δ Y s N u s,N y s,N (15) ApplicationofSylvester’sinequality[25]andunderAs- Proof:Followsbyconstruction. (cid:50) sumptionA.2,wefurtherhave, (cid:16) (cid:17) (cid:16)(cid:104) (cid:105)(cid:17) Remark5 Lemma4canbeusedtoimposeanadditional rank Yˆ −Θ U −Θ Y =rank O δ δ s,N u s,N y s,N s u y constraint on the block-Toeplitz matrices Θ and Θ . If u y (16) wepartitiontheseblock-Toeplitzmatricesconformaltheir First notice that under Assumption A.1 the rank of counterpartsT andT ashighlightedinLemma4,as u,s y,s this matrix is n for δ = 0 and δ = 0. Since the u y follows, (cid:16)(cid:104) (cid:105)(cid:17) (cid:16) (cid:17)   rank O δ δ ≥ rank O for all δ ,δ , we have s u y s u y Θ | 0 Θu,s = u,n  thatnistheminimalvalueoftherankin(16). HΘ |Θ u,s−n u,s−n Itwillnowbeshownthatthisminimalvalueoftherank, (likewise for Θ ), then for the case s ≥ 2n we can y,s canonlybereachedforbothδ andδ equaltozero. imposethefollowingadditionalconstraint, u y (cid:16)(cid:104) (cid:105)(cid:17) For that purpose, let t = {ti ∈ Rp×(m+p)}si=1 be a se- rank HΘ HΘ =n quenceofarbitrarymatricesthatdefinethelowertrian- u,s−n y,s−n gularblock-Toeplitzmatrix∆s(t)as:   The additional constraint highlighted in Remark 5 can t 0 ··· 0 1 bereformulated,asdonee.g.in[18,19],asarankmini-  . mstrizaaintitounscinognstthreainnutc,ltehaartncoarnmb.eHroewlaexveedrtwoeaseceoknvtoexavcoonid- ∆s(t)=t...2 t1 ... .....∈Rsp×s(m+p) imposingthisadditionalconstraintinordertominimize   thenumberofregularizationparameters.Thebasishere- t t ··· t s s−1 1 foreisprovidedinthenextTheorem. (cid:104) (cid:105) The columns of the compound matrix δ δ in (16) Theorem1 Consider the observer in (1) with x(k) ∈ u y Rn and consider the rank optimization problem in Eq. canalwaysbepermutedintoamatrixoftheform∆s(t) 5 and since column permutations do not change the rank allParetooptimalsolutionsoftheconvexreformulation ofamatrixwehavethat, oftheN2SIDproblemcanbeobtainedbysolving: min (cid:107)Γ −Θ U −Θ Y (cid:107) (cid:16)(cid:104) (cid:105)(cid:17) (cid:16)(cid:104) (cid:105)(cid:17) Γs,N ∈Hp,Θu,s∈Tp,m,Θy,s∈Tp,p s,N u,s s,N y,s s,N (cid:63) . rank O δ δ =rank O ∆s(t) s u y s +λ (cid:80)N (cid:107)y(k)−γ(k)(cid:107)2 N k=1 2 (18) Nowweshowthatthefollowingcondition forallvaluesofλ∈[0,∞). (cid:16)(cid:104) (cid:105)(cid:17) rank O ∆s(t) =n Remark4 Themethodcanbeextendedtootherrelated s identificationproblems.Forexampleonewaytoconsider the identification problem of an innovation model with impliesthat∆s(t)hastobezero.Inorderfortheabove absenceofameasurableinput,istoconsiderthefollowing rank constraint to hold we need ∆s(t) to be of the fol- convexrelaxedproblemformulation: lowingform: min (cid:107)Γ −Θ Y (cid:107) +   Γs,N∈Hp,Θy,s∈Tp,p s,N y,s s,N (cid:63) (19) t1 0 ··· 0 0 Nλ (cid:80)Nk=1(cid:107)y(k)−γ(k)(cid:107)22. t2 t1 0 0 (cid:104) (cid:105) ... ... =Os q1 q2 ··· qs−1 qs (17) Itiswell-knownthattheproblem(18)canberecastasa   t t ··· t t Semi-Definite Programming (SDP) problem, [4, 3], and s s−1 2 1 henceitcanbesolvedinpolynomialtimewithstandard (cid:16) (cid:17) SDPsolvers.Thereformulation,however,introducesad- Thefactthats>n,wehavethatrank O =nand s−1 ditionalmatrixvariablesofdimensionN×N,unlessthe therefore we can deduce from the first p(s−1) rows of problem is not further approximated using randomiza- thelastp+mcolumnsintheexpression(17)that, tiontechniquesasin[23].Insection5wewillpresentan alternative exact method using ADMM inspired by its qs =0⇒t1 =0 successfulapplicationin[11]. Usingthisresult,andtheToeplitzstructureof∆s(t),we 4.4 Calculationofthesystemmatrices caninthesamewayconcludefromthefirstp(s−1)rows andfromthecolumns(s−2)(m+p)+1to(s−1)(m+p) The convex-optimization problem (18) yields the esti- in(17)that, mates of the quantities Γ ,Θ and Θ . Since the s,N u,s y,s outcome depends on the regularization parameter λ, qs−1 =0⇒t2 =0 etc. let us denote these estimates as Γˆ (λ),Θˆ (λ) and s,N u,s Θˆ (λ) respectively. The determination of the system Hence there cannot be a ∆s(t) with the given Toeplitz y,s matrices starts with an SVD of the “low rank” approxi- structurethatisdifferentfromzerosuchthat (cid:16)(cid:104) (cid:105)(cid:17) matedmatrixasfollows: rank O ∆s(t) =n.Hencetheminimalvalueofthe s rankofthematrix(cid:104)Os δu δy(cid:105)in(16)w.r.t.δu,δy yields Γˆs,N(λ)−Θû,s(λ)Us,N −Θˆy,s(λ)Ys,N=  zerovalueofboth.Thisconcludestheproof. (cid:50) (cid:104) (cid:105) Σ (λ)|0 VT(λ) Unˆ(λ)|(cid:63)  nˆ  nˆ  (20) 0 |(cid:63) (cid:63) 4.3 Aconvexrelaxation where nˆ is an integer denoting the nˆ largest singular AconvexrelaxationoftheNPhardproblemformulation values and the notation (cid:63) denotes a compatible matrix in (13) will now be developed. The original problem is notofinteresthere.Theselectionofnˆ isoutlinedinthe reformulated in two ways. First, the rank operator is algorithmicdescriptiongivennext. substituted by the nuclear norm. The nuclear norm of a matrix X denoted by (cid:107)X(cid:107) is defined as the sum of (cid:63) the singular values of the matrix X. It is also known as The algorithm requires in addition to the input-output thetracenorm,theKyFannormortheSchattennorm, data sequences the user to specify the parameter s to [14]. This is known to be a good approximation of the fix the number of block rows in the block-Hankel ma- rankoperatorwhenitistobeminimized,[4,3].Second, trices U and Y and an interval for the parameter s,N s,N the minimum variance criterium is substituted by the λ denoted by Λ = [λ ,λ ]. As for the implementa- min max followingsampleaverage 1 (cid:80)N (cid:107)y(k)−γ(k)(cid:107)2. tion described in [11], which we will refer to as WNNopt, N k=1 2 By introducing a scalarization parameter λ ∈ [0,∞), the identification data set could be partitioned in two whichcanbeinterpretedasaregularizationparameter, parts. The first part is referred to as the ide-1 part of 6 theidentificationdatasetandtheremainingpartofthe U asdoneinclassicalSIDmethodsbyconsideringU nˆ nˆ identification data set is referred to as the ide-2 part. tobeanapproximationoftheextendedobservability Thissplittingofthedatasetwasrecommendedin[11]to matrixO ,seee.g.[25]. s avoidoverfitting.IntheN2SID algorithmthreevariants STEP2: With U and the estimated matrix Te we nˆ y,s can be substituted in the algorithmic block ’compute exploitthatthelattermatrixapproximatestheblock- Mj(λ)’ for j = 1,2,3. This algorithmic block performs ToeplitzmatrixTy,stoestimatetheobservergainKˆT the actual calculation of the one-step ahead predictor viathesolutionofastandardlinearleastsquaresprob- andthethreevariantsaresummarizedafterthedescrip- lem. This is seen as follows. Let us assume the block tionofthecorepartofN2SID. Toeplitz matrix T be given and denoted explicitly y,s as, N2SIDalgorithm:   0 0 ··· 0 0 Grid the interval Λ = [λmin,λmax] in N different   points,e.g.usingtheMatlab  CK 0 0 0   (cid:16) (cid:17)   notationΛ=logspace log(λmin),log(λmax),L Ty,s = CAK CK 0 0  ... ...  fori=1:L,   CAs−2K ··· CK 0 Solve (18)forλ=Λ(i)anddatasetide-1. IfweknowthematrixO ,wecanwritethefollowing Compute theSVDasin(20)forλ=Λ(i). s setofequations, Select Select the model order nˆ form the singular val- uesin(20).Thiscanbedonemanuallybytheuseror O (1:(s−1)p,:)K =T (p+1:ps,1:p) automatically. Such automatic selection can be done s y,s asintheN4SIDimplementationin[15]ashighlighted Letusnowdenotethefirst(s−1)prowsofthematrix in[11]:orderthesingularvaluesin(20)indescending U (λ ) by Oˆ and let us denote the submatrix of order,thenselectthatindexofthesingularvaluethat nˆ i s−1,T inlogarithmisclosesttothelogarithmicmeanofthe the matrix Θˆ (λ ) from rows p+1 to row ps and y,s i maximumandminimumsingularvaluesin(20). from column 1 to p by Tˆ (p+1 : ps,1 : p), then we y,s Compute system matrices {Aˆ ,Bˆ ,Cˆ ,Dˆ,Kˆ } accord- canestimateK from: T T T T T ing to the procedure ’Compute M (λ)’ for j =1,2,3 j andλ=Λ(i). min(cid:107)Oˆ K −Tˆ (p+1:ps,1:p)(cid:107)2 (22) s−1,T T y,s Using theestimatedsystemmatrices{Aˆ ,Bˆ ,Cˆ ,Dˆ}, KT T T T and the validation data in ide-2, compute the simu- Thisestimateoftheobservergainisusedtoestimate latedoutputyˆ(k,λ)as, thesystemmatrixA as: T xˆT(k+1)=AˆTxˆT(k)+BˆTu(k) Aˆ =Aˆ +Kˆ Cˆ (23) T T T T yˆ(k,λ)=Cˆ xˆ (k)+Dû(k) (21) T T STEP3: Lettheapproximationoftheobserverbede- andevaluatethecostfunction, notedas: J(λ)=(cid:88)N (cid:107)y(k)−yˆ(k,λ)(cid:107)2 xˆT(k+1)=AˆTxˆT(k)+BˆTu(k)+KˆTy(k) 2 yˆ(k)=Cˆ xˆ (k)+Dû(k) (24) i=1 T T end Then theestimationof the pair Bˆ ,Dˆ and theinitial T SelectM (λ )withλ givenas: conditions of the above observer can again be done j opt opt via a linear least squares problem as outlined in [25] λ =minJ(λ) by minimizing the RMS value of the prediction error opt λ∈Λ y(k)−yˆ(k)determinedfromtheidentificationdatain ide-1.TheestimatedinputmatrixBˆ isthendeter- T minedas: The subsequent three ways to compute the model are Bˆ =Bˆ +Kˆ D (25) T T T summarizedas: ComputeM (λ): 2 ComputeM (λ): 1 STEP1: asinComputeM (λ). 1 STEP1: FromtheSVDin(20),andtheselectedmodel STEP2: Derive an estimate of the state sequence of order nˆ, the pair Aˆ ,Cˆ is derived from the matrix the observer (24) from the SVD (20), where for the T T 7 sake of compactness again the system symbol xˆ (k) Forourproblemthelinearoperatorconsistsofasumof T willbeused, Hankel and Toeplitz operators, and we will show how FFT techniques can be used also for this operator. For (cid:104)xˆ (1) xˆ (2) ··· xˆ (N −s+1)(cid:105)≈VT(λ) moredetailsontheADMMalgorithmseetheappendix. T T T nˆ 5.1 Circulant,ToeplitzandHankelMatrices Using the singular values this approximation could (cid:112) alsobescaledas Σnˆ(λ)VnˆT(λ). We define the circulant matrix operator Cn : Rn → STEP3: Knowledgeoftheestimatedstatesequenceof Rn×n ofavectorx∈Rn via the observer (24) turns the estimation of the system matricesAˆ ,Bˆ ,Cˆ ,Dˆ,Kˆ andtheobserverinititial   T T T T x x ··· x x 1 n 3 2 conditionsintolinearleastsquaresproblem.Theesti-   matedpair(AˆT,BˆT)canbecomputedfromthisquin-  x2 x1 xn x3 tupleasoutlinedin(23)and(25),respectively. Cn(x)= ... x x ... ... . (27)  2 1    ComputeM3(λ): xn−1 ... ... xn   x x ··· x x STEP1and2: asinComputeM (λ). n n−1 2 1 1 STEP3: With U and the estimated Markov param- nˆ eters in Te we could similarly to estimating the We also define the Hankel matrix operator H(m,n) : u,s Kalman gain, also estimate the pair Bˆ ,Dˆ via a lin- Rm+n−1 →Rm×n ofavectorx∈Rm+n−1 via T ear least squares problem. The matrix BˆT can be   estimatedfromBˆ asoutlinedin(25), x1 x2 ··· xn IonfNth2eSeIxDpeArilmgoernitthsmrTepwoirtthedthinemSeocdtieolnco6muspeuwtailtliobnemblaodcke H(m,n)(x)=x...2 ... ...... . (28) ComputeM (λ).ItturnedoutthattheN2SIDalgorithm   1 x ··· ··· x is much less sensitive to overparametrization compared m m+n−1 ascomparedto WNNopt.Forthatreasonwewillusethe wholeidentificationdatasetinallstepsoftheN2SIDal- For a vectorx ∈ Rm+n−1 it holds that H(m,n)(x) = gorithm for the experiments reported in Section 6, i.e. Cm+n−1 (x),i.e.theHankeloperatoristhelower n:n+m−1,n:−1:1 ide-1andide-2areidenticalandequaltotheidentifi- left corner of the circulant operator where the columns cationdataset. are taken in reverse order. We also define the Toeplitz operatorTn :R2n−1 →Rn×n ofavectorx∈R2n−1 via 5 ADMM   x x ··· x n n−1 1 T(2h0e)ipnro[1b1le],mi.ew.e like to solve is exactly of the form in Tn(x)=xn...+1 ... ... ...... . (29) 1   mxin(cid:107)A(x)+A0(cid:107)(cid:63)+ 2(x−a)TH(x−a) (26) x2n−1 ··· ··· xn forsomelinearoperatorA(x)andsomepositivesemidef- We realize that Tn(x) = H(n,n) (x), i.e. a Toplitz op- :,n:−1:1 inite matrix H. In the above mentioned reference the erator can be obtained from a square Hankel operator linear operator is a Hankel matrix operator, and this by taking the columns in reverse order. We are finally structure is used to tailor the ADMM code to run effi- interested in upper triangular Toeplitz operators with ciently.Essentiallythekeyistobeabletocomputethe and without zeros on the diagonal, and we remark that coefficientmatrixrelatedtothenormalequationsofthe these are easily obtained from the normal Toeplitz op- linearoperatorinanefficientwayusingFFT.Thisma- (cid:104) (cid:105)T eratorbyreplacingxwith xT 0 ,whichwillbeupper trixM isdefinedvia triangular with a non-zero diagonal if x ∈ Rn and with A (A(x))=Mx, ∀x azerodiagonalifx∈Rn−1. adj where A (·) is the adjoint operator of A(·). Similar 5.2 TheFourierTransformandHankelMatrices adj techniques have been used for Toeplitz operators in [17],andarecloselyrelatedtotechniquesforexploiting It is well-known, [6], that if we let Fn ∈ Cn×n be the Toeplitz structure in linear systems of equations, [6]. discrete Fourier transform matrix of dimension n, then 8 thecirculantmatrixcanbeexpressedas 5.4 FormingtheCoefficientMatrix 1 A key matrix in the ADDM algorithm is the matrix M Cn(x)= (Fn)Hdiag(Fnx)Fn. (30) n definedvia FromthisweimmediatelyobtainthattheHankelmatrix Aadj(A(x))=Mx, ∀x. canbeexpressedas We will now show how this matrix can be formed ef- ficiently using the Fast Fourier Transform (FFT). We 1 H(m,n)(x)= HHdiag(FNx)G (31) partitionthematrixas N   M M M where N = n + m − 1, F = FN, G = F and 11 12 13 :,n:−1:1   H = F:,n:n+m−1. This expression will make it easy for M =M1T2 M22 M23 us to represent the adjoint of the Hankel operator. It is MT MT M straight forward to verify that the adjoint H(m,n)(Z) : 13 23 33 adj Rm×n →Rn+m−1 isgivenby wherethepartitionisdonetoconformwiththepartition x=(y,v,w).ItisthenclearthatM isdefinedvia 11 1 H(m,n)(Z)= FHdiag(HZGH). adj N H(s,n)(H(s,n)(y))=M y, ∀y adj 11 Noticethatweareabusingtheoperatordiag(·).Incase Thelefthandsidecanbeexpressedas theargumentisavectortheoperatorproducesadiago- nal matrix with the vector on the diagonal, and in case 1 FHdiag(cid:0)HHHdiag(Fy)GGH(cid:1). theargumentisasquarematrix,theoperatorproduces N2 a vector with the components equal to the diagonal of thematrix. Fromtheidentity diag(Adiag(x)B)=(A(cid:12)BT)x 5.3 TheLinearOperatorA where (cid:12) denotes the Hadamard product of matrices, it Wewillnowpresentthelinearoperatorthatweareinter- followsthat estedinfortheSISOcase:A:RN×Rs×Rs−1 →Rs×n, where M = 1 FH(cid:16)(cid:0)HHH(cid:1)(cid:12)(cid:16)GGH(cid:17)(cid:17)F 11 N2  T  T A(x)=H(s,n)(yˆ)+Tsvs:−1:1 V +Tsws−1:−1:1 W The efficient way to form M11 is to first compute F,G 0 0 and H using an FFT algorithm, and then to form the matrix where n = N − s + 1, x = (yˆ,v,w) with yˆ ∈ RN, X =(cid:0)HHH(cid:1)(cid:12)(cid:16)GGH(cid:17). v ∈ Rs, w ∈ Rs−1, V ∈ Rs×N, and W ∈ Rs×N. By After this one should apply the inverse FFT algorithm takingV =−U andW =−Y weobtainthelinear s,N s,N toXT,andthentothetransposeoftheresultingmatrix operatorforN2SID.Then,vandwarethefirstcolumns of the Toeplitz matrices T and T , respectively. We oncemoretheinverseFFTalgorithm. u,s y,s canexpressA(x)intermsofHankeloperatorsas: The expressions for the other blocks of the matrix M canbederivedinasimilarway,andtheyaregivenby:  T  T A(x)=H(s,n)(yˆ)+H(:,ss,:s−)1:1vs:−01:1 V+H(:,ss,−s)1:−1:1ws−10:−1:1 W. M = 1 FH(cid:16)(cid:0)H HH(cid:1)(cid:12)(cid:16)GVTGH (cid:17)(cid:17)F 12 N2 :,s:−1:1 :,1:s :,s:−1:1 The adjoint of this operator can be expressed in terms M = 1 FH(cid:16)(cid:0)H HH(cid:1)(cid:12)(cid:16)GWTGH (cid:17)(cid:17)F oftheadjointoftheHankeloperatoras 13 N2 :,s:−1:1 :,1:s :,s−1:−1:1 M = 1 FH (cid:16)(cid:0)HVVHHH(cid:1)(cid:12)(cid:16)GGH(cid:17)(cid:17)F  H(s,n)(Z)  22 κ2 :,s:−1:1 :,s:−1:1 Aadj(Z)= Ha(sd,js,)s:−1a:d1j(VZsT:−1:1,:)  M23= κ12F:H,s:−1:1(cid:16)(cid:0)HVWHHH(cid:1)(cid:12)(cid:16)GGH(cid:17)(cid:17)F:,s−1:−1:1 Ha(sd,js,)s−1:−1:1(WZsT:−1:1,:) M33= κ12F:H,s−1:−1:1(cid:16)(cid:0)HWWHHH(cid:1)(cid:12)(cid:16)GGH(cid:17)(cid:17)F:,s−1:−1:1. 9 Notice that for the last three blocks the matrices F, for the SISO case for all i. Below are formulas given for G, and H are defined via a discrete Fourier transform sub-blocksofeachoftheothermatrices matrixoforderκ=2s−1. M = 1 FH(cid:16)(cid:0)H HH(cid:1)(cid:12)(cid:16)GVTGH (cid:17)(cid:17)F 12j N2 :,s:−1:1 j :,1:s :,s:−1:1 5.5 MIMOSystems M = 1 FH(cid:16)(cid:0)H HH(cid:1)(cid:12)(cid:16)GWTGH (cid:17)(cid:17)F 13j N2 :,s:−1:1 j :,1:s :,s−1:−1:1 So far we have only discussed SISO systems. For a general p × m system we may write the linear operator M = 1 FH (cid:16)(cid:0)HV VHHH(cid:1)(cid:12)(cid:16)GGH(cid:17)(cid:17)F A:RpN ×Rpms×Rpp(s−1) →Rps×n as: 22jk κ2 :,s:−1:1 j k :,s:−1:1 M = 1 FH (cid:16)(cid:0)HV WHHH(cid:1)(cid:12)(cid:16)GGH(cid:17)(cid:17)F p 23jk κ2 :,s:−1:1 j k :,s−1:−1:1 A(x)=(cid:88)Ai(xi)⊗ei M = 1 FH (cid:16)(cid:0)HW WHHH(cid:1)(cid:12)(cid:16)GGH(cid:17)(cid:17)F . 33jk κ2 :,s−1:−1:1 j k :,s−1:−1:1 i=1 where It is interesting to notice that these formulas do not dependonindexi.ThismeansthatallM arethesame. i A (x )=H(s,n)(yˆ)+(cid:88)m H(s,s) (cid:32)(cid:34)vsi,:j−1:1(cid:35)(cid:33)T V i i i :,s:−1:1 j 6 ValidationStudy 0 j=1 +(cid:88)p H:(,ss,−s)1:−1:1(cid:32)(cid:34)wsi,−j1:−1:1(cid:35)(cid:33)T Wj Imnenthtsisussiencgtiorenalw-leiferedpaotartsertess.uWltsewonillnmumakeeriucsaeloefxspoemrie- 0 j=1 representative data sets from the DaISy collection, [2]. For preliminary test with the new N2SID mehod based whereV =−Uj andW =−Yj areHankelmatrices onacademicexamples,wereferto[23]. j s,N j s,N definedfromu andy ,i.e.fromthejthinputsandout- j j puts, respectively, and where e is the ith basis vector. The numerical results reported in Subsection 6.3 were i Hence we may interpret each term A (x ) as defining a performedwithMatlab.Theimplementationshavebeen i i MISOsysteminthesensethateachpredictedoutputcan carried out in MATLAB R2013b running on an Intel bewrittenasalinearcombinationofalltheintputsand Corei7CPUM2502GHzwith8GBofRAM. outputs. If we write the adjoint variable Z in a similar wayasZ =(cid:80)pi=1Zi⊗ei,itfollowsthattheadjointoper- 6.1 Dataselectionandpre-processing atorisgivenbyA (Z)=(A (Z ),...,A (Z )). adj adj,1 1 adj,p p Hence the matrix M will now be blockdiagonal with From the DaISy collection, [2], five representative data blocksdefinedfromtheidentity sets were selected. These sets contain SISO, SIMO, MISO and MIMO systems. Information about the se- A (A (x ))=M x , ∀x , i=1,...,p. adj,i i i i i i lected data sets is provided in Table 1. In order to evaluate the performances for small length data sets, ItisnotdifficulttorealizethattheoperatorsAadj,i(Zi) data sets of increasing length are considered. The data willbegivenby length is indicated by N in Table 2 for each data set ide of Table 1 . To test the sensitivity of the identification   H(s,n)(Z ) mehods with respect to the length of the identification adj i  Ha(sd,js,)s:−1:1(V1ZiT;s:−1:1,:)  dcoamtapasreetd, Ntoidteheistointacrlenausemdbefrroomf saamsmplaelslanvuamilabbelre,, ains  ..  a way as indicated in Table 2 From each identification  .    andvalidationdatasettheoffsetisremoved.Dataset2 Aadj,i(Zi)= Ha(sd,js,)s:−1:1(VmZiT;s:−1:1,:) . from the continuous stirred tank reactor is scaled in Ha(sd,js,)s−1:−1:1(W1ZiT;s:−1:1,:) smuecrhicaalwraayngteh.aTthbiostihsaocuhtipeuvtesdhbayvescaablionugttthheedseatmreendneud-  ..  versions of these outputs such that the maximum value  .    ofeachoutputequals1. H(s,s) (W ZT ) adj,s−1:−1:1 p i;s:−1:1,: Sincemanyofthedatasetscontainpoorlyexciteddata Hence each of the blocks M will have a similar struc- at their beginning, the first del samples are discarded i ture as the M matrix for the SISO system. However, fromeachdataset.Theactualvalueofdelforeachdata the sub-blocks M , M , M , M and M will have set is listed in Table 2. Finally each identified model is 12 13 22 23 33 sub-blocksthemselvesreflectingthatfactthatthereare validatedforeachtestcaseonthesamevalidationdata severalinputsandoutputs.M willbethesameasM set.ThesevalidationdatasetscontaintheN samples 11 1 val 10