ebook img

Low-Rank Approximation PDF

141 Pages·2014·1.68 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Low-Rank Approximation

Ivan Markovsky Low-Rank Approximation Algorithms, Implementation, Applications September2,2014 Springer vi Preface Preface mationproblems.Thetheoryandalgorithmsofthesenewclassesofproblemsare interestingintheirownrightandbeingapplicationdrivenarepracticallyrelevant. Stochastic estimation and deterministic approximation are two complementary aspectsofdatamodeling.Theformeraimstofindfromnoisydata,generatedbya low-complexitysystem,anestimateofthatdatageneratingsystem.Thelatteraims tofindfromexactdata,generatedbyahighcomplexitysystem,alow-complexity approximation of the data generating system. In applications both the stochastic estimation and deterministic approximation aspects are likely to be present. The dataislikelytobeimpreciseduetomeasurementerrorsandislikelytobegener- atedbyacomplicatedphenomenonthatisnotexactlyrepresentablebyamodelin theconsideredmodelclass.Thedevelopmentofdatamodelingmethodsinsystem identification andsignalprocessing,however,hasbeendominatedbythestochas- tic estimation point of view. If considered, the approximation error is represented inthemainstreamdatamodelingliteratureasarandomprocess.Thisisnotnatural Mathematicalmodelsareobtainedfromfirstprinciples(naturallaws,interconnec- becausethe approximation error is by definition deterministic and even if consid- tion,etc.)andexperimentaldata.Modelingfromfirstprinciplesiscommoninnat- ered as a random process, it is not likely to satisfy standard stochastic regularity ural sciences, while modeling from data is common in engineering. In engineer- conditionssuchaszeromean,stationarity,ergodicity,andGaussianity. ing, often experimental data is available and a simple approximate model is pre- Anexceptiontothestochasticparadigmindatamodelingisthebehavioralap- ferredtoacomplicateddetailedone.Indeed,althoughoptimalpredictionandcon- proach,initiatedbyJ.C.Willemsinthemid80’s.Althoughthebehavioralapproach trolofacomplex(high-order,nonlinear,time-varying)systemiscurrentlydifficult is motivated by the deterministic approximation aspect of data modeling, it does toachieve,robustanalysisanddesignmethods,basedonasimple(low-order,lin- not exclude the stochastic estimation approach. In this book, we use the behav- ear,time-invariant)approximatemodel,mayachievesufficientlyhighperformance. ioralapproachasalanguagefordefiningdifferentmodelingproblemsandpresent- Thisbookaddressestheproblemofdataapproximationbylow-complexitymodels. ingtheirsolutions.Weemphasizetheimportanceofdeterministicapproximationin A unifying theme of the book is low-rank approximation: a prototypical data datamodeling,however,weformulateandsolvestochasticestimationproblemsas modelingproblem.Therankofamatrixconstructedfromthedatacorrespondsto low-rankapproximationproblems. thecomplexityofalinearmodelthatfitsthedataexactly.Thedatamatrixbeingfull Manywellknownconceptsandproblemsfromsystemsandcontrol,signalpro- rankimpliesthatthereisnoexactlowcomplexitylinearmodelforthatdata.Inthis cessing, and machine learning reduce to low-rank approximation. Generic exam- case,theaimistofindanapproximatemodel.Oneapproachforapproximatemod- plesinsystemtheoryaremodelreductionandsystemidentification.Theprincipal eling,consideredinthebook,istofindsmall(insomespecifiedsense)modification componentanalysismethodinmachinelearningisequivalenttolow-rankapproxi- ofthedatathatrendersthemodifieddataexact.Theexactmodelforthemodified mation,whichsuggeststhatrelateddimensionalityreduction,classification,andin- dataisanoptimal(inthespecifiedsense)approximatemodelfortheoriginaldata. formationretrievalproblemscanbephrasedaslow-rankapproximationproblems. Thecorrespondingcomputationalproblemislow-rankapproximation.Itallowsthe Sylvesterstructuredlow-rankapproximationhasapplicationsincomputationswith usertotradeoffaccuracyvscomplexitybyvaryingtherankoftheapproximation. polynomialsandisrelatedtomethodsfromcomputeralgebra. Thedistancemeasureforthedatamodificationisauserchoicethatspecifiesthe The developed ideas lead to algorithms, which are implemented in software. desiredapproximationcriterionorreflectspriorknowledgeabouttheaccuracyofthe The algorithms clarify the ideas and the software implementation clarifies the al- data.Inaddition,theusermayhavepriorknowledgeaboutthesystemthatgenerates gorithms.Indeed,thesoftwareistheultimateunambiguousdescriptionofhowthe thedata.Suchknowledgecanbeincorporatedinthemodelingproblembyimposing ideas are put to work. In addition, the provided software allows the reader to re- constraintsonthemodel.Forexample,ifthemodelisknown(orpostulated)tobe producetheexamplesinthebookandtomodifythem.Theexpositionreflectsthe alineartime-invariantdynamicalsystem,thedatamatrixhasHankelstructureand sequence the approximating matrix should have the same structure. This leads to a Hankel theory algorithms implementation. structuredlow-rankapproximationproblem. 7→ 7→ A tenet of the book is: the estimation accuracy of the basic low-rank approx- Correspondingly,thetextisinterwovenwithcodethatgeneratesthenumericalex- imation method can be improved by exploiting prior knowledge, i.e., by adding amplesbeingdiscussed. constraintsthatareknowntoholdforthedatageneratingsystem.Thispathofde- velopment leads to weighted, structured, and other constrained low-rank approxi- v Preface vii viii Preface Prerequisites andpracticeproblems in MATLABin order to generate the figures and numerical results. Finally, LATEX, bibTEX,dvips, andps2pdf arerunonthefiles createdintheprevious stepsto generate the final pdf file of the book. The described process guarantees that the Acommonfeatureofthecurrentresearchactivityinallareasofscienceandengi- codeinthetextistheactualcodethathasgeneratedtheresultsshowninthebook. neeringisthenarrowspecialization.Inthisbook,wepickapplicationsinthebroad Alaptopandadesktopcomputerswereusedforworkingonthebook.Keeping area of data modeling, posing and solving them as low-rank approximation prob- the files synchronized betweenthetwo computers was done usingthe Unisonfile lems.Thisunifiesseeminglyunrelatedapplicationsandsolutiontechniquesbyem- synchronizer. phasisingtheircommonaspects(e.g.,complexity–accuracytrade-off)andabstract- ing from theapplicationspecificdetails,terminology, andimplementation details. Despiteofthefactthatapplicationsinsystemsandcontrol,signalprocessing,ma- chinelearning,andcomputervisionareusedasexamples,theonlyrealprerequisites Acknowledgements forfollowingthepresentationisknowledgeoflinearalgebra. Thebookisintendedtobeusedforselfstudybyresearchersintheareaofdata AnumberofindividualsandtheEuropeanResearchCouncilcontributedandsup- modelingandbyadvancedundergraduate/graduatelevelstudentsasacomplemen- ported me during the preparation of the book. Oliver Jackson—Springer’s edi- tary text for acourse on systemidentification or machine learning. In either case, tor (engineering)—encouraged me to embark on the project. My colleagues in theexpectedknowledgeisundergraduatelevellinearalgebra.Inaddition,MATLAB ESAT/SISTA,K.U.LeuvenandECS/ISIS,Southampton,U.K.createdtherighten- codeisused,sothatfamiliaritywithMATLABprogramminglanguageisrequired. vironmentfordevelopingtheideasinthebook.Inparticular,IamindebttoJanC. Passivereadingofthebookgivesabroadperspectiveonthesubject.Deeperun- Willems (SISTA) for his personal guidance and example of critical thinking. The derstanding,however,requiresactiveinvolvement,suchassupplyingmissingjusti- behavioralapproachthatJaninitiatedintheearly1980’sispresentinthisbook. ficationofstatementsandspecificexamplesofthegeneralconcepts,applicationand Maarten De Vos, Diana Sima, Konstantin Usevich, and Jan Willems proofread modificationofpresentedideas,andsolutionoftheprovidedexercisesandpractice chaptersofthebookandsuggestedimprovements.Igratefullyacknowledgefunding problems.Therearetwotypesofpracticeproblems:analytical,askingforaproof fromtheEuropeanResearchCouncilundertheEuropeanUnion’sSeventhFrame- ofastatementclarifyingorexpandingthematerialinthebook,andcomputational, workProgramme(FP7/2007-2013)/ERCGrantagreementnumber258581“Struc- askingforexperimentswithrealorsimulateddataofspecificapplications.Mostof turedlow-rankapproximation:Theory,algorithms,andapplications”. theproblemsareeasytomediumdifficulty.Afewproblems(markedwithstars)can beusedassmallresearchprojects. Thecodeinthebook,availablefrom Southampton, IvanMarkovsky http://extra.springer.com/ September2,2014 hasbeentestedwithMATLAB7.9,runningunderLinux,andusestheOptimization Toolbox4.3,ControlSystemToolbox8.4,andSymbolicMathToolbox5.3.Aver- sion of thecodethat is compatible with Octave(afree alternative to MATLAB)is alsoavailablefromthebook’swebpage. Softwaretoolsusedfortypesettingthebook ThebookistypesetusingLATEXandanumberofextensionpackages.Diagramsare createdusingthePSTricksandxypackagesandthestandalonexfigprogram.All editingisdoneintheGNUEmacseditor,usingorg-modeandlatex-mode.MATLAB code,presentedinaliterateprogrammingstyleisanintegralpartofthetext.Thisis achievedbytheNormanRamsey’snowebsystem. The process of compiling the book involves a number of steps automated by a GNUMakefile.FirsttheLATEXsourcecodeandtheMATLABfunctionsandscripts are extracted from noweb’s source files. The obtained scripts are then executed x Contents Contents PartII Miscellaneousgeneralizations 5 Missingdata,centering,andconstraints ..........................137 5.1 Weightedlow-rankapproximationwithmissingdata .............138 5.2 Affinedatamodeling........................................150 5.3 Complexleastsquaresproblemwithconstrainedphase...........159 5.4 Approximatelowrankfactorizationwithstructuredfactors........165 5.5 Notesandreferences........................................177 6 Nonlinearstaticdatamodeling ..................................181 6.1 Aframeworkfornonlinearstaticdatamodeling .................181 6.2 Nonlinearlow-rankapproximation ............................186 6.3 Algorithms................................................189 6.4 Examples .................................................193 6.5 Notesandreferences........................................198 1 Introduction................................................... 1 1.1 Classicalandbehavioralparadigmsfordatamodeling............ 1 7 Fastmeasurementsofslowprocesses .............................201 1.2 Motivatingexampleforlow-rankapproximation................. 3 7.1 Introduction ...............................................201 1.3 Overviewofapplications .................................... 7 7.2 Estimationwithknownmeasurementprocessdynamics ..........204 1.4 Overviewofalgorithms ..................................... 21 7.3 Estimationwithunknownmeasurementprocessdynamics ........206 1.5 Literateprogramming....................................... 24 7.4 Examplesandreal-lifetesting ................................214 1.6 Notesandreferences........................................ 27 7.5 Notesandreferences........................................225 References.....................................................227 PartI Linearmodelingproblems A Approximatesolutionofanoverdeterminedsystemofequations.....229 2 Fromdatatomodels............................................ 37 2.1 Linearstaticmodelrepresentations............................ 37 B Proofs ........................................................237 2.2 Lineartime-invariantmodelrepresentations .................... 47 2.3 Exactandapproximatedatamodeling ......................... 54 P Problems......................................................241 2.4 Unstructuredlow-rankapproximation ......................... 62 2.5 Structuredlow-rankapproximation............................ 68 S Solutions......................................................247 2.6 Notesandreferences........................................ 72 Notation...........................................................263 3 Algorithms .................................................... 75 3.1 Subspacemethods.......................................... 75 Listofcodechunks .................................................265 3.2 Algorithmsbasedonlocaloptimization ........................ 83 Functionsandscriptsindex..........................................269 3.3 Datamodelingusingthenuclearnormheuristic ................. 98 3.4 Notesandreferences........................................106 Index .............................................................271 4 Applicationsinsystem,control,andsignalprocessing ..............109 4.1 Introduction ...............................................109 4.2 Modelreduction ...........................................110 4.3 Systemidentification........................................117 4.4 Analysisandsynthesis ......................................124 4.5 Simulationexamples........................................127 4.6 Notesandreferences........................................130 ix 2 1 Introduction Chapter 1 Toseethatexistenceofalowcomplexityexactlinearmodelisequivalenttorank Introduction deficiencyofthedatamatrix,letthecolumnsd1,...,dNofDbetheobservationsand theelementsd ,...,d ofd betheobservedvariables.Weassumethatthereareat 1j qj j leastasmanyobservationsasobservedvariables,i.e.,q N.AlinearmodelforD ≤ declaresthattherearelinearrelationsamongthevariables,i.e.,therearevectorsr , k suchthat rk⊤dj=0, for j=1,...,N. If there are p independent linear relations, then D has rank less than or equal to m:=q pandtheobservationsbelongtoatmostm-dimensionalsubspaceBofRq. Theveryartofmathematicsistosaythesamething Weide−ntifythemodelforD,definedbythelinearrelationsr1,...,rp Rq,withthe anotherway. set B Rq. Oncea model B is obtained from the data,all possibl∈einput/output ⊂ partitionscanbeenumerated,whichisananalysisproblemfortheidentifiedmodel. Unknown Therefore,thechoiceofaninput/outputpartitioninthebehavioralparadigmtodata modeling can be incorporated, if desired, in the modeling problem and thus need 1.1 Classicalandbehavioralparadigmsfordatamodeling notbehypothesizedasnecessarilydoneintheclassicalparadigm. The classical and behavioral paradigms for data modeling are related but not Fittinglinearmodelstodatacanbeachieved,bothconceptuallyandalgorithmically, equivalent.AlthoughexistenceofsolutionofthesystemAX=Bimpliesthatthema- bysolvingapproximatelyasystemoflinearequations trix AB islowrank,itisnottruethat AB havingasufficientlylowrankimplies that the system AX =B is solvable. This lack of equivalence causes ill-posed (or (cid:2) (cid:3) (cid:2) (cid:3) AX B, (LSE) numericallyill-conditioned)datafittingproblemsintheclassicalparadigm,which ≈ have no solution (or are numerically difficult to solve). In terms of the data fit- where thematrices A andB are constructedfrom the given dataandthematrix X ting problem, ill-conditioning of the problem (LSE) means that the a priori fixed parametrizesthemodel.Inthisclassicalparadigm,themaintoolsaretheordinary input/output partition of the variables is not corroborated by the data. In the be- linearleastsquaresmethodanditsvariations—regularizedleastsquares,totalleast havioral setting without the a priori fixed input/output partition of the variables, squares, robust least squares, etc. The least squares method and its variations are ill-conditioning of the data matrix D implies that the data approximately satisfies mainlymotivatedbytheirapplicationsfordatafitting,buttheyinvariablyconsider linearrelations,sothatnearlyrankdeficiencyisagoodfeatureofthedata. solvingapproximatelyanoverdeterminedsystemofequations. Theclassicalparadigmisincludedinthebehavioralparadigmasaspecialcase Theunderlyingpremiseintheclassicalparadigmisthatexistenceofanexactlin- because approximate solution of an overdetermined system of equations (LSE) is earmodelforthedataisequivalenttoexistenceofsolutionX toasystemAX =B. a possible approach to achieve low-rank approximation. Alternatively, low-rank Suchamodelisalinearmap:thevariablescorrespondingtotheAmatrixareinputs approximation can be achieved by approximating the data matrix with a matrix (or causes)andthevariables correspondingtotheBmatrix areoutputs (or conse- thathasatleastp-dimensionalnullspace,oratmostm-dimensionalcolumnspace. quences) in the sensethat they are determined by the inputs and the model. Note Parametrizingthenullspaceandthecolumnspacebysetsofbasisvectors,theal- thatintheclassicalparadigmtheinput/outputpartitionofthevariablesispostulated ternativeapproachesare: a priori. Unless the model is required to have the a priori specified input/output partition, imposing such structure in advance is ad hoc and leads to undesirable 1. kernelrepresentation thereisafullrowrankmatrixR Rp×q,suchthat ∈ theoreticalandnumericalfeaturesofthemodelingmethodsderived. RD=0, An alternative to the classical paradigm that does not impose an a priori fixed input/outputpartitionisthebehavioralparadigm.Inthebehavioralparadigm,fitting 2. imagerepresentation therearematricesP Rq mandL Rm N,suchthat linear models to data is equivalent to the problem of approximating a matrix D, ∈ × ∈ × constructed from the data, by a matrix D of lower rank. Indeed, existence of an D=PL. exactlinearmodelforDisequivalenttoDbeingrankdeficient.Moreover,therank ofDisrelatedtothecomplexityofthemobdel.Thisfactisthetenetofthebookand Theapproachesusingkernelandimagerepresentationsareequivalenttotheorig- is revisited in the following chapters in the context of applications from systems inallow-rankapproximationproblem.Next,theuseofAX =B,kernel,andimage and control, signal processing, computer algebra, and machine learning. Also its representationsisillustratedonthemostsimpledatafittingproblem—linefitting. implicationtothedevelopmentofnumericalalgorithmsfordatafittingisexplored. 1 1.2 Motivatingexampleforlow-rankapproximation 3 4 1 Introduction 1.2 Motivatingexampleforlow-rankapproximation Thisfeatureoftheclassicalapproachisundesirable:itismorenaturaltospecifya desiredfittingcriterionindependentlyofhowthemodelhappenstobeparametrized. Given a (multi)set of points d ,...,d R2 in the plane, the aim of the line Inmanydatamodelingmethods,however,amodelrepresentationisapriorifixed 1 N { }⊂ andimplicitlycorrespondstoaparticularfittingcriterion. fitting problem is to findaline passingthrough theorigin that “best”matches the givenpoints.Theclassicalapproachforlinefittingistodefine axls=bfit a=bx′lsfit a col(aj,bj):= bj :=dj 6 6 j (cid:20) (cid:21) 4 4 (“:=”standsfor“bydefinition”,seepage263foralistofnotation) andsolveap- proximatelytheoverdeterminedsystem 2 2 col(a ,...,a )x=col(b ,...,b ) (lse) b 0 b 0 1 N 1 N −2 −2 bytheleastsquaresmethod.Letx betheleastsquaressolutionto(lse).Thenthe ls leastsquaresfittinglineis −4 −4 Bls:= d=col(a,b) R2 axls=b . −6 −6 { ∈ | } −2 0 2 −2 0 2 a a Geometrically, Bls minimizes the sum of the squared vertical distances from the datapointstothefittingline. Fig.1.1 Leastsquaresfits(solidlines)minimizingvertical(leftplot)andhorizontal(rightplot) distances. TheleftplotinFigure1.1showsaparticularexamplewithN=10datapoints. Thedatapointsd1,...,d10 arethecirclesinthefigure,thefitBls isthesolidline, andthefittingerrorse:=ax barethedashedlines.Visuallyoneexpectsthebest Thetotalleastsquaresmethodisanalternativetoleastsquaresmethodforsolv- ls − fittobetheverticalaxis,sominimizingverticaldistancesdoesnotseemappropriate ing approximately anoverdetermined system of linear equations.In terms of data inthisexample. fitting,thetotalleastsquaresmethodminimizesthesumofthesquaredorthogonal Notethatbysolving(lse),a(thefirstcomponentsofthed)istreateddifferently distancesfromthedatapointstothefittingline.Usingthesystemofequations(lse), from b (the second components): b is assumed to be a function of a. This is an linefittingbythetotalleastsquaresmethodleadstotheproblem arbitrarychoice;thedatacanbefittedalsobysolvingapproximatelythesystem a b col(a1,...,aN)=col(b1,...,bN)x, (lse′) minimize overx R, ..1 RN,and ..1 RN ∑N d aj 2 inwhichcaseaisassumedtobeafunctionofb.Letx′lsbetheleastsquaressolution ∈ ab.N∈ bb.N∈ j=1(cid:13)(cid:13)(cid:13) j−(cid:20)bbj(cid:21)(cid:13)(cid:13)(cid:13)2 (tls) to(lse′).Itgivesthefittingline subjectto a x=b , for j=1,...,N.   (cid:13) b (cid:13) j j b b Bl′s:={d=col(a,b)∈R2|a=bx′ls}, However,forthbedatabinFigure1.1thetotalleastsquaresproblemhasnosolution. Informally, the approximate solution is x =∞, which corresponds to a fit by a whichminimizesthesumofthesquaredhorizontaldistances(seetherightplotin tls verticalline.Formally, Figure1.1).ThelineBl′shappenstoachievethedesiredfitintheexample. thetotalleastsquaresproblem(tls)mayhavenosolutionandthereforefailto Intheclassicalapproachfordatafitting,i.e.,solvingapproximatelyasystem giveamodel. oflinearequationsintheleastsquaressense,thechoiceofthemodelrepre- sentationaffectsthefittingcriterion. The use of (lse) in the definition of the total least squares line fitting problem restricts the fitting line to be a graph of a function ax=b for some x R. Thus, ∈ theverticallineisaprioriexcludedasapossiblesolution.Intheexample,theline 1.2 Motivatingexampleforlow-rankapproximation 5 6 1 Introduction minimizingthesumofthesquaredorthogonaldistanceshappenstobethevertical rank [d d ] 1. 1 N ··· ≤ line.Forthisreason,x doesnotexist. tls Any line B passing through the origin can be represented as an image and a Thus,(lraP)and(lraR)areinstances(cid:0)obfoneabnd(cid:1)thesame kernel,i.e.,thereexistmatricesP R2 1andR R1 2,suchthat × × ∈ ∈ B=image(P):= d=Pℓ R2 ℓ R abstractproblem:approximatethedatamatrixDbyalowrankmatrixD. { ∈ | ∈ } and b B=ker(R):= d R2 Rd=0 . { ∈ | } InChapter2,theobservationsmadeinthelinefittingexamplearegeneralizedto Usingtheimagerepresentationofthemodel,thelinefittingproblemofminimizing modelingofq-dimensionaldata.Theunderlyinggoalis: thesumofthesquaredorthogonaldistancesis N minimize overP R2 1and ℓ ℓ R1 N ∑ d d 2 GivenasetofpointsinRq (thedata),findasubspaceofRq ofboundeddi- ∈ × 1 ··· N ∈ × j=1k j− jk2 (lra′P) mension(amodel)thathastheleast(2-norm)distancetothedatapoints. subjectto d =Pℓ , for j=(cid:2)1,...,N. (cid:3) j j b With b Such asubspaceis a(2-norm) optimal fitting model. Generalrepresentationsof a D:= d1 ···dN , D:= d1 ···dN , subspaceinRq arethekernelortheimageofamatrix.Theclassicalleastsquares and total least squares formulations of the data modeling problem exclude some and theFrobeniusno(cid:2)rm, (cid:3) h i k·kF b b b subspaces.TheequationsAX =BandA=BX′,usedintheleastsquaresandtotal leastsquaresproblemformulationstorepresentthesubspace,mightfailtorepresent kEkF:= vec(E) 2= e11 ···eq1 ···e1N ···eqN ⊤ 2, forallE∈Rq×N theoptimalsolution,whilethekernelandimagerepresentationsdonothavesuch (lra′P)ismo(cid:13)(cid:13)recompa(cid:13)(cid:13)ctlyw(cid:13)(cid:13)(cid:13)r(cid:2)ittenas (cid:3) (cid:13)(cid:13)(cid:13) dfoerfidcaietancmy.oTdehliisnsgu.ggeststhatthekernelandimagerepresentationsarebettersuited TheequationsAX=BandA=BX wereintroducedfromanalgorithmicpointof ′ minimize overP∈R2×1andL∈R1×N kD−Dk2F (lra ) view—byusingthem,thedatafittingproblemisturnedintothestandardproblemof subjectto D=PL. P solvingapproximatelyanoverdeterminedsystemoflinearequations.Aninterpreta- b tionoftheseequationsinthedatamodelingcontextisthatinthemodelrepresented Similarly,usingakernelrepresentation,thelinefittingproblem,minimizingthesum bytheequationAX =B,thevariableAisaninputandthevariableBisanoutput. b ofsquaresoftheorthogonaldistancesis Similarly,inthemodelrepresentedbytheequationA=BX ,AisanoutputandBis ′ aninput.Theinput/outputinterpretationhasanintuitiveappealbecauseitimpliesa minimize overR∈R1×2,R6=0,andD∈R2×N kD−Dk2F (lra ) causaldependenceofthevariables:theinputiscausingtheoutput. R Representing the model by an equation AX =B and A=BX , as done in the subjectto RD=0. ′ b b classicalapproach,oneaprioriassumesthattheoptimalfittingmodelhasacertain Contrarytothetotalleastsquaresproblem(tls),problems(lra )and(lra )always input/outputstructure.Theconsequencesare: b P R have(nonunique) solutions.In theexample,solutions are,e.g.,P =col(0,1) and ∗ existenceofexceptional(nongeneric)cases,whichcomplicatethetheory, R = 10 ,whichdescribetheverticalline • ∗ ill-conditioningcausedby“nearly”exceptionalcases,whichleadstolackofnu- • (cid:2) (cid:3) B∗:=image(P∗)=ker(R∗). mericalrobustnessofthealgorithms,and needofregularization,whichleadstoachangeofthespecifiedfittingcriterion. • Theconstraints Theseaspectsoftheclassicalapproacharegenerallyconsideredasinherenttothe data modeling problem. By choosing the alternative image and kernel model rep- D=PL, withP∈R2×1, L∈R1×N and RD=0, withR∈R1×2, R6=0 resentations, the problem of solving approximately an overdetermined system of equationsbecomesalow-rankapproximationproblem,wherethenongenericcases areebquivalenttotheconstraintrank(D)≤1,whichbshowsthatthepoints{d1,...,dN} (andtherelatedissuesofill-conditioningandneedofregularization)areavoided. beingfittedexactlybyalinepassingthroughtheoriginisequivalentto b b b 1.3 Overviewofapplications 7 8 1 Introduction 1.3 Overviewofapplications exactdeterministic approximatedeterministic → ↓ ↓ exactstochastic approximatestochastic → Inthissection,examplesoflow-rankapproximationdrawnfromdifferentapplica- tionareasarelisted.Thefactthatamatrixconstructedfromexactdataislowrank Fig.1.2 Transitionsamongexactdeterministic,approximatedeterministic,exactstochastic,and approximatestochasticmodelingproblems.Thearrowsshowprogressionfromsimpletocomplex. and the approximate modeling problem is low-rank approximation is sometimes wellknown(e.g.,inrealizationtheory,modelreduction,andapproximategreatest commondivisor).Inothercases(e.g.,naturallanguageprocessingandconicsection fitting),thelinktolow-rankapproximationislesswellknownandisnotexploited. Theapplicationscanbereadinanyorderorskippedwithoutlossofcontinuity. Commonpatternindatamodeling Applications insystemsandcontrol Themottoofthebookis: Deterministicsystemrealizationandmodelreduction Behindeverydatamodelingproblemthereisa(hidden)low-rankapproxima- tionproblem:themodelimposesrelationsonthedatawhichrenderamatrix Realizationtheoryaddressestheproblemoffindingastaterepresentationofalinear constructedfromexactdatarankdeficient. time-invariantdynamicalsystemdefinedbyatransferfunctionorimpulseresponse representation.Thekeyresultinrealizationtheoryisthatasequence Althoughanexactdatamatrixislowrank,amatrixconstructedfromobserved dataisgenericallyfullrankduetomeasurementnoise,unaccountedeffects,andas- H= H(0),H(1),...,H(t),... sumptionsaboutthedatageneratingsystemthatarenotsatisfiedinpractice.There- isanimpulseresponseofadis(cid:0)crete-timelineartime-inv(cid:1)ariantsystemofordernif fore,generically,theobserveddatadoesnothaveanexactlowcomplexitymodel. andonlyifthetwosidedinfiniteHankelmatrix Thisleadstotheproblemofapproximatemodeling,whichcanbeformulatedasa low-rank approximation problem as follows. Modify the data as little as possible, H(1)H(2)H(3) sothatthematrixconstructedfromthemodifieddatahasaspecifiedlowrank.The ··· modified data matrix being low rank implies that there is an exact model for the H(2)H(3) ...  modifieddata.Thismodelisbydefinitionanapproximatemodelforthegivendata. H(H):=H(3) ... , Thetransitionfromexacttoapproximatemodelingisanimportantstepinbuilding   acoherenttheoryfordatamodelingandisemphasizedinthisbook.  ...    Inallapplications,theexactmodelingproblemisdiscussedbeforethepractically   moreimportantapproximatemodelingproblem.Thisisdonebecause1)exactmod- constructedfromH hasrankn,i.e., elingissimplerthanapproximatemodeling,sothatitistherightstartingplace,and 2)exactmodelingisapartofoptimalapproximatemodelingandsuggestswaysof rank H(H) =orderofaminimalrealizationofH. solvingsuchproblemssuboptimally.Indeed,smallmodificationsofexactmodeling Therefore,existence(cid:0)ofafin(cid:1)itedimensionalrealizationofH (exactlowcomplexity algorithms leadtoeffectiveapproximatemodelingalgorithms. Wellknownexam- lineartime-invariantmodelforH)isequivalenttorankdeficiencyofaHankelma- ples of the transition from exact to approximate modeling in systems theory are trixconstructedfromthedata.Aminimalstaterepresentationcanbeobtainedfrom theprogressionsfromrealizationtheorytomodelreductionandfromdeterministic arankrevealingfactorizationofH(H). subspaceidentificationtoapproximateandstochasticsubspaceidentification. When there is no exact finite dimensional realization of the data or the exact The estimator consistency question in stochastic estimation problems corre- realization is of high order, one may want to find an approximate realization of a spondstoexactdatamodelingbecauseasymptoticallythetruedatageneratingsys- specified low order n. These, respectively, approximate realization and model re- temisrecoveredfrom observeddata.Estimationwithfinitesamplesize,however, ductionproblemsnaturallyleadtoHankelstructuredlow-rankapproximation. necessarily involves approximation. Thus in stochastic estimation theory there is The deterministic system realization and model reduction problems are further alsoastepoftransitionfromexacttoapproximate,seeFigure1.2. consideredinSections2.2,3.1,and4.2. 1.3 Overviewofapplications 9 10 1 Introduction Stochasticsystemrealization w(1) w(2) w(T nmax) ··· − w(2) w(3) w(T n +1) max Let y be the output of an nth order linear time-invariant system, driven by white Hnmax+1(w):= .. .. ··· − ..  (Hi) . . . noise(astochasticsystem)andletEbetheexpectationoperator.Thesequence   w(nmax+1)w(nmax+1) w(T)   ···    R= R(0),R(1),...,R(t),... withn +1blockrows,constructedfromthetrajectoryw,isrankdeficient: max definedby (cid:0) (cid:1) R(τ):=E y(t)y⊤(t τ) rank Hnmax+1(w) ≤rank Hnmax+1(u) +orderofthesystem. (SYSID) − is called the autocorrelation sequence(cid:0)of y. Stochas(cid:1)tic realization theory is con- Conversely(cid:0),iftheHank(cid:1)elmatrix(cid:0)Hnmax+1(w)(cid:1)hasrank(nmax+1)m+nandthema- cerned with the problem of finding a state representation of a stochastic system trix H2nmax+1(u) is full row rank (persistency of excitation of u), then w is a tra- jectory of a controllable linear time-invariant system of order n. Under the above thatcouldhavegeneratedtheobservedoutputy,i.e.,alineartime-invariantsystem assumptions, the data generating system can be identified from a rank revealing drivenbywhitenoise,whoseoutputcorrelationsequenceisequaltoR. An important result in stochastic realization theory is that R is the output cor- factorizationofthematrixHnmax+1(w). When there are measurement errors or the data generating system is not a low relation sequence of an nth order stochastic system if and only if the Hankel ma- trixH(R)constructedfromRhasrankn,i.e., complexity linear time-invariant system, the data matrix Hnmax+1(w) is generi- callyfullrank.Insuchcases,anapproximatelow-complexitylineartime-invariant rank H(R) =orderofaminimalstochasticrealizationofR. modelforwcanbederivedbyfindingaHankelstructuredlow-rankapproximation Therefore,stoch(cid:0)asticre(cid:1)alizationofarandomprocessyisequivalenttodeterministic coafnHbnemaapx+p1li(ewd)a.lTsoheforerfaoprep,rothxeimHaatenkseylstsetmrucidtuernetdifilcoawti-ornan.Lkianpeparrotximime-aitniovnarpiarnotbsleyms- realizationofitsautocorrelationsequenceR.Whenitexists,thefinitedimensional temidentificationisamaintopicofthebookandappearsfrequentlyinthefollowing stochastic realizations can be obtained from a rank revealing factorization of the chapters. matrixH(R). Similarly,totheanalogybetweendeterministicandstochasticsystemrealization, Inpractice,onlyafinitenumberoffinitelengthrealizationsoftheoutputyare thereisananalogybetweendeterministicandstochasticsystemidentification.The available, so that the autocorrelation sequence is estimated from y. With an esti- latteranalogysuggestsanapplicationofHankelstructuredlow-rankapproximation mate R of the autocorrelation R, the Hankel matrix H(R) is almost certainly full tostochasticsystemidentification. rank,whichimpliesthatafinitedimensionalstochasticrealizationcannotbefound. Therefbore,theproblemoffindinganapproximatestochastbicrealizationoccurs.This problemisagainHankelstructuredlow-rankapproximation. Applications incomputeralgebra Systemidentification Greatestcommondivisoroftwopolynomials Realizationtheoryconsidersasystemrepresentationproblem:passfromonerepre- Thegreatestcommondivisorofthepolynomials sentationofasystemtoanother.Alternatively,itcanbeviewedasaspecialexact identificationproblem:findfromimpulseresponsedata(aspecialtrajectoryofthe p(z)=p0+p1z+ +pnzn and q(z)=q0+q1z+ +qmzm ··· ··· system)astatespacerepresentationofthedatageneratingsystem.Theexactidenti- isapolynomialcofmaximaldegreethatdividesbothpandq,i.e.,amaximaldegree ficationproblem(alsocalleddeterministicidentificationproblem)istofindfroma polynomialc,forwhichtherearepolynomialsrands,suchthat generalresponseofasystem,arepresentationofthatsystem.Let p=rc and q=sc. w=col(u,y), where u= u(1),...,u(T) and y= y(1),...,y(T) be an input/output trajectory of a d(cid:0)iscrete-time lin(cid:1)ear time-inva(cid:0)riant system of(cid:1)or- DefinetheSylvestermatrixofthepolynomials pandq dernwithminputsandpoutputsandletn beagivenupperboundonn.Then max theHankelmatrix 1.3 Overviewofapplications 11 12 1 Introduction ℓ1 p0 q0 • p p q q  ...1 p01 ... ...1 q10 ...  ··· ℓm R(p,q):=pn ... ... p0 qm ... ... q0∈R(n+m)×(n+m). (R) •  pn p1 qm q1  ... ... ... ...  w1 ··· wq    pn qm Fig.1.3 Antennaarrayprocessingsetup.   (Byconvention,inthisbook,allmissingentriesinamatrixareassumedtobezeros.) Awellknownfactinalgebraisthatthedegreeofthegreatestcommondivisorof p where τ is the time needed for the wave to travel from the source to the array 1 andqisequaltotherankdeficiency(co-rank)ofR(p,q),i.e., and p Rq is the array’s response to the source emitting at a unit intensity. The 1 ∈ vector p dependsonlyonthearraygeometryandthesourcelocationandisthere- 1 degree(c)=n+m rank R(p,q) . (GCD) − foreconstantintime.Measurementsoftheantennaattimeinstantst=1,...,T give Suppose that p and q havea greatestcommon(cid:0) divisor(cid:1)of degree d>0, but the adatamatrix coefficientsofthepolynomialspandqareimprecise,resultinginperturbedpolyno- D:= w(1) w(T) =p1 ℓ1(1 τ) ℓ1(T τ) =p1ℓ1, mials pd andqd.Generically,thematrixR(pd,qd),constructedfromtheperturbed ··· − ··· − polynomials,isfullrank,implyingthatthegreatestcommondivisorofpdandqdhas (cid:2) (cid:3) (cid:2) ℓ1 (cid:3) degreezero.Theproblemoffindinganapproximatecommondivisor of p andq d d whichhasrankequaltoone. | {z } withdegreed,canbeformulatedasfollows.Modifythecoefficientsof p andq , aslittleaspossible,sothattheresultingpolynomials,say, pandqhaveadgreatedst Consider now m<q distant sources emitting with intensities ℓ1,...,ℓm. Let pk be the response of the array to the kth source emitting alone with unit intensity. common divisor of degree d. This problem is a Sylvester structured low-rank ap- Assumingthatthearrayrespondslinearlytoamixtureofsources,wehave proximationproblem.Therefore,Sylvesterstructuredlow-rankapproximationcan b b beappliedforcomputinganapproximatecommondivisorwithaspecifieddegree. m Theapproximategreatestcommondivisorcfortheperturbedpolynomialspdandqd D= w(1)···w(T) = ∑pk ℓk(1−τk)···ℓk(T−τk) =PL, istheexactgreatestcommondivisorof pandq. k=1 (cid:2) (cid:3) (cid:2) ℓk (cid:3) TheapproximategreatestcommondivisorproblemisconsideredinSection3.2. b | {z } b b whereP:= p1 pm ,L:=col(ℓ1,...,ℓm),andτkisthedelayofthewavecoming ··· fromthekthsource.ThisshowsthattherankofDislessthanorequaltothenumber (cid:2) (cid:3) of sources m. If the number of sources m is less than the number of antennas q Applications insignalprocessing and m is less than the number of samples T, the sources intensities ℓ ,...,ℓ are 1 m linearly independent, and the unit intensity array patterns p ,...,p are linearly 1 m Arraysignalprocessing independent,thenwehavethat Anarrayofantennasorsensorsisusedfordirectionofarrivalestimationandadap- rank(D)=thenumberofsourcestransmittingtothearray. tivebeamforming.Considerqantennasinafixedconfigurationandawavepropa- gatingfromdistantsources,seeFigure1.3. Moreover,thefactorsPandLinarankrevealingfactorizationPLofDcarryinfor- Consider,first,thecaseofasinglesource.Thesourceintensityℓ (thesignal)is mationaboutthesourcelocations. 1 afunctionoftime.Letw(t) Rqbetheresponseofthearrayattimet(w beingthe With noisy observations, the matrix D is generically a full rank matrix. Then, i responseoftheithantenna).∈Assumingthatthesourceisfarfromthearray(relative assumingthatthearray’sgeometryisknown,low-rankapproximationcanbeused tothearray’slength),thearray’sresponseisproportionaltothesourceintensity toestimatethenumberofsourcesandtheirlocations. w(t)=p ℓ (t τ), 1 1 1 −

Description:
A ver- sion of the code that is compatible with Octave (a free alternative to MATLAB) is also available from the book's web page. Software tools used
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.