ebook img

Algebraic Foundations for Applied Topology and Data Analysis PDF

231 Pages·2022·4.879 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Algebraic Foundations for Applied Topology and Data Analysis

Mathematics of Data Hal Schenck Algebraic Foundations for Applied Topology and Data Analysis Mathematics of Data Volume 1 Editor-in-Chief JürgenJost,indenNaturwissenschaften,Max-Planck-InstitutfürMathematik, Leipzig,Germany SeriesEditors BenjaminGess,FakultätfürMathematik,UniversitätBielefeld,Bielefeld, Germany HeatherHarrington,MathematicalInstitute,UniversityofOxford,Oxford,UK KathrynHess,LaboratoryforTopologyandNeuroscience,ÉcolePolytechnique FédéraledeLausanne,Lausanne,Switzerland GittaKutyniok,InstitutfürMathematik,TechnischeUniversitätBerlin,Berlin, Germany BerndSturmfels,indenNaturwissenschaften,Max-Planck-Institutfür Mathematik,Leipzig,Germany ShmuelWeinberger,DepartmentofMathematics,UniversityofChicago,Chicago, IL,USA Howtoreveal,characterize,andexploitthestructureindata?Meetingthiscentral challenge of modern data science requires the development of new mathemat- ical approaches to data analysis, going beyond traditional statistical methods. Fruitfulmathematicalmethodscan originate in geometry,topology,algebra, anal- ysis, stochastics, combinatorics, or indeed virtually any field of mathematics. Confronting the challenge of structure in data is already leading to productive new interactions among mathematics, statistics, and computer science, notably in machine learning. We invite novel contributions (research monographs, advanced textbooks, and lecture notes) presenting substantial mathematics that is relevant for data science. Since the methods required to understand data depend on the source and type of the data, we very much welcome contributions comprising significant discussions of the problems presented by particular applications. We also encourage the use of online resources for exercises, software and data sets. Contributions from all mathematical communities that analyze structures in data are welcome. Examples of potential topics include optimization, topological data analysis, compressedsensing, algebraicstatistics, informationgeometry,manifold learning, tensor decomposition, support vector machines, neural networks, and manymore. Hal Schenck Algebraic Foundations for Applied Topology and Data Analysis HalSchenck DepartmentofMathematicsandStatistics AuburnUniversity Auburn,AL,USA ISSN2731-4103 ISSN2731-4111 (electronic) MathematicsofData ISBN978-3-031-06663-4 ISBN978-3-031-06664-1 (eBook) https://doi.org/10.1007/978-3-031-06664-1 This work was supported by National Science Foundation (2006410) (http://dx.doi.org/10.13039/ 100000001)andRosemaryKopelBrownEndowment MathematicsSubjectClassification:55N31,18G99 ©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerland AG2022 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewhole orpart ofthematerial isconcerned, specifically therights oftranslation, reprinting, reuse ofillustrations, recitation, broadcasting, reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface This book is a mirror of applied topology and data analysis: it covers a wide range of topics, at levels of sophistication varying from the elementary (matrix algebra)tothe esoteric(Grothendieckspectralsequence).Myhopeisthatthereis somethingforeveryone,fromundergraduatesimmersedinafirstlinearalgebraclass to sophisticates investigating higher dimensional analogs of the barcode. Readers areencouragedtobarhop;thegoalistogiveanintuitiveandhands-onintroduction tothetopics,ratherthanapunctiliouslyprecisepresentation. The notes grew out of a class taught to statistics graduate students at Auburn UniversityduringtheCOVIDsummerof2020.Thebookreflectsthat:itiswritten for a mathematicallyengagedaudienceinterested in understandingthe theoretical underpinnings of topological data analysis. Because the field draws practitioners withabroadrangeofexperience,thebookassumeslittlebackgroundattheoutset. However,theadvancedtopicsinthelatterpartofthebookrequireawillingnessto tackletechnicallydifficultmaterial. To treat the algebraic foundations of topological data analysis, we need to introduceafairnumberofconceptsandtechniques,sotokeepfromboggingdown, proofsaresometimessketchedoromitted.Therearemanyexcellenttextsonupper- levelalgebraandtopologywhereadditionaldetailscanbefound,forexample: AlgebraReferences TopologyReferences Aluffi [2] Fulton [76] Artin [3] Greenberg–Harper [83] Eisenbud [70] Hatcher [91] Hungerford [93] Munkres [119] Lang [102] Spanier [137] Rotman [132] Weibel [149] v vi Preface Techniquesfromlinearalgebrahavebeenessentialtoolsin dataanalysisfromthe birthofthefield,andsothebookkicksoffwiththebasics: (cid:129) LeastSquaresApproximation (cid:129) CovarianceMatrixandSpreadofData (cid:129) SingularValueDecomposition Tools from topology have recently made an impact on data analysis. This text providesthebackgroundtounderstanddevelopmentssuchaspersistenthomology. Supposewearegivenpointclouddata,thatis,asetofpointsX: IfX wassampledfromsomeobjectY,we’dliketouseXtoinferpropertiesof Y. Persistent homologyappliestoolsof algebraictopologyto do this. We start by usingXasaseedfromwhichtogrowafamilyofspaces (cid:2) X = N (p), whereN (p)denotesan(cid:2) ballaroundp. (cid:2) (cid:2) (cid:2) p∈X AsX(cid:2) ⊆X(cid:2)(cid:4) if(cid:2) ≤(cid:2)(cid:4),wehaveafamilyoftopologicalspacesandinclusionmaps. Preface vii As Weinbergernotes in [150], persistent homologyis a type of Morse theory: therearea finitenumberofvaluesof(cid:2) wherethetopologyofX changes.Notice (cid:2) that when (cid:2) (cid:6) 0, X is a giant blob; so (cid:2) is typically restricted to a range (cid:2) [0,x]. Topological features which “survive” to the parameter value x are said to bepersistent;intheexampleabove,thecircleS1 isapersistentfeature. Thefirstthreechaptersofthebookareanalgebra-topologybootcamp.Chapter1 providesa brisk reviewof the tools fromlinear algebramost relevantfor applica- tions, such as webpage ranking. Chapters 2 and 3 cover the results we need from upper-levelclassesin(respectively)algebraandtopology.Appliedtopologyappears inSect.3.4,whichtiestogethersheaves,theheatequation,andsocialmedia.Some readersmaywanttoskipthefirstthreechapters,andjumpinatChap.4. The main techniques appear in Chap. 4, which defines simplicial complexes, simplicialhomology,andtheCˇech&Ripscomplexes.Theseconceptsareillustrated with a description of the work of de Silva–Ghrist using the Rips complex to analyzesensornetworks.Chapter5furtherdevelopsthealgebraictopologytoolkit, introducingseveralcohomologytheoriesandhighlightingtheworkofJiang–Lim– Yao–YeinapplyingHodgetheorytovoterranking(theNetflixproblem). We return to algebra in Chap. 6, which is devoted to modules over a principal ideal domain—the structure theorem for modules over a principal ideal domain plays a central role in persistent homology. Chapters 7 and 8 cover, respectively, persistent and multiparameter persistent homology. Chapter 9 is the pièce de résistance (or perhaps the coup de grâce)—a quick and dirty guide to derived functors and spectral sequences. Appendix A illustrates several of the software packageswhichcanbeusedtoperformcomputations. There are a numberof texts which tackle data analysisfrom the perspectiveof pure mathematics. Elementary Applied Topology [79] by Rob Ghrist is closest in spirittothesenotes;ithasamoretopologicalflavor(andwonderfullyilluminating illustrations!). Other texts with a similar slant include Topological Data Analysis withApplications [30]byCarlsson–Vejdemo-Johanssen,ComputationalTopology for Data Analysis [60] by Dey–Wang, and Computational Topology [67] by Edelsbrunner–Harer. At the other end of the spectrum, Persistence Theory: From Quiver Representations to Data Analysis [124] by Steve Oudot is intended for moreadvancedreaders;thebooksofPolterovich–Rosen–Samvelyan–Zhang [126], Rabadan–Blumberg [127], and Robinson [131] focus on applications. Statistical methodsarenottreatedhere;theymeritaseparatevolume. Many folks deserve acknowledgment for their contributions to these notes, starting with my collaborators in the area: Heather Harrington, Nina Otter, and UlrikeTillmann.A precursorto thesummerTDAclasswas aminicourseI taught at CIMAT, organizedby AbrahamMartín delCampo, AntonioRieser, and Carlos VargasObieta. ThesummerTDA class wasitself madepossible bythe Rosemary KopelBrown endowment,and writing by NSF grant2048906.Thanksto Hannah Alpert, Ulrich Bauer, Michael Catanzaro, David Cox, Ketai Chen, Carl de Boor, VindeSilva,RobertDixon,JordanEckert,ZiqinFeng,OliverGäfvert,RobGhrist, Sean Grate, Anil Hirani, Yang HuiHe, Michael Lesnick,Lek-HengLim, Dmitriy Morozov,SteveOudot,AlexSuciu,andMatthewWrightforusefulfeedback. viii Preface Data science is a moving target: one of today’s cutting-edge tools may be relegated to the ash bin of history tomorrow. For this reason, the text aims to highlightmathematicallyimportantconceptswhichhaveprovenorpotentialutility inappliedtopologyanddataanalysis.Butmathematicsisahumanendeavor,soit iswisetorememberSeneca:“Omniahumanabreviaetcaducasunt.” Auburn,AL,USA HalSchenck September2022 Contents 1 LinearAlgebraToolsforDataAnalysis................................... 1 1.1 LinearEquations,GaussianElimination,MatrixAlgebra ........... 1 1.2 VectorSpaces,LinearTransformations,BasisandChange ofBasis ................................................................. 5 1.2.1 BasisofaVectorSpace ....................................... 6 1.2.2 LinearTransformations ....................................... 7 1.2.3 ChangeofBasis ............................................... 8 1.3 Diagonalization,WebpageRanking,DataandCovariance .......... 9 1.3.1 EigenvaluesandEigenvectors ................................ 10 1.3.2 Diagonalization ................................................ 11 1.3.3 RankingUsingDiagonalization .............................. 13 1.3.4 Data Application: Diagonalization of the CovarianceMatrix ............................................. 14 1.4 Orthogonality,Least SquaresFitting, SingularValue Decomposition ......................................................... 16 1.4.1 LeastSquares .................................................. 17 1.4.2 SubspacesandOrthogonality ................................. 18 1.4.3 SingularValueDecomposition................................ 20 2 BasicsofAlgebra:Groups,Rings,Modules............................... 23 2.1 Groups,RingsandHomomorphisms ................................. 23 2.1.1 Groups ......................................................... 23 2.1.2 Rings ........................................................... 25 2.2 ModulesandOperationsonModules ................................. 27 2.2.1 Ideals ........................................................... 29 2.2.2 TensorProduct ................................................. 31 2.2.3 Hom ............................................................ 33 2.3 LocalizationofRingsandModules ................................... 35 2.4 NoetherianRings,HilbertBasisTheorem,Varieties ................. 37 2.4.1 NoetherianRings .............................................. 37 2.4.2 SolutionstoaPolynomialSystem:Varieties ................. 38 ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.