ebook img

Mathematical Problems in Data Science: Theoretical and Practical Methods PDF

219 Pages·2016·4.108 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mathematical Problems in Data Science: Theoretical and Practical Methods

Li M. Chen · Zhixun Su Bo Jiang Mathematical Problems in Data Science Theoretical and Practical Methods Mathematical Problems in Data Science Li M. Chen • Zhixun Su (cid:129) Bo Jiang Mathematical Problems in Data Science Theoretical and Practical Methods 123 LiM.Cheng ZhixunSu DepartmentofComputerScience SchoolofMathematicalSciences andInformationTechnology DalianUniversityofTechnology TheUniversityoftheDistrictofColumbia Dalian,China Washington,DC,USA BoJiang SchoolofInformationScience andTechnology DalianMaritimeUniversity Dalian,China ISBN978-3-319-25125-7 ISBN978-3-319-25127-1 (eBook) DOI10.1007/978-3-319-25127-1 LibraryofCongressControlNumber:2015953100 SpringerChamHeidelbergNewYorkDordrechtLondon ©SpringerInternationalPublishingSwitzerland2015 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper SpringerInternationalPublishingAGSwitzerlandispartofSpringerScience+BusinessMedia(www. springer.com) ToICM2014andItsSatellite Conferenceon DataScience Preface Modern data science is related to massive data sets (BigData), machine learning, and cloud computing. There are multiple ways of understanding data science: (1) BigDatawithsmallcloudcomputationalpower,whichrequiresveryfastalgorithms; (2) relatively small data sets with large cloud computational power, which we can compute by distributing data to the cloud without a very efficient algorithm; (3) BigData and with big cloud, which requires both techniques of algorithms and architectural infrastructures (new computing models); and (4) small data sets with smallcloud,whichjustrequiresthestandardmethods. This book contains state-of-the-art knowledge for researchers in data science. It also presents various problems in BigData and data science. We first introduce importantstatisticalandcomputationalmethodsfordataanalysis.Forexample,we discuss the principal component analysis for the dimension reduction of massive data sets. Then, we introduce graph theoretical methods such as GraphCut, the Laplacianmatrix,andGooglePageRankfordatasearchandclassification.Wealso discuss efficient algorithms, the hardness of problems involving various types of BigData, and geometric data structures. This book is particularly interested in the discussion of incomplete data sets and partial connectedness among data points or data sets. The second part of the book focuses on special topics, which cover topologicalanalysisandmachinelearning,businessandfinancialdatarecovery,and massivedataclassificationandpredicationforhigh-dimensionaldatasets.Another purpose of this book is to challenge the major ongoing and unsolved problems in datascienceandprovidesomeprospectivesolutionstotheseproblems. Thisbookisaconciseandquickintroductiontothehottesttopicinmathematics, computer science, and information technology today: data science. Data science first emerged in mathematics and computer science out of the research need for thenumerousapplicationsofBigDataintheinformationtechnology,business,and medical industries. This book has two main objectives. The first objective of this book is to cover necessary knowledge in statistics, graph theory, algorithms, and computational science. There is also specific focus on the internal connectivity of incompletedatasets,whichcouldbeoneofthecentraltopicsoffuturedatascience, unliketheexistingmethodofdataprocessingwheredatamodelingisatthecenter. vii viii Preface The second focus of this book discusses major ongoing and unsolved problems in datascienceandprovidessomeprospectivesolutionsfortheseproblems. ThebookalsocollectssomeresearchpapersfromthetalksgivenattheInterna- tionalCongressofMathematics(ICM)2014SatelliteConferenceonMathematical Foundation of Modern Data Sciences Computing, Logic, and Education, Dalian Maritime University, Dalian, China, which took place from July 27 to August 1, 2014.WearegratefultotheSeoulICM2014organizationcommitteeandNational Science Foundation of China for their support. Many thanks go to Professor Reinhard Klette at the University of Auckland and Professor Wen Gao at Beijing Universityforgivingexcellentinvitedtalks.SpecialthankstoProfessorsShi-Qiang Wang (Beijing Normal University), Steven G. Krantz (Washington University), ShmuelWeinberger(UniversityofChicago),andHananSamet(UniversityofMary- land)fortheirsupport.SpecialthanksalsogotoDalianUniversityofTechnology, Dalian Maritime University, Southeast University of China, and University of the DistrictofColumbiafortheirsupporttothisconference. This book has three parts. The first part contains the basics in data science; the secondpartmainlydealswithcomputing,leaning,andproblemsindatascience;the thirdpartisselectedtopics.Chapter1:Introduction(L.Chen);Chap.2:Overviewof BasicMethodsforDataScience(L.Chen);Chap.3:RelationshipandConnectivity of Incomplete Data Collection (L. Chen); Chap.4: Machine Leaning for Data Science (L. Chen); Chap.5: Images, Videos, and BigData (L. Chen); Chap.6: Topological Data Analysis (L. Chen); Chap.7: Monte Carlo Methods and Their Applications in Big Data Analysis (H. Ji and Y. Li); Chap.8: Feature Extraction via Vector Bundle Learning (R. Liu and Z. Su); Chap.9: Curve Interpolation and Positivity-PreservingFinancialCurveConstruction(P.Huang,H.Wang,P.Wu,and Y. Li); Chap.10: Advanced Methods in Variational Learning (J. Spencer and K. Chen); Chap.11: On-line Strategies of Groups Evacuation from a Convex Region inthePlane(B.Jiang,Y.Liu,andH.Zhang);andChap.12:ANewComputational ModelofBigdata(B.Zhu). Washington,DC,USA LiM.Chen Dalian,China ZhixunSu BoJiang Contents PartI BasicDataScience 1 Introduction:DataScienceandBigDataComputing................... 3 LiM.Chen 1.1 Data Mining and Cloud Computing: The Prelude ofBigDataandDataScience ........................................ 3 1.2 BigDataEra........................................................... 4 1.3 TheMeaningofDataSciences ...................................... 6 1.4 ProblemsRelatedtoDataScience................................... 7 1.5 MathematicalProblemsinDataScience............................ 9 1.6 Mathematics,DataScience,andDataScientistsinIndustry....... 11 1.7 Remark:DiscussionontheFutureProblemsinDataScience ..... 13 References.................................................................... 14 2 OverviewofBasicMethodsforDataScience............................ 17 LiM.Chen 2.1 “Hardware”and“Software”ofDataScience....................... 17 2.1.1 SearchingandOptimization................................ 18 2.1.2 DecisionMaking............................................ 18 2.1.3 Classification................................................ 18 2.1.4 Learning..................................................... 19 2.2 Graph-TheoreticMethods............................................ 20 2.2.1 ReviewofGraphs........................................... 20 2.2.2 BreadthFirstSearchandDepthFirstSearch.............. 21 2.2.3 Dijkstra’sAlgorithmfortheShortestPath ................ 22 2.2.4 MinimumSpanningTree................................... 22 2.3 StatisticalMethods................................................... 23 2.4 Classification,Clustering,andPatternRecognition ................ 25 2.4.1 k-NearestNeighborMethod................................ 25 2.4.2 k-MeansMethod ........................................... 25 2.5 Numerical Methods and Data Reconstruction in ScienceandEngineering............................................. 26 ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.