Thomas A. Runkler Data Analytics Models and Algorithms for Intelligent Data Analysis 2nd Edition Data Analytics Thomas A. Runkler Data Analytics Models and Algorithms for Intelligent Data Analysis 2nd Edition ThomasA.Runkler SiemensAG München,Germany ISBN:978-3-658-14074-8 ISBN:978-3-658-14075-5(eBook) DOI10.1007/978-3-658-14075-5 LibraryofCongressControlNumber:2016942272 SpringerVieweg ©SpringerFachmedienWiesbaden2012,2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformationstorage andretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynowknownor hereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublicationdoes notimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotective lawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors giveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforanyerrorsoromissions thatmayhavebeenmade. Printedonacid-freepaper SpringerViewegisabrandofSpringerNature TheregisteredcompanyisSpringerFachmedienWiesbadenGmbH Preface The information in the world doubles every 20 months. Important data sources are business and industrial processes, text and structured databases, images and videos, and physical and biomedical data. Data analytics allows to find relevant information, structures, and patterns, to gain new insights, to identify causes and effects, to predict future developments, or to suggest optimal decisions. We need models and algorithms to collect, preprocess, analyze, and evaluate data, from various fields such as statistics, machine learning, pattern recognition, system theory, operations research, or artificial intelligence. With this book, you will learn about the most important methods and algorithmsfordataanalytics.Youwillbeabletochooseappropriatemethodsforspecific tasks and apply these in your own data analytics projects. You will understand the basic conceptsofthegrowingfieldofdataanalytics,whichwillallowyoutokeeppaceandto activelycontributetotheadvancementofthefield. This text is designed for undergraduate and graduate courses on data analytics for engineering, computer science, and math students. It is also suitable for practitioners working on data analytics projects. The book is structured according to typical practical data analytics projects. Only basic mathematics is required. This material has been used for more than ten years in numerous courses at the Technical University of Munich, Germany, in short courses at several other universities, and in tutorials at international scientific conferences. Much of the content is based on the results of industrial research anddevelopmentprojectsatSiemens. Historyofthebookversions: (cid:129) DataAnalytics,secondedition,2016,English (cid:129) DataMining,secondedition,2015,German (cid:129) DataAnalytics,2012,English (cid:129) DataMining,2010,German (cid:129) InformationMining,2000,German v vi Preface Thanks go to everybody who has contributed to this material, in particular to the reviewers and my students for suggesting improvements and pointing out errors and to theeditorialandpublisherteamfortheirprofessionalcollaboration. Munich,Germany ThomasA.Runkler April2016 Contents 1 Introduction ......................................................................... 1 1.1 It’sAllAboutData ............................................................ 1 1.2 DataAnalytics,DataMining,andKnowledgeDiscovery................... 2 References............................................................................ 3 2 DataandRelations ................................................................. 5 2.1 TheIrisDataSet............................................................... 5 2.2 DataScales..................................................................... 8 2.3 SetandMatrixRepresentations............................................... 10 2.4 Relations....................................................................... 11 2.5 DissimilarityMeasures........................................................ 12 2.6 SimilarityMeasures........................................................... 14 2.7 SequenceRelations............................................................ 16 2.8 SamplingandQuantization ................................................... 18 Problems.............................................................................. 21 References............................................................................ 22 3 DataPreprocessing ................................................................. 23 3.1 ErrorTypes .................................................................... 23 3.2 ErrorHandling................................................................. 26 3.3 Filtering........................................................................ 27 3.4 DataTransformation .......................................................... 32 3.5 DataIntegration ............................................................... 35 Problems.............................................................................. 35 References............................................................................ 36 4 DataVisualization .................................................................. 37 4.1 Diagrams....................................................................... 37 4.2 PrincipalComponentAnalysis ............................................... 39 4.3 MultidimensionalScaling..................................................... 43 4.4 SammonMapping............................................................. 47 vii viii Contents 4.5 Auto-Associator ............................................................... 51 4.6 Histograms..................................................................... 51 4.7 SpectralAnalysis.............................................................. 54 Problems.............................................................................. 57 References............................................................................ 57 5 Correlation........................................................................... 59 5.1 LinearCorrelation............................................................. 59 5.2 CorrelationandCausality..................................................... 61 5.3 Chi-SquareTestforIndependence............................................ 62 Problems.............................................................................. 65 References............................................................................ 65 6 Regression ........................................................................... 67 6.1 LinearRegression ............................................................. 67 6.2 LinearRegressionwithNonlinearSubstitution.............................. 71 6.3 RobustRegression............................................................. 72 6.4 NeuralNetworks............................................................... 73 6.5 RadialBasisFunctionNetworks.............................................. 77 6.6 Cross-Validation............................................................... 79 6.7 FeatureSelection .............................................................. 81 Problems.............................................................................. 82 References............................................................................ 82 7 Forecasting........................................................................... 85 7.1 FiniteStateMachines ......................................................... 85 7.2 RecurrentModels.............................................................. 86 7.3 AutoregressiveModels........................................................ 88 Problems.............................................................................. 89 References............................................................................ 89 8 Classification ........................................................................ 91 8.1 ClassificationCriteria ......................................................... 91 8.2 NaiveBayesClassifier ........................................................ 95 8.3 LinearDiscriminantAnalysis................................................. 97 8.4 SupportVectorMachine ...................................................... 99 8.5 NearestNeighborClassifier................................................... 102 8.6 LearningVectorQuantization................................................. 103 8.7 DecisionTrees................................................................. 104 Problems.............................................................................. 107 References............................................................................ 108 Contents ix 9 Clustering............................................................................ 111 9.1 ClusterPartitions .............................................................. 111 9.2 SequentialClustering.......................................................... 113 9.3 Prototype-BasedClustering................................................... 115 9.4 FuzzyClustering............................................................... 117 9.5 RelationalClustering.......................................................... 123 9.6 ClusterTendencyAssessment ................................................ 127 9.7 ClusterValidity................................................................ 128 9.8 Self-OrganizingMap.......................................................... 129 Problems.............................................................................. 130 References............................................................................ 131 AppendixA BriefReviewofSomeOptimizationMethods....................... 133 A.1 OptimizationwithDerivatives................................................ 133 A.2 GradientDescent .............................................................. 134 A.3 LagrangeOptimization........................................................ 135 References............................................................................ 137 Solutions.................................................................................. 139 Index...................................................................................... 143
Description: