Intelligent Systems Reference Library 109 Achim Zielesny From Curve Fitting to Machine Learning An Illustrative Guide to Scientific Data Analysis and Computational Intelligence Second Edition Intelligent Systems Reference Library Volume 109 Series editors Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland e-mail: [email protected] Lakhmi C. Jain, Bournemouth University, Fern Barrow, Poole, Australia, and University of Canberra, Canberra, Australia e-mail: [email protected] About this Series The aim of this series is to publish a Reference Library, including novel advances and developments in all aspects of Intelligent Systems in an easily accessible and well structured form. The series includes reference works, handbooks, compendia, textbooks,well-structuredmonographs,dictionaries,andencyclopedias.Itcontains well integrated knowledge and current information in the field of Intelligent Systems. The series covers the theory, applications, and design methods of IntelligentSystems.Virtuallyalldisciplinessuchasengineering,computerscience, avionics, business, e-commerce, environment, healthcare, physics and life science are included. More information about this series at http://www.springer.com/series/8578 Achim Zielesny From Curve Fitting to Machine Learning fi An Illustrative Guide to Scienti c Data Analysis and Computational Intelligence Second Edition 123 AchimZielesny Institut für biologische undchemische Informatik Westfälische Hochschule Recklinghausen Germany ISSN 1868-4394 ISSN 1868-4408 (electronic) Intelligent Systems Reference Library ISBN978-3-319-32544-6 ISBN978-3-319-32545-3 (eBook) DOI 10.1007/978-3-319-32545-3 LibraryofCongressControlNumber:2016936957 ©SpringerInternationalPublishingSwitzerland2011,2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor foranyerrorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAGSwitzerland To myparents Preface Preface to the firstedition Theanalysisofexperimentaldataisatheartofsciencefromitsbeginnings.Butit wastheadventofdigitalcomputersinthesecondhalfofthe20thcenturythatrev- olutionized scientific data analysis twofold: Tedious pencil and paper work could besuccessivelytransferredtotheemergingsoftwareapplicationssosweatandtears turnedintoautomatedroutines.Inaccordancewithautomationthemanageabledata volumescouldbedramaticallyincreasedduetotheexponentialgrowthofcomputa- tional memory and speed. Moreover highly non-linear and complex data analysis problems came within reach that were completely unfeasible before. Non-linear curve fitting, clustering and machine learning belong to these modern techniques thatenteredtheagendaandconsiderablywidenedtherangeofscientificdataanal- ysis applications. Last but not least they are a further step towards computational intelligence. Thegoalofthisbookisto provideaninteractiveandillustrativeguidetothese topics. It concentrates on the road from two-dimensional curve fitting to multidi- mensionalclusteringandmachinelearningwithneuralnetworksorsupportvector machines.Alongthewaytopicslikemathematicaloptimizationorevolutionaryal- gorithms are touched. All concepts and ideas are outlined in a clear cut manner withgraphicallydepictedplausibilityargumentsandalittleelementarymathemat- ics.Difficultmathematicalandalgorithmicdetailsareconsequentlybannedforthe sakeofsimplicitybutareaccessiblebythereferredliterature.Themajortopicsare extensivelyoutlinedwithexploratoryexamplesandapplications.Theprimarygoal is to be as illustrative as possible without hiding problems and pitfalls but to ad- dressthem.Thecharacterofanillustrativecookbookiscomplementedwithspecific sectionsthataddressmorefundamentalquestionsliketherelationbetweenmachine learningandhumanintelligence.Thesesectionsmaybeskippedwithoutaffecting the main roadbuttheywill openup possiblyinterestinginsightsbeyondthe mere datamassage. vii viii Preface Alltopicsare completelydemonstratedwith the aid ofthe computingplatform MathematicaandtheComputationalIntelligencePackages(CIP),ahigh-levelfunc- tionlibrarydevelopedwithMathematica’sprogramminglanguageontopofMath- ematica’s algorithms. CIP is open-sourceso the detailed code of every method is freely accessible. All examples and applications shown throughoutthe book may be used and customized by the reader without any restrictions. This leads to an interactiveenvironmentwhichallowsindividualmanipulationslike therotationof 3Dgraphicsortheevaluationofdifferentsettingsuptotailoredenhancementsfor specificfunctionality. Thebooktries tobe asintroductoryas possiblecallingonlyfora basic mathe- maticalbackgroundofthereader-alevelthatistypicallytaughtinthefirstyearof scientific education.Thetargetreadershipsarestudentsof(computer)scienceand engineeringaswellasscientificpractitionersinindustryandacademiawhodeserve an illustrative introduction to these topics. Readers with programming skills may easilyportandcustomizetheprovidedcode.Themajorityoftheexamplesandap- plicationsoriginatefromteachingeffortsor solutionproviding.The outlineofthe bookisasfollows: • The introductorychapter 1 providesnecessary basics that underliethe discus- sionsofthefollowingchapterslikeaninitialmotivationfortheinterplayofdata and models with respect to the molecular sciences, mathematical optimization methodsordatastructures.Thechaptermaybeskippedatfirstsightbutshould beconsultedifthingsbecomeunclearinasubsequentchapter. • Themainchaptersthatdescribetheroadfromcurvefittingtomachinelearning are chapters 2 to 4. The curve fitting chapter 2 outlines the variousaspects of adjustinglinearandnon-linearmodelfunctionstoexperimentaldata.A section about mere data smoothing with cubic splines complements the fitting discus- sions. • The clustering chapter 3 sketches the problems of assigning data to different groupsinanunsupervisedmannerwithclusteringmethods.Unsupervisedclus- teringmaybeviewedasalogicalfirststeptowardssupervisedmachinelearning -andmaybeabletoconstructpredictivesystemsonitsown.Machinelearning methodsmayalsoneedclustereddatatoproducesuccessfulresults. • The machine learning chapter 4 comprises supervised learning techniques, in particular multiple linear regression, three-layer feed-forward neural networks andsupportvectormachines.Adequatedatapreprocessingandtheiruseforre- gressionandclassificationtasksaswellastherecurringpitfallsandproblemsare introducedandthoroughlydiscussed. • The discussions chapter 5 supplementsthe topics of the main road. It collects someopenissuesneglectedinthepreviouschaptersandopensupthescopewith more general sections about the possible discovery of new knowledge or the emergenceofcomputationalintelligence. The scientific fields touched in the present book are extensive and in addition constantlyandprogressivelyrefined.Thereforeitisinevitabletoneglectanawfullot ofimportanttopicsandaspects.Theconcreteselectionalwaysmirrorsanauthor’s Preface ix preferencesaswellashispersonalknowledgeandoverview.Sincethemissingparts unfortunatelyexceedtheselectedonesandpeoplealwayshavestrongfeelingsabout whatisofimportancethefinalstatementhastobearequestforindulgence. Recklinghausen,April2011 AchimZielesny Preface to the secondedition The first edition was friendly reviewed as a useful introductory cookbookfor the novicereader.Thesecondeditiontriesto keepthischaracterandresiststhetemp- tation to heavily expandtopicsor lift the discussion to more subtle academic lev- els.Besidesnumerousminoradditionsandcorrectionsthroughoutthewholebook (togetherwith the unavoidableintroductionof somenew errors)the onlysubstan- tialextensionofthesecondeditionistheadditionofMultiplePolynomialRegres- sion (MPR) in order to support the discussions concerning the method crossover from linear and near-linear up to highly non-linear machine learning approaches. As a consequence several examples and applications have been reworked to im- provereadabilityandlineofreasoning.Alsotheconstructionofminimalpredictive modelsisoutlinedinanupdatedandmorecomprehensiblemanner. Thesecondeditionisbasedontheextendedversion2.0oftheComputationalIn- telligencePackages(CIP)whichnowallowsparallelizedcalculationsthatleadtoan oftenconsiderablyimprovedperformancewithmultiple(ormulticore)processors. Specificparallelizationnotesaregiventhroughoutthebook,thedescriptionofCIP isaccordinglyextendedandreworkedexamplesandapplicationsmakenowuseof thenewfunctionality. Withthissecondeditionthebookhopefullystrengthensitsoriginalintenttopro- vide a clear and straight introduction to the fascinating road from curve fitting to machinelearning. Recklinghausen,February2016 AchimZielesny Acknowledgements Certain authors, speaking of their works, say, "My book", "My commentary", "My history",etc.Theyresemblemiddle-classpeoplewhohaveahouseoftheirown,andal- wayshave"Myhouse"ontheirtongue.Theywoulddobettertosay,"Ourbook", "Our commentary","Ourhistory",etc.,becausethereisinthemusuallymoreofotherpeople’s thantheirown. Pascal Acknowledgements tothe first edition I wouldlike to thankLhoussaineBelkoura,ManfredL. Ristig andDietrichWoer- mannwhokindledmyinterestfordataanalysisandmachinelearninginchemistry andphysicsalongtimeago. MymathematicalcolleaguesHeinrichBrinckandSoerenW.Perreycontributed alot-mayitbeindeepcanyons,remotejunglesoratourinstitute’scoffeekitchen. TothemandmyIBCIcollaboratorsMircoDanielandRebeccaSchultzaswellasthe GNWIteamwithStefanNeumann,Jan-NiklasScha¨fer,HolgerSchulteandThomas KuhnIamdeeplythankful. Thecooperationwith ChristophSteinbeckwasveryfruitfulandanexceptional pleasure:Iowealottohissupportandkindness. Karina vanden Broek, MareikeDo¨rrenberg,Saskia Faassen, Jenny Grote, Jen- niferMakalowski,StefanieKleiberandAndreasTruszkowskicorrectedthemanuscript withbenevolenceandstrongcommitment:Manythankstoallofthem. Last but not least I want to express deep gratitude and love to my companion DanielaBeisserwhonotonlyhadtobearanoverworkedbookwriterbutsupported allstagesofthebookanditscontentswithgreatpassion. Every book is a piece of collaborative work but all mistakes and errors are of coursemine. xi
Description: