Ansgar Steland Kwok-Leung Tsui Editors Artificial Intelligence, Big Data and Data Science in Statistics Challenges and Solutions in Environmetrics, the Natural Sciences and Technology Artificial Intelligence, Big Data and Data Science in Statistics Ansgar Steland • Kwok-Leung Tsui Editors Artificial Intelligence, Big Data and Data Science in Statistics Challenges and Solutions in Environmetrics, the Natural Sciences and Technology Editors AnsgarSteland Kwok-LeungTsui InstituteofStatisticsandAICenter GradoDepartmentofIndustrial RWTHAachenUniversity andSystemsEngineering Aachen,Germany VirginiaPolytechnicInstituteandState University Blacksburg,VA,USA ISBN978-3-031-07154-6 ISBN978-3-031-07155-3 (eBook) https://doi.org/10.1007/978-3-031-07155-3 ©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicencetoSpringerNatureSwitzerland AG2022 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewhole orpart ofthematerial isconcerned, specifically therights oftranslation, reprinting, reuse ofillustrations, recitation, broadcasting, reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface The change to data-centrism in many fields, the need to extract information and knowledge from big data, and the increasing success of machine learning (ML) andartificialintelligence(AI)havecreatedbothopportunitiesandchallengestothe field of statistics. These developmentshave, to some extent, led to the creationof datascience,partiallyregardedasanewdiscipline,relatedtostatisticsandcomputer science.TheintersectionsamongML/AI,datascience,andstatisticsaremuchlarger thanpeopleexpect,particularlyontheory,models,practicalmethods,andproblems underinvestigation.Allcommunitiescanlearnalotfromeachother. The impressive successes of ML and AI methods, especially deep learners and convolutional networks, in many practical problems might seem to devalue statistical approaches. Quite a few researchers as well as practitioners regard machine learning as being more focused on problem solving and benchmarkdata sets than statistics. But, on the other hand, ML solutions are often tailored to a specific problemand thus can be difficult to generalizeand implementfor a wide rangeofapplications. Further,thereiswiderangeofproblemsrelatedtodataforwhichstatisticspro- videsmoreappropriateorevenoptimalsolutionsand allowsspecific interpretable models. Stochastic models often provide mathematical descriptions of physical processesratherthanrelyingonblackboxes.Indeed,lackofmodelinterpretability, potentialbias,causality,andstability,andwhyandwhendeeplearnersmayworkare commonquestionsfortheMLapproaches.Statisticalthinkingandapproachesare goodalternativesto rectifythese problems,in termsof boththeories, models,and practicalmethods.A furtherissue where statistics is indispensable is the question whetheragivendatasetsatisfiespropersamplingdesigns,asstudiedbystatistical samplingtheory,andthesoundstatisticalpreprocessing,handling,andcleaningof data.Both topicsare importantto evaluategivendata, to ensurehighdata quality, and to clarify what can be learnt from a certain data set. On the other hand, the flexibility of many ML and AI methods may yield superior results when reliable first-classdatafromwell-selectedvariablesarenotavailableandonehastorelyon noisyandsurrogatedata. v vi Preface Focusing on environmentalscience, natural science, and technology,this book contributes to the discussions of various issues and general interplay among statistics, data science, machine learning, and artificial intelligence. The chapters cover theoretical studies of machine learning methods, expositions of general methodologies for sound statistical analyses of data, as well as novel approaches for modeling and analyzing data in specific areas and problems. In terms of applications, the chapters deal with data as arising in industrial quality control, autonomousdriving,transportationand traffic, chip manufacturing,photovoltaics, football,transmissionofinfectiousdiseases,Covid-19,andpublichealth. The idea for this volume came from the meetings of the Section on Environ- metrics, Natural Science and Technologyof Deutsche Statistische Gesellschaft of the last few years, andmostauthorshave presentedresearchatthe annualconfer- encesStatistischeWoche.Allchaptersofthisvolumehavebeenpeerreviewed,and the editors are grateful to those colleagues who helped in the evaluation process as anonymous reviewers. Nevertheless, the authors of each chapters are solely responsiblefortheirwork. Aachen,Germany AnsgarSteland Blacksburg,VA,USA Kwok-LeungTsui November2021 Contents PartI MethodologiesandTheoreticalStudies One-RoundCross-ValidationandUncertaintyDeterminationfor RandomizedNeuralNetworkswithApplicationstoMobileSensors ...... 3 AnsgarStelandandBartE.Pieters ScaleInvariantandRobustPatternIdentificationinUnivariate Time Series,withApplicationtoGrowthTrendDetectionin MusicStreamingData........................................................... 25 NerminaMumic,OliverLeodolter,AlexanderSchwaiger, andPeterFilzmoser Fine-TunedParallelPiecewiseSequentialConfidenceInterval and Point EstimationStrategiesfor the Mean of a Normal Population:BigDataContext................................................... 51 NitisMukhopadhyayandChenZhang StatisticalLearningforChangePointandAnomalyDetection inGraphs.......................................................................... 85 AnnaMalinovskaya,PhilippOtto,andTorbenPeters OntheRobustnessofKernel-BasedPairwiseLearning ..................... 111 PatrickGenslerandAndreasChristmann GlobalSensitivityAnalysisfortheInterpretationofMachine LearningAlgorithms............................................................. 155 SonjaKuhntandArkadiusKalka ImprovingGaussianProcessEmulatorswithBoundaryInformation..... 171 ZhaohuiLiandMatthiasHwaiYongTan vii viii Contents PartII ChallengesandSolutionsinApplications An Overview and GeneralFrameworkfor Spatiotemporal ModelingandApplicationsinTransportationandPublicHealth.......... 195 LishuaiLi,Kwok-LeungTsui,andYangZhao IntroductiontoWaferTomography:Likelihood-BasedPrediction ofIntegrated-CircuitYield...................................................... 227 MichaelBaron,EmmanuelYashchin,andAsyaTakken UncertaintyQuantificationBasedonBayesianNeuralNetworks forPredictiveQuality............................................................ 253 SimonCramer,MeikeHuber,andRobertH.Schmitt TwoStatisticalDegradationModelsofBatteriesUnderDifferent OperatingConditions............................................................ 269 Jin-ZhenKongandDongWang DetectingDiamondBreakoutsofDiamondImpregnatedTools forCoreDrillingofConcretebyForceMeasurements ...................... 283 Christine H. Müller, Hendrik Dohme, Dennis Malcherczyk, Dirk Biermann,andWolfgangTillmann Visualising Complex Data Within a Data Science Loop: A Spatio-TemporalExamplefromFootball ..................................... 301 LeoN.Geppert,KatjaIckstadt,FabianKarl,JonasMünch,andMichael Steinbrecher Application of the Singular Spectrum Analysis on Electroluminescence Images of Thin-Film Photovoltaic Modules ........................................................................... 321 EvgeniiSovetkinandBartE.Pieters TheImpactoftheLockdownRestrictionsonAirQualityDuring COVID-19PandemicinLombardy,Italy ..................................... 343 PaoloMaranzanoandAlessandroFassó AuthorIndex...................................................................... 375 Part I Methodologies and Theoretical Studies One-Round Cross-Validation and Uncertainty Determination for Randomized Neural Networks with Applications to Mobile Sensors AnsgarStelandandBartE.Pieters Abstract Randomized artificial neural networks such as extreme learning machinesprovidean attractive and efficientmethodfor supervisedlearning under limitedcomputingresourcesandforgreenmachinelearning.Thisespeciallyapplies whenequippingmobiledevices(sensors)withweakartificialintelligence.Results arediscussedaboutsupervisedlearningwithsuchnetworksandregressionmethods in terms of consistency and bounds for the generalization and prediction error. Especially,somerecentresultsarereviewedaddressinglearningwithdatasampled bymovingsensorsleadingtonon-stationaryanddependentsamples.Asrandomized networks lead to random out-of-sampleperformancemeasures, we study a cross- validation approach to handle the randomness and make use of it to improve out-of-sample performance. Additionally, a computationally efficient approach to determine the resulting uncertainty in terms of a confidence interval for the mean out-of-sample prediction error is discussed based on two-stage estimation. The approach is applied to a prediction problem arising in vehicle integrated photovoltaics. Keywords Cross-validation · Extremelearning · Modelcomparison · Neural network · Photovoltaics · Uncertaintyinterval 1 Introduction Artificialneuralnetworksarean attractiveclassofmodelsforsupervisedlearning tasks arising in data science such as nonlinearregressionand predictiveanalytics. There is a growing interest which is mainly driven by the developmentof highly A.Steland((cid:2)) InstituteofStatisticsandAICenter,RWTHAachenUniversity,Aachen,Germany e-mail:[email protected] B.E.Pieters ForschungszentrumJülich,InstitutfürEnergie-undKlimaforschung,Jülich,Germany e-mail:[email protected] ©TheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerlandAG2022 3 A.Steland,K.-L.Tsui(eds.),ArtificialIntelligence,BigDataandDataScience inStatistics,https://doi.org/10.1007/978-3-031-07155-3_1