ebook img

Computational Statistics in Data Science PDF

673 Pages·2022·27.187 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Computational Statistics in Data Science

ComputationalStatisticsinDataScience Computational Statistics in Data Science Edited by Walter W. Piegorsch UniversityofArizona Richard A. Levine SanDiegoStateUniversity Hao Helen Zhang UniversityofArizona Thomas C. M. Lee UniversityofCalifornia–Davis Thiseditionfirstpublished2022 ©2022JohnWiley&Sons,Ltd. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or transmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise, exceptaspermittedbylaw.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleisavailable athttp://www.wiley.com/go/permissions. TherightofWalterW.Piegorsch,RichardA.Levine,HaoHelenZhang,ThomasC.M.Leetobeidentified astheauthor(s)oftheeditorialmaterialinthisworkhasbeenassertedinaccordancewithlaw. RegisteredOffice(s) JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,UK EditorialOffice 9600GarsingtonRoad,Oxford,OX42DQ,UK Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproducts visitusatwww.wiley.com. Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthat appearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats. LimitofLiability/DisclaimerofWarranty Thecontentsofthisworkareintendedtofurthergeneralscientificresearch,understanding,anddiscussion onlyandarenotintendedandshouldnotberelieduponasrecommendingorpromotingscientificmethod, diagnosis,ortreatmentbyphysiciansforanyparticularpatient.Inviewofongoingresearch,equipment modifications,changesingovernmentalregulations,andtheconstantflowofinformationrelatingtothe useofmedicines,equipment,anddevices,thereaderisurgedtoreviewandevaluatetheinformation providedinthepackageinsertorinstructionsforeachmedicine,equipment,ordevicefor,amongother things,anychangesintheinstructionsorindicationofusageandforaddedwarningsandprecautions. Whilethepublisherandauthorshaveusedtheirbesteffortsinpreparingthiswork,theymakeno representationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthisworkand specificallydisclaimallwarranties,includingwithoutlimitationanyimpliedwarrantiesofmerchantability orfitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysalesrepresentatives,written salesmaterialsorpromotionalstatementsforthiswork.Thefactthatanorganization,website,orproduct isreferredtointhisworkasacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthat thepublisherandauthorsendorsetheinformationorservicestheorganization,website,orproductmay provideorrecommendationsitmaymake.Thisworkissoldwiththeunderstandingthatthepublisheris notengagedinrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmaynotbe suitableforyoursituation.Youshouldconsultwithaspecialistwhereappropriate.Further,readersshould beawarethatwebsiteslistedinthisworkmayhavechangedordisappearedbetweenwhenthisworkwas writtenandwhenitisread.Neitherthepublishernorauthorsshallbeliableforanylossofprofitorany othercommercialdamages,includingbutnotlimitedtospecial,incidental,consequential,orother damages. LibraryofCongressCataloging-in-PublicationData ISBN9781119561071(hardback) CoverDesign:Wiley CoverImage:©goja1/Shutterstock Setin9.5/12.5ptSTIXTwoTextbyStraive,Chennai,India 10 9 8 7 6 5 4 3 2 1 v Contents ListofContributors xxiii Preface xxix PartI ComputationalStatisticsandDataScience 1 1 ComputationalStatisticsandDataScienceintheTwenty-first Century 3 AndrewJ.Holbrook,AkihikoNishimura,XiangJi,andMarcA.Suchard 1 Introduction 3 2 CoreChallenges1–3 5 2.1 BigN 5 2.2 BigP 6 2.3 BigM 7 3 Model-SpecificAdvances 8 3.1 BayesianSparseRegressionintheAgeofBigNandBigP 8 3.1.1 Continuousshrinkage:alleviatingbigM 8 3.1.2 Conjugategradientsamplerforstructuredhigh-dimensionalGaussians 9 3.2 PhylogeneticReconstruction 10 4 CoreChallenges4and5 12 4.1 Fast,Flexible,andFriendlyStatisticalAlgo-Ware 13 4.2 Hardware-OptimizedInference 14 5 RiseofDataScience 16 Acknowledgments 17 Notes 17 References 17 2 StatisticalSoftware 23 AlfredG.SchisslerandAlexanderD.Knudson 1 UserDevelopmentEnvironments 23 1.1 ExtensibleTextEditors:EmacsandVim 24 1.2 JupyterNotebooks 25 1.3 RStudioandRmarkdown 25 vi Contents 2 PopularStatisticalSoftware 26 2.1 R 26 2.1.1 WhyuseRoverPythonorMinitab? 27 2.1.2 WherecanusersfindRsupport? 27 2.1.3 HoweasyisRtodevelop? 27 2.1.4 WhatisthedownsideofR? 28 2.1.5 SummaryofR 28 2.2 Python 28 2.3 SAS® 29 2.4 SPSS® 30 3 NoteworthyStatisticalSoftwareandRelatedTools 30 3.1 BUGS/JAGS 30 3.2 C++ 31 3.3 MicrosoftExcel/Spreadsheets 32 3.4 Git 32 3.5 Java 32 3.6 JavaScript,Typescript 33 3.7 Maple 34 3.8 MATLAB,GNUOctave 34 3.9 Minitab® 34 3.10 WorkloadManagers:SLURM/LSF 35 3.11 SQL 35 3.12 Stata® 35 3.13 Tableau® 36 4 PromisingandEmergingStatisticalSoftware 36 4.1 Edward,Pyro,NumPyro,andPyMC3 36 4.2 Julia 37 4.3 NIMBLE 38 4.4 Scala 38 4.5 Stan 38 5 TheFutureofStatisticalComputing 38 6 ConcludingRemarks 39 Acknowledgments 39 References 39 FurtherReading 41 3 AnIntroductiontoDeepLearningMethods 43 YaoLi,JustinWangandThomasC.M.Lee 1 Introduction 43 2 MachineLearning:AnOverview 43 2.1 Introduction 43 2.2 SupervisedLearning 44 2.3 GradientDescent 44 3 FeedforwardNeuralNetworks 45 3.1 Introduction 45 Contents vii 3.2 ModelDescription 46 3.3 TraininganMLP 47 4 ConvolutionalNeuralNetworks 48 4.1 Introduction 48 4.2 ConvolutionalLayer 49 4.3 LeNet-5 49 5 Autoencoders 52 5.1 Introduction 52 5.2 ObjectiveFunction 52 5.3 VariationalAutoencoder 53 6 RecurrentNeuralNetworks 54 6.1 Introduction 54 6.2 Architecture 54 6.3 LongShort-TermMemoryNetworks 56 7 Conclusion 57 References 57 4 StreamingDataandDataStreams 59 TaiwoKolajo,OlawandeDaramola,andAyodeleAdebiyi 1 Introduction 59 2 DataStreamComputing 61 3 IssuesinDataStreamMining 61 3.1 Scalability 62 3.2 Integration 63 3.3 Fault-Tolerance 63 3.4 Timeliness 63 3.5 Consistency 63 3.6 HeterogeneityandIncompleteness 63 3.7 LoadBalancing 64 3.8 HighThroughput 64 3.9 Privacy 64 3.10 Accuracy 64 4 StreamingDataToolsandTechnologies 64 5 StreamingDataPre-Processing:ConceptandImplementation 65 6 StreamingDataAlgorithms 65 6.1 UnsupervisedLearning 66 6.2 Semi-SupervisedLearning 67 6.3 SupervisedLearning 67 6.4 Ontology-BasedMethods 68 7 StrategiesforProcessingDataStreams 68 8 BestPracticesforManagingDataStreams 69 9 ConclusionandtheWayForward 70 References 70 viii Contents PartII Simulation-BasedMethods 79 5 MonteCarloSimulation:AreWeThereYet? 81 DootikaVats,JamesM.Flegal,andGalinL.Jones 1 Introduction 81 2 Estimation 83 2.1 Expectations 83 2.2 Quantiles 83 2.3 OtherEstimators 83 3 SamplingDistribution 84 3.1 Means 84 3.2 Quantiles 85 3.3 OtherEstimators 86 3.4 ConfidenceRegionsforMeans 86 4 EstimatingΣ 87 5 StoppingRules 88 5.1 IIDMonteCarlo 88 5.2 MCMC 89 6 Workflow 89 7 Examples 90 7.1 ActionFigureCollectorProblem 90 7.2 EstimatingRiskforEmpiricalBayes 92 7.3 BayesianNonlinearRegression 93 Note 95 References 95 6 SequentialMonteCarlo:ParticleFiltersandBeyond 99 AdamM.Johansen 1 Introduction 99 2 SequentialImportanceSamplingandResampling 99 2.1 ExtendedStateSpacesandSMCSamplers 103 2.2 ParticleMCMCandRelatedMethods 104 3 SMCinStatisticalContexts 106 3.1 SMCforHiddenMarkovModels 106 3.1.1 Filtering 107 3.1.2 Smoothing 108 3.1.3 Parameterestimation 109 3.2 SMCforBayesianInference 109 3.2.1 SMCformodelcomparison 110 3.2.2 SMCforABC 110 3.3 SMCforMaximum-LikelihoodEstimation 111 3.4 SMCforRareEventEstimation 111 4 SelectedRecentDevelopments 112 Acknowledgments 113 Contents ix Note 113 References 113 7 MarkovChainMonteCarloMethods,ASurveywithSomeFrequent Misunderstandings 119 ChristianP.RobertandWuChangye 1 Introduction 119 2 MonteCarloMethods 121 3 MarkovChainMonteCarloMethods 128 3.1 Metropolis–HastingsAlgorithms 131 3.2 GibbsSampling 138 3.3 HamiltonianMonteCarlo 138 4 ApproximateBayesianComputation 141 5 FurtherReading 145 AbbreviationsandAcronyms 146 Notes 146 References 146 8 BayesianInferencewithAdaptiveMarkovChainMonteCarlo 151 MattiVihola 1 Introduction 151 2 Random-WalkMetropolisAlgorithm 151 3 AdaptationofRandom-WalkMetropolis 152 3.1 AdaptiveMetropolis(AM) 153 3.2 AdaptiveScalingMetropolis(ASM) 153 3.3 RobustAdaptiveMetropolis(RAM) 154 3.4 RationalebehindtheAdaptations 154 3.5 SummaryandDiscussionontheMethods 155 4 MultimodalTargetswithParallelTempering 156 5 DynamicModelswithParticleFilters 157 6 Discussion 159 Acknowledgments 160 Notes 160 References 161 9 AdvancesinImportanceSampling 165 VíctorElviraandLucaMartino 1 IntroductionandProblemStatement 165 1.1 StandardMonteCarloIntegration 166 2 ImportanceSampling 167 2.1 Origins 167 2.2 Basics 167 2.3 TheoreticalAnalysis 168 2.4 Diagnostics 169

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.