ebook img

Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets PDF

403 Pages·2021·6.213 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets

(cid:2) DataScienceinTheoryandPractice (cid:2) (cid:2) (cid:2) (cid:2) (cid:2) (cid:2) (cid:2) (cid:2) Data Science in Theory and Practice Techniques for Big Data Analytics and Complex Data Sets Maria Cristina Mariani UniversityofTexas,ElPaso ElPaso,UnitedStates (cid:2) (cid:2) Osei Kofi Tweneboah RamapoCollegeofNewJersey Mahwah,UnitedStates Maria Pia Beccar-Varela UniversityofTexas,ElPaso ElPaso,UnitedStates (cid:2) (cid:2) Thisfirsteditionfirstpublished2022 ©2022JohnWileyandSons,Inc. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem, ortransmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,recordingor otherwise,exceptaspermittedbylaw.Adviceonhowtoobtainpermissiontoreusematerial fromthistitleisavailableathttp://www.wiley.com/go/permissions TherightofMariaCristinaMariani,OseiKofiTweneboah,andMariaPiaBeccar-Varelatobe identifiedastheauthorsofthisworkhasbeenassertedinaccordancewithlaw. RegisteredOffice JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA EditorialOffice 111RiverStreet,Hoboken,NJ07030,USA Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWiley productsvisitusatwww.wiley.com Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Some contentthatappearsinstandardprintversionsofthisbookmaynotbeavailableinother formats. LimitofLiability/DisclaimerofWarranty Inviewofongoingresearch,equipmentmodifications,changesingovernmentalregulations, (cid:2) (cid:2) andtheconstantflowofinformationrelatingtotheuseofexperimentalreagents,equipment, anddevices,thereaderisurgedtoreviewandevaluatetheinformationprovidedinthepackage insertorinstructionsforeachchemical,pieceofequipment,reagent,ordevicefor,amongother things,anychangesintheinstructionsorindicationofusageandforaddedwarningsand precautions.Whilethepublisherandauthorshaveusedtheirbesteffortsinpreparingthiswork, theymakenorepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessofthe contentsofthisworkandspecificallydisclaimallwarranties,includingwithoutlimitationany impliedwarrantiesofmerchantabilityorfitnessforaparticularpurpose.Nowarrantymaybe createdorextendedbysalesrepresentatives,writtensalesmaterialsorpromotionalstatements forthiswork.Thefactthatanorganization,website,orproductisreferredtointhisworkasa citationand/orpotentialsourceoffurtherinformationdoesnotmeanthatthepublisherand authorsendorsetheinformationorservicestheorganization,website,orproductmayprovide orrecommendationsitmaymake.Thisworkissoldwiththeunderstandingthatthepublisheris notengagedinrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmay notbesuitableforyoursituation.Youshouldconsultwithaspecialistwhereappropriate. Further,readersshouldbeawarethatwebsiteslistedinthisworkmayhavechangedor disappearedbetweenwhenthisworkwaswrittenandwhenitisread.Neitherthepublishernor authorsshallbeliableforanylossofprofitoranyothercommercialdamages,includingbutnot limitedtospecial,incidental,consequential,orotherdamages. LibraryofCongressCataloging-in-PublicationDataappliedfor ISBN:9781119674689 CoverDesign:Wiley CoverImage:©nobeastsofierce/Shutterstock Setin9.5/12.5ptSTIXTwoTextbyStraive,Chennai,India 10 9 8 7 6 5 4 3 2 1 (cid:2) (cid:2) v Contents ListofFigures xvii ListofTables xxi Preface xxiii 1 BackgroundofDataScience 1 1.1 Introduction 1 1.2 OriginofDataScience 2 (cid:2) 1.3 WhoisaDataScientist? 2 (cid:2) 1.4 BigData 3 1.4.1 CharacteristicsofBigData 4 1.4.2 BigDataArchitectures 5 2 MatrixAlgebraandRandomVectors 7 2.1 Introduction 7 2.2 SomeBasicsofMatrixAlgebra 7 2.2.1 Vectors 7 2.2.2 Matrices 8 2.3 RandomVariablesandDistributionFunctions 12 2.3.1 TheDirichletDistribution 15 2.3.2 MultinomialDistribution 17 2.3.3 MultivariateNormalDistribution 18 2.4 Problems 19 3 MultivariateAnalysis 21 3.1 Introduction 21 3.2 MultivariateAnalysis:Overview 21 3.3 MeanVectors 22 3.4 Variance–CovarianceMatrices 24 3.5 CorrelationMatrices 26 (cid:2) (cid:2) vi Contents 3.6 LinearCombinationsofVariables 28 3.6.1 LinearCombinationsofSampleMeans 29 3.6.2 LinearCombinationsofSampleVarianceandCovariance 29 3.6.3 LinearCombinationsofSampleCorrelation 30 3.7 Problems 31 4 TimeSeriesForecasting 35 4.1 Introduction 35 4.2 Terminologies 36 4.3 ComponentsofTimeSeries 39 4.3.1 Seasonal 39 4.3.2 Trend 40 4.3.3 Cyclical 41 4.3.4 Random 42 4.4 TransformationstoAchieveStationarity 42 4.5 EliminationofSeasonalityviaDifferencing 44 4.6 AdditiveandMultiplicativeModels 44 4.7 MeasuringAccuracyofDifferentTimeSeriesTechniques 45 4.7.1 MeanAbsoluteDeviation 46 4.7.2 MeanAbsolutePercentError 46 (cid:2) (cid:2) 4.7.3 MeanSquareError 47 4.7.4 RootMeanSquareError 48 4.8 AveragingandExponentialSmoothingForecastingMethods 48 4.8.1 AveragingMethods 49 4.8.1.1 SimpleMovingAverages 49 4.8.1.2 WeightedMovingAverages 51 4.8.2 ExponentialSmoothingMethods 54 4.8.2.1 SimpleExponentialSmoothing 54 4.8.2.2 AdjustedExponentialSmoothing 55 4.9 Problems 57 5 IntroductiontoR 61 5.1 Introduction 61 5.2 BasicDataTypes 62 5.2.1 NumericDataType 62 5.2.2 IntegerDataType 62 5.2.3 Character 63 5.2.4 ComplexDataTypes 63 5.2.5 LogicalDataTypes 64 5.3 SimpleManipulations–NumbersandVectors 64 5.3.1 VectorsandAssignment 64 (cid:2) (cid:2) Contents vii 5.3.2 VectorArithmetic 65 5.3.3 VectorIndex 66 5.3.4 LogicalVectors 67 5.3.5 MissingValues 68 5.3.6 IndexVectors 69 5.3.6.1 IndexingwithLogicals 69 5.3.6.2 AVectorofPositiveIntegralQuantities 69 5.3.6.3 AVectorofNegativeIntegralQuantities 69 5.3.6.4 NamedIndexing 69 5.3.7 OtherTypesofObjects 70 5.3.7.1 Matrices 70 5.3.7.2 List 72 5.3.7.3 Factor 73 5.3.7.4 DataFrames 75 5.3.8 DataImport 76 5.3.8.1 ExcelFile 76 5.3.8.2 CSVFile 76 5.3.8.3 TableFile 77 5.3.8.4 MinitabFile 77 5.3.8.5 SPSSFile 77 (cid:2) (cid:2) 5.4 Problems 78 6 IntroductiontoPython 81 6.1 Introduction 81 6.2 BasicDataTypes 82 6.2.1 NumberDataType 82 6.2.1.1 Integer 82 6.2.1.2 Floating-PointNumbers 83 6.2.1.3 ComplexNumbers 84 6.2.2 Strings 84 6.2.3 Lists 85 6.2.4 Tuples 86 6.2.5 Dictionaries 86 6.3 NumberTypeConversion 87 6.4 PythonConditions 87 6.4.1 IfStatements 88 6.4.2 TheElseandElifClauses 89 6.4.3 TheWhileLoop 90 6.4.3.1 TheBreakStatement 91 6.4.3.2 TheContinueStatement 91 6.4.4 ForLoops 91 (cid:2) (cid:2) viii Contents 6.4.4.1 NestedLoops 92 6.5 PythonFileHandling:Open,Read,andClose 93 6.6 PythonFunctions 93 6.6.1 CallingaFunctioninPython 94 6.6.2 ScopeandLifetimeofVariables 94 6.7 Problems 95 7 Algorithms 97 7.1 Introduction 97 7.2 Algorithm–Definition 97 7.3 HowtoWriteanAlgorithm 98 7.3.1 AlgorithmAnalysis 99 7.3.2 AlgorithmComplexity 99 7.3.3 SpaceComplexity 100 7.3.4 TimeComplexity 100 7.4 AsymptoticAnalysisofanAlgorithm 101 7.4.1 AsymptoticNotations 102 7.4.1.1 BigONotation 102 7.4.1.2 TheOmegaNotation,Ω 102 7.4.1.3 TheΘNotation 102 (cid:2) (cid:2) 7.5 ExamplesofAlgorithms 104 7.6 Flowchart 104 7.7 Problems 105 8 DataPreprocessingandDataValidations 109 8.1 Introduction 109 8.2 Definition–DataPreprocessing 109 8.3 DataCleaning 110 8.3.1 HandlingMissingData 110 8.3.2 TypesofMissingData 110 8.3.2.1 MissingCompletelyatRandom 110 8.3.2.2 MissingatRandom 110 8.3.2.3 MissingNotatRandom 111 8.3.3 TechniquesforHandlingtheMissingData 111 8.3.3.1 ListwiseDeletion 111 8.3.3.2 PairwiseDeletion 111 8.3.3.3 MeanSubstitution 112 8.3.3.4 RegressionImputation 112 8.3.3.5 MultipleImputation 112 8.3.4 IdentifyingOutliersandNoisyData 113 8.3.4.1 Binning 113 (cid:2)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.