ebook img

Intelligent Data Analysis: From Data Gathering to Data Comprehension PDF

418 Pages·2020·15.415 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Intelligent Data Analysis: From Data Gathering to Data Comprehension

(cid:2) IntelligentDataAnalysis (cid:2) (cid:2) (cid:2) (cid:2) Intelligent Data Analysis From Data Gathering to Data Comprehension Edited by Deepak Gupta (cid:2) (cid:2) MaharajaAgrasenInstituteofTechnology Delhi,India Siddhartha Bhattacharyya CHRIST(DeemedtobeUniversity) Bengaluru,India Ashish Khanna MaharajaAgrasenInstituteofTechnology Delhi,India Kalpna Sagar KIETGroupofInstitutions UttarPradesh,India (cid:2) (cid:2) Thiseditionfirstpublished2020 ©2020JohnWiley&SonsLtd Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,in anyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptaspermittedby law.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleisavailableathttp://www.wiley.com/go/ permissions. TherightofDeepakGupta,SiddharthaBhattacharyya,AshishKhanna,andKalpnaSagartobeidentifiedasthe authorsoftheeditorialmaterialinthisworkhasbeenassertedinaccordancewithlaw. RegisteredOffices JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,UK EditorialOffice TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,UK Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproductsvisitusat www.wiley.com. Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthat appearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats. LimitofLiability/DisclaimerofWarranty MATLABⓇisatrademarkofTheMathWorks,Inc.andisusedwithpermission.TheMathWorksdoesnotwarrant theaccuracyofthetextorexercisesinthisbook.Thiswork’suseordiscussionofMATLABⓇsoftwareorrelated productsdoesnotconstituteendorsementorsponsorshipbyTheMathWorksofaparticularpedagogicalapproach orparticularuseoftheMATLABⓇsoftware. Whilethepublisherandauthorshaveusedtheirbesteffortsinpreparingthiswork,theymakenorepresentations orwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthisworkandspecificallydisclaim allwarranties,includingwithoutlimitationanyimpliedwarrantiesofmerchantabilityorfitnessforaparticular (cid:2) purpose.Nowarrantymaybecreatedorextendedbysalesrepresentatives,writtensalesmaterialsorpromotional (cid:2) statementsforthiswork.Thefactthatanorganization,website,orproductisreferredtointhisworkasacitation and/orpotentialsourceoffurtherinformationdoesnotmeanthatthepublisherandauthorsendorsethe informationorservicestheorganization,website,orproductmayprovideorrecommendationsitmaymake.This workissoldwiththeunderstandingthatthepublisherisnotengagedinrenderingprofessionalservices.The adviceandstrategiescontainedhereinmaynotbesuitableforyoursituation.Youshouldconsultwithaspecialist whereappropriate.Further,readersshouldbeawarethatwebsiteslistedinthisworkmayhavechangedor disappearedbetweenwhenthisworkwaswrittenandwhenitisread.Neitherthepublishernorauthorsshallbe liableforanylossofprofitoranyothercommercialdamages,includingbutnotlimitedtospecial,incidental, consequential,orotherdamages. LibraryofCongressCataloging-in-PublicationData Names:Gupta,Deepak,editor. Title:Intelligentdataanalysis:fromdatagatheringtodata comprehension/editedbyDr.DeepakGupta,Dr.Siddhartha Bhattacharyya,Dr.AshishKhanna,Ms.KalpnaSagar. Description:Hoboken,NJ,USA:Wiley,2020.|Series:TheWileyseriesin intelligentsignalanddataprocessing|Includesbibliographical referencesandindex. Identifiers:LCCN2019056735(print)|LCCN2019056736(ebook)|ISBN 9781119544456(hardback)|ISBN9781119544449(adobepdf)|ISBN 9781119544463(epub) Subjects:LCSH:Datamining.|Computationalintelligence. Classification:LCCQA76.9.D343I574352020(print)|LCCQA76.9.D343 (ebook)|DDC006.3/12–dc23 LCrecordavailableathttps://lccn.loc.gov/2019056735 LCebookrecordavailableathttps://lccn.loc.gov/2019056736 CoverDesign:Wiley CoverImage:©gremlin/GettyImages Setin9.5/12.5ptSTIXTwoTextbySPiGlobal,Chennai,India 10 9 8 7 6 5 4 3 2 1 (cid:2) (cid:2) DeepakGuptawouldliketodedicatethisbooktohisfather,Sh.R.K.Gupta,hismother, Smt.GeetaGupta,hismentorsfortheirconstantencouragement,andhisfamilymembers, includinghiswife,brothers,sisters,kidsandthestudents. SiddharthaBhattacharyyawouldliketodedicatethisbooktohisparents,thelateAjitKumar BhattacharyyaandthelateHashiBhattacharyya,hisbelovedwife,Rashni,andhisresearch scholars,Sourav,Sandip,Hrishikesh,Pankaj,Debanjan,Alokananda,Koyel,andTulika. AshishKhannawouldliketodedicatethisbooktohisparents,thelateR.C.Khannaand Smt.SurekhaKhanna,fortheirconstantencouragementandsupport,andtohiswife, Sheenu,andchildren,MasterBhavyaandMasterSanyukt. KalpnaSagarwouldliketodedicatethisbooktoherfather,Mr.LekhRamSagar,andher mother,Smt.GomtiSagar,thestrongestpersonsofherlife. (cid:2) (cid:2) (cid:2) (cid:2) vii Contents ListofContributors xix SeriesPreface xxiii Preface xxv 1 IntelligentDataAnalysis:BlackBoxVersusWhiteBoxModeling 1 SarthakGupta,SiddhantBagga,andDeepakKumarSharma 1.1 Introduction 1 1.1.1 IntelligentDataAnalysis 1 1.1.2 ApplicationsofIDAandMachineLearning 2 (cid:2) 1.1.3 WhiteBoxModelsVersusBlackBoxModels 2 (cid:2) 1.1.4 ModelInterpretability 3 1.2 InterpretationofWhiteBoxModels 3 1.2.1 LinearRegression 3 1.2.2 DecisionTree 5 1.3 InterpretationofBlackBoxModels 7 1.3.1 PartialDependencePlot 7 1.3.2 IndividualConditionalExpectation 9 1.3.3 AccumulatedLocalEffects 9 1.3.4 GlobalSurrogateModels 12 1.3.5 LocalInterpretableModel-AgnosticExplanations 12 1.3.6 FeatureImportance 12 1.4 IssuesandFurtherChallenges 13 1.5 Summary 13 References 14 2 Data:ItsNatureandModernDataAnalyticalTools 17 RavinderAhuja,ShikharAsthana,AyushAhuja,andManuAgarwal 2.1 Introduction 17 2.2 DataTypesandVariousFileFormats 18 2.2.1 StructuredData 18 2.2.2 Semi-StructuredData 20 2.2.3 UnstructuredData 20 2.2.4 NeedforFileFormats 21 2.2.5 VariousTypesofFileFormats 22 2.2.5.1 CommaSeparatedValues(CSV) 22 (cid:2) (cid:2) viii Contents 2.2.5.2 ZIP 22 2.2.5.3 PlainText(txt) 23 2.2.5.4 JSON 23 2.2.5.5 XML 23 2.2.5.6 ImageFiles 24 2.2.5.7 HTML 24 2.3 OverviewofBigData 25 2.3.1 SourcesofBigData 27 2.3.1.1 Media 27 2.3.1.2 TheWeb 27 2.3.1.3 Cloud 27 2.3.1.4 InternetofThings 27 2.3.1.5 Databases 27 2.3.1.6 Archives 28 2.3.2 BigDataAnalytics 28 2.3.2.1 DescriptiveAnalytics 28 2.3.2.2 PredictiveAnalytics 28 2.3.2.3 PrescriptiveAnalytics 29 2.4 DataAnalyticsPhases 29 2.5 DataAnalyticalTools 30 2.5.1 MicrosoftExcel 30 (cid:2) 2.5.2 ApacheSpark 33 (cid:2) 2.5.3 OpenRefine 34 2.5.4 RProgramming 35 2.5.4.1 AdvantagesofR 36 2.5.4.2 DisadvantagesofR 36 2.5.5 Tableau 36 2.5.5.1 HowTableauWorks 36 2.5.5.2 TableauFeature 37 2.5.5.3 Advantages 37 2.5.5.4 Disadvantages 37 2.5.6 Hadoop 37 2.5.6.1 BasicComponentsofHadoop 38 2.5.6.2 Benefits 38 2.6 DatabaseManagementSystemforBigDataAnalytics 38 2.6.1 HadoopDistributedFileSystem 38 2.6.2 NoSql 38 2.6.2.1 CategoriesofNoSql 39 2.7 ChallengesinBigDataAnalytics 39 2.7.1 StorageofData 40 2.7.2 SynchronizationofData 40 2.7.3 SecurityofData 40 2.7.4 FewerProfessionals 40 2.8 Conclusion 40 References 41 (cid:2) (cid:2) Contents ix 3 StatisticalMethodsforIntelligentDataAnalysis:Introduction andVariousConcepts 43 ShubhamKumaram,SamarthChugh,andDeepakKumarSharma 3.1 Introduction 43 3.2 Probability 43 3.2.1 Definitions 43 3.2.1.1 RandomExperiments 43 3.2.1.2 Probability 44 3.2.1.3 ProbabilityAxioms 44 3.2.1.4 ConditionalProbability 44 3.2.1.5 Independence 44 3.2.1.6 RandomVariable 44 3.2.1.7 ProbabilityDistribution 45 3.2.1.8 Expectation 45 3.2.1.9 VarianceandStandardDeviation 45 3.2.2 Bayes’Rule 45 3.3 DescriptiveStatistics 46 3.3.1 PictureRepresentation 46 3.3.1.1 FrequencyDistribution 46 3.3.1.2 SimpleFrequencyDistribution 46 3.3.1.3 GroupedFrequencyDistribution 46 (cid:2) 3.3.1.4 StemandLeafDisplay 46 (cid:2) 3.3.1.5 HistogramandBarChart 47 3.3.2 MeasuresofCentralTendency 47 3.3.2.1 Mean 47 3.3.2.2 Median 47 3.3.2.3 Mode 47 3.3.3 MeasuresofVariability 48 3.3.3.1 Range 48 3.3.3.2 BoxPlot 48 3.3.3.3 VarianceandStandardDeviation 48 3.3.4 SkewnessandKurtosis 48 3.4 InferentialStatistics 49 3.4.1 FrequentistInference 49 3.4.1.1 PointEstimation 50 3.4.1.2 IntervalEstimation 50 3.4.2 HypothesisTesting 51 3.4.3 StatisticalSignificance 51 3.5 StatisticalMethods 52 3.5.1 Regression 52 3.5.1.1 LinearModel 52 3.5.1.2 NonlinearModels 52 3.5.1.3 GeneralizedLinearModels 53 3.5.1.4 AnalysisofVariance 53 3.5.1.5 MultivariateAnalysisofVariance 55 (cid:2) (cid:2) x Contents 3.5.1.6 Log-LinearModels 55 3.5.1.7 LogisticRegression 56 3.5.1.8 RandomEffectsModel 56 3.5.1.9 Overdispersion 57 3.5.1.10 HierarchicalModels 57 3.5.2 AnalysisofSurvivalData 57 3.5.3 PrincipalComponentAnalysis 58 3.6 Errors 59 3.6.1 ErrorinRegression 60 3.6.2 ErrorinClassification 61 3.7 Conclusion 61 References 61 4 IntelligentDataAnalysiswithDataMining:Theoryand Applications 63 ShivamBachhety,RamneekSinghal,andRachnaJain Objective 63 4.1 IntroductiontoDataMining 63 4.1.1 ImportanceofIntelligentDataAnalyticsinBusiness 64 4.1.2 ImportanceofIntelligentDataAnalyticsinHealthCare 65 4.2 DataandKnowledge 65 (cid:2) 4.3 DiscoveringKnowledgeinDataMining 66 (cid:2) 4.3.1 ProcessMining 67 4.3.2 ProcessofKnowledgeDiscovery 67 4.4 DataAnalysisandDataMining 69 4.5 DataMining:Issues 69 4.6 DataMining:SystemsandQueryLanguage 71 4.6.1 DataMiningSystems 71 4.6.2 DataMiningQueryLanguage 72 4.7 DataMiningMethods 73 4.7.1 Classification 74 4.7.2 ClusterAnalysis 75 4.7.3 Association 75 4.7.4 DecisionTreeInduction 76 4.8 DataExploration 77 4.9 DataVisualization 80 4.10 ProbabilityConceptsforIntelligentDataAnalysis(IDA) 83 Reference 83 5 IntelligentDataAnalysis:DeepLearningandVisualization 85 ThanD.LeandHuyV.Pham 5.1 Introduction 85 5.2 DeepLearningandVisualization 86 5.2.1 LinearandLogisticRegressionandVisualization 86 5.2.2 CNNArchitecture 89 (cid:2) (cid:2) Contents xi 5.2.2.1 VanishingGradientProblem 90 5.2.2.2 ConvolutionalNeuralNetworks(CNNs) 91 5.2.3 ReinforcementLearning 91 5.2.4 InceptionandResNetNetworks 93 5.2.5 Softmax 94 5.3 DataProcessingandVisualization 97 5.3.1 RegularizationforDeepLearningandVisualization 98 5.3.1.1 RegularizationforLinearRegression 98 5.4 ExperimentsandResults 102 5.4.1 MaskRCNNBasedonObjectDetectionandSegmentation 102 5.4.2 DeepMatrixFactorization 108 5.4.2.1 NetworkVisualization 108 5.4.3 DeepLearningandReinforcementLearning 111 5.5 Conclusion 112 References 113 6 ASystematicReviewontheEvolutionofDentalCariesDetection MethodsandItsSignificanceinDataAnalysisPerspective 115 SomaDatta,NabenduChaki,andBiswajitModak 6.1 Introduction 115 6.1.1 AnalysisofDentalCaries 115 (cid:2) 6.2 DifferentCariesLesionDetectionMethodsandDataCharacterization 119 (cid:2) 6.2.1 PointDetectionMethod 120 6.2.2 VisibleLightPropertyMethod 121 6.2.3 Radiographs 121 6.2.4 Light-EmittingDevices 123 6.2.5 OpticalCoherentTomography(OCT) 125 6.2.6 SoftwareTools 125 6.3 TechnicalChallengeswiththeExistingMethods 126 6.3.1 ChallengesinDataAnalysisPerspective 127 6.4 ResultAnalysis 129 6.5 Conclusion 129 Acknowledgment 131 References 131 7 IntelligentDataAnalysisUsingHadoopCluster–Inspired MapReduceFrameworkandAssociationRuleMiningonEducational Domain 137 PratiyushGuleriaandManuSood 7.1 Introduction 137 7.1.1 ResearchAreasofIDA 138 7.1.2 TheNeedforIDAinEducation 139 7.2 LearningAnalyticsinEducation 139 7.2.1 RoleofWeb-EnabledandMobileComputinginEducation 141 7.2.2 BenefitsofLearningAnalytics 142 (cid:2)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.