ebook img

The Handbook of Data Mining PDF

722 Pages·2003·8.832 MB·English
by  Nong Ye
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Handbook of Data Mining

THE HANDBOOK OF DATA MINING Edited by Nong Ye Human Factors and Ergonomics THE HANDBOOK OF DATA MINING HumanFactorsandErgonomics Gavriel Salvendy, Series Editor Hendrick,H.,andKleiner,B.(Eds.): Macroergonomics:Theory,Methods,and Applications Hollnagel,E.(Ed.): HandbookofCognitiveTaskDesign Jacko,J.A.,andSears,A.(Eds.): TheHuman-ComputerInteractionHandbook: Fundamentals,EvolvingTechnologiesandEmergingApplications Meister,D.,andEnderwick,T.(Eds.): HumanFactorsinSystemDesign,Development,and Testing Stanney,KayM.(Ed.): HandbookofVirtualEnvironments:Design,Implementation, andApplications Stephanidis,C.(Ed.): UserInterfacesforAll:Concepts,Methods,andTools Ye,Nong(Ed.): TheHandbookofDataMining AlsointhisSeries HCI1999Proceedings 2-VolumeSet (cid:1) Bullinger,H.-J.,andZiegler,J.(Eds.):Human-ComputerInteraction:Ergonomics andUserInterfaces (cid:1) Bullinger,H.-J.,andZiegler,J.(Eds.):Human-ComputerInteraction: Communication,Cooperation,andApplicationDesign HCI2001Proceedings 3-VolumeSet (cid:1) Smith,M.J.,Salvendy,G.,Harris,D.,andKoubek,R.J.(Eds.):UsabilityEvaluation andInterfaceDesign:CognitiveEngineering,IntelligentAgents,andVirtualReality (cid:1) Smith,M.J.,andSalvendy,G.(Eds.):Systems,Social,andInternationalization DesignAspectsofHuman-ComputerInteraction (cid:1) Stephanidis,C.(Ed.):UniversalAccessinHCI:TowardsanInformationSociety forAll FormoreinformationaboutLEAtitles,pleasecontactLawrenceErlbaumAssociates,Publishers,at www.erlbaum.com. THE HANDBOOK OF DATA MINING Edited by Nong Ye Arizona State University LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS 2003 Mahwah, New Jersey London SeniorAcquisitionsEditor: DebraRiegert EditorialAssistant: JasonPlaner CoverDesign: KathrynHoughtalingLacey TextbookProductionManager: PaulSmolenski Full-ServiceCompositor: TechBooks TextandCoverPrinter: HamiltonPrintingCompany Thisbookwastypesetin10/12pt.Times,Italic,Bold,BoldItalic, andCourier.TheheadsweretypesetinAmericanaBoldandAmericana BoldItalic. Copyright(cid:1)C 2003byLawrenceErlbaumAssociates,Inc. Allrightsreserved.Nopartofthisbookmaybereproducedinany form,byphotostat,microfilm,retrievalsystem,oranyothermeans, withoutpriorwrittenpermissionofthepublisher. LawrenceErlbaumAssociates,Inc.,Publishers 10IndustrialAvenue Mahwah,NewJersey07430 Theeditor,authors,andthepublisherhavemadeeveryeffortto provideaccurateandcompleteinformationinthishandbookbutthe handbookisnotintendedtoserveasareplacementforprofessional advice.Anyuseofthisinformationisatthereader’sdiscretion.The editor,authors,andthepublisherspecificallydisclaimanyandall liabilityarisingdirectlyorindirectlyfromtheuseorapplicationof anyinformationcontainedinthishandbook.Anappropriateprofessional shouldbeconsultedregardingyourspecificsituation. LibraryofCongressCataloging-in-PublicationData Thehandbookofdatamining/editedbyNongYe. p. cm.—(Humanfactorsandergonomics) Includesbibliographicalreferencesandindex. ISBN0-8058-4081-8 1.Datamining. I.Ye,Nong. II.Series. QA76.9.D343H385 2003 006.3—dc21 2002156029 BookspublishedbyLawrenceErlbaumAssociatesareprintedonacid-free paper,andtheirbindingsarechosenforstrengthanddurability. PrintedintheUnitedStatesofAmerica 10 9 8 7 6 5 4 3 2 1 Contents Foreword xviii GavrielSalvendy Preface xix NongYe AbouttheEditor xxiii AdvisoryBoard xxv Contributors xxvii I: METHODOLOGIESOFDATAMINING 1 DecisionTrees 3 JohannesGehrke Introduction 3 ProblemDefinition 4 ClassificationTreeConstruction 7 SplitSelection 7 DataAccess 8 TreePruning 15 MissingValues 17 AShortIntroductiontoRegressionTrees 20 ProblemDefinition 20 SplitSelection 20 DataAccess 21 ApplicationsandAvailableSoftware 22 CatalogingSkyObjects 22 DecisionTreesinToday’sDataMiningTools 22 Summary 22 References 23 2 AssociationRules 25 GeoffreyI.Webb Introduction 26 MarketBasketAnalysis 26 AssociationRuleDiscovery 27 TheAprioriAlgorithm 28 ThePoweroftheFrequentItemSetStrategy 29 MeasuresofInterestingness 31 v vi CONTENTS Lift 31 Leverage 32 ItemSetDiscovery 32 TechniquesforFrequentItemSetDiscovery 33 ClosedItemSetStrategies 33 LongItemSets 35 Sampling 35 TechniquesforDiscoveringAssociationRuleswithoutItemSetDiscovery 35 AssociationswithNumericValues 36 ApplicationsofAssociationRuleDiscovery 36 Summary 37 References 38 3 ArtificialNeuralNetworkModelsforDataMining 41 JennieSi,BenjaminJ.Nelson,andGeorgeC.Runger IntroductiontoMultilayerFeedforwardNetworks 42 GradientBasedTrainingMethodsforMFN 43 ThePartialDerivatives 44 NonlinearLeastSquaresMethods 45 BatchversusIncrementalLearning 47 ComparisonofMFNandOtherClassificationMethods 47 DecisionTreeMethods 47 DiscriminantAnalysisMethods 48 MultiplePartitionDecisionTree 49 AGrowingMFN 50 CaseStudy1—ClassifyingSurfaceTexture 52 ExperimentalConditions 52 QuantitativeComparisonResultsofClassificationMethods 53 ClosingDiscussionsonCase1 55 IntroductiontoSOM 55 TheSOMAlgorithm 56 SOMBuildingBlocks 57 ImplementationoftheSOMAlgorithm 58 CaseStudy2—DecodingMonkey’sMovementDirectionsfromIts CorticalActivities 59 TrajectoryComputationfromMotorCorticalDischargeRates 60 UsingDatafromSpiralTaskstoTraintheSOM 62 UsingDatafromSpiralandCenter→OutTaskstoTraintheSOM 62 AverageTestingResultUsingtheLeave-K-OutMethod 63 ClosingDiscussionsonCase2 64 FinalConclusionsandDiscussions 65 References 65 4 StatisticalAnalysisofNormalandAbnormalData 67 ConnieM.Borror Introduction 67 UnivariateControlCharts 68 VariablesControlCharts 68 AttributesControlCharts 81 CONTENTS vii CumulativeSumControlCharts 89 ExponentiallyWeightedMovingAverageControlCharts 93 ChoiceofControlChartingTechniques 95 AverageRunLength 96 MultivariateControlCharts 98 DataDescription 98 HotellingT2ControlChart 98 MultivariateEWMAControlCharts 101 Summary 102 References 102 5 BayesianDataAnalysis 103 DavidMadiganandGregRidgeway Introduction 104 FundamentalsofBayesianInference 104 ASimpleExample 104 AMoreComplicatedExample 106 HierarchicalModelsandExchangeability 109 PriorDistributionsinPractice 111 BayesianModelSelectionandModelAveraging 113 ModelSelection 113 ModelAveraging 114 ModelAssessment 114 BayesianComputation 115 ImportanceSampling 115 MarkovChainMonteCarlo(MCMC) 116 AnExample 117 ApplicationtoMassiveData 118 ImportanceSamplingforAnalysisofMassiveDataSets 118 VariationalMethods 120 BayesianModeling 121 BUGSandModelsofRealisticComplexityviaMCMC 121 BayesianPredictiveModeling 125 BayesianDescriptiveModeling 127 AvailableSoftware 128 DiscussionandFutureDirections 128 Summary 128 Acknowledgments 129 References 129 6 HiddenMarkovProcessesandSequentialPatternMining 133 StevenL.Scott IntroductiontoHiddenMarkovModels 134 ParameterEstimationinthePresenceofMissingData 136 TheEMAlgorithm 136 MCMCDataAugmentation 138 MissingDataSummary 140 LocalComputation 140 TheLikelihoodRecursion 140 viii CONTENTS TheForward-BackwardRecursions 141 TheViterbiAlgorithm 142 UnderstandingtheRecursions 143 ANumericalExampleIllustratingtheRecursions 143 IllustrativeExamplesandApplications 144 FetalLambMovements 144 TheBusinessCycle 150 HMMStationaryandPredictiveDistributions 153 StationaryDistributionofd 153 t PredictiveDistributions 154 PosteriorCovarianceofh 154 AvailableSoftware 154 Summary 154 References 155 7 StrategiesandMethodsforPrediction 159 GregRidgeway IntroductiontothePredictionProblem 160 GuidingExamples 160 PredictionModelComponents 161 LossFunctions—WhatWeareTryingtoAccomplish 162 CommonRegressionLossFunctions 162 CommonClassificationLossFunctions 163 CoxLossFunctionforSurvivalData 166 LinearModels 167 LinearRegression 168 Classification 169 GeneralizedLinearModel 172 NonlinearModels 174 NearestNeighborandKernelMethods 174 TreeModels 177 Smoothing,BasisExpansions,andAdditiveModels 179 NeuralNetworks 182 SupportVectorMachines 183 Boosting 185 AvailabilityofSoftware 188 Summary 189 References 190 8 PrincipalComponentsandFactorAnalysis 193 DanielW.Apley Introduction 194 ExamplesofVariationPatternsinCorrelatedMultivariateData 194 OverviewofMethodsforIdentifyingVariationPatterns 197 RepresentationandIllustrationofVariationPatternsinMultivariateData 197 PrincipalComponentsAnalysis 198 DefinitionofPrincipalComponents 199 UsingPrincipalComponentsasEstimatesoftheVariationPatterns 199

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.