ebook img

Practical Data Mining PDF

294 Pages·2011·3.96 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Practical Data Mining

Information Technology/ Database H a Achieves a unique and delicate balance between depth, breadth, and clarity. n c —Stefan Joe-Yen, Cognitive Research Engineer, Northrop Grumman Corporation o & Adjunct Professor, Department of Computer Science, Webster University c k Used as a primer for the recent graduate or as a refresher for the grizzled veteran, Practical Data Mining is a must-have book for anyone in the field of data mining and analytics. P —Chad Sessions, Program Manager, Advanced Analytics Group (AAG) R A Used by corporations, industry, and government to inform and fuel everything from C focused advertising to homeland security, data mining can be a very useful tool across a wide range of applications. Unfortunately, most books on the subject are T designed for the computer scientist and statistical illuminati and leave the reader largely adrift in technical waters. I C Revealing the lessons known to the seasoned expert, yet rarely written down for A the uninitiated, Practical Data Mining explains the ins-and-outs of the detection, characterization, and exploitation of actionable patterns in data. This working field L manual outlines the what, when, why, and how of data mining and offers an easy- to-follow, six-step spiral process. D Helping you avoid common mistakes, the book describes specific genres of data A mining practice. Most chapters contain one or more case studies with detailed T project descriptions, methods used, challenges encountered, and results obtained. The book includes working checklists for each phase of the data mining process. A Your passport to successful technical and planning discussions with management, senior scientists, and customers, these checklists lay out the right questions to ask M and the right points to make from an insider’s point of view. I Visit the book’s webpage for access to additional resources—including checklists, N figures, PowerPoint® slides, and a small set of simple prototype data mining tools. I http://www. celestech.com/PracticalDataMining N G K13109 ISBN: 978-1-4398-6836-2 90000 www.crcpress.com 9 781439 868362 www.auerbach-publications.com K13109 cvr mech.indd 1 10/31/11 4:31 PM Practical Data Mining K13109_FM.indd 1 11/8/11 4:17 PM TThhiiss ppaaggee iinntteennttiioonnaallllyy lleefftt bbllaannkk Practical Data Mining Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. K13109_FM.indd 3 11/8/11 4:17 PM CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2012 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20111031 International Standard Book Number-13: 978-1-4398-6837-9 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit- ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Dedication This book is dedicated to my beloved wife, Sandy, and to my dear little sister, Dr. Angela Lobreto. You make life a joy. Also, to my professional mentors George Milligan, Dr. Craig Price, and Tell Gates, three of the finest men I have ever known, or ever hope to know: May God bless you richly, gentlemen; He has blessed me richly through you. v TThhiiss ppaaggee iinntteennttiioonnaallllyy lleefftt bbllaannkk Contents Dedication v Preface xv About the Author xxi Acknowledgments xxiii Chapter 1 What Is Data Mining and What Can It Do? 1 Purpose 1 Goals 1 1.1 Introduction 1 1.2 A Brief Philosophical Discussion 2 1.3 The Most Important Attribute of the Successful Data Miner: Integrity 3 1.4 What Does Data Mining Do? 4 1.5 What Do We Mean By Data? 6 1.5.1 Nominal Data vs. Numeric Data 7 1.5.2 Discrete Data vs. Continuous Data 7 1.5.3 Coding and Quantization as Inverse Processes 8 1.5.4 A Crucial Distinction: Data and Information Are Not the Same Thing 9 1.5.5 The Parity Problem 11 1.5.6 Five Riddles about Information 11 1.5.7 Seven Riddles about Meaning 13 1.6 Data Complexity 14 1.7 Computational Complexity 15 vii viii Practical Data Mining 1.7.1 Some NP-Hard Problems 17 1.7.2 Some Worst-Case Computational Complexities 17 1.8 Summary 17 Chapter 2 The Data Mining Process 19 Purpose 19 Goals 19 2.1 Introduction 19 2.2 Discovery and Exploitation 20 2.3 Eleven Key Principles of Information Driven Data Mining 23 2.4 Key Principles Expanded 24 2.5 Type of Models: Descriptive, Predictive, Forensic 30 2.5.1 Domain Ontologies as Models 30 2.5.2 Descriptive Models 32 2.5.3 Predictive Models 32 2.5.4 Forensic Models 32 2.6 Data Mining Methodologies 32 2.6.1 Conventional System Development: Waterfall Process 33 2.6.2 Data Mining as Rapid Prototyping 34 2.7 A Generic Data Mining Process 34 2.8 RAD Skill Set Designators 35 2.9 Summary 36 Chapter 3 Problem Definition (Step 1) 37 Purpose 37 Goals 37 3.1 Introduction 37 3.2 Problem Definition Task 1: Characterize Your Problem 38 3.3 Problem Definition Checklist 38 3.3.1 Identify Previous Work 43 3.3.2 Data Demographics 45 3.3.3 User Interface 47 3.3.4 Covering Blind Spots 50 3.3.5 Evaluating Domain Expertise 51 3.3.6 Tools 53 3.3.7 Methodology 54 3.3.8 Needs 54 Contents ix 3.4 Candidate Solution Checklist 56 3.4.1 What Type of Data Mining Must the System Perform? 56 3.4.2 Multifaceted Problems Demand Multifaceted Solutions 57 3.4.3 The Nature of the Data 58 3.5 Problem Definition Task 2: Characterizing Your Solution 62 3.5.1 Candidate Solution Checklist 62 3.6 Problem Definition Case Study 64 3.6.1 Predictive Attrition Model: Summary Description 64 3.6.2 Glossary 64 3.6.3 The ATM Concept 65 3.6.4 Operational Functions 65 3.6.5 Predictive Modeling and ATM 67 3.6.6 Cognitive Systems and Predictive Modeling 68 3.6.7 The ATM Hybrid Cognitive Engine 68 3.6.8 Testing and Validation of Cognitive Systems 69 3.6.9 Spiral Development Methodology 69 3.7 Summary 70 Chapter 4 Data Evaluation (Step 2) 71 Purpose 71 Goals 71 4.1 Introduction 71 4.2 Data Accessibility Checklist 72 4.3 How Much Data Do You Need? 74 4.4 Data Staging 75 4.5 Methods Used for Data Evaluation 76 4.6 Data Evaluation Case Study: Estimating the Information Content Features 77 4.7 Some Simple Data Evaluation Methods 81 4.8 Data Quality Checklist 85 4.9 Summary 87 Chapter 5 Feature Extraction and Enhancement (Step 3) 89 Purpose 89 Goals 89 5.1 Introduction: A Quick Tutorial on Feature Space 89 5.1.1 Data Preparation Guidelines 90

Description:
Used by corporations, industry, and government to inform and fuel everything from focused advertising to homeland security, data mining can be a very useful tool across a wide range of applications. Unfortunately, most books on the subject are designed for the computer scientist and statistical illu
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.