ebook img

Decision Trees for Analytics Using SAS Enterprise Miner PDF

269 Pages·2013·6.44 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Decision Trees for Analytics Using SAS Enterprise Miner

D e Decision Trees for Analytics c i s i o n Using SAS Enterprise Miner ® ™ T r e e s f o r A n a l y t i c s U s i n g S A S ® E n t e r p r i s e M i n e r ™ d e V i l l e a n d N Barry de Ville and Padraic Neville e v i l l e Decision Trees for Analytics Using SAS® Enterprise Miner™ Barry de Ville and Padraic Neville support.sas.com/bookstore deVille, Barry, and Padraic Neville. Decision Trees for Analytics Using SAS® Enterprise Miner™. Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. For additional SAS resources, visit support.sas.com/bookstore. The correct bibliographic citation for this manual is as follows: de Ville, Barry, and Padraic Neville. 2013. Decision Trees for Analytics Using SAS® Enterprise Miner.Cary, NC: SAS Institute Inc. Decision Trees for Analytics Using SAS® Enterprise Miner Copyright © 2013, SAS Institute Inc., Cary, NC, USA ISBN 978-1-61290-252-4 (electronic book) ISBN 978-1-61290-315-6 All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414 1st printing, June 2013 SAS provides a complete selection of books and electronic products to help customers use SAS® software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit support.sas.com/bookstore or call 1-800-727-3228. SAS®and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. deVille, Barry, and Padraic Neville. Decision Trees for Analytics Using SAS® Enterprise Miner™. Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. For additional SAS resources, visit support.sas.com/bookstore. deVille, Barry, and Padraic Neville. Decision Trees for Analytics Using SAS® Enterprise Miner™. Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. For additional SAS resources, visit support.sas.com/bookstore. deVille, Barry, and Padraic Neville. Decision Trees for Analytics Using SAS® Enterprise Miner™. Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. For additional SAS resources, visit support.sas.com/bookstore. Contents Preface ...................................................................................................... ix About This Book ........................................................................................ xi About These Authors ................................................................................. xv Acknowledgments ................................................................................... xvii Chapter 1: Decision Trees—What Are They?............................................... 1 Introduction ........................................................................................................................ 1 Using Decision Trees with Other Modeling Approaches ......................................................... 4 Why Are Decision Trees So Useful? ..................................................................................... 6 Level of Measurement ......................................................................................................... 9 Chapter 2: Descriptive, Predictive, and Explanatory Analyses .................. 15 Introduction ...................................................................................................................... 16 The Importance of Showing Context .................................................................................. 17 Antecedents ............................................................................................................... 18 Intervening Factors ...................................................................................................... 19 A Classic Study and Illustration of the Need to Understand Context .................................... 19 The Effect of Context ........................................................................................................ 21 How Do Misleading Results Appear?.................................................................................. 22 Automatic Interaction Detection ................................................................................... 24 The Role of Validation and Statistics in Growing Decision Trees .......................................... 28 The Application of Statistical Knowledge to Growing Decision Trees .................................... 30 Significance Tests ....................................................................................................... 30 Validation to Determine Tree Size and Quality ..................................................................... 34 What Is Validation? ...................................................................................................... 35 Pruning ............................................................................................................................. 38 deVille, Barry, and Padraic Neville. Decision Trees for Analytics Using SAS® Enterprise Miner™. Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. For additional SAS resources, visit support.sas.com/bookstore. vi Contents Machine Learning, Rule Induction, and Statistical Decision Trees ........................................ 43 Rule Induction ............................................................................................................. 44 Rule Induction and the Work of Ross Quinlan ............................................................... 48 The Use of Multiple Trees ............................................................................................ 49 A Review of the Major Features of Decision Trees ............................................................... 50 Roots and Trees ......................................................................................................... 50 Branches .................................................................................................................... 50 Similarity Measures ..................................................................................................... 51 Recursive Growth ........................................................................................................ 51 Shaping the Decision Tree ........................................................................................... 51 Deploying Decision Trees ............................................................................................ 51 A Brief Review of the SAS Enterprise Miner ARBORETUM Procedure .................................. 52 Chapter 3: The Mechanics of Decision Tree Construction ....................... 55 The Basics of Decision Trees ............................................................................................. 55 Step 1—Preprocess the Data for the Decision Tree Growing Engine .................................... 57 Step 2—Set the Input and Target Modeling Characteristics ................................................. 59 Targets ....................................................................................................................... 60 Inputs ......................................................................................................................... 61 Step 3—Select the Decision Tree Growth Parameters ......................................................... 64 Step 4—Cluster and Process Each Branch-Forming Input Field ........................................... 66 Clustering Algorithms .................................................................................................. 69 The Kass Merge-and-Split Heuristic ............................................................................. 76 Dealing with Missing Data and Missing Inputs in Decision Trees .................................... 76 Step 5—Select the Candidate Decision Tree Branches ....................................................... 79 Step 6—Complete the Form and Content of the Final Decision Tree .................................... 94 Switching Targets ............................................................................................................ 105 Example of Multiple Target Selection Using the Home Equity Demonstration Data ......... 106 Synergy, Functionality, and the Wisdom of the End User .............................................. 114 Chapter 4: Business Intelligence and Decision Trees ............................. 117 Introduction ..................................................................................................................... 117 A Decision Tree Approach to Cube Construction ............................................................... 120 Multidimensional Cubes and Decision Trees Compared: A Small Business Example ...... 121 Multidimensional Cubes and Decision Trees: A Side-By-Side Comparison .................... 126 The Main Difference between Decision Trees and Multidimensional Cubes .................... 128 Regression as a Business Tool ......................................................................................... 128 Decision Trees and Regression Compared .................................................................. 129 deVille, Barry, and Padraic Neville. Decision Trees for Analytics Using SAS® Enterprise Miner™. Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. For additional SAS resources, visit support.sas.com/bookstore. Contents vii Multidimensional Analysis with Trees ................................................................................. 134 An Example with Multiple Targets ............................................................................... 135 Chapter 5: Theoretical Issues in the Decision Tree Growing Process ..... 145 Introduction ..................................................................................................................... 145 Crafting the Decision Tree Structure for Insight and Exposition ........................................... 146 Conceptual Model ...................................................................................................... 147 Predictive Issues: Accuracy, Reliability, Reproducibility, and Performance ..................... 154 Choosing the Right Number of Branches ..................................................................... 157 Perspectives on Selection Bias ......................................................................................... 158 Potential Remedies to Variable Selection Bias ............................................................. 159 Multiple Decision Trees .................................................................................................... 171 Ensembles ................................................................................................................. 172 Chapter 6: The Integration of Decision Trees with Other Data Mining Approaches ............................................................................................ 187 Introduction ..................................................................................................................... 187 Decision Trees in Stratified Regression .............................................................................. 188 Time-Ordered Data .................................................................................................... 189 Decision Trees in Forecasting Applications ........................................................................ 191 Decision Trees in Variable Selection .................................................................................. 194 Decision Tree Results ................................................................................................. 197 Interactions ................................................................................................................ 198 Cross-Contributions of Decision Trees and Other Approaches ..................................... 199 Decision Trees in Analytical Model Development ................................................................ 200 The Use of Decision Trees in Rule Induction ...................................................................... 204 Iterative Removal of Observations ............................................................................... 205 Conclusion ...................................................................................................................... 216 Business Intelligence .................................................................................................. 217 Data Mining ............................................................................................................... 217 Glossary ................................................................................................. 219 References ............................................................................................. 233 Index ...................................................................................................... 239 deVille, Barry, and Padraic Neville. Decision Trees for Analytics Using SAS® Enterprise Miner™. Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. For additional SAS resources, visit support.sas.com/bookstore. viii deVille, Barry, and Padraic Neville. Decision Trees for Analytics Using SAS® Enterprise Miner™. Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. For additional SAS resources, visit support.sas.com/bookstore. Preface This updated book on decision trees combines the talents and knowledge of two of the most experienced decision tree practitioners in the field today. Barry de Ville was a pioneer in the implementation of the CHAID and XAID approaches advocated by Kass (1975, 1980) and Hawkins and Kass (1982). He led the development of the first commercial implementation of an integrated CHAID/XAID approach to decision trees (deVille, 1990). Padraic Neville began developing decision tree algorithms in 1983 and served on the development team for the commercial implementation of the approach described in Classification and Regression Trees (Breiman et al. 1984). Padraic has served as the primary developer of the SAS decision tree software since the inauguration of the procedure in the first SAS Enterprise Miner release in 1999. He has seamlessly and artfully blended the foundational decision tree traditions together in the development of an integrated package that offers the best of all traditions in one place. This creates a unique ability to mix and match approaches, to make the appropriate adjustments, and to create an optimal decision tree. In addition to providing an exhaustive treatment of the end-to-end process of decision tree construction and the respective considerations and algorithms—as provided in the first edition— this edition adds up-to-date treatments of boosting and forest approaches, rule induction, and a dedicated section on the most recent findings related to bias reduction in variable selection. The coverage includes discussions of key issues in decision tree practice: how to protect against overfitting by proactively adjusting p-values (from the CHAID/XAID tradition) or by retrospectively pruning with validation (or cross-validation) in the Breiman et al. tradition. In the same fashion, multiple methods of dealing with missing values are described (treat missing as valid, assign it to the closest branch, use a surrogate, or distribute the values across branches). The various aspects of these approaches are all covered here, in one place. The overall framework that enables the user to incorporate prior probabilities or costs in the split search, pruning, and tree formation is also discussed. New additions to the SAS Enterprise Miner decision tree capabilities, such as support for multiple targets and switching targets mid-tree, are also provided. Additional sections on recent developments in rule induction are included. As with the first edition, this current edition remains the most comprehensive treatment of decision tree theory, use, and applications available in one easy-to-access place. deVille, Barry, and Padraic Neville. Decision Trees for Analytics Using SAS® Enterprise Miner™. Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. For additional SAS resources, visit support.sas.com/bookstore.

Description:
Decision Trees for Analytics Using SAS Enterprise Miner is the most comprehensive treatment of decision tree theory, use, and applications available in one easy-to-access place. This book illustrates the application and operation of decision trees in business intelligence, data mining, business anal
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.