Table Of Content

Feature Engineering and Selection A Practical Approach for Predictive Models CHAPMAN & HALL/CRC DATA SCIENCE SERIES Reflecting the interdisciplinary nature of the field, this book series brings together researchers, practitioners, and instructors from statistics, computer science, machine learning, and analytics. The series will publish cutting-edge research, industry applications, and textbooks in data science. The inclusion of concrete examples, applications, and methods is highly encouraged. The scope of the series includes titles in the areas of machine learning, pattern recog- nition, predictive analytics, business analytics, Big Data, visualization, programming, software, learning analytics, data wrangling, interactive graphics, and reproducible research. Published Titles Feature Engineering and Selection: A Practical Approach for Predictive Models Max Kuhn and Kjell Johnson Probability and Statistics for Data Science: Math + R + Data Norman Matloff Feature Engineering and Selection A Practical Approach for Predictive Models Max Kuhn Kjell Johnson Map tiles on front cover: © OpenStreetMap © CartoDB. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2020 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper International Standard Book Number-13: 978-1-138-07922-9 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com To: Maryann Kuhn Valerie, Truman, and Louie Contents Preface • xi Author Bios • xv 1 Introduction • 1 1.1 A Simple Example • 4 1.2 Important Concepts • 7 1.3 A More Complex Example • 15 1.4 Feature Selection • 17 1.5 An Outline of the Book • 18 1.6 Computing • 20 2 Illustrative Example: Predicting Risk of Ischemic Stroke • 21 2.1 Splitting • 23 2.2 Preprocessing • 23 2.3 Exploration • 26 2.4 Predictive Modeling across Sets • 30 2.5 Other Considerations • 34 2.6 Computing • 34 3 A Review of the Predictive Modeling Process • 35 3.1 Illustrative Example: OkCupid Profile Data • 35 3.2 Measuring Performance • 36 3.3 Data Splitting • 46 3.4 Resampling • 47 3.5 Tuning Parameters and Overfitting • 56 3.6 Model Optimization and Tuning • 57 3.7 Comparing Models Using the Training Set • 61 3.8 Feature Engineering without Overfitting • 62 3.9 Summary • 64 3.10 Computing • 64 4 Exploratory Visualizations • 65 4.1 Introduction to the Chicago Train Ridership Data • 66 vii viii Contents 4.2 Visualizations for Numeric Data: Exploring Train Ridership Data • 69 4.3 Visualizations for Categorical Data: Exploring the OkCupid Data • 83 4.4 Postmodeling Exploratory Visualizations • 88 4.5 Summary • 92 4.6 Computing • 92 5 Encoding Categorical Predictors • 93 5.1 Creating Dummy Variables for Unordered Categories • 94 5.2 Encoding Predictors with Many Categories • 96 5.3 Approaches for Novel Categories • 102 5.4 Supervised Encoding Methods • 102 5.5 Encodings for Ordered Data • 107 5.6 Creating Features from Text Data • 109 5.7 Factors versus Dummy Variables in Tree-Based Models • 114 5.8 Summary • 119 5.9 Computing • 120 6 Engineering Numeric Predictors • 121 6.1 1:1 Transformations • 122 6.2 1:Many Transformations • 126 6.3 Many:Many Transformations • 133 6.4 Summary • 154 6.5 Computing • 155 7 Detecting Interaction Effects • 157 7.1 Guiding Principles in the Search for Interactions • 161 7.2 Practical Considerations • 164 7.3 The Brute-Force Approach to Identifying Predictive Interactions • 165 7.4 Approaches when Complete Enumeration Is Practically Impossible • 172 7.5 Other Potentially Useful Tools • 184 7.6 Summary • 185 7.7 Computing • 186 8 Handling Missing Data • 187 8.1 Understanding the Nature and Severity of Missing Information • 189 8.2 Models that Are Resistant to Missing Values • 195 8.3 Deletion of Data • 196 8.4 Encoding Missingness • 197 8.5 Imputation Methods • 198 8.6 Special Cases • 203 8.7 Summary • 203 8.8 Computing • 204 9 Working with Profile Data • 205 9.1 Illustrative Data: Pharmaceutical Manufacturing Monitoring • 209 9.2 What Are the Experimental Unit and the Unit of Prediction? • 210 9.3 Reducing Background • 214 9.4 Reducing Other Noise • 215 9.5 Exploiting Correlation • 217 Contents ix 9.6 Impacts of Data Processing on Modeling • 219 9.7 Summary • 224 9.8 Computing • 225 10 Feature Selection Overview • 227 10.1 Goals of Feature Selection • 227 10.2 Classes of Feature Selection Methodologies • 228 10.3 Effect of Irrelevant Features • 232 10.4 Overfitting to Predictors and External Validation • 235 10.5 A Case Study • 238 10.6 Next Steps • 240 10.7 Computing • 240 11 Greedy Search Methods • 241 11.1 Illustrative Data: Predicting Parkinson’s Disease • 241 11.2 Simple Filters • 241 11.3 Recursive Feature Elimination • 248 11.4 Stepwise Selection • 252 11.5 Summary • 254 11.6 Computing • 255 12 Global Search Methods • 257 12.1 Naive Bayes Models • 257 12.2 Simulated Annealing • 260 12.3 Genetic Algorithms • 270 12.4 Test Set Results • 280 12.5 Summary • 281 12.6 Computing • 282 Bibliography • 283 Index • 295

Feature Engineering and Selection: A Practical Approach for Predictive Models PDF

314 Pages·2021·20.676 MB·English

by Max Kuhn, Kjell Johnson

#Computers #Cybernetics: Artificial Intelligence

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Feature Engineering and Selection: A Practical Approach for Predictive Models PDF Free - Full Version

by Max Kuhn, Kjell Johnson| 2021| 314 pages| 20.676| English

Download Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn, Kjell Johnson in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Feature Engineering and Selection: A Practical Approach for Predictive Models

No description available for this book.

Detailed Information

Author:	Max Kuhn, Kjell Johnson
Publication Year:	2021
ISBN:	9781138079229
Pages:	314
Language:	English
File Size:	20.676
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Feature Engineering and Selection: A Practical Approach for Predictive Models Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Feature Engineering and Selection: A Practical Approach for Predictive Models PDF?

Yes, on https://PDFdrive.to you can download Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn, Kjell Johnson completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Feature Engineering and Selection: A Practical Approach for Predictive Models on my mobile device?

After downloading Feature Engineering and Selection: A Practical Approach for Predictive Models PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Feature Engineering and Selection: A Practical Approach for Predictive Models?

Yes, this is the complete PDF version of Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn, Kjell Johnson. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Feature Engineering and Selection: A Practical Approach for Predictive Models PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.