ebook img

Mastering Machine Learning with R: Advanced machine learning techniques for building smart applications with R 3.5, 3rd Edition PDF

344 Pages·2019·5.821 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mastering Machine Learning with R: Advanced machine learning techniques for building smart applications with R 3.5, 3rd Edition

Mastering Machine Learning with R Third Edition Advanced machine learning techniques for building smart applications with R 3.5 Cory Lesmeister BIRMINGHAM - MUMBAI Mastering Machine Learning with R Third Edition Copyright © 2019 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Commissioning Editor: Sunith Shetty Acquisition Editor: Devika Battike Content Development Editor: Unnati Guha Technical Editor: Dinesh Chaudhary Copy Editor: Safis Editing Project Coordinator: Manthan Patel Proofreader: Safis Editing Indexer: Priyanka Dhadke Graphics: Jisha Chirayil Production Coordinator: Jisha Chirayil First published: October 2015 Second edition: April 2017 Third edition: January 2019 Production reference: 1310119 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78961-800-6 www.packtpub.com mapt.io Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website. Why subscribe? Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals Improve your learning with Skill Plans built especially for you Get a free eBook or video every month Mapt is fully searchable Copy and paste, print, and bookmark content Packt.com Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. Contributors About the author Cory Lesmeister has over fourteen years of quantitative experience and is currently a senior data scientist for the Advanced Analytics team at Cummins, Inc. in Columbus, Indiana. Cory spent 16 years at Eli Lilly and Company in sales, market research, Lean Six Sigma, marketing analytics, and new product forecasting. He also has several years of experience in the insurance and banking industries, both as a consultant and as a manager of marketing analytics. A former US Army active duty and reserve officer, Cory was stationed in Baghdad, Iraq, in 2009 serving as the strategic advisor to the 29,000-person Iraqi Oil Police, succeeding where others failed by acquiring and delivering promised equipment to help the country secure and protect its oil infrastructure. Cory has a BBA in Aviation Administration from the University of North Dakota and a commercial helicopter license. About the reviewers Subhash Shah works as a head of technology at AIMDek Technologies Pvt. Ltd. He is an experienced solutions architect with over 12 years of experience. He holds a degree in information technology from a reputable university. He is an advocate of open source development and its use in solving critical business problems at a reduced cost. His interests include microservices, data analysis, machine learning, artificial intelligence, and databases. He is an admirer of quality code and TDD. His technical skills include translating business requirements into scalable architecture, designing sustainable solutions, and project delivery. He is a co-author of MySQL 8 Administrator's Guide and Hands-on High Performance with Spring 5. Doug Ortiz is an experienced enterprise cloud, big data, data analytics, and solutions architect who has architected, designed, developed, re-engineered, and integrated enterprise solutions. His other expertise includes Amazon Web Services, Azure, Google Cloud Platform, business intelligence, Hadoop, Spark, NoSQL databases, and SharePoint. He is the founder of Illustris. Packt is searching for authors like you If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea. Table of Contents Preface 1 Chapter 1: Preparing and Understanding Data 6 Overview 7 Reading the data 8 Handling duplicate observations 10 Descriptive statistics 11 Exploring categorical variables 12 Handling missing values 14 Zero and near-zero variance features 16 Treating the data 18 Correlation and linearity 20 Summary 24 Chapter 2: Linear Regression 25 Univariate linear regression 26 Building a univariate model 29 Reviewing model assumptions 32 Multivariate linear regression 34 Loading and preparing the data 35 Modeling and evaluation – stepwise regression 42 Modeling and evaluation – MARS 48 Reverse transformation of natural log predictions 52 Summary 55 Chapter 3: Logistic Regression 56 Classification methods and linear regression 57 Logistic regression 57 Model training and evaluation 58 Training a logistic regression algorithm 59 Weight of evidence and information value 61 Feature selection 63 Cross-validation and logistic regression 66 Multivariate adaptive regression splines 72 Model comparison 76 Summary 78 Chapter 4: Advanced Feature Selection in Linear Models 79 Regularization overview 80 Ridge regression 81 LASSO 81 Table of Contents Elastic net 82 Data creation 82 Modeling and evaluation 85 Ridge regression 85 LASSO 90 Elastic net 93 Summary 99 Chapter 5: K-Nearest Neighbors and Support Vector Machines 100 K-nearest neighbors 101 Support vector machines 102 Manipulating data 106 Dataset creation 106 Data preparation 109 Modeling and evaluation 112 KNN modeling 112 Support vector machine 120 Summary 126 Chapter 6: Tree-Based Classification 127 An overview of the techniques 128 Understanding a regression tree 128 Classification trees 129 Random forest 130 Gradient boosting 131 Datasets and modeling 132 Classification tree 132 Random forest 138 Extreme gradient boosting – classification 146 Feature selection with random forests 151 Summary 154 Chapter 7: Neural Networks and Deep Learning 155 Introduction to neural networks 156 Deep learning – a not-so-deep overview 161 Deep learning resources and advanced methods 163 Creating a simple neural network 165 Data understanding and preparation 165 Modeling and evaluation 167 An example of deep learning 170 Keras and TensorFlow background 170 Loading the data 171 Creating the model function 172 Model training 174 Summary 176 [ ii ] Table of Contents Chapter 8: Creating Ensembles and Multiclass Methods 177 Ensembles 178 Data understanding 179 Modeling and evaluation 181 Random forest model 181 Creating an ensemble 184 Summary 185 Chapter 9: Cluster Analysis 186 Hierarchical clustering 188 Distance calculations 189 K-means clustering 190 Gower and PAM 190 Gower 191 PAM 192 Random forest 193 Dataset background 194 Data understanding and preparation 194 Modeling 197 Hierarchical clustering 197 K-means clustering 205 Gower and PAM 208 Random forest and PAM 209 Summary 211 Chapter 10: Principal Component Analysis 212 An overview of the principal components 213 Rotation 216 Data 218 Data loading and review 219 Training and testing datasets 222 PCA modeling 224 Component extraction 224 Orthogonal rotation and interpretation 227 Creating scores from the components 228 Regression with MARS 229 Test data evaluation 233 Summary 235 Chapter 11: Association Analysis 236 An overview of association analysis 236 Creating transactional data 238 Data understanding 239 Data preparation 240 Modeling and evaluation 242 [ iii ] Table of Contents Summary 247 Chapter 12: Time Series and Causality 248 Univariate time series analysis 249 Understanding Granger causality 257 Time series data 258 Data exploration 260 Modeling and evaluation 265 Univariate time series forecasting 265 Examining the causality 275 Linear regression 276 Vector autoregression 277 Summary 283 Chapter 13: Text Mining 284 Text mining framework and methods 285 Topic models 287 Other quantitative analysis 288 Data overview 290 Data frame creation 290 Word frequency 292 Word frequency in all addresses 292 Lincoln's word frequency 294 Sentiment analysis 298 N-grams 302 Topic models 304 Classifying text 309 Data preparation 309 LASSO model 311 Additional quantitative analysis 313 Summary 320 Appendix A: Creating a Package 321 Creating a new package 321 Summary 326 Other Books You May Enjoy 327 Index 330 [ iv ]

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.