ebook img

Statistics for Machine Learning (Python, R) PDF

438 Pages·2017·16.457 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistics for Machine Learning (Python, R)

Statistics for Machine Learning Build supervised, unsupervised, and reinforcement learning models using both Python and R Pratap Dangeti BIRMINGHAM - MUMBAI Statistics for Machine Learning Copyright © 2017 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: July 2017 Production reference: 1180717 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78829-575-8 www.packtpub.com Credits Author Copy Editor Pratap Dangeti Safis Editing Reviewer Project Coordinator Manuel Amunategui Nidhi Joshi Commissioning Editor Proofreader Veena Pagare Safis Editing Acquisition Editor Indexer Aman Singh Tejal Daruwale Soni Content Development Editor Graphics Mayur Pawanikar Tania Dutta Technical Editor Production Coordinator Dinesh Pawar Arvindkumar Gupta About the Author Pratap Dangeti develops machine learning and deep learning solutions for structured, image, and text data at TCS, analytics and insights, innovation lab in Bangalore. He has acquired a lot of experience in both analytics and data science. He received his master's degree from IIT Bombay in its industrial engineering and operations research program. He is an artificial intelligence enthusiast. When not working, he likes to read about next-gen technologies and innovative methodologies. First and foremost, I would like to thank my mom, Lakshmi, for her support throughout my career and in writing this book. She has been my inspiration and motivation for continuing to improve my knowledge and helping me move ahead in my career. She is my strongest supporter, and I dedicate this book to her. I also thank my family and friends for their encouragement, without which it would not be possible to write this book. I would like to thank my acquisition editor, Aman Singh, and content development editor, Mayur Pawanikar, who chose me to write this book and encouraged me constantly throughout the period of writing with their invaluable feedback and input. About the Reviewer Manuel Amunategui is vice president of data science at SpringML, a startup offering Google Cloud TensorFlow and Salesforce enterprise solutions. Prior to that, he worked as a quantitative developer on Wall Street for a large equity-options market-making firm and as a software developer at Microsoft. He holds master degrees in predictive analytics and international administration. He is a data science advocate, blogger/vlogger (amunategui.github.io) and a trainer on Udemy and O'Reilly Media, and technical reviewer at Packt Publishing. www.PacktPub.com For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. https://www.packtpub.com/mapt Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career. Why subscribe? Fully searchable across every book published by Packt Copy and paste, print, and bookmark content On demand and accessible via a web browser Customer Feedback Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1788295757. If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products! Table of Contents Preface 1 Chapter 1: Journey from Statistics to Machine Learning 7 Statistical terminology for model building and validation 8 Machine learning 8 Major differences between statistical modeling and machine learning 10 Steps in machine learning model development and deployment 11 Statistical fundamentals and terminology for model building and validation 12 Bias versus variance trade-off 32 Train and test data 34 Machine learning terminology for model building and validation 35 Linear regression versus gradient descent 38 Machine learning losses 41 When to stop tuning machine learning models 43 Train, validation, and test data 44 Cross-validation 46 Grid search 46 Machine learning model overview 50 Summary 54 Chapter 2: Parallelism of Statistics and Machine Learning 55 Comparison between regression and machine learning models 55 Compensating factors in machine learning models 57 Assumptions of linear regression 58 Steps applied in linear regression modeling 61 Example of simple linear regression from first principles 61 Example of simple linear regression using the wine quality data 64 Example of multilinear regression - step-by-step methodology of model building 66 Backward and forward selection 69 Machine learning models - ridge and lasso regression 75 Example of ridge regression machine learning 77 Example of lasso regression machine learning model 80 Regularization parameters in linear regression and ridge/lasso regression 82 Summary 82 Chapter 3: Logistic Regression Versus Random Forest 83 Maximum likelihood estimation 83 Logistic regression – introduction and advantages 85 Terminology involved in logistic regression 87 Applying steps in logistic regression modeling 94 Example of logistic regression using German credit data 94 Random forest 111 Example of random forest using German credit data 113 Grid search on random forest 117 Variable importance plot 120 Comparison of logistic regression with random forest 122 Summary 124 Chapter 4: Tree-Based Machine Learning Models 125 Introducing decision tree classifiers 126 Terminology used in decision trees 127 Decision tree working methodology from first principles 128 Comparison between logistic regression and decision trees 134 Comparison of error components across various styles of models 135 Remedial actions to push the model towards the ideal region 136 HR attrition data example 137 Decision tree classifier 140 Tuning class weights in decision tree classifier 143 Bagging classifier 145 Random forest classifier 149 Random forest classifier - grid search 155 AdaBoost classifier 158 Gradient boosting classifier 163 Comparison between AdaBoosting versus gradient boosting 166 Extreme gradient boosting - XGBoost classifier 169 Ensemble of ensembles - model stacking 174 Ensemble of ensembles with different types of classifiers 174 Ensemble of ensembles with bootstrap samples using a single type of classifier 182 Summary 185 Chapter 5: K-Nearest Neighbors and Naive Bayes 186 K-nearest neighbors 187 KNN voter example 187 Curse of dimensionality 188 [ ii ]

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.