ebook img

Machine Learning Algorithms PDF

514 Pages·2018·65.309 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning Algorithms

Machine Learning Algorithms Second Edition Popular algorithms for data science and machine learning Giuseppe Bonaccorso BIRMINGHAM - MUMBAI Machine Learning Algorithms Second Edition Copyright © 2018 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Commissioning Editor: Pravin Dhandre Acquisition Editor: Divya Poojari Content Development Editor: Eisha Dsouza Technical Editor: Jovita Alva Copy Editor: Safis Editing Project Coordinator: Namrata Swetta Proofreader: Safis Editing Indexer: Tejal Daruwale Soni Graphics: Jisha Chirayil Production Coordinator: Nilesh Mohite First published: July 2017 Second edition: August 2018 Production reference: 1280818 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78934-799-9 www.packtpub.com To my family and to all the people who always believed in me and encouraged me in this long journey! – Giuseppe Bonaccorso mapt.io Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website. Why subscribe? Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals Improve your learning with Skill Plans built especially for you Get a free eBook or video every month Mapt is fully searchable Copy and paste, print, and bookmark content PacktPub.com Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. Contributors About the author Giuseppe Bonaccorso is an experienced team leader/manager in AI, machine/deep learning solution design, management, and delivery. He got his MScEng in electronics in 2005 from the University of Catania, Italy, and continued his studies at the University of Rome Tor Vergata and the University of Essex, UK. His main interests include machine/deep learning, reinforcement learning, big data, bio-inspired adaptive systems, cryptocurrencies, and NLP. I want to thank the people who have been close to me and have supported me, especially my parents, who never stopped encouraging me. About the reviewer Doug Ortiz is an experienced enterprise cloud, big data, data analytics, and solutions architect who has architected, designed, developed, re-engineered, and integrated enterprise solutions. Other expertise includes Amazon Web Services, Azure, Google Cloud, business intelligence, Hadoop, Spark, NoSQL databases, and SharePoint, to name a few. He is the founder of Illustris, LLC and is reachable at [email protected]. Huge thanks to my wonderful wife, Milla, Maria, Nikolay, and our children for all their support. Packt is searching for authors like you If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea. Table of Contents Preface 1 Chapter 1: A Gentle Introduction to Machine Learning 7 Introduction – classic and adaptive machines 8 Descriptive analysis 11 Predictive analysis 12 Only learning matters 13 Supervised learning 14 Unsupervised learning 17 Semi-supervised learning 19 Reinforcement learning 21 Computational neuroscience 23 Beyond machine learning – deep learning and bio-inspired adaptive systems 24 Machine learning and big data 26 Summary 27 Chapter 2: Important Elements in Machine Learning 29 Data formats 29 Multiclass strategies 33 One-vs-all 33 One-vs-one 34 Learnability 34 Underfitting and overfitting 36 Error measures and cost functions 39 PAC learning 43 Introduction to statistical learning concepts 44 MAP learning 45 Maximum likelihood learning 46 Class balancing 51 Resampling with replacement 52 SMOTE resampling 54 Elements of information theory 57 Entropy 57 Cross-entropy and mutual information 59 Divergence measures between two probability distributions 61 Summary 62 Chapter 3: Feature Selection and Feature Engineering 64 scikit-learn toy datasets 65 Table of Contents Creating training and test sets 66 Managing categorical data 68 Managing missing features 71 Data scaling and normalization 73 Whitening 75 Feature selection and filtering 77 Principal Component Analysis 80 Non-Negative Matrix Factorization 87 Sparse PCA 89 Kernel PCA 91 Independent Component Analysis 94 Atom extraction and dictionary learning 98 Visualizing high-dimensional datasets using t-SNE 101 Summary 103 Chapter 4: Regression Algorithms 104 Linear models for regression 104 A bidimensional example 106 Linear regression with scikit-learn and higher dimensionality 111 R2 score 115 Explained variance 116 Regressor analytic expression 117 Ridge, Lasso, and ElasticNet 118 Ridge 118 Lasso 121 ElasticNet 123 Robust regression 124 RANSAC 125 Huber regression 127 Bayesian regression 129 Polynomial regression 133 Isotonic regression 137 Summary 140 Chapter 5: Linear Classification Algorithms 141 Linear classification 142 Logistic regression 145 Implementation and optimizations 148 Stochastic gradient descent algorithms 151 Passive-aggressive algorithms 155 Passive-aggressive regression 161 Finding the optimal hyperparameters through a grid search 165 Classification metrics 168 Confusion matrix 170 [ ii ] Table of Contents Precision 174 Recall 174 F-Beta 175 Cohen's Kappa 176 Global classification report 178 Learning curve 178 ROC curve 180 Summary 184 Chapter 6: Naive Bayes and Discriminant Analysis 185 Bayes' theorem 186 Naive Bayes classifiers 188 Naive Bayes in scikit-learn 189 Bernoulli Naive Bayes 189 Multinomial Naive Bayes 192 An example of Multinomial Naive Bayes for text classification 194 Gaussian Naive Bayes 197 Discriminant analysis 201 Summary 206 Chapter 7: Support Vector Machines 207 Linear SVM 207 SVMs with scikit-learn 212 Linear classification 213 Kernel-based classification 215 Radial Basis Function 216 Polynomial kernel 217 Sigmoid kernel 217 Custom kernels 217 Non-linear examples 218 ν-Support Vector Machines 223 Support Vector Regression 226 An example of SVR with the Airfoil Self-Noise dataset 230 Introducing semi-supervised Support Vector Machines (S3VM) 234 Summary 241 Chapter 8: Decision Trees and Ensemble Learning 242 Binary Decision Trees 243 Binary decisions 244 Impurity measures 247 Gini impurity index 247 Cross-entropy impurity index 247 Misclassification impurity index 249 Feature importance 249 Decision Tree classification with scikit-learn 249 Decision Tree regression 257 [ iii ]

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.