ebook img

Applied Unsupervised Learning with Python PDF

483 Pages·2019·10.692 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Applied Unsupervised Learning with Python

Applied Unsupervised Learning with Python Discover hidden patterns and relationships in unstructured data with Python Benjamin Johnston, Aaron Jones, and Christopher Kruger Applied Unsupervised Learning with Python Copyright © 2019 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Authors: Benjamin Johnston, Aaron Jones, and Christopher Kruger Technical Reviewer: Jay Kim Managing Editor: Rutuja Yerunkar Acquisitions Editor: Aditya Date Production Editor: Nitesh Thakur Editorial Board: David Barnes, Mayank Bhardwaj, Ewan Buckingham, Simon Cox, Mahesh Dhyani, Taabish Khan, Manasa Kumar, Alex Mazonowicz, Douglas Paterson, Dominic Pereira, Shiny Poojary, Erol Staveley, Ankita Thakur, Mohita Vyas, and Jonathan Wray First Published: May 2019 Production Reference: 1240519 ISBN: 978-1-78995-229-2 Published by Packt Publishing Ltd. Livery Place, 35 Livery Street Birmingham B3 2PB, UK Table of Contents Preface i Introduction to Clustering 1 Introduction .................................................................................................... 2 Unsupervised Learning versus Supervised Learning ................................ 2 Clustering ........................................................................................................ 4 Identifying Clusters .............................................................................................. 4 Two-Dimensional Data ........................................................................................ 6 Exercise 1: Identifying Clusters in Data ............................................................. 6 Introduction to k-means Clustering .......................................................... 10 No-Math k-means Walkthrough ...................................................................... 10 k-means Clustering In-Depth Walkthrough ................................................... 12 Alternative Distance Metric – Manhattan Distance ....................................... 12 Deeper Dimensions ........................................................................................... 13 Exercise 2: Calculating Euclidean Distance in Python ................................... 14 Exercise 3: Forming Clusters with the Notion of Distance ........................... 15 Exercise 4: Implementing k-means from Scratch .......................................... 16 Exercise 5: Implementing k-means with Optimization ................................. 18 Clustering Performance: Silhouette Score ...................................................... 22 Exercise 6: Calculating the Silhouette Score ................................................... 23 Activity 1: Implementing k-means Clustering ................................................ 24 Summary ............................................................................................................. 27 Hierarchical Clustering 29 Introduction .................................................................................................. 30 Clustering Refresher .................................................................................... 30 k-means Refresher ............................................................................................. 31 The Organization of Hierarchy ................................................................... 31 Introduction to Hierarchical Clustering .................................................... 33 Steps to Perform Hierarchical Clustering ....................................................... 34 An Example Walk-Through of Hierarchical Clustering .................................. 34 Exercise 7: Building a Hierarchy ....................................................................... 38 Linkage .......................................................................................................... 42 Activity 2: Applying Linkage Criteria ................................................................ 43 Agglomerative versus Divisive Clustering ................................................. 45 Exercise 8: Implementing Agglomerative Clustering with scikit-learn ........ 46 Activity 3: Comparing k-means with Hierarchical Clustering ....................... 48 k-means versus Hierarchical Clustering ................................................... 51 Summary ....................................................................................................... 52 Neighborhood Approaches and DBSCAN 55 Introduction .................................................................................................. 56 Clusters as Neighborhoods .............................................................................. 57 Introduction to DBSCAN ............................................................................. 58 DBSCAN In-Depth ............................................................................................... 60 Walkthrough of the DBSCAN Algorithm .......................................................... 61 Exercise 9: Evaluating the Impact of Neighborhood Radius Size ................. 62 DBSCAN Attributes – Neighborhood Radius ................................................... 65 Activity 4: Implement DBSCAN from Scratch ................................................. 66 DBSCAN Attributes – Minimum Points ............................................................ 67 Exercise 10: Evaluating the Impact of Minimum Points Threshold ............. 68 Activity 5: Comparing DBSCAN with k-means and Hierarchical Clustering .............................................................................. 72 DBSCAN Versus k-means and Hierarchical Clustering ............................ 73 Summary ....................................................................................................... 74 Dimension Reduction and PCA 77 Introduction .................................................................................................. 78 What Is Dimensionality Reduction? ................................................................. 78 Applications of Dimensionality Reduction ..................................................... 80 The Curse of Dimensionality ............................................................................ 82 Overview of Dimensionality Reduction Techniques ................................ 84 Dimensionality Reduction and Unsupervised Learning ............................... 86 PCA ................................................................................................................. 87 Mean .................................................................................................................... 87 Standard Deviation ............................................................................................ 87 Covariance .......................................................................................................... 88 Covariance Matrix .............................................................................................. 88 Exercise 11: Understanding the Foundational Concepts of Statistics ......... 89 Eigenvalues and Eigenvectors .......................................................................... 93 Exercise 12: Computing Eigenvalues and Eigenvectors ................................ 94 The Process of PCA ............................................................................................ 97 Exercise 13: Manually Executing PCA .............................................................. 99 Exercise 14: Scikit-Learn PCA ......................................................................... 104 Activity 6: Manual PCA versus scikit-learn ................................................... 109 Restoring the Compressed Dataset .............................................................. 111 Exercise 15: Visualizing Variance Reduction with Manual PCA ................. 111 Exercise 16: Visualizing Variance Reduction with ....................................... 118 Exercise 17: Plotting 3D Plots in Matplotlib ................................................. 121 Activity 7: PCA Using the Expanded Iris Dataset ......................................... 124 Summary ..................................................................................................... 127 Autoencoders 129 Introduction ................................................................................................ 130 Fundamentals of Artificial Neural Networks ......................................... 131 The Neuron ...................................................................................................... 132 Sigmoid Function ............................................................................................ 133 Rectified Linear Unit (ReLU) ........................................................................... 134 Exercise 18: Modeling the Neurons of an Artificial Neural Network ........ 134 Activity 8: Modeling Neurons with a ReLU Activation Function ................ 138 Neural Networks: Architecture Definition ................................................... 139 Exercise 19: Defining a Keras Model ............................................................. 140 Neural Networks: Training ............................................................................ 142 Exercise 20: Training a Keras Neural Network Model ................................ 144 Activity 9: MNIST Neural Network ................................................................ 153 Autoencoders ............................................................................................. 154 Exercise 21: Simple Autoencoder .................................................................. 155 Activity 10: Simple MNIST Autoencoder ....................................................... 159 Exercise 22: Multi-Layer Autoencoder .......................................................... 161 Convolutional Neural Networks ................................................................... 165 Exercise 23: Convolutional Autoencoder ..................................................... 166 Activity 11: MNIST Convolutional Autoencoder .......................................... 171 Summary ..................................................................................................... 173 t-Distributed Stochastic Neighbor Embedding (t-SNE) 175 Introduction ................................................................................................ 176 Stochastic Neighbor Embedding (SNE) .................................................... 178 t-Distributed SNE ....................................................................................... 179 Exercise 24: t-SNE MNIST ............................................................................... 180 Activity 12: Wine t-SNE ................................................................................... 191 Interpreting t-SNE Plots ............................................................................ 193 Perplexity ......................................................................................................... 193 Exercise 25: t-SNE MNIST and Perplexity ..................................................... 193 Activity 13: t-SNE Wine and Perplexity ......................................................... 199 Iterations .......................................................................................................... 200 Exercise 26: t-SNE MNIST and Iterations ...................................................... 200 Activity 14: t-SNE Wine and Iterations .......................................................... 204 Final Thoughts on Visualizations .................................................................. 205 Summary ..................................................................................................... 205 Topic Modeling 207 Introduction ................................................................................................ 208 Topic Models .................................................................................................... 209 Exercise 27: Setting Up the Environment ..................................................... 210 A High-Level Overview of Topic Models ....................................................... 211 Business Applications ..................................................................................... 215 Exercise 28: Data Loading .............................................................................. 217 Cleaning Text Data ..................................................................................... 220 Data Cleaning Techniques ............................................................................. 220 Exercise 29: Cleaning Data Step by Step ...................................................... 221 Exercise 30: Complete Data Cleaning ........................................................... 226 Activity 15: Loading and Cleaning Twitter Data .......................................... 228 Latent Dirichlet Allocation ........................................................................ 230 Variational Inference ...................................................................................... 232 Bag of Words ................................................................................................... 233 Exercise 31: Creating a Bag-of-Words Model Using the Count Vectorizer ....................................................................................... 234 Perplexity ......................................................................................................... 235 Exercise 32: Selecting the Number of Topics ............................................... 236 Exercise 33: Running Latent Dirichlet Allocation ........................................ 238 Exercise 34: Visualize LDA .............................................................................. 243 Exercise 35: Trying Four Topics ..................................................................... 247 Activity 16: Latent Dirichlet Allocation and Health Tweets ....................... 251 Bag-of-Words Follow-Up ................................................................................ 252 Exercise 36: Creating a Bag-of-Words Using TF-IDF .................................... 253 Non-Negative Matrix Factorization ......................................................... 254 Frobenius Norm .............................................................................................. 255 Multiplicative Update ..................................................................................... 256 Exercise 37: Non-negative Matrix Factorization ......................................... 257 Exercise 38: Visualizing NMF .......................................................................... 260 Activity 17: Non-Negative Matrix Factorization .......................................... 262 Summary ..................................................................................................... 263 Market Basket Analysis 265 Introduction ................................................................................................ 266 Market Basket Analysis ............................................................................. 266 Use Cases ......................................................................................................... 269 Important Probabilistic Metrics .................................................................... 270 Exercise 39: Creating Sample Transaction Data .......................................... 271 Support ............................................................................................................. 272 Confidence ....................................................................................................... 273 Lift and Leverage ............................................................................................ 273 Conviction ........................................................................................................ 274 Exercise 40: Computing Metrics .................................................................... 275 Characteristics of Transaction Data ........................................................ 277 Exercise 41: Loading Data .............................................................................. 278 Data Cleaning and Formatting ...................................................................... 281 Exercise 42: Data Cleaning and Formatting ................................................. 281 Data Encoding ................................................................................................. 286 Exercise 43: Data Encoding ............................................................................ 287 Activity 18: Loading and Preparing Full Online Retail Data ....................... 289 Apriori Algorithm ....................................................................................... 290 Computational Fixes ....................................................................................... 293 Exercise 44: Executing the Apriori algorithm .............................................. 294 Activity 19: Apriori on the Complete Online Retail Dataset ...................... 300 Association Rules ....................................................................................... 302 Exercise 45: Deriving Association Rules ....................................................... 303 Activity 20: Finding the Association Rules on the Complete Online Retail Dataset ...................................................................................... 309 Summary ..................................................................................................... 310 Hotspot Analysis 313 Introduction ................................................................................................ 314 Spatial Statistics .............................................................................................. 315 Probability Density Functions ....................................................................... 316 Using Hotspot Analysis in Business .............................................................. 317

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.