ebook img

Machine Learning with Go PDF

287 Pages·2017·4.525 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning with Go

Machine Learning With Go Implement Regression, Classification, Clustering, Time-series Models, Neural Networks and more using the Go Programming Language Daniel Whitenack BIRMINGHAM - MUMBAI Machine Learning With Go Copyright © 2017 Packt Publishing First published: September 2017 Production reference: 1210917 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78588-210-4 www.packtpub.com Contents Preface 1 Chapter 1: Gathering and Organizing Data 8 Handling data - Gopher style 9 Best practices for gathering and organizing data with Go 11 CSV files 12 Reading in CSV data from a file 13 Handling unexpected fields 14 Handling unexpected types 15 Manipulating CSV data with data frames 18 JSON 20 Parsing JSON 20 JSON output 23 SQL-like databases 24 Connecting to an SQL database 24 Querying the database 25 Modifying the database 26 Caching 27 Caching data in memory 28 Caching data locally on disk 28 Data versioning 30 Pachyderm jargon 31 Deploying/installing Pachyderm 31 Creating data repositories for data versioning 32 Putting data into data repositories 33 Getting data out of versioned data repositories 33 References 33 Summary 34 Chapter 2: Matrices, Probability, and Statistics 35 Matrices and vectors 35 Vectors 36 Vector operations 36 Matrices 38 Matrix operations 40 Statistics 41 Distributions 42 Statistical measures 43 Measures of central tendency 43 Measures of spread or dispersion 45 Visualizing distributions 48 Histograms 48 Box plots 52 Probability 55 Random variables 55 Probability measures 56 Independent and conditional probability 56 Hypothesis testing 57 Test statistics 58 Calculating p-values 58 References 60 Summary 61 Chapter 3: Evaluation and Validation 62 Evaluation 62 Continuous metrics 63 Categorical metrics 67 Individual evaluation metrics for categorical variables 67 Confusion matrices, AUC, and ROC 72 Validation 75 Training and test sets 76 Holdout set 80 Cross validation 81 References 83 Summary 84 Chapter 4: Regression 85 Understanding regression model jargon 85 Linear regression 86 Overview of linear regression 87 Linear regression assumptions and pitfalls 89 Linear regression example 90 Profiling the data 90 Choosing our independent variable 94 Creating our training and test sets 97 Training our model 99 Evaluating the trained model 100 Multiple linear regression 104 Nonlinear and other types of regression 109 References 114 Summary 114 Chapter 5: Classification 115 Understanding classification model jargon 116 Logistic regression 116 Overview of logistic regression 117 Logistic regression assumptions and pitfalls 121 Logistic regression example 121 Cleaning and profiling the data 122 Creating our training and test sets 127 Training and testing the logistic regression model 129 k-nearest neighbors 135 Overview of kNN 135 kNN assumptions and pitfalls 137 kNN example 138 Decision trees and random forests 140 Overview of decision trees and random forests 140 Decision tree and random forest assumptions and pitfalls 141 Decision tree example 142 Random forest example 143 Naive bayes 144 Overview of naive bayes and its big assumption 144 Naive bayes example 144 References 146 Summary 147 Chapter 6: Clustering 148 Understanding clustering model jargon 149 Measuring Distance or Similarity 149 Evaluating clustering techniques 151 Internal clustering evaluation 151 External clustering evaluation 156 k-means clustering 157 Overview of k-means clustering 157 k-means assumptions and pitfalls 160 k-means clustering example 161 Profiling the data 161 Generating clusters with k-means 164 Evaluating the generated clusters 166 Other clustering techniques 169 References 170 Summary 170 Chapter 7: Time Series and Anomaly Detection 171 Representing time series data in Go 172 Understanding time series jargon 175 Statistics related to time series 176 Autocorrelation 176 Partial autocorrelation 181 Auto-regressive models for forecasting 184 Auto-regressive model overview 184 Auto-regressive model assumptions and pitfalls 185 Auto-regressive model example 186 Transforming to a stationary series 186 Analyzing the ACF and choosing an AR order 189 Fitting and evaluating an AR(2) model 190 Auto-regressive moving averages and other time series models 196 Anomaly detection 197 References 199 Summary 199 Chapter 8: Neural Networks and Deep Learning 201 Understanding neural net jargon 202 Building a simple neural network 203 Nodes in the network 204 Network architecture 206 Why do we expect this architecture to work? 207 Training our neural network 208 Utilizing the simple neural network 215 Training the neural network on real data 216 Evaluating the neural network 218 Introducing deep learning 220 What is a deep learning model? 221 Deep learning with Go 222 Setting up TensorFlow for use with Go 224 Retrieving and calling a pretrained TensorFlow model 224 Object detection using TensorFlow from Go 226 References 230 Summary 230 Chapter 9: Deploying and Distributing Analyses and Models 231 Running models reliably on remote machines 232 A brief introduction to Docker and Docker jargon 232 Docker-izing a machine learning application 234 Docker-izing the model training and export 234 Docker-izing model predictions 239 Testing the Docker images locally 244 Running the Docker images on remote machines 246 Building a scalable and reproducible machine learning pipeline 247 Setting up a Pachyderm and Kubernetes cluster 248 Building a Pachyderm machine learning pipeline 250 Creating and filling the input repositories 251 Creating and running the processing stages 255 Updating pipelines and examining provenance 260 Scaling pipeline stages 262 References 264 Summary 265 Chapter 10: Algorithms/Techniques Related to Machine Learning 266 Gradient descent 266 Entropy, information gain, and related methods 269 Backpropagation 271 Index 275 Preface It seems like machine learning and artificial intelligence is all the rage, both in hip tech companies and increasingly in larger enterprise companies. Data scientists are using machine learning to do everything from drive cars to draw cats. However, if you follow the data science community, you have very likely seen something like language wars unfold between Python and R users. These languages dominate the machine learning conversation and often seem to be the only choices to integrate machine learning in your organization. We will explore a third option in this book: Go, the open source programming language created at Google. The unique features of Go, along with the mindset of Go programmers, can help data scientists overcome some of the common struggles that they encounter. In particular, data scientists are (unfortunately) known to produce bad, inefficient, and unmaintainable code. This book will address this issue, and will clearly show you how to be productive in machine learning while also producing applications that maintain a high level of integrity. It will also allow you to overcome the common challenges of integrating analysis and machine learning code within an existing engineering organization. This book will develop readers into productive, innovative data analysts who leverage Go to build robust and valuable applications. To this end, the book will clearly introduce the technical, programming aspects of machine learning in Go, but it will also guide the reader to understand sound workflows and philosophies for real-world analysis. What this book covers Preparing and analyzing data in machine learning workflows: Chapter 1, Gathering and Organizing Data, covers the gathering, organization, and parsing of data to/from a local and remote sources. Once the reader is done with this chapter, they will understand how to interact with data stored in various places and in various formats, how to parse and clean that data, and how to output that cleaned and parsed data. Chapter 2, Matrices, Probability, and Statistics, covers the organization of data into matrices and matrix operations. Once the reader is done with this material, they will understand how to form matrices within Go programs and how to utilize these matrices to perform various types of matrix operations. This chapter also covers statistical measures and operations key to day-to-day data analysis work. Once the reader is done with this chapter, they will understand how to perform solid summary data analysis, describe and visualize distributions, quantify hypotheses, and transform datasets with, for example, dimensionality reductions. Chapter 3, Evaluation and Validation, covers evaluation and validation, which are key to measuring the performance of machine applications and ensuring that they generalize. Once the reader is done with this chapter, they will understand various metrics to gauge the performance of models (in other words, evaluate the model) as well as various techniques to validate the model more generally. Machine learning techniques: Chapter 4, Regression, explains regression, a widely used technique to model continuous variables and a basis for other models. Regression produces models that are immediately interpretable. Thus, it can provide an excellent starting point when introducing predictive capabilities in a organization. Chapter 5, Classification, covers classification, a machine learning technique distinct from regression in that the target variable is typically categorical or labeled. For example, a classification model may classify emails into spam and not spam categories or classify network traffic as fraudulent or not fraudulent. Chapter 6, Clustering, covers clustering, an unsupervised machine learning technique used to form groupings of samples. At the end of this chapter, readers will be able to automatically form groupings of data points to better understand their structure. Chapter 7, Time Series and Anomaly Detection, introduces techniques utilized to model time series data, such as stock prices, user events, and so on. After reading this chapter, the reader will understand how to evaluate various terms in a time series, build up a model of the time series, and detect anomalies in a time series. Taking machine learning to the next level: Chapter 8, Neural Networks and Deep Learning, introduces techniques utilized to perform regression, classification, and image processing with neural networks. After reading this chapter, the reader will understand how and when to apply these more complicated modeling techniques. [ 2 ] Chapter 9, Deploying and Distributing Analyses and Models, empowers readers to deploy the models that we have developed throughout the class to production environments and distribute processing over production scale data. This chapter illustrates how both of these things can be done easily, without significant modifications to the code utilized throughout the book. The Appendix, Algorithms/Techniques Related to Machine Learning, can be referenced throughout the text of the book and will provide information about algorithms, optimizations, and techniques that are relevant to machine learning workflows. What you need for this book To run the examples in this book and experiment with the techniques covered in the book, you will generally need the following: Access to a bash-like shell. A complete Go environment including Go, an editor, and related default or custom environment variables defined. You can, for example, follow this guide at https://www.goinggo.net/2016/05/installing-go-and-your-workspace.html. Various Go dependencies. These can be obtained as they are needed via go get .... Then, to run the examples related to some of the advanced topics, such as data pipelining and deep learning, you will need a few additional things: An installation or deployment of Pachyderm. You can follow these docs to get Pachyderm up and running locally or in the cloud, http://pachyderm.readthedocs.io/en/latest/. A working Docker installation (https://www.docker.com/community-edition#/download). An installation of TensorFlow. To install TensorFlow locally, you can follow this guide at https://www.tensorflow.org/install/. [ 3 ]

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.