Early Praise for Programming Machine Learning As a developer with more than 20 years of experience but with no background in machine learning, I found this book to be pure gold. It explains the math behind machine learning in a very intuitive way that is easy to understand. ➤ Giancarlo Valente Agile Coach and auLAB Co-Founder Let me say that I think this is a brilliant book. It takes the reader step by step through the thinking behind machine learning. Combine that with Paolo’s fun approach and this is the book I’d suggest every machine learning neophyte start with. ➤ Russ Olsen Author, Getting Clojure and Eloquent Ruby This book is totally engaging. I love the humor, and the way Paolo talks as a buddy who understands your fears and guides you through as someone who has gone through the same learning process. ➤ Alberto Lumbreras Research Scientist, Criteo AI Lab Programming Machine Learning is a well-organized and accessible introduction to machine learning for programmers. The book eschews traditional mathematically centric explanations for programming centric ones, and as a result, it makes foundational concepts readily accessible. ➤ Dan Sheikh Lead Engineer, BCG Digital Ventures Programming Machine Learning From Coding to Deep Learning Paolo Perrotta The Pragmatic Bookshelf Raleigh, North Carolina Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trade- marks of The Pragmatic Programmers, LLC. Every precaution was taken in the preparation of this book. However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein. Our Pragmatic books, screencasts, and audio books can help you and your team create better software and have more fun. Visit us at https://pragprog.com. The team that produced this book includes: Publisher: Andy Hunt VP of Operations: Janet Furlow Executive Editor: Dave Rankin Development Editor: Katharine Dvorak Copy Editor: Jasmine Kwityn Indexing: Potomac Indexing, LLC Layout: Gilson Graphics For sales, volume licensing, and support, please contact [email protected]. For international rights, please contact [email protected]. Copyright © 2020 The Pragmatic Programmers, LLC. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. ISBN-13: 978-1-68050-660-0 Encoded using the finest acid-free high-entropy binary digits. Book version: P1.0—March 2020 To my wife Irene, making my every day. Contents Acknowledgments . . . . . . . . . . . xi How the Heck Is That Possible? . . . . . . . . xiii Part I — From Zero to Image Recognition 1. How Machine Learning Works. . . . . . . . . 3 Programming vs. Machine Learning 4 Supervised Learning 6 The Math Behind the Magic 8 Setting Up Your System 11 2. Your First Learning Program . . . . . . . . . 15 Getting to Know the Problem 15 Coding Linear Regression 17 Adding a Bias 26 What You Just Learned 29 Hands On: Tweaking the Learning Rate 30 3. Walking the Gradient . . . . . . . . . . 31 Our Algorithm Doesn’t Cut It 31 Gradient Descent 33 What You Just Learned 43 Hands On: Basecamp Overshooting 44 4. Hyperspace! . . . . . . . . . . . . . 45 Adding More Dimensions 46 Matrix Math 48 Upgrading the Learner 52 Bye Bye, Bias 59 A Final Test Drive 60 Contents • viii What You Just Learned 61 Hands On: Field Statistician 61 5. A Discerning Machine . . . . . . . . . . 63 Where Linear Regression Fails 64 Invasion of the Sigmoids 66 Classification in Action 73 What You Just Learned 75 Hands On: Weighty Decisions 75 6. Getting Real . . . . . . . . . . . . . 77 Data Come First 78 Our Own MNIST Library 80 The Real Thing 85 What You Just Learned 86 Hands On: Tricky Digits 86 7. The Final Challenge . . . . . . . . . . . 87 Going Multiclass 87 Moment of Truth 93 What You Just Learned 95 Hands On: Minesweeper 96 8. The Perceptron . . . . . . . . . . . . 99 Enter the Perceptron 99 Assembling Perceptrons 100 Where Perceptrons Fail 102 A Tale of Perceptrons 105 Part II — Neural Networks 9. Designing the Network . . . . . . . . . . 111 Assembling a Neural Network from Perceptrons 112 Enter the Softmax 116 Here’s the Plan 118 What You Just Learned 118 Hands On: Network Adventures 119 10. Building the Network . . . . . . . . . . 121 Coding Forward Propagation 121 Cross Entropy 125 Contents • ix What You Just Learned 127 Hands On: Time Travel Testing 128 11. Training the Network . . . . . . . . . . 129 The Case for Backpropagation 129 From the Chain Rule to Backpropagation 131 Applying Backpropagation 135 Initializing the Weights 140 The Finished Network 144 What You Just Learned 147 Hands On: Starting Off Wrong 147 12. How Classifiers Work . . . . . . . . . . 149 Tracing a Boundary 149 Bending the Boundary 155 What You Just Learned 157 Hands On: Data from Hell 157 13. Batchin’ Up . . . . . . . . . . . . 159 Learning, Visualized 160 Batch by Batch 162 Understanding Batches 165 What You Just Learned 169 Hands On: The Smallest Batch 170 14. The Zen of Testing . . . . . . . . . . . 171 The Threat of Overfitting 171 A Testing Conundrum 173 What You Just Learned 175 Hands On: Thinking About Testing 176 15. Let’s Do Development . . . . . . . . . . 177 Preparing Data 178 Tuning Hyperparameters 182 The Final Test 189 Hands On: Achieving 99% 191 What You Just Learned… and the Road Ahead 192 Part III — Deep Learning 16. A Deeper Kind of Network . . . . . . . . . 197 The Echidna Dataset 198 Contents • x Building a Neural Network with Keras 199 Making It Deep 206 What You Just Learned 207 Hands On: Keras Playground 208 17. Defeating Overfitting . . . . . . . . . . 211 Overfitting Explained 212 Regularizing the Model 217 A Regularization Toolbox 224 What You Just Learned 226 Hands On: Keeping It Simple 226 18. Taming Deep Networks . . . . . . . . . . 229 Understanding Activation Functions 229 Beyond the Sigmoid 235 Adding More Tricks to Your Bag 239 What You Just Learned 245 Hands On: The 10 Epochs Challenge 245 19. Beyond Vanilla Networks . . . . . . . . . 247 The CIFAR-10 Dataset 248 The Building Blocks of CNNs 251 Running on Convolutions 256 What You Just Learned 259 Hands On: Hyperparameters Galore 259 20. Into the Deep . . . . . . . . . . . . 261 The Rise of Deep Learning 261 Unreasonable Effectiveness 264 Where Now? 267 Your Journey Begins 274 A1. Just Enough Python . . . . . . . . . . 275 What Python Looks Like 277 Python’s Building Blocks 279 Defining and Calling Functions 283 Working with Modules and Packages 286 Creating and Using Objects 290 That’s It, Folks! 292 A2. The Words of Machine Learning . . . . . . . 293 Index . . . . . . . . . . . . . . 311 Acknowledgments A shout out to my tech reviewers: Alessandro Bahgat, Arno Bastenhof, Roberto Bettazzoni, Guido “Zen” Bolognesi, Juan de Bravo, Simone Busoli, Pieter Bute- neers, Andrea Cisternino, Sebastian Hennebrüder, Alberto Lumbreras, Russ Olsen, Luca Ongaro, Pierpaolo Pantone, Karol Przystalski, Dan Sheikh, Leonie Sieger, Gal Tsubery, l’ùmarèin pugnàtta di Casalecchio, and Giancarlo Valente. All of them should be thanked for making this book better with their insightful comments, with the exception of Roberto Bettazzoni. Like a good friend, he should instead take the blame for any mistake in these pages. Thank you to the generous readers who sent comments and errata during the beta phase: Marco Arena, Glen Aultman-Bettridge, Juanjo Bazan, Zbynek Bazanowski, Jamis Buck, Charles de Bueger, Leonardo Carotti, Amir Ebrahimi, Helge Eichhorn, George Ellis, Bruno Girin, Elton Goci, Dave Halli- day, Darren Hunt, Peter Lin, Karen Mauney, Bradley Mirly, Vasileios Ntarla- giannis, Volkmar Petschnig, David Pinto, Conlan Rios, Roman Romanchuk, Ionut Simion, Drew Thomas, and Jeroen Wenting. If you ever cross me at a conference, or anywhere else, tap my shoulder. Beer (or your favorite beverage) will follow. Thanks to my friend Annamaria Di Sebastiano, who shared with me the story that opens the first chapter. Long time no see! Thank you to Marc Schnierle for his excellent LaTeX4technics1 web app that I used to generate formulae; to Kimberly Geswein, who designed the Indie Flower font that I used in the diagrams; to the team that designed DejaVu Sans and DejaVu Sans Mono, which I used extensively throughout the book; and to the Brisbane City Council, which published the beautiful echidna picture2 that appears, in a slightly modified version, in one of the last chapters. 1. https://www.latex4technics.com 2. https://www.flickr.com/photos/brisbanecitycouncil/6971519658 report erratum • discuss