Deep Learning from the Basics Python and Deep Learning: Theory and Implementation Koki Saitoh Deep Learning from the Basics Copyright © 2021 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Author: Koki Saitoh Managing Editor: Ashish James Acquisitions Editor: Bridget Neale Production Editor: Salma Patel Editorial Board: Megan Carlisle, Mahesh Dhyani, Heather Gopsill, Manasa Kumar, Alex Mazonowicz, Monesh Mirpuri, Bridget Neale, Abhishek Rane, Brendan Rodrigues, Ankita Thakur, Nitesh Thakur, and Jonathan Wray First Published: March 2021 Production Reference: 1040321 ISBN: 978-1-80020-613-7 Published by Packt Publishing Ltd. Livery Place, 35 Livery Street Birmingham B3 2PB, UK Table of Contents Preface i Introduction 1 Chapter 1: Introduction to Python 9 What is Python? .............................................................................................. 9 Installing Python .......................................................................................... 10 Python Versions ................................................................................................. 10 External Libraries That We Use ........................................................................ 10 Anaconda Distribution ...................................................................................... 11 Python Interpreter ....................................................................................... 11 Mathematical Operations ................................................................................. 12 Data Types .......................................................................................................... 12 Variables ............................................................................................................. 13 Lists ...................................................................................................................... 13 Dictionaries ......................................................................................................... 14 Boolean ............................................................................................................... 15 if Statements ...................................................................................................... 15 for Statements ................................................................................................... 16 Functions ............................................................................................................. 16 Python Script Files ....................................................................................... 16 Saving in a File .................................................................................................... 17 Classes ................................................................................................................. 17 NumPy ........................................................................................................... 18 Importing NumPy .............................................................................................. 18 Creating a NumPy Array ................................................................................... 19 Mathematical Operations in NumPy ............................................................... 19 N-Dimensional NumPy Arrays .......................................................................... 20 Broadcasting ....................................................................................................... 21 Accessing Elements ........................................................................................... 21 Matplotlib ..................................................................................................... 23 Drawing a Simple Graph ................................................................................... 23 Features of pyplot .............................................................................................. 24 Displaying Images .............................................................................................. 25 Summary ....................................................................................................... 26 Chapter 2: Perceptrons 29 What Is a Perceptron? ................................................................................. 29 Simple Logic Circuits .................................................................................... 31 AND Gate ............................................................................................................ 31 NAND and OR gates ........................................................................................... 32 Implementing Perceptrons ......................................................................... 33 Easy Implementation ........................................................................................ 33 Introducing Weights and Bias .......................................................................... 34 Implementation with Weights and Bias .......................................................... 35 Limitations of Perceptrons ......................................................................... 36 XOR Gate ............................................................................................................. 37 Linear and Nonlinear ........................................................................................ 39 Multilayer Perceptrons ............................................................................... 40 Combining the Existing Gates .......................................................................... 41 Implementing an XOR Gate .............................................................................. 43 From NAND to a Computer ........................................................................ 44 Summary ....................................................................................................... 46 Chapter 3: Neural Networks 49 From Perceptrons to Neural Networks ..................................................... 50 Neural Network Example .................................................................................. 50 Reviewing the Perceptron ................................................................................. 51 Introducing an Activation Function ................................................................. 53 Activation Function ...................................................................................... 54 Sigmoid Function ............................................................................................... 55 Implementing a Step Function ......................................................................... 55 Step Function Graph .......................................................................................... 57 Implementing a Sigmoid Function ................................................................... 58 Comparing the Sigmoid Function and the Step Function ............................. 59 Nonlinear Function ............................................................................................ 61 ReLU Function .................................................................................................... 62 Calculating Multidimensional Arrays ........................................................ 63 Multidimensional Arrays ................................................................................... 63 Matrix Multiplication ......................................................................................... 65 Matrix Multiplication in a Neural Network ..................................................... 68 Implementing a Three-Layer Neural Network ......................................... 69 Examining the Symbols ..................................................................................... 70 Implementing Signal Transmission in Each Layer ......................................... 71 Implementation Summary ............................................................................... 76 Designing the Output Layer ....................................................................... 77 Identity Function and Softmax Function .................................................. 78 Issues when Implementing the Softmax Function ........................................ 80 Characteristics of the Softmax Function ........................................................ 81 Number of Neurons in the Output Layer ....................................................... 83 Handwritten Digit Recognition ................................................................... 83 MNIST Dataset .................................................................................................... 84 Inference for Neural Network .......................................................................... 87 Batch Processing ................................................................................................ 89 Summary ....................................................................................................... 93 Chapter 4: Neural Network Training 95 Learning from Data ..................................................................................... 96 Data-Driven ......................................................................................................... 96 Training Data and Test Data ............................................................................. 99 Loss Function ................................................................................................ 99 Sum of Squared Errors ................................................................................... 100 Cross-Entropy Error ........................................................................................ 101 Mini-Batch Learning ....................................................................................... 103 Implementing Cross-Entropy Error (Using Batches) ................................... 105 Why Do We Configure a Loss Function? ....................................................... 106 Numerical Differentiation ........................................................................ 108 Derivative ......................................................................................................... 108 Examples of Numerical Differentiation ....................................................... 111 Partial Derivative ............................................................................................ 113 Gradient ...................................................................................................... 115 Gradient Method ............................................................................................. 117 Gradients for a Neural Network ................................................................... 121 Implementing a Training Algorithm ........................................................ 124 A Two-Layer Neural Network as a Class ...................................................... 125 Implementing Mini-Batch Training ............................................................... 129 Using Test Data for Evaluation ...................................................................... 131 Summary ..................................................................................................... 134 Chapter 5: Backpropagation 137 Computational Graphs .............................................................................. 138 Using Computational Graphs to Solve Problems ........................................ 138 Local Calculation ............................................................................................. 140 Why Do We Use Computational Graphs? ..................................................... 141 Chain Rule ................................................................................................... 142 Backward Propagation in a Computational Graph ..................................... 143 What Is the Chain Rule? ................................................................................. 143 The Chain Rule and Computational Graphs ................................................ 144 Backward Propagation .............................................................................. 145 Backward Propagation in an Addition Node ............................................... 146 Backward Propagation in a Multiplication Node ........................................ 148 Apples Example ............................................................................................... 149 Implementing a Simple Layer .................................................................. 150 Implementing a Multiplication Layer ........................................................... 151 Implementing an Addition Layer .................................................................. 153 Implementing the Activation Function Layer ......................................... 155 ReLU Layer ....................................................................................................... 155 Sigmoid Layer .................................................................................................. 156 Implementing the Affine and Softmax Layers ....................................... 161 Affine Layer ...................................................................................................... 161 Batch-Based Affine Layer ............................................................................... 164 Softmax-with-Loss Layer ................................................................................ 166 Implementing Backpropagation .............................................................. 170 Overall View of Neural Network Training .................................................... 170 Presupposition ................................................................................................ 170 Implementing a Neural Network That Supports Backpropagation ......... 171 Gradient Check ................................................................................................ 175 Training Using Backpropagation ................................................................... 176 Summary ..................................................................................................... 178 Chapter 6: Training Techniques 181 Updating Parameters ................................................................................ 182 Story of an Adventurer ................................................................................... 182 SGD ................................................................................................................... 183 Disadvantage of SGD ...................................................................................... 184 Momentum ...................................................................................................... 186 AdaGrad ........................................................................................................... 188 Adam ................................................................................................................ 191 Which Update Technique Should We Use? .................................................. 192 Using the MNIST Dataset to Compare the Update Techniques ................ 193 Initial Weight Values .................................................................................. 194 How About Setting the Initial Weight Values to 0? ..................................... 194 Distribution of Activations in the Hidden Layers ........................................ 195 Initial Weight Values for ReLU ....................................................................... 199 Using the MNIST Dataset to Compare the Weight Initializers .................. 200 Batch Normalization ....................................................................................... 202 Batch Normalization Algorithm .................................................................... 203 Evaluating Batch Normalization ................................................................... 205 Regularization ............................................................................................ 206 Overfitting ........................................................................................................ 206 Weight Decay ................................................................................................... 210 Dropout ............................................................................................................ 211 Validating Hyperparameters .................................................................... 215 Validation Data ................................................................................................ 215 Optimizing Hyperparameters ....................................................................... 216 Implementing Hyperparameter Optimization ............................................ 217 Summary ..................................................................................................... 219 Chapter 7: Convolutional Neural Networks 221 Overall Architecture .................................................................................. 221 The Convolution Layer .............................................................................. 223 Issues with the Fully Connected Layer ......................................................... 223 Convolution Operations ................................................................................. 224 Padding ............................................................................................................ 226 Stride ................................................................................................................ 227 Performing a Convolution Operation on Three-Dimensional Data .......... 229 Thinking in Blocks ........................................................................................... 231 Batch Processing ............................................................................................. 233 The Pooling Layer ...................................................................................... 234 Characteristics of a Pooling Layer ................................................................ 234 Implementing the Convolution and Pooling Layers .............................. 235 Four-Dimensional Arrays ............................................................................... 236 Expansion by im2col ....................................................................................... 236 Implementing a Convolution Layer .............................................................. 238 Implementing a Pooling Layer ...................................................................... 241 Implementing a CNN ................................................................................. 244 Visualizing a CNN ....................................................................................... 248 Visualizing the Weight of the First Layer ..................................................... 248 Using a Hierarchical Structure to Extract Information .............................. 249 Typical CNNs ............................................................................................... 250 LeNet ................................................................................................................ 250 AlexNet ............................................................................................................. 250 Summary ..................................................................................................... 251