ebook img

Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning PDF

256 Pages·2019·15.327 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning

Natural Language Processing with PyTorch Build Intelligent Language Applications Using Deep Learning Delip Rao and Brian McMahan BBeeiijjiinngg BBoossttoonn FFaarrnnhhaamm SSeebbaassttooppooll TTookkyyoo Natural Language Processing with PyTorch by Delip Rao and Brian McMahan Copyright © 2019 Delip Rao and Brian McMahan. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or [email protected]. Acquisition Editor: Rachel Roumeliotis Development Editor: Jeff Bleiel Production Editor: Nan Barber Copyeditor: Octal Publishing, LLC Proofreader: Rachel Head Indexer: Judy McConville Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest February 2019: First Edition Revision History for the First Edition 2019-01-16: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491978238 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Natural Language Processing with PyTorch, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-97823-8 [LSI] Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Supervised Learning Paradigm 2 Observation and Target Encoding 5 One-Hot Representation 6 TF Representation 7 TF-IDF Representation 8 Target Encoding 9 Computational Graphs 10 PyTorch Basics 11 Installing PyTorch 13 Creating Tensors 13 Tensor Types and Size 16 Tensor Operations 17 Indexing, Slicing, and Joining 19 Tensors and Computational Graphs 22 CUDA Tensors 24 Exercises 26 Solutions 26 Summary 27 References 27 2. A Quick Tour of Traditional NLP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Corpora, Tokens, and Types 29 Unigrams, Bigrams, Trigrams, …, N-grams 32 Lemmas and Stems 33 Categorizing Sentences and Documents 34 iii Categorizing Words: POS Tagging 34 Categorizing Spans: Chunking and Named Entity Recognition 35 Structure of Sentences 36 Word Senses and Semantics 37 Summary 38 References 38 3. Foundational Components of Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 The Perceptron: The Simplest Neural Network 39 Activation Functions 41 Sigmoid 41 Tanh 42 ReLU 43 Softmax 44 Loss Functions 45 Mean Squared Error Loss 45 Categorical Cross-Entropy Loss 46 Binary Cross-Entropy Loss 48 Diving Deep into Supervised Training 49 Constructing Toy Data 49 Putting It Together: Gradient-Based Supervised Learning 51 Auxiliary Training Concepts 53 Correctly Measuring Model Performance: Evaluation Metrics 53 Correctly Measuring Model Performance: Splitting the Dataset 53 Knowing When to Stop Training 54 Finding the Right Hyperparameters 55 Regularization 55 Example: Classifying Sentiment of Restaurant Reviews 56 The Yelp Review Dataset 57 Understanding PyTorch’s Dataset Representation 59 The Vocabulary, the Vectorizer, and the DataLoader 62 A Perceptron Classifier 67 The Training Routine 68 Evaluation, Inference, and Inspection 74 Summary 78 References 79 4. Feed-Forward Networks for Natural Language Processing. . . . . . . . . . . . . . . . . . . . . . . . 81 The Multilayer Perceptron 82 A Simple Example: XOR 84 Implementing MLPs in PyTorch 85 Example: Surname Classification with an MLP 89 iv | Table of Contents The Surnames Dataset 90 Vocabulary, Vectorizer, and DataLoader 92 The SurnameClassifier Model 94 The Training Routine 95 Model Evaluation and Prediction 97 Regularizing MLPs: Weight Regularization and Structural Regularization (or Dropout) 99 Convolutional Neural Networks 100 CNN Hyperparameters 101 Implementing CNNs in PyTorch 107 Example: Classifying Surnames by Using a CNN 110 The SurnameDataset Class 111 Vocabulary, Vectorizer, and DataLoader 111 Reimplementing the SurnameClassifier with Convolutional Networks 113 The Training Routine 114 Model Evaluation and Prediction 115 Miscellaneous Topics in CNNs 116 Pooling 116 Batch Normalization (BatchNorm) 117 Network-in-Network Connections (1x1 Convolutions) 118 Residual Connections/Residual Block 118 Summary 119 References 120 5. Embedding Words and Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Why Learn Embeddings? 122 Efficiency of Embeddings 123 Approaches to Learning Word Embeddings 124 The Practical Use of Pretrained Word Embeddings 124 Example: Learning the Continuous Bag of Words Embeddings 130 The Frankenstein Dataset 131 Vocabulary, Vectorizer, and DataLoader 133 The CBOWClassifier Model 134 The Training Routine 135 Model Evaluation and Prediction 136 Example: Transfer Learning Using Pretrained Embeddings for Document Classification 137 The AG News Dataset 137 Vocabulary, Vectorizer, and DataLoader 138 The NewsClassifier Model 141 The Training Routine 144 Model Evaluation and Prediction 145 Table of Contents | v Evaluating on the test dataset 145 Summary 146 References 147 6. Sequence Modeling for Natural Language Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Introduction to Recurrent Neural Networks 150 Implementing an Elman RNN 153 Example: Classifying Surname Nationality Using a Character RNN 155 The SurnameDataset Class 155 The Vectorization Data Structures 156 The SurnameClassifier Model 158 The Training Routine and Results 160 Summary 161 References 161 7. Intermediate Sequence Modeling for Natural Language Processing. . . . . . . . . . . . . . . 163 The Problem with Vanilla RNNs (or Elman RNNs) 164 Gating as a Solution to a Vanilla RNN’s Challenges 165 Example: A Character RNN for Generating Surnames 166 The SurnameDataset Class 167 The Vectorization Data Structures 168 From the ElmanRNN to the GRU 170 Model 1: The Unconditioned SurnameGenerationModel 170 Model 2: The Conditioned SurnameGenerationModel 172 The Training Routine and Results 173 Tips and Tricks for Training Sequence Models 179 References 180 8. Advanced Sequence Modeling for Natural Language Processing. . . . . . . . . . . . . . . . . . 183 Sequence-to-Sequence Models, Encoder–Decoder Models, and Conditioned Generation 183 Capturing More from a Sequence: Bidirectional Recurrent Models 187 Capturing More from a Sequence: Attention 189 Attention in Deep Neural Networks 190 Evaluating Sequence Generation Models 193 Example: Neural Machine Translation 195 The Machine Translation Dataset 196 A Vectorization Pipeline for NMT 197 Encoding and Decoding in the NMT Model 201 The Training Routine and Results 212 Summary 214 References 215 vi | Table of Contents 9. Classics, Frontiers, and Next Steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 What Have We Learned so Far? 217 Timeless Topics in NLP 218 Dialogue and Interactive Systems 218 Discourse 219 Information Extraction and Text Mining 220 Document Analysis and Retrieval 220 Frontiers in NLP 221 Design Patterns for Production NLP Systems 222 Where Next? 227 References 228 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Table of Contents | vii

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.