ebook img

Transformers for natural language processing: second edition PDF

565 Pages·12.356 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Transformers for natural language processing: second edition

Transformers for Natural Language Processing Second Edition Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3 Denis Rothman BIRMINGHAM—MUMBAI Transformers for Natural Language Processing Second Edition Copyright © 2022 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Producer: Tushar Gupta Acquisition Editor – Peer Reviews: Saby Dsilva Project Editor: Janice Gonsalves Content Development Editor: Bhavesh Amin Copy Editor: Safis Editing Technical Editor: Karan Sonawane Proofreader: Safis Editing Indexer: Pratik Shirodkar Presentation Designer: Pranit Padwal First published: January 2021 Second edition: March 2022 Production reference: 3170322 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-80324-733-5 www.packt.com Foreword In less than four years, Transformers took the NLP community by storm, breaking any record achieved in the previous 30 years. Models such as BERT, T5, and GPT, now constitute the fun- damental building bricks for new applications in everything from computer vision to speech recognition to translation to protein sequencing to writing code. For this reason, Stanford has recently introduced the term foundation models to define a set of large language models based on giant pre-trained transformers. All of this progress is thanks to a few simple ideas. This book is a reference for everyone interested in understanding how transformers work both from a theoretical and from a practical perspective. The author does a tremendous job of explaining how to use transformers step-by-step with a hands-on approach. After reading this book, you will be ready to use this state-of-the-art set of techniques for empowering your deep learning applications. In Particular, this book gives a solid background on the architecture of transformers before covering, in detail, popular models, such as BERT, RoBERTa, T5, and GPT-3. It also explains many use cases (text summarization, image labeling, question-answering, sentiment analysis, and fake news analysis) that transformers can cover. If these topics interest you, then this is definitely a worthwhile book. The first edition always has a place on my desk, and the same is going to happen with the second edition. Antonio Gulli Engineering Director for the Office of the CTO, Google Contributors About the author Denis Rothman graduated from Sorbonne University and Paris Diderot University, designing one of the first patented encoding and embedding systems. He authored one of the first patented AI cognitive robots and bots. He began his career delivering Natural Language Processing (NLP) chatbots for Moët et Chandon and an AI tactical defense optimizer for Airbus (formerly Aero- spatiale). Denis then authored an AI resource optimizer for IBM and luxury brands, leading to an Advanced Planning and Scheduling (APS) solution used worldwide I want to thank the corporations that trusted me from the start to deliver artificial intelligence solutions and shared the risks of continuous innovation. I also want to thank my family, who always believed I would make it. About the reviewer George Mihaila is a Ph.D. candidate at the University of North Texas in the Department of Computer Science, where he also got his master’s degree in computer science. He received his bachelor’s degree in electrical engineering in his home country, Romania. He worked for 10 months at TCF Bank, where he helped put together the machine learning op- eration framework for automatic model deployment and monitoring. He did three internships for State Farm as a data scientist and machine learning engineer. He worked as a data scientist and machine learning engineer for the University of North Texas’ High-Performance Computing Center for 2 years. He has been working in the research field of natural language processing for 5 years, with the last 3 years spent working with transformer models. His research interests are in dialogue generation with persona. He was a technical reviewer for the first edition of Transformers for Natural Language Process- ing by Denis Rothman. He is currently working toward his doctoral thesis in casual dialog generation with persona. In his free time, George likes to share his knowledge of state-of-the-art language models with tutorials and articles and help other researchers in the field of NLP. Join our book’s Discord space Join the book’s Discord workspace for a monthly Ask me Anything session with the authors: https://www.packt.link/Transformers Table of Contents Preface xxiii Chapter 1: What are Transformers? 1 The ecosystem of transformers ������������������������������������������������������������������������������������������� 3 Industry 4.0 • 3 Foundation models • 4 Is programming becoming a sub-domain of NLP? • 6 The future of artificial intelligence specialists • 8 Optimizing NLP models with transformers ������������������������������������������������������������������������ 9 The background of transformers • 10 What resources should we use? ������������������������������������������������������������������������������������������ 11 The rise of Transformer 4.0 seamless APIs • 12 Choosing ready-to-use API-driven libraries • 14 Choosing a Transformer Model • 16 The role of Industry 4.0 artificial intelligence specialists • 16 Summary �������������������������������������������������������������������������������������������������������������������������� 18 Questions �������������������������������������������������������������������������������������������������������������������������� 19 References ������������������������������������������������������������������������������������������������������������������������� 19 Chapter 2: Getting Started with the Architecture of the Transformer Model 21 The rise of the Transformer: Attention is All You Need ������������������������������������������������������ 22 The encoder stack • 24 viii Table of Contents Input embedding • 26 Positional encoding • 29 Sublayer 1: Multi-head attention • 36 Sublayer 2: Feedforward network • 53 The decoder stack • 54 Output embedding and position encoding • 55 The attention layers • 55 The FFN sublayer, the post-LN, and the linear layer • 56 Training and performance ������������������������������������������������������������������������������������������������ 57 Tranformer models in Hugging Face ��������������������������������������������������������������������������������� 57 Summary �������������������������������������������������������������������������������������������������������������������������� 58 Questions �������������������������������������������������������������������������������������������������������������������������� 59 References ������������������������������������������������������������������������������������������������������������������������� 60 Chapter 3: Fine-Tuning BERT Models 61 The architecture of BERT �������������������������������������������������������������������������������������������������� 62 The encoder stack • 62 Preparing the pretraining input environment • 65 Pretraining and fine-tuning a BERT model • 68 Fine-tuning BERT �������������������������������������������������������������������������������������������������������������� 71 Hardware constraints • 71 Installing the Hugging Face PyTorch interface for BERT • 71 Importing the modules • 72 Specifying CUDA as the device for torch • 72 Loading the dataset • 73 Creating sentences, label lists, and adding BERT tokens • 75 Activating the BERT tokenizer • 75 Processing the data • 76 Creating attention masks • 76 Splitting the data into training and validation sets • 76 Converting all the data into torch tensors • 77 Table of Contents ix Selecting a batch size and creating an iterator • 77 BERT model configuration • 78 Loading the Hugging Face BERT uncased base model • 80 Optimizer grouped parameters • 82 The hyperparameters for the training loop • 83 The training loop • 83 Training evaluation • 85 Predicting and evaluating using the holdout dataset • 86 Evaluating using the Matthews Correlation Coefficient • 87 The scores of individual batches • 88 Matthews evaluation for the whole dataset • 89 Summary �������������������������������������������������������������������������������������������������������������������������� 89 Questions �������������������������������������������������������������������������������������������������������������������������� 90 References ������������������������������������������������������������������������������������������������������������������������� 90 Chapter 4: Pretraining a RoBERTa Model from Scratch 93 Training a tokenizer and pretraining a transformer ���������������������������������������������������������� 94 Building KantaiBERT from scratch ������������������������������������������������������������������������������������ 96 Step 1: Loading the dataset • 96 Step 2: Installing Hugging Face transformers • 97 Step 3: Training a tokenizer • 98 Step 4: Saving the files to disk • 100 Step 5: Loading the trained tokenizer files • 102 Step 6: Checking resource constraints: GPU and CUDA • 103 Step 7: Defining the configuration of the model • 104 Step 8: Reloading the tokenizer in transformers • 104 Step 9: Initializing a model from scratch • 105 Exploring the parameters • 106 Step 10: Building the dataset • 110 Step 11: Defining a data collator • 111 Step 12: Initializing the trainer • 111

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.