Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision provides an overview of general deep learning methodology and its applications of natural language processing (NLP), speech, and computer vision tasks. It simplifies and presents the concepts of deep learning in a comprehensive manner, with suitable, full-fledged examples of deep learning models, with an aim to bridge the gap between the theoretical and the applications using case studies with code, experiments, and supporting analysis. Features: • Covers latest developments in deep learning techniques as applied to audio analysis, computer vision, and natural language processing. • Introduces contemporary applications of deep learning techniques as applied to audio, textual, and visual processing. • Discovers deep learning frameworks and libraries for NLP, speech, and computer vision in Python. • Gives insights into using the tools and libraries in Python for real-world applications. • Provides easily accessible tutorials and real-world case studies with code to provide hands-on experience. This book is aimed at researchers and graduate students in computer engineering, image, speech, and text processing. Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision Techniques and Use Cases L. Ashok Kumar and D. Karthika Renuka First edition published 2023 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487–2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2023 L. Ashok Kumar and D. Karthika Renuka Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978–750–8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-39165-6 (hbk) ISBN: 978-1-032-39166-3 (pbk) ISBN: 978-1-003-34868-9 (ebk) DOI: 10.1201/9781003348689 Typeset in Times by Apex CoVantage, LLC Dedication To my wife Ms. Y. Uma Maheswari and daughter A. K. Sangamithra, for their constant support and love. —Dr. L. Ashok Kumar To my parents Mr. N. Dhanaraj and Ms. D. Anuradha, who laid the foundation for all my success, to my husband Mr. R. Sathish Kumar and my daughter P.S. Preethi for their unconditional love and support for competition of this book, and to my friend, brother and co-author of this book Dr. L. Ashok Kumar, who has been highly credible and my greatest source of motivation and as a pillar of inspiration. —Dr. D. Karthika Renuka Contents About the Authors ....................................................................................................xv Preface ....................................................................................................................xvii Acknowledgments ...................................................................................................xix Chapter 1 Introduction ..........................................................................................1 Learning Outcomes ..............................................................................1 1.1 Introduction ...............................................................................1 1.1.1 Subsets of Artificial Intelligence .................................3 1.1.2 Three Horizons of Deep Learning Applications .........4 1.1.3 Natural Language Processing ......................................5 1.1.4 Speech Recognition .....................................................7 1.1.5 Computer Vision..........................................................7 1.2 Machine Learning Methods for NLP, Computer Vision (CV), and Speech .....................................................................10 1.2.1 Support Vector Machine (SVM)................................10 1.2.2 Bagging .....................................................................12 1.2.3 Gradient-boosted Decision Trees (GBDTs) ..............13 1.2.4 Naïve Bayes ...............................................................13 1.2.5 Logistic Regression ...................................................15 1.2.6 Dimensionality Reduction Techniques ......................17 1.3 Tools, Libraries, Datasets, and Resources for the Practitioners .............................................................................17 1.3.1 TensorFlow ................................................................20 1.3.2 Keras..........................................................................21 1.3.3 Deeplearning4j ..........................................................21 1.3.4 Caffe ..........................................................................21 1.3.5 ONNX .......................................................................21 1.3.6 PyTorch .....................................................................21 1.3.7 scikit-learn .................................................................22 1.3.8 NumPy.......................................................................22 1.3.9 Pandas .......................................................................22 1.3.10 NLTK ........................................................................23 1.3.11 Gensim ......................................................................23 1.3.12 Datasets .....................................................................24 1.4 Summary .....................................................................................26 Bibliography ................................................................................................26 Chapter 2 Natural Language Processing .............................................................27 Learning Outcomes ............................................................................27 2.1 Natural Language Processing ..................................................27 vii viii Contents 2.2 Generic NLP Pipeline ..............................................................28 2.2.1 Data Acquisition ........................................................29 2.2.2 Text Cleaning ............................................................30 2.3 Text Pre-processing .................................................................31 2.3.1 Noise Removal ..........................................................31 2.3.2 Stemming ..................................................................31 2.3.3 Tokenization ..............................................................32 2.3.4 Lemmatization...........................................................33 2.3.5 Stop Word Removal ..................................................33 2.3.6 Parts of Speech Tagging ............................................34 2.4 Feature Engineering .................................................................34 2.5 Modeling ..................................................................................35 2.5.1 Start with Simple Heuristics.....................................35 2.5.2 Building Your Model .................................................35 2.5.3 Metrics to Build Model .............................................36 2.6 Evaluation ................................................................................36 2.7 Deployment .............................................................................38 2.8 Monitoring and Model Updating .............................................39 2.9 Vector Representation for NLP ...............................................39 2.9.1 One Hot Vector Encoding..........................................39 2.9.2 Word Embeddings .....................................................39 2.9.3 Bag of Words .............................................................41 2.9.4 TF-IDF ......................................................................41 2.9.5 N-gram ......................................................................42 2.9.6 Word2Vec ..................................................................42 2.9.7 Glove .........................................................................43 2.9.8 ElMo ..........................................................................43 2.10 Language Modeling with n-grams ..........................................44 2.10.1 Evaluating Language Models ....................................45 2.10.2 Smoothing .................................................................45 2.10.3 Kneser-Ney Smoothing .............................................46 2.11 Vector Semantics and Embeddings ...........................................46 2.11.1 Lexical Semantics .....................................................46 2.11.2 Vector Semantics .......................................................46 2.11.3 Cosine for Measuring Similarity ...............................47 2.11.4 Bias and Embeddings ................................................47 2.12 Summary ...................................................................................48 Bibliography ................................................................................................48 Chapter 3 State-of-the-Art Natural Language Processing ..................................49 Learning Outcomes ............................................................................49 3.1 Introduction .............................................................................49 3.2 Sequence-to-Sequence Models ................................................50 3.2.1 Sequence ...................................................................50 3.2.2 Sequence Labeling ....................................................50 3.2.3 Sequence Modeling ...................................................51 Contents ix 3.3 Recurrent Neural Networks .....................................................53 3.3.1 Unrolling RNN ..........................................................54 3.3.2 RNN-based POS Tagging Use Case ..........................56 3.3.3 Challenges in RNN ...................................................56 3.4 Attention Mechanisms .............................................................56 3.4.1 Self-attention Mechanism .........................................58 3.4.2 Multi-head Attention Mechanism .............................59 3.4.3 Bahdanau Attention ...................................................60 3.4.4 Luong Attention ........................................................61 3.4.5 Global Attention versus Local Attention ...................62 3.4.6 Hierarchical Attention ...............................................64 3.5 Transformer Model ..................................................................64 3.5.1 Bidirectional Encoder, Representations, and Transformers (BERT) .........................................65 3.5.2 GPT3 .........................................................................72 3.6 Summary ..................................................................................74 Bibliography ................................................................................................74 Chapter 4 Applications of Natural Language Processing ...................................77 Learning Outcomes ............................................................................77 4.1 Introduction .............................................................................77 4.2 Word Sense Disambiguation ...................................................78 4.2.1 Word Senses ..............................................................78 4.2.2 WordNet: A Database of Lexical Relations ..............79 4.2.3 Approaches to Word Sense Disambiguation .............79 4.2.4 Applications of Word Sense Disambiguation ...........81 4.3 Text Classification ...................................................................82 4.3.1 Building the Text Classification Model .....................82 4.3.2 Applications of Text Classification ...........................83 4.3.3 Other Applications ....................................................84 4.4 Sentiment Analysis ..................................................................84 4.4.1 Types of Sentiment Analysis .....................................84 4.5 Spam Email Classification .......................................................86 4.5.1 History of Spam ........................................................87 4.5.2 Spamming Techniques ..............................................87 4.5.3 Types of Spams .........................................................87 4.6 Question Answering ................................................................88 4.6.1 Components of Question Answering System ............88 4.6.2 Information Retrieval-based Factoid Question and Answering ...........................................................90 4.6.3 Entity Linking ...........................................................91 4.6.4 Knowledge-based Question Answering ....................92 4.7 Chatbots and Dialog Systems ..................................................93 4.7.1 Properties of Human Conversation ...........................93 4.7.2 Chatbots.....................................................................94 4.7.3 The Dialog-state Architecture ...................................96