Table Of Content

Applied Text Analysis with Python Enabling Language-Aware Data Products with Machine Learning Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda BBeeiijjiinngg BBoossttoonn FFaarrnnhhaamm SSeebbaassttooppooll TTookkyyoo Applied Text Analysis with Python by Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda Copyright © 2018 Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or corporate@oreilly.com. Editor: Nicole Tache Indexer: WordCo Indexing Services, Inc. Production Editor: Nicholas Adams Interior Designer: David Futato Copyeditor: Jasmine Kwityn Cover Designer: Karen Montgomery Proofreader: Christina Edwards Illustrator: Rebecca Demarest June 2018: First Edition Revision History for the First Edition 2018-06-08: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491963043 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Applied Text Analysis with Python, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-96304-3 [LSI] Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Language and Computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Data Science Paradigm 2 Language-Aware Data Products 4 The Data Product Pipeline 5 Language as Data 8 A Computational Model of Language 8 Language Features 10 Contextual Features 13 Structural Features 15 Conclusion 16 2. Building a Custom Corpus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 What Is a Corpus? 19 Domain-Specific Corpora 20 The Baleen Ingestion Engine 21 Corpus Data Management 22 Corpus Disk Structure 24 Corpus Readers 27 Streaming Data Access with NLTK 28 Reading an HTML Corpus 31 Reading a Corpus from a Database 34 Conclusion 36 3. Corpus Preprocessing and Wrangling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Breaking Down Documents 38 Identifying and Extracting Core Content 38 iii Deconstructing Documents into Paragraphs 39 Segmentation: Breaking Out Sentences 42 Tokenization: Identifying Individual Tokens 43 Part-of-Speech Tagging 44 Intermediate Corpus Analytics 45 Corpus Transformation 47 Intermediate Preprocessing and Storage 48 Reading the Processed Corpus 51 Conclusion 53 4. Text Vectorization and Transformation Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Words in Space 56 Frequency Vectors 57 One-Hot Encoding 59 Term Frequency–Inverse Document Frequency 62 Distributed Representation 65 The Scikit-Learn API 68 The BaseEstimator Interface 68 Extending TransformerMixin 70 Pipelines 74 Pipeline Basics 75 Grid Search for Hyperparameter Optimization 76 Enriching Feature Extraction with Feature Unions 77 Conclusion 79 5. Classification for Text Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Text Classification 82 Identifying Classification Problems 82 Classifier Models 84 Building a Text Classification Application 85 Cross-Validation 86 Model Construction 89 Model Evaluation 91 Model Operationalization 94 Conclusion 95 6. Clustering for Text Similarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Unsupervised Learning on Text 97 Clustering by Document Similarity 99 Distance Metrics 99 Partitive Clustering 102 Hierarchical Clustering 107 iv | Table of Contents Modeling Document Topics 111 Latent Dirichlet Allocation 111 Latent Semantic Analysis 119 Non-Negative Matrix Factorization 121 Conclusion 123 7. Context-Aware Text Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Grammar-Based Feature Extraction 126 Context-Free Grammars 126 Syntactic Parsers 127 Extracting Keyphrases 128 Extracting Entities 131 n-Gram Feature Extraction 132 An n-Gram-Aware CorpusReader 133 Choosing the Right n-Gram Window 135 Significant Collocations 136 n-Gram Language Models 139 Frequency and Conditional Frequency 140 Estimating Maximum Likelihood 143 Unknown Words: Back-off and Smoothing 145 Language Generation 147 Conclusion 149 8. Text Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Visualizing Feature Space 152 Visual Feature Analysis 152 Guided Feature Engineering 162 Model Diagnostics 170 Visualizing Clusters 170 Visualizing Classes 172 Diagnosing Classification Error 173 Visual Steering 177 Silhouette Scores and Elbow Curves 177 Conclusion 180 9. Graph Analysis of Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Graph Computation and Analysis 185 Creating a Graph-Based Thesaurus 185 Analyzing Graph Structure 186 Visual Analysis of Graphs 187 Extracting Graphs from Text 189 Creating a Social Graph 189 Table of Contents | v Insights from the Social Graph 192 Entity Resolution 200 Entity Resolution on a Graph 201 Blocking with Structure 202 Fuzzy Blocking 202 Conclusion 205 10. Chatbots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Fundamentals of Conversation 208 Dialog: A Brief Exchange 210 Maintaining a Conversation 213 Rules for Polite Conversation 215 Greetings and Salutations 216 Handling Miscommunication 220 Entertaining Questions 222 Dependency Parsing 223 Constituency Parsing 225 Question Detection 227 From Tablespoons to Grams 229 Learning to Help 233 Being Neighborly 235 Offering Recommendations 238 Conclusion 240 11. Scaling Text Analytics with Multiprocessing and Spark. . . . . . . . . . . . . . . . . . . . . . . . . . 241 Python Multiprocessing 242 Running Tasks in Parallel 244 Process Pools and Queues 249 Parallel Corpus Preprocessing 251 Cluster Computing with Spark 253 Anatomy of a Spark Job 254 Distributing the Corpus 255 RDD Operations 257 NLP with Spark 259 Conclusion 270 12. Deep Learning and Beyond. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Applied Neural Networks 274 Neural Language Models 274 Artificial Neural Networks 275 Deep Learning Architectures 280 Sentiment Analysis 284 vi | Table of Contents Deep Structure Analysis 286 The Future Is (Almost) Here 291 Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Table of Contents | vii

Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning PDF

332 Pages·2018·13.97 MB·English

by Benjamin Bengfort

#python for data analysis #chatbot #vector analysis

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning PDF Free - Full Version

by Benjamin Bengfort| 2018| 332 pages| 13.97| English

Download Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning by Benjamin Bengfort in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning

No description available for this book.

Detailed Information

Author:	Benjamin Bengfort
Publication Year:	2018
Pages:	332
Language:	English
File Size:	13.97
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning PDF?

Yes, on https://PDFdrive.to you can download Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning by Benjamin Bengfort completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning on my mobile device?

After downloading Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning?

Yes, this is the complete PDF version of Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning by Benjamin Bengfort. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.