ebook img

Text Analysis with Python: A Research Oriented Guide PDF

268 Pages·2022·27.793 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Text Analysis with Python: A Research Oriented Guide

Text Analysis with Python: A Research Oriented Guide Authored by Mamta Mittal Delhi Skill & Entrepreneurship University, New Delhi, India Gopi Battineni University of Camerino, Camerino, Italy Bhimavarapu Usharani Department of CSE, Koneru Lakshmaiah Education Foundation at Vaddeswaram, Andhra Pradesh, India & Lalit Mohan Goyal Department of Computer Engineering, J.C. Bose University of Science & Technology, YMCA Faridabad (Hr.), India Text Analysis with Python: A Research-Oriented Guide Authors: Mamta Mittal, Gopi Battineni, Bhimavarapu Usharani and Lalit Mohan Goyal ISBN (Online): 978-981-5049-60-2 ISBN (Print): 978-981-5049-61-9 ISBN (Paperback): 978-981-5049-62-6 © 2022, Bentham Books imprint. Published by Bentham Science Publishers Pte. Ltd. Singapore. All Rights Reserved. First published in 2022. BSP-EB-PRO-9789815049602-TP-268-TC-08-PD-20220812 BENTHAM SCIENCE PUBLISHERS LTD. End User License Agreement (for non-institutional, personal use) This is an agreement between you and Bentham Science Publishers Ltd. Please read this License Agreement carefully before using the ebook/echapter/ejournal (“Work”). Your use of the Work constitutes your agreement to the terms and conditions set forth in this License Agreement. If you do not agree to these terms and conditions then you should not use the Work. Bentham Science Publishers agrees to grant you a non-exclusive, non-transferable limited license to use the Work subject to and in accordance with the following terms and conditions. This License Agreement is for non-library, personal use only. For a library / institutional / multi user license in respect of the Work, please contact: [email protected]. Usage Rules: 1. All rights reserved: The Work is the subject of copyright and Bentham Science Publishers either owns the Work (and the copyright in it) or is licensed to distribute the Work. You shall not copy, reproduce, modify, remove, delete, augment, add to, publish, transmit, sell, resell, create derivative works from, or in any way exploit the Work or make the Work available for others to do any of the same, in any form or by any means, in whole or in part, in each case without the prior written permission of Bentham Science Publishers, unless stated otherwise in this License Agreement. 2. You may download a copy of the Work on one occasion to one personal computer (including tablet, laptop, desktop, or other such devices). You may make one back-up copy of the Work to avoid losing it. 3. The unauthorised use or distribution of copyrighted or other proprietary content is illegal and could subject you to liability for substantial money damages. You will be liable for any damage resulting from your misuse of the Work or any violation of this License Agreement, including any infringement by you of copyrights or proprietary rights. Disclaimer: Bentham Science Publishers does not guarantee that the information in the Work is error-free, or warrant that it will meet your requirements or that access to the Work will be uninterrupted or error-free. The Work is provided "as is" without warranty of any kind, either express or implied or statutory, including, without limitation, implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the results and performance of the Work is assumed by you. No responsibility is assumed by Bentham Science Publishers, its staff, editors and/or authors for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products instruction, advertisements or ideas contained in the Work. Limitation of Liability: In no event will Bentham Science Publishers, its staff, editors and/or authors, be liable for any damages, including, without limitation, special, incidental and/or consequential damages and/or damages for lost data and/or profits arising out of (whether directly or indirectly) the use or inability to use the Work. The entire liability of Bentham Science Publishers shall be limited to the amount actually paid by you for the Work. General: 1. Any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims) will be governed by and construed in accordance with the laws of Singapore. Each party agrees that the courts of the state of Singapore shall have exclusive jurisdiction to settle any dispute or claim arising out of or in connection with this License Agreement or the Work (including non-contractual disputes or claims). 2. Your rights under this License Agreement will automatically terminate without notice and without the need for a court order if at any point you breach any terms of this License Agreement. In no event will any delay or failure by Bentham Science Publishers in enforcing your compliance with this License Agreement constitute a waiver of any of its rights. 3. You acknowledge that you have read this License Agreement, and agree to be bound by its terms and conditions. To the extent that any other terms and conditions presented on any website of Bentham Science Publishers conflict with, or are inconsistent with, the terms and conditions set out in this License Agreement, you acknowledge that the terms and conditions set out in this License Agreement shall prevail. Bentham Science Publishers Pte. Ltd. 80 Robinson Road #02-00 Singapore 068898 Singapore Email: [email protected] BSP-EB-PRO-9789815049602-TP-268-TC-08-PD-20220812 CONTENTS PREFACE ................................................................................................................................................ i CONSENT FOR PUBLICATION ................................................................................................ i CONFLICT OF INTEREST ......................................................................................................... ii ACKNOWLEDGEMENT ............................................................................................................. ii CHAPTER 1 INTRODUCTION .......................................................................................................... 1 1.1. INTRODUCTION ................................................................................................................... 1 1.2. NATURAL LANGUAGE ....................................................................................................... 2 1.2.1. From Linguistics to Natural Language Processing (NLP) ............................................ 2 1.2.2. Natural Language Processing (NLP) ............................................................................ 3 1.3. TEXT ANALYSIS ................................................................................................................... 5 1.3.1. Advantages .................................................................................................................... 5 1.3.2. Methods & Techniques ................................................................................................. 6 1.3.3. Sentiment Analysis (SA) ............................................................................................... 7 1.3.4. Topic Modelling ............................................................................................................ 7 1.3.5. Intent Identification ....................................................................................................... 7 1.3.6. Keyword Extraction ...................................................................................................... 8 1.3.7. Entity Recognition ........................................................................................................ 8 1.3.8. Text Analysis Functionality .......................................................................................... 9 1.4. TEXT SUMMARIZATION .................................................................................................... 10 1.4.1. Extraction ...................................................................................................................... 11 1.4.2. Abstractive Summarization ........................................................................................... 12 1.5. TEXT MINING AND WORKFLOW .................................................................................... 13 1.5.1. Data Recovery ............................................................................................................... 14 1.5.2. Data Extraction ............................................................................................................. 14 1.5.3. Data Mining .................................................................................................................. 15 CONCLUSION ............................................................................................................................... 15 REFERENCES ............................................................................................................................... 15 CHAPTER 2 INTRODUCTION TO PYTHON ................................................................................. 18 2.1. INTRODUCTION ................................................................................................................... 18 2.2. WORKING ENVIRONMENTS OF PYTHON .................................................................... 19 Google Colab .......................................................................................................................... 19 Features of Google Collaboratory (COLAB) ......................................................................... 19 2.3.WORKING WITH ANACONDA ........................................................................................... 20 Steps to Anaconda Installation ................................................................................................ 20 2.4. CREATING THE FIRST PROJECT IN GOOGLE COLAB CREATING THE FIRST PROJECT IN GOOGLE COLAB CREATING THE FIRST PROJECT IN GOOGLE COLAB CREATING THE FIRST PROJECT IN GOOGLE COLAB .................................... 23 2.5. MATHEMATICAL OPERATIONS ..................................................................................... 26 2.6. PYTHON LIBRARIES AND CONCEPTS ........................................................................... 28 Libraries .................................................................................................................................. 28 a). Math and CMath Libraries ...................................................................................... 28 b). SciPy Library ........................................................................................................... 29 c). ScikitLearn Library .................................................................................................. 30 d). NumPy Library ........................................................................................................ 30 2.7.BASIC CONCEPTS IN PYTHON .......................................................................................... 31 a). Arrays ................................................................................................................................. 31 b). Data Frames ....................................................................................................................... 31 c). Loops ................................................................................................................................. 32 For loop ................................................................................................................................... 32 While Loop and the Else Branch ............................................................................................ 33 Program: .................................................................................................................................. 34 CONCLUSION ............................................................................................................................... 34 REFERENCES ............................................................................................................................... 34 CHAPTER 3 DATA LOADING AND PRE-PROCESSING ............................................................ 35 3.1. INTRODUCTION ................................................................................................................... 35 3.1. IMPORTING DATASETS ..................................................................................................... 36 3.2. DATA RESHAPING ............................................................................................................... 36 3.3. PIVOT AND MELT FUNCTIONS ........................................................................................ 37 3.4. STACKING AND UNSTACKING ........................................................................................ 37 3.5. DATA PRE-PROCESSING .................................................................................................... 39 Outliers .................................................................................................................................... 39 Missing Value Imputation ....................................................................................................... 39 Handling of Missing Data ....................................................................................................... 41 Mean Calculation .......................................................................................................... 41 Deleting of Specific Row ........................................................................................................ 42 Dummy Variables ................................................................................................................... 42 One Hot Encoding ................................................................................................................... 43 3.6. DATA VISUALIZATION ....................................................................................................... 44 • Matplotlib ............................................................................................................................ 44 • ggplot Visualization ............................................................................................................. 45 • Geoplot Visualization .......................................................................................................... 45 • Regression Plots .................................................................................................................. 46 CONCLUSION ............................................................................................................................... 47 REFERENCES ............................................................................................................................... 47 CHAPTER 4 TEXT MINING .............................................................................................................. 48 INTRODUCTION .......................................................................................................................... 48 The Steps Followed for Text Mining are: ............................................................................... 50 Why Should we use Text Mining? .......................................................................................... 50 Benefits of Text Mining .......................................................................................................... 50 Text Analysis in Real-Time .................................................................................................... 51 Text Mining Applications ....................................................................................................... 51 Issues in Text Mining .............................................................................................................. 51 4.1. TEXT MINING WITH PYTHON ......................................................................................... 51 Program: .................................................................................................................................. 52 Program: .................................................................................................................................. 52 Program: .................................................................................................................................. 52 Gensim Library ....................................................................................................................... 53 Program: .................................................................................................................................. 53 Output: .......................................................................................................................... 53 Program ................................................................................................................................... 53 Output ............................................................................................................................ 54 4.2. DATA GATHERING .............................................................................................................. 54 Reading a Text File ................................................................................................................. 54 Steps for Reading a Text File in Python ........................................................................ 54 Open() Function ...................................................................................................................... 54 Syntax ............................................................................................................................ 54 Reading Text File .................................................................................................................... 55 Close () .................................................................................................................................... 55 Syntax:close() ......................................................................................................................... 55 Reading a CSV File ................................................................................................................ 56 Steps ........................................................................................................................................ 56 Reading Text from a PDF File ................................................................................................ 57 import PyPDF2 ....................................................................................................................... 57 Program ........................................................................................................................ 57 4.3. TEXT MINING PRE-PROCESSING TECHNIQUES ........................................................ 57 Program: .................................................................................................................................. 57 Output: .......................................................................................................................... 58 Program: .................................................................................................................................. 58 Output ............................................................................................................................ 58 Program: .................................................................................................................................. 58 Program: .................................................................................................................................. 59 Program: .................................................................................................................................. 59 Output ..................................................................................................................................... 59 Program: .................................................................................................................................. 60 Output: .......................................................................................................................... 60 Program: .................................................................................................................................. 60 Program: .................................................................................................................................. 61 4.4. FEATURE SELECTION IN TEXT MINING ...................................................................... 61 Program ................................................................................................................................... 61 Output: .......................................................................................................................... 62 4.5. TEXT SUMMARIZATION .................................................................................................... 62 Program ................................................................................................................................... 62 Program: .................................................................................................................................. 63 4.6. TEXT EXTRACTION ............................................................................................................ 64 4.6.1. Bag of Words ................................................................................................................ 64 Program: .................................................................................................................................. 65 Limitations of Bag of Words ......................................................................................... 65 4.6.2. TF-IDF .......................................................................................................................... 65 Program ................................................................................................................................... 66 Output ............................................................................................................................ 66 Program: .................................................................................................................................. 67 Output: .......................................................................................................................... 67 Word2vec ................................................................................................................................ 67 Program: .................................................................................................................................. 67 Output ............................................................................................................................ 68 Document Term Matrix .......................................................................................................... 68 Program: .................................................................................................................................. 69 Output ............................................................................................................................ 69 4.7. TEXT VISUALIZATION ....................................................................................................... 70 Program ................................................................................................................................... 70 Output ............................................................................................................................ 70 Program ................................................................................................................................... 71 Output: .......................................................................................................................... 71 Program: .................................................................................................................................. 72 Output ............................................................................................................................ 73 Program ................................................................................................................................... 73 Output ............................................................................................................................ 74 Program ................................................................................................................................... 74 Output ............................................................................................................................ 75 Program ................................................................................................................................... 75 Output ............................................................................................................................ 76 CONCLUSION ............................................................................................................................... 76 REFERENCES ............................................................................................................................... 77 CHAPTER 5 TEXT CLASSIFICATION IN PYTHON .................................................................... 78 5.1. INTRODUCTION ................................................................................................................... 78 5.2. TEXT CLASSIFICATION ..................................................................................................... 80 5.3. MACHINE LEARNING-BASED TEXT CLASSIFICATION ........................................... 81 Step by Step Explanation ........................................................................................................ 81 5.4. APPLICATIONS OF TEXT MINING .................................................................................. 83 5.4.1. Email Spam Detection .................................................................................................. 83 5.4.2. Social Media Reviews ................................................................................................... 84 5.4.3. Google Translator ......................................................................................................... 84 5.4.4. Text labelling Based on Content ................................................................................... 85 5.5. CLASSIFICATION ALGORITHMS .................................................................................... 85 5.5.1.. Naïve Bayes (NB) Classifiers ...................................................................................... 86 Case Study: Text Classification With Naïve Bayes ................................................................ 88 Movie Review Classification Dataset ............................................................................ 88 5.5.2. DECISION TREE CLASSIFIERS ..................................................................................... 97 Case Study Text Classification with Decision Tree Algorithms ............................................ 98 5.5.3. Nearest Neighbour Classifier ........................................................................................ 104 How KNN will Work in Text Classifications ................................................................. 105 Useful Information with KNN ....................................................................................... 105 Case Study Text Classification with KNN ..................................................................... 106 5.5.4. Support Vector Machines ............................................................................................. 112 From Texts to Vectors ................................................................................................... 112 Advantages .................................................................................................................... 113 Case Study Text Classification with KNN ............................................................................. 113 CONCLUSIONS ............................................................................................................................. 118 CHAPTER HIGHLIGHTS ............................................................................................................ 118 REFERENCES ............................................................................................................................... 119 CHAPTER 6 TEXT CLUSTERING IN PYTHON ............................................................................ 121 6.1. INTRODUCTION ................................................................................................................... 121 6.2. CLUSTERING PROCESS ..................................................................................................... 122 6.2.1. Word Clustering ............................................................................................................ 123 6.2.2. Document Clustering .................................................................................................... 123 6.2.3. Term Frequency-Inverse Document Frequency (tf-idf) ............................................... 124 6.3. APPLICATIONS OF TEXT CLUSTERING IN REAL-TIME .......................................... 126 Identifying Fake News ............................................................................................................ 126 Spam Filter .............................................................................................................................. 126 Marketing and Sales ................................................................................................................ 127 Classifying Website Traffic .................................................................................................... 127 Identifying Fraudulent or Criminal Activity ........................................................................... 127 Document Analysis ................................................................................................................. 127 6.4. CLUSTERING ALGORITHMS WITH CODE IMPLEMENTATION ............................ 128 6.4.1. K-means Clustering ...................................................................................................... 128 Advantages .................................................................................................................... 128 Disadvantages of k-means Clustering ........................................................................... 128 K means Clustering in Scikit-learn ............................................................................... 129 6.4.2. Hierarchical Clustering ................................................................................................. 132 How it Works ................................................................................................................. 132 Hierarchical Clustering Applications ........................................................................... 134 Hierarchical Clustering with Scikit-learn ..................................................................... 134 6.4.3. Fuzzy C-means Clustering ............................................................................................ 138 Stepwise Approach To Performing fuzzy C-means Clustering ..................................... 138 Fuzzy C means Clustering via Scikit-learn ................................................................... 139 CONCLUSIONS ............................................................................................................................. 157 REFERENCES ............................................................................................................................... 157 CHAPTER 7 FUZZY LOGIC IN TEXT MINING USING PYTHON ............................................ 159 7.1. INTRODUCTION TO FUZZY LOGIC ................................................................................ 159 Steps to be Followed in the Fuzzy System ............................................................................. 161 Fuzzy Membership Functions ................................................................................................. 161 Trapezoidal Membership Function ......................................................................................... 162 Gaussian Membership Function ............................................................................................. 163 Generalised Bell Membership Function ................................................................................. 163 Sigmoid Membership Function ............................................................................................... 164 Fuzzy Set Operations .............................................................................................................. 164 Why do we use Fuzzy Logic? ................................................................................................. 165 Uses of Fuzzy Logic in Text Mining ...................................................................................... 165 Applications of Fuzzy System ................................................................................................ 165 Issues in Fuzzy Logic .............................................................................................................. 165 7.2. FUZZY LOGIC WITH PYTHON ......................................................................................... 166 FuzzyWuzzy Library .............................................................................................................. 166 Program ................................................................................................................................... 166 Program ................................................................................................................................... 166 Program ................................................................................................................................... 167 Program ................................................................................................................................... 167 Program ................................................................................................................................... 167 Program ................................................................................................................................... 168 7.3. PREPROCESSING ................................................................................................................. 169 Program ................................................................................................................................... 169 Program ................................................................................................................................... 169 Program ................................................................................................................................... 171 7.4. FEATURE EXTRACTION .................................................................................................... 172 Program ................................................................................................................................... 172 Program ................................................................................................................................... 173 7.5. FUZZY CLUSTERING ................................................................................................... 174 Fuzzy C-Means Clustering ..................................................................................................... 175 Steps to Perform the fuzzy C-means Clustering Algorithm ................................................... 175 Program ................................................................................................................................... 175 Program ................................................................................................................................... 176 Fuzzy K-Means Clustering ..................................................................................................... 178 Program ................................................................................................................................... 178 7.6. CLASSIFICATION ................................................................................................................. 179 Program ................................................................................................................................... 179 Program ................................................................................................................................... 180 7.7. FUZZY ASSOCIATION RULES .......................................................................................... 181 Program ................................................................................................................................... 181 Program ................................................................................................................................... 182 7.8. FUZZY VISUALIZATION .................................................................................................... 183

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.