Table Of ContentComputational Techniques
for Text Summarization
based on Cognitive
Intelligence
The book is concerned with contemporary methodologies used for automatic text
summarization. It proposes interesting approaches to solve well-known problems
on text summarization using computational intelligence (CI) techniques including
cognitive approaches. A better understanding of the cognitive basis of the summa-
rization task is still an open research issue; an extent of its use in text summariza-
tion is highlighted for further exploration. With the ever-growing text, people in
research have little time to spare for extensive reading, where summarized infor-
mation helps for a better understanding of the context in a shorter time.
This book helps students and researchers to automatically summarize text doc-
uments in an efficient and effective way. The computational approaches and the
research techniques presented guides to achieve text summarization at ease. The
summarized text generated supports readers to learn the context or the domain at a
quicker pace. The book is presented with a reasonable amount of illustrations and
examples convenient for the readers to understand and implement for their use. It
is not to make readers understand what text summarization is, but for people to
perform text summarization using various approaches. This also describes mea-
sures that can help to evaluate, determine, and explore the best possibilities for
text summarization to analyze and use for any specific purpose. The illustration
is based on social media and healthcare domain, which shows the possibilities to
work with any domain for summarization. The new approach for text summariza-
tion based on cognitive intelligence is presented for further exploration in the field.
Computational Techniques
for Text Summarization
based on Cognitive
Intelligence
V. Priya and K. Umamaheswari
First edition published 2023
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
CRC Press is an imprint of Taylor & Francis Group, LLC
© 2023 V. Priya and K. Umamaheswari
Reasonable efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The authors
and publishers have attempted to trace the copyright holders of all material reproduced in this publica-
tion and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any future
reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www .copyright .com or
contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-
8400. For works that are not available on CCC please contact mpkbookspermissions @tandf .co .uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used
only for identification and explanation without intent to infringe.
ISBN: 9781032392820 (hbk)
ISBN: 9781032442471 (pbk)
ISBN: 9781003371199 (ebk)
DOI: 10.1201/9781003371199
Typeset in Times
by Deanta Global Publishing Services, Chennai, India
Contents
Preface..................................................................................................................ix
About This Book ..................................................................................................xi
Chapter 1 Concepts of Text Summarization ....................................................1
1.1 Introduction ..........................................................................1
1.2 Need for Text Summarization ..............................................1
1.3 Approaches to Text Summarization .....................................2
1.3.1 Extractive Summarization .......................................2
1.3.2 Abstractive Summarization .....................................2
1.4 Text Modeling for Extractive Summarization ......................3
1.4.1 Bag-of-Words Model ...............................................3
1.4.2 Vector Space Model ................................................5
1.4.3 Topic Representation Schemes ................................9
1.4.4 Real-Valued Model ................................................11
1.5 Preprocessing for Extractive Summarization .....................11
1.6 Emerging Techniques for Summarization ..........................15
1.7 Scope of the Book ..............................................................16
References .....................................................................................18
Sample Code .................................................................................19
Sample Screenshots ............................................................22
Chapter 2 Large-Scale Summarization Using Machine Learning
Approach ...............................................................................23
2.1 Scaling to Summarize Large Text ......................................23
2.2 Machine Learning Approaches ..........................................23
2.2.1 Different Approaches for Modeling Text
Summarization Problem ........................................24
2.2.2 Classification as Text Summarization ...................24
2.2.2.1 Data Representation ...............................24
2.2.2.2 Text Feature Extraction..........................27
2.2.2.3 Classification Techniques ......................29
2.2.3 Clustering as Text Summarization ........................32
2.2.4 Deep Learning Approach for Text
Summarization ......................................................36
References .....................................................................................44
Sample Code .................................................................................45
Chapter 3 Sentiment Analysis Approach to Text Summarization .................47
3.1 Introduction ........................................................................47
v
vi Contents
3.2 Sentiment Analysis: Overview ...........................................47
3.2.1 Sentiment Extraction and Summarization ............47
3.2.1.1 Sentiment Extraction from Text .............48
3.2.1.2 Classification ..........................................48
3.2.1.3 Score Computation ................................48
3.2.1.4 Summary Generation .............................49
3.2.2 Sentiment Summarization: An Illustration ...........49
Summarized Output ...............................................50
3.2.3 Methodologies for Sentiment Summarization ......51
3.3 Implications of Sentiments in Text Summarization ...........54
Cognition-Based Sentiment Analysis and
Summarization .................................................55
3.4 Summary ............................................................................56
Practical Examples ........................................................................56
Example 1 ...........................................................................56
Example 2 ...........................................................................57
Sample Code (Run Using GraphLab) .................................58
Example 3 ...........................................................................58
References .....................................................................................59
Sample Code .................................................................................60
Chapter 4 Text Summarization Using Parallel Processing Approach ...........63
4.1 Introduction ........................................................................63
Parallelizing Computational Tasks .....................................63
Parallelizing for Distributed Data .......................................63
4.2 Parallel Processing Approaches .........................................63
4.2.1 Parallel Algorithms for Text Summarization ........64
4.2.2 Parallel Bisection k-Means Method ......................64
4.3 Parallel Data Processing Algorithms for Large-Scale
Summarization ...................................................................67
4.3.1 Designing MapReduce Algorithm for Text
Summarization ......................................................67
4.3.2 Key Concepts in Mapper .......................................68
4.3.3 Key Concepts in Reducer ......................................69
4.3.4 Summary Generation ............................................71
An Illustrative Example for MapReduce ............................71
Good Time: Movie Review ....................................71
4.4 Other MR-Based Methods ..................................................75
4.5 Summary ............................................................................81
4.6 Examples ............................................................................81
K-Means Clustering Using MapReduce .............................81
Parallel LDA Example (Using Gensim Package) ...............81
Sample Code: (Using Gensim Package) .............................83
Example: Creating an Inverted Index .................................83
Contents vii
Example: Relational Algebra (Table JOIN) ........................85
References .....................................................................................87
Sample Code .................................................................................88
Chapter 5 Optimization Approaches for Text Summarization ......................97
5.1 Introduction ........................................................................97
5.2 Optimization for Summarization .......................................97
5.2.1 Modeling Text Summarization as
Optimization Problem ...........................................98
5.2.2 Various Approaches for Optimization ..................98
5.3 Formulation of Various Approaches ..................................98
5.3.1 Sentence Ranking Approach .................................98
5.3.1.1 Stages and Illustration .........................100
5.3.2 Evolutionary Approaches .....................................101
5.3.2.1 Stages ....................................................101
5.3.2.2 Demonstration......................................102
5.3.3 MapReduce-Based Approach ..............................104
5.3.3.1 In-Node Optimization Illustration .......105
5.3.4 Multi-objective-Based Approach ........................106
Summary ......................................................................................111
Exercises .......................................................................................112
References ....................................................................................116
Sample Code ................................................................................117
Chapter 6 Performance Evaluation of Large-Scale Summarization
Systems .........................................................................................119
6.1 Evaluation of Summaries ..................................................119
6.1.1 CNN Dataset .......................................................120
6.1.2 Daily Mail Dataset ..............................................120
6.1.3 Description ..........................................................121
6.2 Methodologies ..................................................................122
6.2.1 Intrinsic Methods ................................................122
6.2.2 Extrinsic Methods ...............................................122
6.3 Intrinsic Methods ..............................................................122
6.3.1 Text Quality Measures ........................................122
6.3.1.1 Grammaticality ....................................122
6.3.1.2 Non-redundancy ..................................122
6.3.1.3 Reverential Clarity ...............................123
6.3.1.4 Structure and Coherence .....................123
6.3.2 Co-selection-Based Methods ...............................123
6.3.2.1 Precision, Recall, and F-score .............123
6.3.2.2 Relative Utility .....................................124
6.3.3 Content-Based Methods ......................................124
viii Contents
6.3.3.1 Content-Based Measures .....................124
6.3.3.2 Cosine Similarity .................................125
6.3.3.3 Unit Overlap .........................................125
6.3.3.4 Longest Common Subsequence ...........125
6.3.3.5 N-Gram Co-occurrence Statistics:
ROUGE ................................................125
6.3.3.6 Pyramids ..............................................126
6.3.3.7 LSA-Based Measure ............................126
6.3.3.8 Main Topic Similarity..........................126
6.3.3.9 Term Significance Similarity ...............126
6.4 Extrinsic Methods ............................................................127
6.4.1 Document Categorization ....................................127
6.4.1.1 Information Retrieval ..........................127
6.4.1.2 Question Answering ............................128
6.4.2 Summary .............................................................128
6.4.3 Examples .............................................................128
Bibliography ................................................................................132
Chapter 7 Applications and Future Directions ............................................133
7.1 Possible Directions in Modeling Text Summarization .....133
7.2 Scope of Summarization Systems in Different
Applications ......................................................................133
7.3 Healthcare Domain ...........................................................134
Future Directions for Medical Document
Summarization ...............................................135
7.4 Social Media .....................................................................136
Challenges in Social Media Text Summarization .............138
Domain Knowledge and Transfer Learning ........138
Online Learning ................................................................138
Information Credibility .....................................................138
Applications of Deep Learning ........................................138
Implicit and Explicit Information for Actionable
Insights ..........................................................139
7.5 Research Directions for Text Summarization ..................139
7.6 Further Scope of Research on Large-Scale
Summarization ..................................................................141
Conclusion .........................................................................141
References ....................................................................................141
Appendix A: Python Projects and Useful Links on Text
Summarization ......................................................................................143
Appendix B: Solutions to Selected Exercises ................................................199
Index ..................................................................................................................211
Preface
People have traditionally utilized written papers to convey important facts,
viewpoints, and feelings. New technologies have caused an exponential rise in
document output generated because of growing technology. In social networks,
markets, production platforms, and websites, a tremendous volume of messages,
product reviews, news pieces, and scientific documents are created and published
every day. Although often verbose for the readers, this unstructured material can
be quite helpful. The most pertinent material has been succinctly presented, and
the reader is exposed to the key ideas thanks to the use of summaries. The new
field of automatic text summarization was made possible by advances in text min-
ing, machine learning, and natural language processing. These methods allow
for the automatic production of summaries that typically contain either the most
pertinent sentences or the most noticeable keywords from the document or col-
lection. For visitors to become familiar with the content of interest rapidly, it is
essential to extract a brief but informative description of a single document and/
or a collection.
For instance, a synthesized overview of the most important news aspects may
be provided by the summary of a group of news articles on the same subject. In
contrast, the summary of social network data can help with the discovery of per-
tinent details about a particular event and the deduction of user and community
interests and viewpoints. Several automatic summarizing techniques have been
put forth in recent years that are broadly categorized into extractive summariza-
tion and abstractive summarization techniques.
This book offers a thorough examination of the state-of-the-art methods to
describe text summarization. For both extractive summarizing tasks and abstrac-
tive summary tasks, the reader will discover in-depth treatment of several meth-
odologies utilizing machine learning, natural language processing, and data
mining techniques. Additionally, it is shown how summarizing methodologies
can be used in a variety of applications, including healthcare and social media
domain along with the possible research directions and future scope.
The book comprises seven chapters and is organized as follows. Chapter 1
‘Concepts of Text Summarization’ gives a basic but detailed text representation
based on ideas or principles of text summarization. A detailed discussion of the
ideas and practical examples are included for clear understanding. Some exercises
related to text representation models are given to practitioners in the domain.
Chapter 2 ‘Large-Scale Summarization Using Machine Learning Approach’
covers the representation of text summarization based on machine learning prob-
lems such as classification, clustering, deep learning, and others. It also examines
the complexities and challenges encountered while using machine learning in the
domain of text summarization.
Chapter 3 ‘Sentiment Analysis Approach to Text Summarization’ addresses
sentiment-based text summarization. Sentiment extraction and summarization
ix