ebook img

Experiment and Evaluation in Information Retrieval Models PDF

277 Pages·2017·2.841 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Experiment and Evaluation in Information Retrieval Models

Experiment and Evaluation in Information Retrieval Models K. Latha CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2018 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business International Standard Book Number-13: 978-1-138-03231-6 (Hardback) Library of Congress Cataloging–in–Publication Data Names: Latha, K., author. Title: Experiment and evaluation in information retrieval models / K. Latha. Description: Boca Raton : CRC Press, Taylor & Francis Group, [2016] | Includes bibliographical references and index. Identifiers: LCCN 2017004392| ISBN 9781138032316 (hardback : alk. paper) | ISBN 9781315392622 (ebook) | ISBN 9781315392615 (ebook) | ISBN 9781315392608 (ebook) | ISBN 9781315392592 (ebook) Subjects: LCSH: Data mining. | Querying (Computer science) | Big data. | Information retrieval--Experiments. | Information storage and retrieval systems--Evaluation. Classification: LCC QA76.9.D343 L384 2016 | DDC 006.3/12--dc23 LC record available at https://lccn.loc.gov/2017004392 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface ...........................................................................................................................................xiii Acknowledgments .....................................................................................................................xvii About the Author ........................................................................................................................xix Section I Foundations 1 Introduction .............................................................................................................................3 1.1 Motivation ......................................................................................................................3 1.1.1 Web Search ........................................................................................................4 1.2 Evolutionary Search and IR .........................................................................................4 1.3 Applications of IR .........................................................................................................5 1.3.1 Other Search Applications..............................................................................7 Section II Preliminaries 2 Preliminaries .........................................................................................................................11 2.1 Information Retrieval .................................................................................................11 2.2 Information Retrieval versus Data Retrieval ..........................................................12 2.3 Information Retrieval (IR) versus Information Extraction (IE) ............................12 2.4 Components of an Information Retrieval System ..................................................13 2.4.1 Document Processing ....................................................................................13 2.4.2 Query Processing ...........................................................................................15 2.4.3 Retrieval and Feedback Generation Component ......................................15 3 Contextual and Conceptual Information Retrieval ......................................................19 3.1 Context Search .............................................................................................................19 3.1.1 Need for Contextual Search .........................................................................19 3.1.2 Graphical Representation of Context-Based Search .................................19 3.1.3 Architecture of Context-Based Indexing ....................................................20 3.1.4 Approaches for Context Search ...................................................................22 3.1.4.1 Searching Based on Explicitly Specifying User Context ..........22 3.1.4.2 Searching Based on Automatically Derived Context ................22 3.1.5 Traditional Method for Context-Based Search: User Profile-Based Context Search ................................................................................................22 3.2 Conceptual Search ......................................................................................................23 3.2.1 The Semantic Web .........................................................................................23 3.2.2 Ontology ..........................................................................................................23 3.2.3 Approaches to Conceptual Search ..............................................................24 3.2.4 Types of Conceptual Structures ...................................................................24 3.2.5 Features of Conceptual Structures ..............................................................25 3.2.6 Framework for Concept-Based Search ........................................................25 3.2.7 Concept Chain Graphs ..................................................................................26 4 Information Retrieval Models ...........................................................................................27 4.1 Boolean Model .............................................................................................................27 4.2 Vector Model ................................................................................................................28 4.2.1 The Vector Space Model ................................................................................28 4.2.2 Similarity Measures ......................................................................................28 4.2.2.1 Cosine Similarity ............................................................................28 4.2.2.2 Jaccard Coefficient ..........................................................................29 4.2.2.3 Dice Coefficient ...............................................................................29 4.3 Fixing the Term Weights ............................................................................................29 4.3.1 Term Frequency .............................................................................................30 4.3.2 Inverse Document Frequency ......................................................................30 4.3.3 tf-idf .................................................................................................................30 4.4 Probabilistic Models ...................................................................................................31 4.4.1 Probabilistic Ranking Principle (PRP) ........................................................31 4.4.2 Binary Independence Retrieval (BIR) Model .............................................32 4.4.3 The Probabilistic Indexing Model ...............................................................33 4.5 Language Model..........................................................................................................33 4.5.1 Multinomial Distributions Model ...............................................................34 4.5.2 The Query Likelihood Model ......................................................................35 4.5.3 Extended Language Modeling Approaches ..............................................36 4.5.4 Translation Model ..........................................................................................36 4.5.5 Comparisons with Traditional Probabilistic IR Approaches ..................37 5 Evaluation of Information Retrieval Systems ................................................................39 5.1 Ranked and Unranked Results .................................................................................39 5.1.1 Relevance ........................................................................................................39 5.2 Unranked Retrieval System .......................................................................................39 5.2.1 Precision ..........................................................................................................39 5.2.2 Recall................................................................................................................40 5.2.3 Accuracy ..........................................................................................................40 5.2.4 F-Measure .......................................................................................................41 5.2.5 G-Measure.......................................................................................................41 5.2.6 Prevalence .......................................................................................................42 5.2.7 Error Rate ........................................................................................................42 5.2.8 Fallout ..............................................................................................................43 5.2.9 Miss Rate .........................................................................................................43 5.3 Ranked Retrieval System ...........................................................................................43 5.3.1 Precision and Recall Curves .........................................................................43 5.3.2 Average Precision ...........................................................................................44 5.3.3 Precision at k ...................................................................................................44 5.3.4 R-Precision ......................................................................................................44 5.3.5 Mean Average Precision (MAP) ...................................................................45 5.3.6 Breakeven Point ..............................................................................................45 5.3.7 ROC Curve ......................................................................................................46 5.3.7.1 Relationship between PR and ROC Curves ...............................46 6 Fundamentals of Evolutionary Algorithms ....................................................................47 6.1 Combinatorial Optimization Problems ...................................................................47 6.1.1 Heuristics ........................................................................................................47 6.1.2 Metaheuristics ................................................................................................48 6.1.3 Case-Based Reasoning (CBR) .......................................................................48 6.2 Evolutionary Programming ......................................................................................48 6.3 Evolutionary Computation ........................................................................................49 6.3.1 Single-Objective Optimization ....................................................................50 6.3.2 Multi-Objective Optimization .....................................................................50 6.4 Role of Evolutionary Algorithms in Information Retrieval ..................................50 6.5 Evolutionary Algorithms ...........................................................................................51 6.5.1 Firefly Algorithm ...........................................................................................51 6.5.2 Particle Swarm Optimization ......................................................................52 6.5.3 Genetic Algorithms .......................................................................................52 6.5.4 Genetic Programming ...................................................................................53 6.5.5 Applications of Genetic Programming .......................................................54 6.5.6 Simulated Annealing ....................................................................................54 6.5.7 Harmony Search ............................................................................................55 6.5.8 Differential Evolution ....................................................................................55 6.5.9 Tabulated Search ............................................................................................56 Section III Demand of Evolutionary Algorithms in IR 7 Demand of Evolutionary Algorithms in Information Retrieval ................................59 7.1 Document Ranking.....................................................................................................59 7.1.1 Retrieval Effectiveness ..................................................................................59 7.2 Relevance Feedback Approach .................................................................................60 7.2.1 Relevance Feedback in Text IR .....................................................................61 7.2.1.1 Query Expansion ............................................................................62 7.2.2 Relevance Feedback in Content-Based Image Retrieval ..........................62 7.2.3 Relevance Feedback in Region-Based Image Retrieval ............................63 7.3 Term-Weighting Approaches ....................................................................................64 7.3.1 Term Frequency .............................................................................................65 7.3.2 Inverse Document Frequency ......................................................................65 7.4 Document Retrieval ....................................................................................................65 7.5 Feature Selection Approach.......................................................................................66 7.5.1 Filter Method for Feature Selection .............................................................67 7.5.2 Wrapper Method for Feature Selection ......................................................67 7.5.3 Embedded Method for Feature Selection ...................................................67 7.6 Image Retrieval ...........................................................................................................68 7.6.1 Content-Based Image Retrieval ...................................................................69 7.6.1.1 Feature Extraction ..........................................................................71 7.6.1.2 Color Descriptor .............................................................................71 7.6.1.3 Texture Descriptor .........................................................................72 7.6.1.4 Shape Descriptor ............................................................................73 7.6.1.5 Similarity Measure .........................................................................73 7.6.2 Region-Based Image Retrieval .....................................................................73 7.6.2.1 Image Segmentation ......................................................................74 7.6.2.2 Similarity Measure .........................................................................75 7.6.3 Image Summarization ...................................................................................75 7.6.3.1 Multimodal Image Collection Summarization ..........................76 7.6.3.2 Bag of Words ...................................................................................77 7.6.3.3 Dictionary Learning for Calculating Sparse Approximately ...79 7.7 Web-Based Recommendation System ......................................................................80 7.8 Web Page Classification ..............................................................................................81 7.9 Facet Generation ..........................................................................................................83 7.10 Duplicate Detection System .......................................................................................84 7.11 Improvisation of Seeker Satisfaction in Community Question Answering Systems .........................................................................................................................86 7.12 Abstract Generation ....................................................................................................87 Section IV Model Formulations of Information Retrieval Techniques 8 TABU Annealing: An Efficient and Scalable Strategy for Document Retrieval .....91 8.1 Simulated Annealing ..................................................................................................91 8.1.1 The Simulated Annealing Algorithm .........................................................92 8.1.2 Cooling Schedules .........................................................................................92 8.2 TABU Annealing Algorithm .....................................................................................93 8.3 Empirical Results and Discussion ............................................................................94 9 Efficient Latent Semantic Indexing-Based Information Retrieval Framework Using Particle Swarm Optimization and Simulated Annealing ................................99 9.1 Architecture of Proposed Information Retrieval System......................................99 9.2 Methodology and Solutions ....................................................................................100 9.2.1 Text Preprocessing .......................................................................................100 9.2.2 Dimensionality Reduction ..........................................................................101 9.2.2.1 Dimensionality Reduction Using Latent Semantic Indexing .........................................................................................101 9.2.2.2 Query Conversion Using LSI ......................................................102 9.2.3 Clustering of Dimensionally Reduced Documents ................................103 9.2.3.1 Background of Particle Swarm Optimization (PSO) Algorithm ......................................................................................103 9.2.3.2 Background of K-Means ..............................................................105 9.2.3.3 Hybrid PSO + K-Means Algorithm ............................................106 9.2.4 Simulated Annealing for Document Retrieval ........................................106 9.3 Experimental Results and Discussion ...................................................................106 9.3.1 Performance Evaluation for Clustering ....................................................106 9.3.2 Performance Evaluation for Document Retrieval ...................................108 10 Music-Inspired Optimization Algorithm: Harmony-TABU for Document Retrieval Using Rhetorical Relations and Relevance Feedback ...............................113 10.1 The Basic Harmony Search Clustering Algorithm ..............................................113 10.1.1 Basic Structure of Harmony Search Algorithm ......................................113 10.1.2 Representation of Documents and Queries .............................................113 10.1.3 Representation of Solutions ........................................................................114 10.1.4 Features of Harmony Search ......................................................................114 10.1.5 Initialize the Problem and HS Parameters ...............................................115 10.1.6 Harmony Memory Initialization ...............................................................115 10.1.7 New Harmony Improvisation ....................................................................115 10.1.8 Hybridization ...............................................................................................116 10.1.9 Evaluation of Solutions ...............................................................................116 10.2 Harmony-TABU Algorithm .....................................................................................116 10.3 Relevance Feedback and Query Expansion in IR ................................................118 10.3.1 Presentation Term Selection .......................................................................118 10.3.2 Direct Term Feedback (TFB) .......................................................................119 10.3.3 Cluster Feedback (CFB) ...............................................................................120 10.3.4 Term-Cluster Feedback (TCFB) ..................................................................120 10.4 Empirical Results and Discussion ..........................................................................121 10.4.1 Document Collections .................................................................................121 10.4.2 Experimental Setup .....................................................................................121 10.5 Rhetorical Structure ..................................................................................................123 10.6 Abstract Generation ..................................................................................................123 11 Evaluation of Light Inspired Optimization Algorithm-Based Image Retrieval ...................................................................................................................125 11.1 Query Selection and Distance Calculation ...........................................................126 11.2 Optimization Using a Stochastic Firefly Algorithm ............................................127 11.2.1 Agents Initialization and Fitness Evaluation ...........................................127 11.2.2 Variation in Brightness of Firefly ...............................................................127 11.2.3 Strategy for Searching New Swarms ........................................................127 11.3 Experimental Setup ..................................................................................................129 11.4 Visual Signature ........................................................................................................129 11.5 Performance Measures .............................................................................................130 11.6 Parameter Settings of Firefly Algorithm ...............................................................130 11.7 Performance Evaluation ...........................................................................................131 12 An Evolutionary Approach for Optimizing Content-Based Image Retrieval Using Support Vector Machine .......................................................................................135 12.1 Relevance Feedback Learning via Support Vector Machine ..............................136 12.2 Optimization Using a Stochastic Firefly Algorithm ............................................137 12.3 Image Database .........................................................................................................139 12.4 Baselines .....................................................................................................................139 12.5 Comparison Methods ...............................................................................................140 13 An Application of Firefly Algorithm to Region-Based Image Retrieval ................143 13.1 Image Retrieval .........................................................................................................144 13.1.1 Image Segmentation ....................................................................................144 13.1.2 Image Representation ..................................................................................144 13.1.3 Similarity Measure ......................................................................................144 13.2 Optimization Using a Stochastic Firefly Algorithm ............................................146 13.2.1 Firefly Agent’s Initialization and Fitness Evaluation .............................146 13.2.2 Attraction toward New Firefly...................................................................146 13.2.3 Movement of Fireflies ..................................................................................147 13.3 Image Databases ........................................................................................................147 13.4 Performance Evaluation ...........................................................................................148 14 An Evolutionary Approach for Optimizing Region-Based Image Retrieval Using Support Vector Machine .......................................................................................151 14.1 Region-Based Image Retrieval ................................................................................151 14.2 Behavior of Fireflies ..................................................................................................153 14.3 Why Is the Firefly Algorithm So Efficient? ............................................................153 14.4 Machine Learning .....................................................................................................154 14.5 Support Vector Machines .........................................................................................155 14.6 Optimization of SVM by PSO .................................................................................155 14.6.1 SVM-Based RF ..............................................................................................156 14.7 Optimization Using a Stochastic Firefly Algorithm ............................................157 14.8 Image Databases ........................................................................................................157 14.8.1 COIL Database..............................................................................................157 14.8.2 The Corel Database ......................................................................................158 14.9 Baselines .....................................................................................................................158 14.9.1 The Proposed SVM: FA Approach ............................................................158 14.10 Discussion ..................................................................................................................159 14.10.1 Comparison of FA with PSO and GA .......................................................160 15 Optimization of Sparse Dictionary Model for Multimodal Image Summarization Using Firefly Algorithm ......................................................................161 15.1 Image Representation ...............................................................................................162 15.2 Problem Formulation ................................................................................................163 15.3 Optimization of Dictionary Learning ....................................................................165 15.4 Sparse Coding............................................................................................................166 15.5 Iterative Dictionary Selection Stage ........................................................................167 15.6 Performance Analysis ..............................................................................................167 15.6.1 Experiment Setup ........................................................................................167 15.6.2 Experimental Specification .........................................................................168 15.6.3 Baseline Algorithms ....................................................................................168 15.6.4 Mean Square Error Performance ...............................................................168 Section V Algorithmic Solutions to the Problems in Advanced IR Concepts 16 A Dynamic Feature Selection Method for Document Ranking with Relevance Feedback Approach ........................................................................................173 16.1 Overview ....................................................................................................................173 16.2 Feature Selection Procedures ..................................................................................173 16.2.1 Markov Random Field (MRF) Model for Feature Selection ..................175 16.2.2 Correlation-Based Feature Selection .........................................................175 16.2.3 Count Difference-Based Feature Selection...............................................176 16.3 Proposed Approach for Feature Selection .............................................................177 16.3.1 Feature Generalization with Association Rule Induction .....................178 16.3.2 Ranking .........................................................................................................178 16.3.2.1 Document Ranking Using BM25 Weighting Function ...........179 16.3.2.2 Expectation Maximization for Relevance Feedback ...............179 16.4 Empirical Results and Discussion ..........................................................................179 16.4.1 Dataset Used for Feature Selection ...........................................................179 16.4.2 n-Gram Generation ......................................................................................180 16.4.3 Evaluation .....................................................................................................180 17 TDCCREC: An Efficient and Scalable Web-Based Recommendation System ......185 17.1 Recommendation Methodologies ...........................................................................185 17.1.1 Learning Automata (LA) ............................................................................186 17.1.2 Weighted Association Rule ........................................................................187 17.1.3 Content-Based Recommendation ..............................................................188 17.1.4 Collaborative Filtering-Based Recommendation.....................................189 17.2 Proposed Approach: Truth Discovery-Based Content and Collaborative Recommender System (TDCCREC) .......................................................................190 17.3 Empirical Results and Discussion ..........................................................................193 18 An Automatic Facet Generation Framework for Document Retrieval ....................197 18.1 Baseline Approach ....................................................................................................198 18.1.1 Drawbacks ....................................................................................................198 18.2 Greedy Algorithm .....................................................................................................198 18.2.1 Drawbacks ....................................................................................................199 18.3 Feedback Language Model ......................................................................................199 18.4 Proposed Method: Automatic Facet Generation Framework (AFGF) ...............200 18.5 Empirical Results and Discussion ..........................................................................202 19 ASPDD: An Efficient and Scalable Framework for Duplication Detection ...........205 19.1 Duplication Detection Techniques .........................................................................205 19.1.1 Prior Work .....................................................................................................207 19.1.1.1 Similarity Measures .....................................................................207 19.1.1.2 Shingling Techniques ..................................................................207 19.1.2 Proposed Approach (ASPDD) ....................................................................208 19.2 Empirical Results and Discussion ..........................................................................210 20 Improvisation of Seeker Satisfaction in Yahoo! Community Question Answering Using Automatic Ranking, Abstract Generation, and History Updation ...............................................................................................................................213 20.1 The Asker Satisfaction Problem ..............................................................................214 20.2 Community Question Answering Problems ........................................................214 20.3 Methodologies ...........................................................................................................216 20.4 Experimental Setup ..................................................................................................220 20.5 Empirical Results and Discussion ..........................................................................225 Section VI Findings and Summary 21 Findings and Summary of Text Information Retrieval Chapters ............................231 21.1 Findings and Summary ...........................................................................................231 21.2 Future Directions ......................................................................................................233 22 Findings and Summary of Image Retrieval and Assessment of Image Mining Systems Chapters ................................................................................................235 22.1 Experimental Setup ..................................................................................................235 22.2 Results and Discussions ...........................................................................................236 22.3 Findings 1: Average Precision-Recall Curves of Proposed Image Retrieval Systems for Pascal Database ....................................................................................237 22.4 Findings 2: Average Precision and Average Recall of Proposed Methods for Different Semantic Classes ................................................................................238 22.5 Findings 3: Average Precision and Average Recall of Top-Ranked Results after the Ninth Feedback for Corel Database ........................................................240 22.6 Findings 4: Average Precision of Top-Ranked Results after the Ninth Feedback for IR with Summarization and IR without Summarization............241 22.7 Findings 5: Average Execution Time of Proposed Methods ...............................242 22.8 Findings 6: Performance Analysis of Top Retrieval Results Obtained with the Proposed Image Retrieval Systems ..................................................................243 22.9 Summary ....................................................................................................................245 22.10 Future Scope ..............................................................................................................246 Appendix: Abbreviations, Acronyms and Symbols ...........................................................249 Bibliography ................................................................................................................................257 Index .............................................................................................................................................279

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.