ebook img

Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval PDF

317 Pages·2002·9.95 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval

ADVANCES IN INFORMATION RETRIEVAL Recent Research from the Center for Intelligent Information Retrieval THE KLUWER INTERNATIONAL SERIES ON INFORMATION RETRIEVAL Series Editor W.Bruce Croft University ofMassachusetts, Amherst AlsointheSeries: MULTIMEDIA INFORMATION RETRIEVAL: Content-Based Information Retrieval from LargeText and Audio Databases, by Peter Sch..auble; ISBN: 0-7923-9899-8 INFORMATIONRETRIEVALSYSTEMS,byGeraldKowalski;ISBN: 0-7923-9926-9 CROSS-LANGUAGEINFORMATIONRETRIEVAL,editedbyGregory Grefenstette; ISBN: 0-7923-8 122-X TEXT RETRIEVAL ANDFILTERING: AnalyticModels of Performance, by Robert M. Losee; ISBN: 0-7923-8177-7 INFORMATIONRETRIEVAL: UNCERTAINTYANDLOGICS: Advanced Models for the Representation and Retrieval of Information,by Fabio Crestani, Mounia Lalmas, and Cornelis Joost van Rijsbergen; ISBN: 0-7923-8302-8 DOCUMENTCOMPUTING: Technologies for ManagingElectronic Document Collections, by Ross Wilkinson, Timothy Arnold-Moore, Michael Fuller, Ron Sacks-Davis, James Thom, and Justin Zobel; ISBN: 0-7923-8357-5 AUTOMATICINDEXINGANDABSTRACTINGOFDOCUMENT TEXTS, by Marie-Francine Moens; ISBN 0-7923-7793-1 ADVANCES IN INFORMATION RETRIEVAL Recent Research from the Center for Intelligent Information Retrieval Edited by W. Bruce Croft Universityof Massachusetts, Amherst KLUWER ACADEMIC PUBLISHERS NewYork / Boston/Dordrecht/London / Moscow eBook ISBN: 0-306-47019-5 Print ISBN: 0-792-37812-1 ©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow Print ©2000 Kluwer Academic Publishers Massachusetts All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: http://kluweronline.com and Kluwer's eBookstore at: http://ebooks.kluweronline.com Contents Preface ix ContributingAuthors xiii 1 Combining Approaches to Information Retrieval 1 W. Bruce Croft 1 Introduction 1 2 Combining Representations 5 3 Combining Queries 9 4 Combining Ranking Algorithms 11 5 Combining Search Systems 13 6 Combining Belief 16 7 Language Models 25 8 Conclusion 28 2 The Use of Exploratory Data Analysis in 37 InformationRetrievalResearch WarrenR. Greiff 1 Introduction 37 2 ExploratoryDataAnalysis 39 3 Weight ofEvidence 40 4 AnalysisoftheRelationshipbetweenDocumentFrequencyandthe 43 Weight of Evidenceof Term Occurrence 5 Probabilistic Modeling of Multiple Sources of Evidence 53 6 Conclusions 70 3 LanguageModels forRelevance Feedback 73 Jay M. Ponte 1 Introduction 73 2 TheLanguageModelingApproachtoIR 75 3 RelatedWork 78 4 QueryExpansionintheLanguageModelingApproach 83 vi ADVANCES IN INFORMATION RETRIEVAL 5 DiscussionandFutureWork 92 4 Topic Detection and Tracking: EventClustering as aBasis forFirst Story 97 Detection Ron PapkaJamesAllan 1 TopicDetectionandTracking 98 2 On-lineClusteringAlgorithms 103 3 ExperimentalSetting 108 4 EventClustering 110 5 First Story Detection 112 6 Discussion of First Story Detection 119 7 Conclusion 120 8 Future Work 122 5 Distributed Information Retrieval 127 Jamie Callan 127 1 Introduction 2 Multi-Database Testbeds 129 3 Resource Description 130 4 Resource Selection 131 5 Merging Document Rankings 135 6 Acquiring Resource Descriptions 137 7 Summary and Conclusions 145 6 Topic-Based Language Models for Distributed Retrieval 151 Jinxi Xu W. Bruce Croft 1 Introduction 152 2 Topic Models 154 3 K-Means Clustering 155 4 Four Methods of Distributed Retrieval 155 5 ExperimentalSetup 158 6 Global Clustering 160 7 Recall-basedRetrieval 163 8 Distributed Retrieval in Dynamic Environments 165 9 MoreClusters 165 10 Better Choice of Initial Clusters 165 11 Local Clustering 166 12 Multiple-Topic Representation 166 13 Efficiency 168 14 RelatedWork 168 15 Conclusion andFutureWork 169 7 The Effect of Collection Organization and Query Locality on Information 173 RetrievalSystemPerformance Zhihong Lu Kathryn S. McKinley 1 Introduction 174 2 Related Work 176 Contents vii 3 SystemArchitectures 181 4 ConfigurationwithRespecttoCollectionOrganization,Collection AccessSkew,andQueryLocality 185 5 SimulationModel 188 6 Experiments 189 7 Conclusions 197 8 Cross-LanguageRetrievalviaTransitiveTranslation 203 LisaA. Ballesteros 1 Introduction 203 2 TranslationResources 205 3 DictionaryTranslationandAmbiguity 208 4 ResolvingAmbiguity 209 5 AddressingLimitedResources 212 6 Summary 230 9 Building,Testing,andApplyingConceptHierarchies 235 MarkSandersonDawn Lawrie 1 Introduction 235 2 Building a Concept Hierarchy 238 3 Presenting a Concept Hierarchy 246 4 Evaluating the Structures 251 5 FutureWork 255 6 Conclusions 261 Appendix: ANOVA analysis 262 10 Appearance-Based Global Similarity Retrieval of Images 267 S. Ravela C. Luo 1 Introduction 268 2 Appearance Related Representations 272 3 Computing Global Appearance Similarity 278 4 TrademarkRetrieval 293 5 Conclusions andLimitations 299 Index 305 Preface The Center for Intelligent Information Retrieval (CIIR) was formed in the ComputerScienceDepartmentoftheUniversityofMassachusetts,Amherstin 1992. The core support for the Center came from a National Science Founda- tionState/Industry/UniversityCooperativeResearchCenter(S/IUCRC)grant, although there had been a sizeable information retrieval (IR) research group for over10yearspriortothatgrant. ThebasicgoaloftheseCentersistocombine basic research, applied research, and technology transfer. The CIIR has been successful in each of these areas, in that it has produced over 270 research papers, has been involved in many successful government and industry collaborations, and has had a significant role in high-visibility Internet sites and start-ups. As a result of these efforts, the CIIR has become known internationally as one of the leading research groups in the area of information retrieval. The CIIR focuses on research that results in more effective and efficient access and discovery in large, heterogeneous, distributed, text and multimedia databases. The scope of the work that is done in the CIIR is broad and goes significantly beyond “traditional” areas of information retrieval such as retrieval models, cross-lingual search, and automatic query expansion. The research includes both low-level systems issues such as the design of protocols and architectures for distributed search, as well as more human-centered topics such as user interface design, visualization and data mining with text, and multimedia retrieval. The papers in this book contain some of the more recent research results from the CIIR. The first group of papers present new research related to the re- trieval models that underly IR systems. The first paper, Combining Approaches to Information Retrieval by Croft, discusses retrieval models and strategies for combining evidence from multiple document representations, queries, ranking algorithms and search systems. This has been an important line of research for more than 10 years, and this paper provides a framework for understanding the many experimental results in this area and indicates how recent work on x ADVANCES IN INFORMATION RETRIEVAL language models contributes to these results. Greiff’s paper, The Use of Ex- ploratory Data Analysis in Information Retrieval Research, introduces a data- drivenapproachtodeveloping retrievalmodelsand usesthisapproachto derive a probabilistic ranking formula from an analysis of TREC data. A number of retrieval experiments are used to validate this new model. In the third paper of this group, Language Models for Relevance Feedback, Ponte describes the language modeling approach to IR that he introduced in his thesis work, and then shows how this approach can be used for relevance feedback and filtering environments. A number of experiments demonstrate the effectiveness of this conceptually simple, but potentially very powerful retrieval model. The next paper, Topic Detection and Tracking: Event Clustering as a Ba- sis for First Story Detection by Allan and Papka, describes a relatively new area of research that focuses on detecting significant events in broadcast news. New algorithms and modifications of existing IR techniques are presented and evaluated in the context of this novel task. The next three papers deal with a range of topics related to distributed infor- mation retrieval. In Distributed Information Retrieval, Callan gives an overview of this area of research and summarizes the results related to database de- scription and selection, and merging rankings from multiple systems. Xu and Croft, in their paper Topic-Based Language Models for Distributed Retrieval, present recent results from an approach to describing databases that is based on identifying language models through clustering. Lu and McKinley discuss performance-related issues in their paper The Effect of Collection Organization and Query Locality on Information Retrieval System Performance. They show that the use of database replication combined with selection algorithms can significantly improve the efficiency and scalability of distributed retrieval. The next paper, Cross-Language Retrieval via Transitive Translation by Ballesteros, discusses the language resources and techniques that are used for IR in multiple languages. The paper describes a series of experiments using a dictionary-based approach to transitive translation and retrieval through an in- termediate language. In Building, Testing and Applying Concept Hierarchies, Sanderson and Lawrie describe research in the important new area of summa- rization. They focus specifically on a technique for constructing a hierarchy of concepts to summarize the contents of a group of documents, such as those retrieved by a query. The last paper, Appearance-Based Global Similarity Retrieval of Images by Ravela and Luo, presents new research in the important, emerging area of image retrieval. Because many of the techniques used for image indexing and comparison are very different to those used for text retrieval, the paper contains an extensive introduction to those techniques. Retrieval evaluations of a new indexing technique based on image “appearance” are described and discussed. PREFACE xi Thesepapers, liketheresearch in theCIIR, coverawide variety oftopics inthegeneral areaofIR.Together, theyrepresentasnapshotofthe“state-of- the-art” in information retrieval attheturn ofthecentury andattheendofa decadethathasseentheadventoftheWorld-WideWeb. Thepapershavebeen writtentoprovideoverviewsoftheirsubareas andtoserveassourcematerial forgraduateandundergraduatecoursesininformationretrieval. Finally,Iwouldliketoacknowledgethefaculty,staff,andstudentsassociated withtheCIIR since 1992whohavecontributedenormouslytoitssuccess. In particular,JeanJoyce,KateMoruzzi,andGlennStowellhavebeeninstrumental totheCenter’soperation. IwouldalsoliketothankWinAung,ourNSFProgram Manager,forhissupportovertheyears. BRUCE CROFT

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.