ebook img

Estimating the Query Difficulty for Information Retrieval PDF

89 Pages·2010·1.005 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Estimating the Query Difficulty for Information Retrieval

Estimating the Query Difficulty for Information Retrieval Copyright© 2010byMorgan&Claypool Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedin anyformorbyanymeans—electronic,mechanical,photocopy,recording,oranyotherexceptforbriefquotationsin printedreviews,withoutthepriorpermissionofthepublisher. EstimatingtheQueryDifficultyforInformationRetrieval DavidCarmelandEladYom-Tov www.morganclaypool.com ISBN:9781608453573 paperback ISBN:9781608453580 ebook DOI10.2200/S00235ED1V01Y201004ICR015 APublicationintheMorgan&ClaypoolPublishersseries SYNTHESISLECTURESONINFORMATIONCONCEPTS,RETRIEVAL,ANDSERVICES Lecture#15 SeriesEditor:GaryMarchionini,UniversityofNorthCarolina,ChapelHill SeriesISSN SynthesisLecturesonInformationConcepts,Retrieval,andServices Print1947-945X Electronic1947-9468 Synthesis Lectures on Information Concepts, Retrieval, and Services Editor GaryMarchionini,UniversityofNorthCarolina,ChapelHill SynthesisLecturesonInformationConcepts,Retrieval,andServicesiseditedbyGary MarchioninioftheUniversityofNorthCarolina.Theserieswillpublish50-to100-page publicationsontopicspertainingtoinformationscienceandapplicationsoftechnologyto informationdiscovery,production,distribution,andmanagement.Thescopewilllargelyfollowthe purviewofpremierinformationandcomputerscienceconferences,suchasASIST,ACMSIGIR, ACM/IEEEJCDL,andACMCIKM.Potentialtopicsinclude,butnotarelimitedto:datamodels, indexingtheoryandalgorithms,classification,informationarchitecture,informationeconomics, privacyandidentity,scholarlycommunication,bibliometricsandwebometrics,personal informationmanagement,humaninformationbehavior,digitallibraries,archivesand preservation,culturalinformatics,informationretrievalevaluation,datafusion,relevance feedback,recommendationsystems,questionanswering,naturallanguageprocessingforretrieval, textsummarization,multimediaretrieval,multilingualretrieval,andexploratorysearch. EstimatingtheQueryDifficultyforInformationRetrieval DavidCarmelandEladYom-Tov 2010 iRODSPrimer:IntegratedRule-OrientedDataSystem ArcotRajasekar,ReaganMoore,Chien-YiHou,ChristopherA.Lee,RichardMarciano,Antoinede Torcy,MichaelWan,WayneSchroeder,Sheau-YenChen,LucasGilbert,PaulTooby,andBingZhu 2010 CollaborativeWebSearch:Who,What,Where,When,andWhy MeredithRingelMorrisandJaimeTeevan 2009 MultimediaInformationRetrieval StefanRüger 2009 iv OnlineMultiplayerGames WilliamSimsBainbridge 2009 InformationArchitecture:TheDesignandIntegrationofInformationSpaces WeiDingandXiaLin 2009 ReadingandWritingtheElectronicBook CatherineC.Marshall 2009 HypermediaGenes:AnEvolutionaryPerspectiveonConcepts,Models,andArchitectures NunoM.GuimarãesandLuísM.Carrico 2009 UnderstandingUser-WebInteractionsviaWebAnalytics BernardJ.(Jim)Jansen 2009 XMLRetrieval MouniaLalmas 2009 FacetedSearch DanielTunkelang 2009 IntroductiontoWebometrics:QuantitativeWebResearchfortheSocialSciences MichaelThelwall 2009 ExploratorySearch:BeyondtheQuery-ResponseParadigm RyenW.WhiteandResaA.Roth 2009 NewConceptsinDigitalReference R.DavidLankes 2009 AutomatedMetadatainMultimediaInformationSystems:Creation,Refinement,Usein Surrogates,andEvaluation MichaelG.Christel 2009 Estimating the Query Difficulty for Information Retrieval David Carmel and EladYom-Tov IBMResearch,Israel SYNTHESISLECTURESONINFORMATIONCONCEPTS,RETRIEVAL,AND SERVICES#15 M &C Morgan &cLaypool publishers ABSTRACT Many information retrieval (IR) systems suffer from a radical variance in performance when re- spondingtousers’queries.Evenforsystemsthatsucceedverywellonaverage,thequalityofresults returnedforsomeofthequeriesispoor.Thus,itisdesirablethatIRsystemswillbeabletoidentify "difficult"queriessotheycanbehandledproperly.Understandingwhysomequeriesareinherently moredifficultthanothersisessentialforIR,andagoodanswertothisimportantquestionwillhelp searchenginestoreducethevarianceinperformance,hencebetterservicingtheircustomerneeds. Estimatingthequerydifficultyisanattempttoquantifythequalityofsearchresultsretrieved foraqueryfromagivencollectionofdocuments.Thisbookdiscussesthereasonsthatcausesearch enginestofailforsomeofthequeries,andthenreviewsrecentapproachesforestimatingquerydif- ficultyintheIRfield.Itthendescribesacommonmethodologyforevaluatingthepredictionquality of those estimators, and experiments with some of the predictors applied by various IR methods over several TREC benchmarks. Finally, it discusses potential applications that can utilize query difficulty estimators by handling each query individually and selectively,based upon its estimated difficulty. KEYWORDS informationretrieval,retrievalrobustness,querydifficultyestimation,performancepre- diction vii Contents Acknowledgments...........................................................xi 1 Introduction-TheRobustnessProblemofInformationRetrieval...............1 1.1 Reasonsforretrievalfailures-theRIAworkshop..............................3 1.2 Instabilityinretrieval-theTREC’sRobusttracks.............................4 1.3 Estimatingthequerydifficulty...............................................6 1.4 Summary..................................................................7 2 BasicConcepts...............................................................9 2.1 Theretrievaltask...........................................................9 2.2 Thepredictiontask........................................................10 2.2.1 Linearcorrelation 10 2.2.2 Rankcorrelation 12 2.3 Predictionrobustness......................................................12 2.4 Summary.................................................................13 3 QueryPerformancePredictionMethods......................................15 4 Pre-RetrievalPredictionMethods............................................17 4.1 Linguisticapproaches......................................................17 4.2 Statisticalapproaches......................................................19 4.2.1 Definitions 19 4.2.2 Specificity 20 4.2.3 Similarity 21 4.2.4 Coherency 21 4.2.5 Termrelatedness 22 4.3 Evaluatingpre-retrievalmethods............................................22 viii 4.4 Summary.................................................................24 5 Post-RetrievalPredictionMethods...........................................25 5.1 Clarity ...................................................................26 5.1.1 Definition 26 5.1.2 Examples 27 5.1.3 OtherClarity measures 28 5.2 Robustness................................................................29 5.2.1 Queryperturbation 29 5.2.2 Documentperturbation 30 5.2.3 Retrievalperturbation 31 5.2.4 Cohesion 31 5.3 Scoredistributionanalysis..................................................32 5.4 Evaluatingpost-retrievalmethods...........................................35 5.5 Predictionsensitivity.......................................................36 5.6 Summary.................................................................36 6 CombiningPredictors.......................................................39 6.1 Linearregression..........................................................39 6.2 Combiningpre-retrievalpredictors..........................................39 6.3 Combiningpost-retrievalpredictors.........................................40 6.3.1 Combiningpredictorsbasedonstatisticaldecisiontheory 41 6.3.2 EvaluatingtheUEF framework 42 6.3.3 Results 43 6.3.4 CombiningpredictorsintheUEF model 44 6.4 Summary.................................................................45 7 AGeneralModelforQueryDifficulty........................................47 7.1 Geometricalillustration....................................................47 7.2 Generalmodel............................................................48 7.3 Validatingthegeneralmodel................................................50 7.4 Therelationshipbetweenaspectcoverageandquerydifficulty..................51 CONTENTS ix 7.5 Validatingtherelationshipbetweenaspectcoverageandquerydifficulty.........52 7.6 Summary.................................................................53 8 ApplicationsofQueryDifficultyEstimation..................................55 8.1 Feedback:Totheuserandtothesystem .....................................55 8.2 Federationandmetasearch .................................................56 8.3 Contentenhancementusingmissingcontentanalysis .........................58 8.4 Selectivequeryexpansion .................................................. 60 8.4.1 Selectiveexpansionbasedonquerydriftestimation 61 8.4.2 Adaptiveuseofpseudorelevancefeedback 61 8.5 Otherusesofquerydifficultyprediction.....................................62 8.6 Summary.................................................................63 9 SummaryandConclusions...................................................65 9.1 Summary.................................................................65 9.2 Whatnext?...............................................................67 9.3 Concludingremarks.......................................................68 Bibliography................................................................69 Authors’Biographies........................................................77

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.