ebook img

The thirteenth text REtrieval conference, TREC 2004 PDF

2005·7.7 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The thirteenth text REtrieval conference, TREC 2004

NATL INST. OF STAND 8, TECH NIST AlllDt PUBLICATIONS flfl7bfifl NIST Special Publication 500-261 Information Technolosv: The Thirteenth Text REtrieval Conference, TREC 2004 Ellen M. Voorhees and Lori P. Buckland Editors Nisr ^ ^ National Institute of Standards and Technology I Technology Administration, U.S. Department of Commerce \jlJ^^ J heNational Institute ofStandards andTechnologywas established in 1988 by Congress to "assist industry in the development oftechnology ... needed to improveproduct quality, to modernizemanufacturing processes, to ensure product reliability ... and to facilitate rapid commercialization ... ofproductsbased on new scientific discoveries." NIST, originally founded as theNational Bureau ofStandards in 1901, works to strengthenU.S. industry's competitiveness; advance science andengineering; and improvepublichealth, safety, andthe environment. One ofthe agency'sbasic functions is to develop, maintain, andretain custody ofthe national standards of measurement, andprovide the means andmethods forcomparing standards used in science, engineering, manufacturing, commerce, industry, and education with the standards adoptedorrecognizedby the Federal Government. As an agency ofthe U.S. Commerce Department's TechnologyAdministration, NISTconducts basic and applied research in the physical sciences and engineering, and develops measurement techniques, testmethods, standards, and related services. The Institute does generic andprecompetitive work on new and advanced technologies. NIST's research facilities are located at Gaithersburg, MD 20899, and atBoulder, CO 80303. Major technical operatingunits and theirprincipal activities are listedbelow. Formore information visittheNIST Website at http://www.nist.gov, or contactthe Publications andProgram Inquiries Desk, 301-975-3058. Office ofthe Director Chemical Science and Technology • NationalQualityProgram Laboratory • IntemationalandAcademicAffairs • Biotechnology • ProcessMeasurements Technology Services • SurfaceandMicroanalysis Science • StandardsServices • PhysicalandChemicalProperties^ • TechnologyPartnerships • AnalyticalChemistry • MeasurementServices • Information Services Physics Laboratory • WeightsandMeasures • ElectronandOpticalPhysics • AtomicPhysics Advanced Technology Program • OpticalTechnology • EconomicAssessment • IonizingRadiation • InformationTechnologyandApplications • TimeandFrequency' • ChemistryandLifeSciences • QuantumPhysics' • ElectronicsandPhotonicsTechnology Manufacturing Engineering Manufacturing Extension Partnership Laboratory Program • PrecisionEngineering • RegionalPrograms • ManufacturingMetrology • NationalPrograms • IntelligentSystems • ProgramDevelopment • FabricationTechnology • ManufacturingSystemsIntegration Electronics and Electrical Engineering Laboratory Building and Fire Research • Microelectronics Laboratory • LawEnforcementStandards • AppliedEconomics • Electricity • MaterialsandConstructionResearch • SemiconductorElectronics • BuildingEnvironment • Radio-FrequencyTeclmology' • FireResearch • ElectromagneticTeclmology' • Optoelectronics' Information Technology Laboratory • MagneticTeclmology' • MathematicalandComputationalSciences^ • AdvancedNetworkTechnologies Materials Science and Engineering • ComputerSecurity Laboratory • hifomiationAccess • IntelligentProcessingofMaterials • ConvergentInformation Systems • Ceramics • InfomiationServicesandComputing • MaterialsReliability! • Software DiagnosticsandConformanceTesting • Polymers • StatisticalEngineering • Metallurgy • NISTCenterforNeutronResearch 'AtBoulder,CO 80303 ^SoineelementsatBoulder,CO NIST Special Publication 500-261 Information Technolosv: The Thirteenth Text Retrieval Conference, TREC 2004 Ellen M. Voorhees and Lori P. Buckland Editors Information TechnologyLaboratory InformationAccess Division NationalInstitute ofStandards and Technology MD Gaithersburg, 20899-8940 August2005 U.S. DepartmentofCommerce Carlos M. Gutierrez, Secretary TechnologyAdministration Michelle O'Neill, Acting UnderSecretaryofCommercefor Technology National Institute ofStandards andTechnology WilliamA. Jeffrey, Director Reports on Information Technology The Information Technology Laboratory (ITL) at the National Institute of Standards and Technology (NIST) stimulates U.S. economic growth and industrial competitiveness through technical leadership and collaborative research in critical infrastructure technology, including tests, test methods, reference data, and forward-looking standards, to advance the development and productive use of information technology. To overcome barriers to usability, scalability, interoperability, and security in information systems and networks, ITL programs focus on a broad range of networking, security, and advanced information technologies, as well as the mathematical, statistical, and computational sciences. This Special Publication 500-series reports on ITL's research in tests and test methods for information technology, and its collaborative activities with industry, government, and academic organizations. Certain commercial entities, equipment, ormaterials may be identified inthis document in orderto describe an experimental procedure orconceptadequately. Such identification is not intendedto imply recommendation or endorsement bythe National Institute ofStandards and Technology, nor is it intendedto implythatthe entities, materials, orequiomentare necessarilythe bestavailable forthe nurnose. National Institute ofStandards and Technology Special Publication 500-261 Natl. Inst. Stand. Technol. Spec. Publ. 500-261 132 pages (August 2005) CODEN: NSPUE2 U.S. GOVERNMENT PRINTING OFFICE WASHINGTON: 2005 For sale bythe Superintendent ofDocuments, U.S. GovernmentPrinting Office Internet: bookstore.gpo.gov—Phone: (202) 512-1800—Fax: (202) 512-2250 Mail: Stop SSOP, Washington, DC 20402-0001 Foreword This report constitutes the proceedings of the 2004 edition of the Text REtrieval Conference, TREC 2004, held in Gaithersburg, Maryland, November 16-19, 2004. The conference was co- sponsoredby the National Institute ofStandards and Technology (NIST), the Advanced Research andDevelopmentActivity (ARDA), andtheDefense AdvancedResearch Projects Agency (DARPA) Approximately 200 people attended the conference, including representatives from 21 different countries. The conference was the thirteenth in an on-going series of workshops to evaluate new technologies for textretrieval and related information-seeking tasks. The workshop includedplenary sessions, discussion groups, aposter session, and demonstrations. Because the participants in the workshop drew on theirpersonal experiences, they sometimes cite specific vendors and commercial products. The inclusion or omission of a particular company or product implies neither endorsement nor criticism by NIST. Any opinions, findings, and con- clusions or recommendations expressed in the individual papers are the authors' own and do not necessarily reflect those ofthe sponsors. The sponsorship ofthe U.S. Department ofDefense is gratefully acknowledged, as is the tremen- dous work ofthe program committee and the track coordinators. Ellen Voorhees August 2, 2005 TREC 2004 Program Committee Ellen Voorhees, NIST, chair James Allan, University ofMassachusetts at Amherst Chris Buckley, Sabir Research, Inc. Gordon Cormack, University ofWaterloo Susan Dumais, Microsoft Donna Harman, NIST David Hawking, CSIRO Bill Hersh, Oregon Health & Science University David Lewis, Omarose Inc. John Prager, IBM John Prange, U.S. Department ofDefense Steve Robertson, Microsoft Mark Sanderson, University ofSheffield UK Karen Sparck Jones, University ofCambridge, Ross Wilkinson, CSIRO 111 I I I TREC 2004 Proceedings Foreword iii Listing ofcontents ofAppendix xiv Listing ofpapers, alphabetical by organization xv Listing ofpapers, organized by track xxiv Abstract xxxvi Overview Papers Overview ofTREC 2004 1 E.M. Voorhees, National Institute ofStandards and Technology (NIST) TREC 2004 Genomics Track Overview 13 W.R. Hersh, R.T. Bhuptiraju, L. Ross, A.M. Cohen, D.F. Kraemer, & Oregon Health Science University P. Johnson, Biogen Idee Corporation HARD Track Overview in TREC 2004 25 High Accuracy Retrieval from Documents J. Allan, University ofMassachusetts Amherst Overview ofthe TREC 2004 Novelty Track 36 I.Soboroff, NIST Overview ofthe TREC 2004 Question Answering Track 52 E.M. Voorhees, NIST Overview ofthe TREC 2004 Robust Track 70 E.M. Voorhees, NIST Overview ofthe TREC 2004 Terabyte Track 80 C. Clarke, University ofWaterloo N. Craswell, Microsoft Research I. Soboroff, National Institute ofStandards and Technology Overview ofthe TREC 2004 Web Track 89 MSR N. Craswell, Cambridge D. Hawking, CSIRO V Other Papers (contents ofthesepapers arefoundon the TREC2004 Proceedings CD) Phrasal Queries with LingPipe and Lucene: Ad Hoc Genomics Text Retrieval B. Carpenter, Alias-i, Inc. Experiments with Web QA System and TREC 2004 Questions D. Roussinov, Y. Ding, J.A. Robles-Flores, Arizona State University Categorization ofGenomics Text Based on Decision Rules R. Guillen, California State University San Marcos Liitial Results with Structured Queries and Language Models on Halfa Terabyte ofText K. Collins-Thompson, P. Ogilvie, J. Callan, Carnegie Mellon University Experiments in TREC 2004 Novelty Track at CAS-ICT H.-P. Zhang, H.-B. Xu, S. Bai, B. Wang, X.-Q. Cheng, Chinese Academy ofSciences TREC 2004 Web Track Experiments at CAS-ICT Z. Zhou, Y. Guo, B. Wang, X. Cheng, H. Xu, G. Zhang, Chinese Academy ofSciences NLPR at TREC 2004: Robust Experiments J. Xu, J. Zhao, B. Xu, Chinese Academy ofScience ISCAS at TREC 2004: HARD Track L. Sun, J. Zhang, Y. Sun, Chinese Academy ofSciences TREC 2004 HARD Track Experiments in Clustering D.A. Evans, J. Bennett, J. Montgomery, V. Sheftel, D.A. Hull, J.G. Shanahan, Clairvoyance Corporation XML Evolving and Dictionary Strategies for Question Answering and Novelty Tasks K.C. Litkowski, CL Research Columbia University in the Novelty Track at TREC 2004 B. Schiffman, K.R. McKeown, Columbia University Concept Extraction and Synonymy Management for Biomedical Information Retrieval C. Crangle, A. Zbyslaw, ConverSpeech LLC M. Cherry, E. Hong, Stanford University DalTREC 2004: Question Answering Using Regular Expression Rewriting V. Keselj, A. Cox, Dalhousie University, Halifax VI Experiments in Terabyte Searching, Genomic Retrieval and Novelty Detection for TREC 2004 S. Blott, F. Camous, P. Ferguson, G. Gaughan, C. Gurrin, J.G.F. Jones, N. Murphy, N. O'Connor, A.F. Smeaton, P. Wilkins, Dublin City University O. Boydell, B. Smyth, University College Dublin Amberfish at the TREC 2004 Terabyte Track N. Nassar, Etymon Systems Fondazione Ugo Bordoni at TREC 2004 G. Amati, C. Carpineto, G. Romano, Fondazione Ugo Bordoni FDUQA on TREC 2004 QA Track L. Wu, X. Huang, L. You, Z. Zhang, X. Li, Y. Zhou, Fudan University The GUC Goes to TREC 2004: Using Whole or Partial Documents forRetrieval and Classification in the Genomics Track K. Darwish, A. Madkour, The German University in Cairo The Hong Kong Polytechnic University at the TREC 2004 Robust Track D.Y. Wang, R.W.P. Luk, The Hong Kong Polytechnic University K.F. Wong, The Chinese University ofHong Kong Robust, Web and Terabyte Retrieval with Hummingbird SearchServer at TREC 2004 S. Tomlinson, Hummingbird Juru at TREC 2004: Experiments with Prediction ofQuery Difficulty E. Yom-Tov, S. Fine, D. Carmel, A. Darlow, E. Amitay, IBM Haifa Research Labs IBM's PIQUANT II in TREC 2004 J. Chu-CarroU, K. Czuba, J. Prager, A. Ittycheriah, IBM T.J. Watson Research Center S. Blair-Goldensohn, Columbia University A Hidden Markov Model for the TREC Novelty Task J.M. Conroy, IDA/CCS IIT at TREC 2004 Standard Retrieval Models Over Partitioned Indices for the Terabyte Track J. Heard, O. Frieder, D. Grossman, Illinois Institute ofTechnology WIDIT in TREC 2004 Genomics, Hard, Robust and Web Tracks K. Yang, N. Yu, A. Wead, G. La Rowe, Y.-H. Li, C. Friend, Y. Lee, Indiana University TREC 2004 Genomics Track Experiments at KJB K. Seki, J.C. Costello, V.R. Singan, J. Mostafa, Indiana University Bloomington Vll TREC Novelty Track at IRIT-SIG T. Dkaki, Institut de Recherche en hiformatique de Toulouse (IRIT) and Universite Toulouse le Mirail J. Mothe, Listitut de Recherche en Informatique de Toulouse (IRIT) and Institut Universitaire de Formation des Maitres Midi-Pyrenees Combining Linguistic Processing and Web Mining for Question Answering: ITC-irst at TREC 2004 H. Tanev, M. Kouylekov, B. Magnini, ITC-irst JHU/APL at TREC 2004: Robust and Terabyte Tracks C. Piatko, J. Mayfield, P. McNamee, S. Cost, The Johns Hopkins University Applied Physics Laboratory Korea University Question Answering System at TREC 2004 K.-S. Han, H. Chung, S.-B. Kim, Y.-I. Song, J.-Y. Lee, H.-C. Rim, Korea University Novel Approaches in Text Information Retrieval Experiments in the Web Track ofTREC 2004 M. Farah, D. Vanderpooten, Lamsade, University ofParis Dauphine LexiClone hic. andNIST TREC I. Geller, LexiClone AnswerFinder at TREC 2004 D. Molla, M. Gardiner, Macquarie University Meiji University Web, Novelty and Genomic Track Experiments T. Tomiyama, K. Karoji, T. Kondo, Y. Kakuta, T. Takagi, Meiji University Microsoft Research Asia at Web Track and Terabyte Track ofTREC 2004 R. Song, J.-R. Wen, S. Shi, G. Xin, T.-Y. Liu, T. Qin, X. Zheng, J. Zhang, G. Xue, W.-Y. Ma, Microsoft Research Asia Microsoft Cambridge at TREC 13: Web and Hard Tracks H. Zaragoza, N. Craswell, M. Taylor, S. Saria, S. Robertson, Microsoft Research Ltd. Answering Muhiple Questions on a Topic From Heterogeneous Resources B. Katz, M. Bilotti, S. Felshin, A. Femandes, W. Hildebrandt, R. Katzir, J. Lin, D. Loreto, G. Marton, F. Mora, O. Uzuner, MIT Computer Science and Artificial Intelligence Laboratory Experience ofUsing SVM for the Triage Task in TREC 2004 Genomics Track D. Zhang, W.S. Lee, National University ofSingapore viii

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.