ebook img

Conceptual Information Retrieval: A Case Study in Adaptive Partial Parsing PDF

229 Pages·1991·16.528 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Conceptual Information Retrieval: A Case Study in Adaptive Partial Parsing

CONCEPTUALINFO~TION RETRIEVAL A Case Study in Adaptive Partial Parsing THEKLUWER INTERNATIONALSERIES IN ENGINEERINGAND COMPUTERSCIENCE NATURALLANGUAGE PROCESSING AND MACIllNETRANSLATION ConsultingEditor JaimeCarbonell Otherbooksintheseries: EFFICIENTPARSINGFORNATURALLANGUAGE:AFASTALGORITHM FOR PRACTICALSYSTEMS, M.Tomita ISBN0-89838-202-5 ANATURALlANGUAGEINTERFACEFORCOMPUTERAIDEDDESIGN, T. Samad ISBN0-89838-222-X INTEGRATEDNATURALLANGUAGEDIALOGUE:ACOMPUTATIONAL MODEL,R.E. Frederking ISBN0-89838-255-6 NAIVESEMANTICSFOR NATURALLANGUAGEUNDERSTANDING, K. Dahlgren ISBN0-89838-287-4 UNDERSTANDING EDITORIAL TEXT: A Computer Model of Argument Comprehension, SJ.Alvarado ISBN:0-7923-9123-3 NATURALlANGUAGEGENERATIONINARTIFICIAL INTELLIGENCEANDCOMPUTATIONALLINGUISTICS Paris/Swartout/Mann ISBN: 0-7923-9098-9 CURRENTISSUES INPARSINGTECHNOLOGY M.Tomita ISBN:0-7923-9131-4 CONCEPTUALINFORMATION RETRIEVAL A ease Study in Adaptive Partial Parsing by Michael L Mauldin Carnegie Mellon University with a foreword by Jaime G. Carbonell ~. " SPRINGER SCIENCE+BUSINESS MEDIA, LLC oe Library Congress Cataloging.in.Publication Data Mauldin, Michael L., 1959- Conceptual information retrieval : a case study in adaptive partial parsing / by Michael L. Mauldin. p. cm. --(The Kluwer international series in engineering and computer science; interna!. vo!. # 152. Naturallanguage processing and machine translation) Includes bibliographical references and index. ISBN 978-1-4613-6790-1 ISBN 978-1-4615-4004-5 (eBook) DOI 10.1007/978-1-4615-4004-5 1. Natural language processing (Computer science) 2. FERRET (Information retrieval system) 3. Computationallinguistics. I. Title. II. Series: Kluwer international series in engineering and computer science; SECS 152. 1II. Series: Kluwer international series in engineering and computer science. Natural language processing and machine translation. QA76.9.N38M38 1991 006.3'5--dc20 91-27108 CIP Copyright © 1991 by Springer Science+Business Media New York Originally published by Kluwer Academic Publishers in 1991 Softcover reprint ofthe hardcover 1st edition 1991 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photo-copying, recording, or otherwise, without the prior wrilten permission of the publisher, Springer Science+Business Media,LLC. Printed on acid-free paper. To my wife Michelle TABLE OF CONTENTS LIST OF FIGURES xi FOREWORD xv Jaime Carbonell PREFACE xvii ACKNOWLEDGMENTS xix 1. INTRODUCTION 1 1.1. Motivation 1 1.2. The Keyword Barrier 2 1.3. Beyond the Keyword Barrier: Conceptual Matching 6 1.4. Definitions 8 1.5. Perfonnance Measures for Infonnation Retrieval 11 1.6. Goals ofthe FERRETproject 12 1.7. Learning 14 1.8. ChapterOverview 14 2. CURRENT APPROACHES 17 2.1. Postcoordinate Systems 18 2.2. Precoordinate Systems 24 2.3. Knowledge Rich Representations for Retrieval 27 2.4. Problems with the State ofthe Art 35 2.5. What Kinds ofLearning are Possible 36 VIU CONCEPTUALINFORMAnONRETRIEVAL 3. CONCEPTUAL UNDERSTANDING OF TEXT 39 3.1. NaturalLanguageParsing 39 3.2. CaseFramesforUnderstanding 42 3.3. FRUMP 46 4. THE FERRET SYSTEM SS 4.1. Architectural Overview 55 4.2. Scanner 69 4.3. TheMcFRUMPParser 72 4.4. DictionaryInterface 82 4.5. CaseFrame Matching 83 4.6. UserInterface 86 4.7. Summary 86 S. SCRIPT LEARNING 89 5.1. Approaches to Machine Learning 89 5.2. Tools for Learning: Genetic Algorithms 91 5.3. Design ofLearning Component 95 5.4. ExampleofNew Script Being Learned 102 5.5. Caveat 106 5.6. Summary 106 6. EMPIRICAL STUDIES 109 6.1. Domains for Studying Retrieval 109 6.2. ParsingEffectiveness 115 6.3. EffectofUsing Dictionary 116 6.4. FERRETversus Boolean Keyword Query 118 6.5. Possible Sources ofError 130 7. CONCLUSION 133 7.1. Objectives Met 133 7.2. Applications 135 7.3. FutureWork 141 7.4. Conclusion 151 TABLEOFCONTENTS ix Appendix A. SAMPLE DATA STRUCTURES 153 A.l. SampleTexts 153 A.2. SampleTexts afterScanning 158 A.3. SampleLexicalEntries 161 A.4. Webster's Seventh Dictionary 163 A.5. Sample Sketchy Scripts 167 A.6. Sample Texts afterParsing 168 A.7. Sample Retrieval 169 Appendix B. USER QUERIES USED IN THE 171 STUDIES B.l. Sample User Survey Form 171 B.2. UserQueries Usedin the Study 173 Appendix C. RAW DATA FOR EMPIRICAL 177 STUDIES Appendix D. THE NUMBER AND DATE 181 GRAMMARS REFERENCES 193 INDEX 211 LIST OF FIGURES Figure 1-1: Synonymy and Polysemy 3 Figure 1-2: The Keyword Barrier 5 Figure 1-3: Canonicality ofCD representation improves 7 recall Figure 2-1: An Example ofInverted Files 20 Figure 2-2: MajorDivisionsofthe Dewey Decimal System 25 Figure 2-3: AND/OR tree for text classification 30 Figure 3-1: Sample Input Story for FRUMP 47 Figure 3-2: FRUMP'Soutput 48 Figure 3-3: Portion ofan SSIDT 52 Figure 4-1: SimpleDiagram ofFERRET 56 Figure 4-2: Relevance Homomorphism Assumption 57 Figure 4-3: ASample STARDATEText 59 Figure 4-4: AnnotatedTraceofa Sample ParserRun 60 Figure 4-5: Sample Retrieval by Script 65 Figure 4-6: Sample Retrieval by CD graph 67 Figure 4-7: Requesting an Abstract 68 Figure 4-8: Sample Scanned Text 70 Figure 4-9: Sample ScannerOutput 72 Figure 4-10: The MCFRUMPParser 74 Figure 4-11: Types ofLexical Information in McFRuMP 77 Figure 4-12: Part ofthe MCFRUMP IS-A Hierarchy 77 Figure 4-13: The Original FRUMPVehicle AccidentScript 80 xii CONCEPTUALINFORMAnONRETRIEVAL Figure 4-14: The McFRUMP Vehicle Accident Script 81 Figure 4-15: APortion ofFERRET'sSSmT 81 Figure 4-16: Sample Inputand OutputofDictionary 84 Lookup Figure 4-17: FERRETwithoutLearning 87 Figure 5-1: R •A Single ReplacementGenetic Algorithm 93 I Figure 5-2: Crossoveras Sexual Reproduction 94 Figure 5-3: Learning Component Architecture 95 Figure 5-4: New ScriptHypothesis Algorithm 99 Figure 5-5: Hypothetical ExampleofScript Specialization 100 Figure 5-6: Hypothetical Example ofScript Crossover 100 Figure 5-7: The Original $LAUNCH Script 103 Figure 5-8: Spacecraft vs Telescope in the IS-A Hierarchy 103 Figure 5-9: The Learned $LAUNCH Script 104 Figure 5-10: Retrieval using newly learned Launch script 105 Figure 5-11: FERRETplus Learning 107 Figure 6-1: Sample Dow Jones News Story 110 Figure 6-2: FERRET's definitions for''rise" 111 Figure 6-3: Parsing the Sample Dow Jones Story 112 Figure 6-4: Astronomy story using "rise" 113 Figure 6-5: Parsing the Astronomy Story 113 Figure 6-6: ParsingEffectiveness 116 Figure 6-7: EffectofDictionary with time limits 117 Figure 6-8: EffectofDictionary without time limits 118 Figure 6-9: Faking a query parser with McFRUMP 121 Figure 6-10: Ferretversus Keywords -Training Set 122 Figure 6-11: Retrieval Performance without Learning - 122 Training Set Figure 6-12: Ferretversus Keywords -Evaluation Set 123 Figure 6-13: Retrieval Performance without Learning - 123 Evaluation Set Figure 6-14: EffectofScriptLearning -Training Set 126 Figure 6-15: Retrieval Performance with Learning 126 Figure 6-16: EffectofScript Learning -Evaluation Set 127

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.