ebook img

Corpus-Based and Computational Approaches to Discourse Anaphora PDF

265 Pages·2000·1.418 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Corpus-Based and Computational Approaches to Discourse Anaphora

corpus-based and computational approaches to discourse anaphora scl.3.vw.p65 1 09/05/00, 3:44 PM SCL Studies in Corpus Linguistics Studies in Corpus Linguistics aims to provide insights into the way a corpus can be used, the type of ¼ndings that can be obtained, the possible applications of these ¼ndings as well as the theoretical changes that corpus work can bring into linguistics and language engineering. The main concern of SCL is to present ¼ndings based on, or related to, the cumulative e¤ect of naturally occuring language and on the interpretation of frequency and distributional data. General Editor Elena Tognini-Bonelli Consulting Editor Wolfgang Teubert Advisory Board Michael Barlow (Rice University, Houston) Robert de Beaugrande (UAE) Douglas Biber (North Arizona University) Wallace Chafe (University of California) Stig Johansson (Oslo University) M.A.K. Halliday (University of Sydney) Graeme Kennedy (Victoria University of Wellington) John La¤ling (Herriot Watt University, Edinburgh) Geo¤rey Leech (University of Lancaster) John Sinclair (University of Birmingham) Piet van Sterkenburg (Institute for Dutch Lexicology, Leiden) Michael Stubbs (University of Trier) Jan Svartvik (University of Lund) H-Z. Yang (Jiao Tong University, Shanghai) Antonio Zampolli (University of Pisa) Volume 3 Simon Botley and Anthony Mark McEnery (eds) Corpus-based and Computational Approaches to Discourse Anaphora scl.3.vw.p65 2 09/05/00, 3:44 PM Corpus-based and Computational Approaches to Discourse Anaphora Edited by SIMON BOTLEY ANTHONY MARK McENERY john benjamins publishing company amsterdam / philadelphia scl.3.vw.p65 3 09/05/00, 3:44 PM TM The paper used in this publication meets the minimum requirements 8 of American National Standard for Information Sciences — Permanence of Paper for Printed Library Materials, ANSI Z39.48–1984. Cover design: Françoise Berserik Cover illustration from original painting Random Order by Lorenzo Pezzatini, Florence, 1996. Library of Congress Cataloging-in-Publication Data Corpus-based and computational approaches to discourse anaphora / edited by Simon Botley, Anthony Mark McEnery. p. cm. -- (Studies in corpus linguistics, ISSN 1388-0373 ; v. 3) Includes bibliographical references and indexes. 1. Anaphora (Linguistics)--Data processing. 2. Discourse analysis--Data processing. I. Botley, Simon. II. McEnery, Tony, 1964- . III. Series. P299.A5C675 1999 401’.41'0285--dc21 99-43484 ISBN 90 272 2272 X (Eur.) / 1 55619 397 1 (US) (alk. paper) CIP © 2000 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, micro¼lm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O.Box 75577 · 1070 an amsterdam · The Netherlands John Benjamins North America · P.O.Box 27519 · Philadelphia pa 19118-0519 · usa scl.3.vw.p65 4 09/05/00, 3:44 PM Table of Contents 1. Discourse anaphora: The need for synthesis 1 SimonBotleyandTonyMcEnery 2. Demonstrative expressions in argumentative discourse: A computer corpus-based comparison of non-native and native English 43 StephaniePetch-Tyson 3. Is it possible to predetermine a referent included in a French N De N structure? 65 DidierBaltazartandLaurenceKister 4. A corpus-based study of anaphora in English and Portuguese 81 MarcoRocha 5. Conversational strategies using full NP anaphors 95 HusseinShokouhi 6. Some uses of demonstratives in spoken Swedish 107 EvaLindström 7. Pronoun resolution: The practical alternative 129 RuslanMitkov 8. Quantitative evaluation of coreference algorithms in an information extraction system 145 RobertGaizauskasandKevinHumphreys 9. Anaphoric reference and ellipsis resolution in a telephone-based spoken language system for accessing email 171 SandraWilliams 10. Processing definite descriptions in corpora 189 RenataVieiraandMassimoPoesio vi TABLEOFCONTENTS 11. Indirect reference in Japanese sentences 213 MasakiMurataandMakotoNagao 12. Generating coreferential anaphoric definite NPs 227 AgnèsTutinandEvelyneViegas Name Index 249 Subject Index 253  C 1 Discourse anaphora The need for synthesis Simon Botley Tony McEnery 1.1 Introduction Thisbookisconcernedwithanaphora—aphenomenonwhichhasgivenriseto agreatdealofintellectualactivityinseveralfields,notablylinguistics,computa- tional linguistics and cognitive science. Because of the diversity of approaches these fields bring to the anaphora problem, we feel that a synthesis, or at least a principledattempttodraw the differingstrandsof anaphora researchtogether,is long overdue. Hence this book. 1.1.1 Aboutthisbook Broadly speaking, we can divide this book between empirical descriptive chapters, which make arguments based on corpus evidence (Chapters 2, 3, 4, 5 and 6), and those chapters which describe computer systems for automatically processinganaphorsintexts(Chapters7,8,9,10,11and12).Therearechapters which look at anaphora in spoken language (for example Chapters 4, 5 and 6) andthosewhichdealwithwrittentexts(forexampleChapters2,3,andmanyof the chapters dealing with computer systems). Not every chapter deals with English only — anaphora in various languages such as Persian (Chapter 5), Swedish (Chapter 6) and Japanese (Chapter 11) are also examined, introducing an important multilingual dimension to the book. Thechaptersinthisbookeachhaveashortprefacewhichsetsthecontribu- tioninitsplacewithinthebook,aswellasorientingthereaderconcerningwhat evidence and arguments are used, as well as the chapter’s methodological and theoretical framework. The rest of this introductory chapter is concerned with placing the work in 2 SIMONBOTLEYANDTONYMCENERY this volume in context. There will follow a detailed review of the literature on discourse anaphora, taking into account a number of major strands represented inthisbook—linguistic,cognitive,computationalandcorpus-basedapproaches. After this review, we argue that what is needed is a synthesis covering all of these approaches. 1.2 Whatisdiscourseanaphora? Anaphoraallowsaspeaker/writertorecalltotheconsciousnessofahearer/reader entities or concepts that have already been introduced into a discourse. In English, for instance, anaphora can be realised by many different linguistic markers,suchas pronouns,demonstrativepronouns,pronominalsubstitutionsor ellipses. Some English examples, from the APHB1 corpus and the Lancaster Anaphoric Treebank2 will illustrate some of these different types of anaphora. Antecedents are underlined, and the anaphors are in bold italic type:3 Anaphorainvolvingpronouns: (1) A tall woman in a long rustling gown appeared. “Hotchkiss!” she said in a hushed but concerned voice. Anaphorainvolvingdemonstrativedeterminers: (2) “It is no great matter to me,” Hotchkiss concluded, “for I had only the wages of my Portland engagement, and that was no great sum, I assure you”. Anaphorainvolvingpronominalsubstitution: (3) About 3,500 anti-Klu Klux Klan demonstrators, some carrying pictures of five persons slain three months ago in a “Death to the Klan”rally,marchedthroughGreensboroSaturdayinfrigidweather. Anaphorainvolvingellipsis: (Ellipsis marked by 0) (4) “The Groundhog may be in the hole, but Steeler fans are not 0,” Mayor Richard Caliguiri told the crowd… Ascanbeseenfromtheexamplesabove,anaphoraisaphenomenonthatisboth syntacticanddiscoursalinnature.Thisdistinctionisparticularlyimportantinthis book, as some chapters deal with anaphora between sentences (inter-sentential anaphora) and anaphora within sentences (intra-sentential anaphora). Note, however, that the way in which anaphora is realised varies in different languag- es, as is seen in some chapters of this book. DISCOURSEANAPHORA 3 1.3 Theimportanceofanaphora Anaphorahasreceivedagreatdealofattentionfromlinguists,cognitivepsychol- ogists, philosophers and computer scientists. Firstly, in linguistics, anaphoric phenomenaareofinterest becausethey tellusabouthowdiscourseisconstruct- ed and maintained — how linguistic patterning above and beyond the sentence is arranged. Secondly,atthesentencelevel,anaphoricfeaturesfunctiontobindstructur- al elements together, and therefore can play an important role in the syntactic description of languages. The treatment of anaphora within linguistics is an importantelementofthisbook,especiallywithregardtocorpus-basedlinguistics (McEnery & Wilson 1996). With this in mind, Chapters 2–6 of this book deal with corpus-based descriptive approaches to anaphora. Thirdly, aswell asbeingofinterest tolinguists,anaphorahasbeenofgreat interest to computational linguists, because of the immense challenges that anaphorapresentsfornaturallanguageprocessing.Identifyingthecorrectormost probable antecedent of an anaphoric proform is difficult for a computer to achieve, and many complex algorithms have been proposed and implemented, some of which are reviewed in this chapter. However, for now, a computer system that can resolve all anaphors is effectively some way off. Finally, cognitive psychology, and those branches of linguistics that share many of the aims and assumptions of cognitive psychology, are also interested in anaphora because it tells us some things about how language is understood, and processed. In particular, such workers are interested in how anaphoric phenomena function to recall concepts in discourse to the consciousness of the hearerorreader,tofacilitate understanding.Some chaptersinthisbook,notably chapters 3, 4 and 5, make reference to these cognitive approaches to anaphora, though within a corpus-based linguistics framework. Such studies show the relevance of corpus-based work in evaluating such theories of anaphora (though see Botley 1999). Nowthatthevariousreasonsforinterestinanaphorahavebeenoutlined,we will give a more detailed overview of the field, in order to give some historical contexttothis book,andtofurtherillustratetheneedforasynthesisofdifferent approaches to discourse anaphora. The next three sections will deal in turn with approaches to anaphora within linguistics, within computational linguistics and computerscience,andfinallywithincorpus-linguistics.Afterthisreview,wewill be able to more clearly justify the importance of this book, and will present an outline of the chapters of this book, to show how they relate to the wider issues discussed in this review.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.