ebook img

The Use of Databases in Cross-Linguistic Studies PDF

415 Pages·5.464 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Use of Databases in Cross-Linguistic Studies

The Use of Databases in Cross-Linguistic Studies ≥ Empirical Approaches to Language Typology 41 Editors Georg Bossong Bernard Comrie Yaron Matras Mouton de Gruyter Berlin · New York The use of Databases in Cross-Linguistic Studies edited by Martin Everaert Simon Musgrave Alexis Dimitriadis Mouton de Gruyter Berlin · New York MoutondeGruyter(formerlyMouton,TheHague) isaDivisionofWalterdeGruyterGmbH&Co.KG,Berlin. (cid:2)(cid:2)Printedonacid-freepaperwhichfallswithintheguidelinesofthe ANSItoensurepermanenceanddurability. LibraryofCongressCataloging-in-PublicationData Theuseofdatabasesincross-linguisticstudies/editedbyMartinEveraert, SimonMusgrave,AlexisDimitriadis. p.cm.(cid:2) (Empiricalapproachestolanguagetypology;41) Includesbibliographicalreferencesandindex. ISBN978-3-11-019308-4(hardcover:alk.paper) 1.Linguistics(cid:2)Databases. 2.Typology(Linguistics) I.Everaert, Martin. II.Musgrave,Simon. III.Dimitriadis,Alexis,1963(cid:2) P128.D37U742009 4101.285574(cid:2)dc22 2009004481 BibliographicinformationpublishedbytheDeutscheNationalbibliothek TheDeutscheNationalbibliothekliststhispublicationintheDeutscheNationalbibliografie; detailedbibliographicdataisavailableintheInternetathttp://dnb.d-nb.de. ISBN 978-3-11-019308-4 ISSN 0933-761X ©Copyright2009byWalterdeGruyterGmbH&Co.KG,D-10785Berlin. Allrightsreserved,includingthoseoftranslationintoforeignlanguages.Nopartofthisbook may be reproduced or transmitted in any form or by any means, electronic or mechanical, includingphotocopy,recordingoranyinformationstorageandretrievalsystem,withoutper- missioninwritingfromthepublisher. PrintedinGermany. Contents Introduction.......................................................... 1 Simon Musgrave, Alexis Dimitriadis and Martin Everaert Designing linguistic databases: A primer for linguists .................. 13 Alexis Dimitriadis and Simon Musgrave A typological database of personal and demonstrative pronouns......... 77 Heather Bliss and Elizabeth Ritter Databases designed for investigating specific phenomena.............. 117 Dunstan Brown, Carole Tiberius, Marina Chumakina, Greville Corbett and Alexander Krasovitsky How to integrate databases without starting a typology war: The Typological Database System................................... 155 Alexis Dimitriadis, Menzo Windhouwer, Adam Saulwick, Rob Goedemans and Tamás Bíró A contribution to ‘two-dimensional’ language description: The Typological Database of Intensifiers and Reflexives.............. 209 Volker Gast StressTyp: A database for word accentual patterns in the world’s languages .................................................. 235 Rob Goedemans and Harry van der Hulst The typological database of the World Atlas of Language Structures... 283 Martin Haspelmath Typology of reduplication: The Graz database........................ 301 Bernhard Hurch and Veronika Mattes The Romani Morpho-Syntax (RMS) database........................ 329 Yaron Matras, Christopher White and Viktor Elšík A database on personal pronouns in African languages ............... 363 Guillaume Segerer Contributors........................................................ 391 Index of subjects ................................................... 394 Index of languages.................................................. 401 Index of persons.................................................... 406 Introduction Simon Musgrave, Alexis Dimitriadis and Martin Everaert 1. Using databases in cross-linguistic research The core of modern (formal and functionalist/cognitive) linguistic theory is concerned with phenomena that we find in many, if not all languages of the world. Cross-linguistic comparison has firmly established that linguistic representations do not in fact vary randomly across languages, and over the last twenty years, theoretical linguistics, psycholinguistics and computa- tional linguistics have become more and more sensitive to cross-linguistic variation. However, this cross-linguistic orientation is still disproportionately focused on languages that are well-known and well-described, but are un- likely to be fully representative of linguistic diversity as manifested in the world’s more than 6000 languages. As theoretical treatises incorporate more and stronger claims about the limits of variation, the need for the systematic evaluation of theories goes hand in hand with the need for more systematic language data. This holds true for research where the difference between language in its various modalities is relevant, but also for research requiring knowledge of the types of variation found among human languages. For functional and formal typologists alike, a systematic body of collected data on languages is essential in order to gain a proper understanding of what is truly universal in language and what is determined by specific cultural set- tings. And such data sets are increasingly needed in order to make it possi- ble to systematically evaluate contrasting theoretical claims. Databases are an ideal tool for supporting these activities. Electronic databases have long been used as a tool in linguistic research. Nerbonne (1998) presented a collection of papers reporting work to that date, and also pointed to even earlier work in phonetics and psycholinguistics (Liberman 1997; McWhinney 1995). More recent workshops on the use of databases include those organised by the Institute for Research in Cognitive Science (University of Pennsylvania) in 2001 (IRCS 2001), by the Language Typology Resource Centre EU-project at CIL 17 (Prague) in 2003, and by E-MELD in 2004 (E-MELD 2004). At a more specialised level, we also note that a volume of the journal Sign Language & Linguistics was devoted in 2 Simon Musgrave, Alexis Dimitriadis and Martin Everaert part to the particular issues which arise in dealing with sign language data in databases (Bergman et al. 2001). Linguists’ interest in the use of databases is easily understood; as Nerbonne points out, the amount of linguistic data which exists is enormous and the use of computational tools can make han- dling these amounts of data significantly easier. A database is a general-purpose data-management tool, and can support a limitless variety of data-oriented enterprises; but while large language cor- pora, experimental apparatus, statistical analysis programs, and other similar applications can make use of databases, the focus of this book is what we might call the “pure” cross-linguistic research database: A body of collected linguistic research data with a user interface whose purpose is to present that data. Numerous typological databases have been developed by researchers in the field, often for personal or small-group use. Increasingly, these data- bases are being made available to the linguistic community over the Internet, providing the potential for enormous increases in the power of exploratory typological investigation. The fit between database technology and the interests and needs of cross- linguistic researchers is especially close. An electronic database is an obvi- ous and appropriate tool: it can store the attribute values (= grammatical properties) of entities (= languages, constructions, etc.), and execute queries which recover information about the entities meeting a set of criteria. Once an adequate amount of data has been entered into the database for a work- able sample of languages, the researcher can quickly and easily find out which languages in the sample have any combination of the described properties. A well-designed database can simplify the process of collecting and managing data, reduce the chance of errors, and greatly facilitate analy- sis and presentation of the collected data. Just as important, data in elec- tronic form can be easily shared with others over the internet; all of the data- bases described in this book are now accessible over the internet. The above description is neutral with regard to the type of research being undertaken. Most, if not all, of the research described in this volume falls under the label of linguistic typology, but similar strategies and procedures have been employed with great success in other areas, such as comparative linguistics. Researchers in that area are often interested in what form a lan- guage uses to express a particular meaning, and this information can easily be coded as an attribute of the language, that is a wordlist can be seen as a simple database structure. An easily accessible example of a wordlist data- base is the Austronesian Basic Vocabulary Database (Greenhill, Gray and Blust 2003–2008, the website includes links to publications utilising the dataset), and more general information about techniques for utilising such Introduction 3 data can be found in Kessler (2001) and in A.McMahon and R.McMahon (2005). In addition to the practical advantages which come from the use of elec- tronic databases, we would suggest that there can also be conceptual advan- tages. As with any type of computational modelling in linguistics, design- ing a database imposes a certain level of rigour and explicitness on our con- ceptualisation of the domain. A database embodies a very specific abstract model of some portion of reality, and the process of designing that model will inevitably lead the researchers to the consideration of questions about their view of that reality which might otherwise have remained in the back- ground. Answering those questions will be an important step in designing the database, and in many cases the process will continue in a cyclical fash- ion as the data which is entered into the database will provide new insights into the questions asked in the design stage and these insights may lead to further refinements in how the domain is modelled (and possibly concomi- tant changes to the database design – often a troublesome development!). Entering data into a database can be laborious, compared to informal records in a notebook or computer text document, but this is precisely because of the greater degree of precision that the structure of the database imposes. Of course a poorly designed database could ask the wrong questions, pro- vide the wrong choice of answers, or fail to provide a place for storing im- portant information; but a successful database will enhance, rather than im- pede, the quality and efficiency of data collection and utilization. In the next section of this introduction, we consider in detail how a lin- guistic domain can be conceptualised in different ways by different re- searchers by way of a comparison of two of the databases described in this volume. The section after that gives some thoughts on the (sometimes) vexed question of whether designing and populating a database should be considered a legitimate research activity, and whether it should be recog- nised as such by the institutions of academia. In the final section, we briefly introduce the chapters which make up the volume. 2. Databases as models of reality We suggested earlier that a database is an abstract model of some portion of the world, that portion that is of interest to the creators of the database. In the contribution of Dimitriadis and Musgrave, there is an introduction to one well-accepted way of constructing such an abstraction. There is, how- ever, no procedure which, given sufficient information about the relevant 4 Simon Musgrave, Alexis Dimitriadis and Martin Everaert domain, will produce an optimal design modelling the domain. Rather, the theoretical commitments of the creators and the nature of the data which they take to be of interest will play a large part in influencing the design which is adopted. We will illustrate this point with a detailed examination of the way in which two different databases model a rather small domain of linguistic reality, pronoun systems. The two databases in question are Les marques personelles dans les langues africaines (MPLA for short; Segerer, this volume) and the Calgary Pronoun Database (CPD; Bliss and Ritter, this volume). Pronoun systems are, to quote Segerer, “in some respects ideal candi- dates for typological studies: they form closed sets with strong structural organization”. Nevertheless, there are interesting differences in the way in which the two databases model the domain, both in terms of the categories which are considered important, and also in terms of the possible values which can apply in the categories. The similarities and differences are summarised in Table 1. Table 1. A comparison of the encoding schemes of the Calgary Pronoun Database and the database Marques personelles dans les langues africaines. CPD MPLA Person 1st -speaker Person 1 inclusive – speaker 2 + addressee 3 2nd – addressee 3rd – other 4th – another Number singular Number s(ingular) dual p(lural) trial d(ual) paucal plural general Gender (open class) Specification animacy gender inclusivity/exclusivity definiteness logophoricity Case (open class) Function Tonic Subject Object Possessive Reflexive Formality (open class)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.