ebook img

Overcoming Challenges in Corpus Construction: The Spoken British National Corpus 2014 PDF

221 Pages·2020·2.276 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Overcoming Challenges in Corpus Construction: The Spoken British National Corpus 2014

Overcoming Challenges in Corpus Construction This volume offers a critical examination of the construction of the Spoken British National Corpus 2014 (Spoken BNC2014) and points the way forward toward a more informed understanding of corpus linguistic methodology more broadly. The book begins by situating the creation of this second corpus, a compilation of new, publicly- accessible Spoken British English from the 2010s, within the context of the first, created in 1994, talking through the need to balance backward capability and optimal practice for today’s users. Chapters subsequently use the Spoken BNC2014 as a focal point around which to discuss the various considerations taken into account in corpus construction, including design, data collection, transcription and annotation. The volume concludes by reflecting on the successes and limitations of the project, as well as the broader utility of the corpus in linguistic research, both in current examples and future possibilities. This exciting new contribution to the literature on linguistic methodology is a valuable resource for students and researchers in corpus linguistics, applied linguistics and English language teaching. Robbie Love is Research Fellow in Applied and Corpus Linguistics at the University of Leeds, UK. Routledge Advances in Corpus Linguistics Edited by Tony McEnery Lancaster University, UK Michael Hoey Liverpool University, UK Understanding Metaphor through Corpora A Case Study of Metaphors in Nineteenth Century Writing Katie J. Patterson TESOL Student Teacher Discourse A Corpus-Based Analysis of Online and Face-to-Face Interactions Elaine Riordan Corpus Approaches to Contemporary British Speech Sociolinguistic Studies of the Spoken BNC2014 Edited by Vaclav Brezina, Robbie Love and Karin Aijmer Multilayer Corpus Studies Amir Zeldes Self-Reflexive Journalism A Corpus Study of Journalistic Culture and Community in The Guardian Anna Marchi The Religious Rhetoric of U.S. Presidential Candidates A Corpus Linguistics Approach to the Rhetorical God Gap Arnaud Vincent Triangulating Corpus Linguistics with other Linguistic Research Methods Edited by Jesse Egbert and Paul Baker Overcoming Challenges in Corpus Construction The Spoken British National Corpus 2014 Robbie Love For more information about this series, please visit: https://www.routledge.com/ Routledge-Advances-in-Corpus-Linguistics/book-series/SE0593 Overcoming Challenges in Corpus Construction The Spoken British National Corpus 2014 Robbie Love First published 2020 by Routledge 52 Vanderbilt Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2020 Taylor & Francis The right of Robbie Love to be identified as author of this work has been asserted by him in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging- in- Publication Data A catalog record for this book has been requested ISBN: 978- 1- 138- 36737- 1 (hbk) ISBN: 978- 0- 429- 42981- 1 (ebk) Typeset in Sabon by Apex CoVantage, LLC For Mum and Dad. Contents List of Tables ix List of Figures xi Foreword xiii Preface xvii Acknowledgments xviii 1 Introduction 1 PART I Before Corpus Construction: Theory and Design 7 2 Why a New Spoken BNC and Why Now? 9 3 Theoretical Challenges in Corpus Design 30 PART II During Corpus Construction: Theory Meets Practice 49 4 Challenges in Data Collection 51 5 Challenges in Transcription, Part I – Conventions 103 6 Challenges in Transcription, Part II – Who Said What? 136 7 Challenges in Corpus Processing and Dissemination 165 viii Contents PART III After Corpus Construction: Evaluating the Corpus 179 8 Evaluating the Spoken BNC2014 181 9 Conclusions and Further Construction Work 192 Index 201 Tables 2.1 Proportion of selected articles relative to the total number of articles published by each journal between 1994 and 2016. 16 3.1 Comparison of Spoken BNC1994 token counts to the UK population in 1994 (ONS, 2019). 37 4.1 Number of words categorised as ‘unknown’ or ‘info missing’ for the three main demographic categories in the Spoken BNC1994DS and the Spoken BNC2014. 61 4.2 National Readership Survey Social Grade Classifications (NRS, 2014). 79 4.3 Social Class based on Occupation (SC) (Stuchbury, 2013b). 80 4.4 Socio- economic group (SEG) (Stuchbury, 2013a). 81 4.5 The nine major analytic classes of the NS- SEC (ONS, 2010c). 82 4.6 Mapping between the NS- SEC and Social Grade assumed for Spoken BNC2014 speaker metadata. 85 6.1 Speaker metadata for the gold standard recording. 144 6.2 Distribution of code types in the Spoken BNC2014 transcripts. 145 6.3 Total distribution of code types for the eight Spoken BNC2014 transcripts. 147 6.4 Distribution of code types in the gold standard transcripts. 147 6.5 Total distribution of code types for the eight gold standard test transcripts. 149 6.6 Categories of speaker ID code for which agreement between the gold standard and test transcripts could occur. 150 6.7 Inter- rater agreement (i.e. accuracy) of speaker identification between the test transcripts and the gold standard transcript. 150 6.8 Accuracy of the phonetician’s transcript compared to the gold standard test transcript extracts. 154 6.9 Frequency of corpus texts per number of speakers per recording in the Spoken BNC2014. 155

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.