CORPUS APPROACHES TO DISCOURSE ‘In this welcome new contribution to the emerging field of meta-reflection on corpus linguistic methodology, the editors expertly bring together a range of stud- ies that address “dusty corners” (for example, neglected aspects such as similarity and absence) and “blind spots” (for example, the omission of non-verbal elements) as well as the pitfalls of research design. A convincing argument for accountability, self-reflexivity and triangulation, this volume is a must-read for any researcher in corpus-based discourse analysis and corpus linguistics more generally.’ Monika Bednarek, University of Sydney, Australia ‘Readers interested in corpora and discourse will find that this book is a new departure. Insightful and thought-provoking, each chapter deals with a key meth- odological issue. Under the headings of “dusty corners”, “blind spots” and “pit- falls”, the volume addresses under-researched topics, genres and analytical tools. It is a prime example of self-reflexive research.’ Gerlinde Mautner, Vienna University of Economics and Business, Austria Corpus linguistics has now come of age and Corpus Approaches to Discourse equips students with the means to question, defend and refine the methodology. Looking at corpus linguistics in discourse research from a critical perspective, this volume is a call for greater reflexivity in the field. The chapters, each written by leading author- ities, contain an overview of an emerging area and a case-study, presenting practical advice alongside theoretical reflection. Carefully structured with an introduction by the editors and a conclusion by leading researcher, Paul Baker, this is key reading for advanced students and researchers of corpus linguistics and discourse analysis. Charlotte Taylor is Senior Lecturer at the University of Sussex. She is author of Mock Politeness in English and Italian (2016), c o-author of Patterns and Meanings in Discourse (with Alan Partington and Alison Duguid, 2013) and The Language of Persuasion in Politics (with Alan Partington, 2017) and co-editor of Exploring Silence and Absence in Discourse (with Melani Schroeter, 2018). Anna Marchi is an Adjunct Lecturer at the University of Bologna She is the author of Self-reflexive Journalism: A Corpus Study of Journalistic Culture and Community in The Guardian (Routledge, forthcoming). This page intentionally left blank CORPUS APPROACHES TO DISCOURSE A Critical Review Edited by Charlotte Taylor and Anna Marchi First published 2018 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 711 Third Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2018 selection and editorial matter, Charlotte Taylor and Anna Marchi; individual chapters, the contributors The right of the editors to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Names: Taylor, Charlotte, 1977- editor. | Marchi, Anna, (Professor of linguistics) editor. Title: Corpus approaches to discourse: a critical review / Charlotte Taylor and Anna Marchi [editors]. Description: Milton Park, Abingdon, Oxon; New York: Routledge, 2018. | Includes index. Identifiers: LCCN 2017043814| ISBN 9781138895782 (hardback) | ISBN 9781138895805 (pbk.) Subjects: LCSH: Corpora (Linguistics) | Discourse analysis. Classification: LCC P128.C68 C655 2018 | DDC 410.1/88–dc23 LC record available at https://lccn.loc.gov/2017043814 ISBN: 978-1-138-89578-2 (hbk) ISBN: 978-1-138-89580-5 (pbk) ISBN: 978-1-315-17934-6 (ebk) Typeset in Bembo by Deanta Global Publishing Services, Chennai, India CONTENTS List of figures vii List of tables x Contributors xii Acknowledgements xv 1 Introduction: partiality and reflexivity 1 Anna Marchi and Charlotte Taylor PART A Overlooked areas (checking the dusty corners) 17 2 Similarity 19 Charlotte Taylor 3 Absence: you don’t know what you’re missing. Or do you? 38 Alison Duguid and Alan Partington 4 Overlooked text types: from fictional texts to real-world discourses 60 Alon Lischinsky PART B Triangulation (identifying blind spots) 83 5 Analysing the multimodal text 85 Helen Caple vi Contents 6 Using multiple data sets 110 Sylvia Jaworska and Karen Kinloch 7 Interdisciplinary approaches in corpus linguistics and CADS 130 Clyde Ancarno PART C Research design (avoiding pitfalls/re-examining the foundations) 157 8 The role of the text in corpus and discourse analysis: missing the trees for the forest 159 Jesse Egbert and Erin Schnur 9 Dividing up the data: epistemological, methodological and practical impact of diachronic segmentation 174 Anna Marchi 10 Visualisation in corpus-based discourse studies 197 Laurence Anthony 11 Keyness analysis: nature, metrics and techniques 225 Costas Gabrielatos 12 Statistical choices in corpus-based discourse analysis 259 Vaclav Brezina 13 Conclusion: reflecting on reflective research 281 Paul Baker Index 293 FIGURES 2.1 Comparison of immigrant 24 2.2 Sketch Thesaurus for immigrant 25 2.3 Naming choices 30 2.4 Percentage of collocates of refugees 32 2.5 Shared collocates between the Times and Hansard 33 3.1 Mentions of the items Middle East and North Africa quarter-yearly in 2010 in the US newspapers New York Times and Washington Post 45 3.2 Mentions of the items Middle East and North Africa quarter-yearly in 2010 in the UK newspapers the Guardian and the Telegraph 45 4.1 References to characters per 10,000 words 71 4.2 References to body parts per million words 73 4.3 Pronouns per million words 73 4.4 References to body parts per thousand 3PS references 74 5.1 Instagram posts commenting multimodally on the size of the ballot paper 90 5.2 An Instagram post showing a classic democracy sausage and one showing an aberrant version 92 5.3 Number of posts collected using #ausvotes 93 5.4 Total number of Instagram posts per hashtag, June 30 to July 4, 2016 93 5.5 A topology for situating linguistic research 94 5.6 Examples of political party representations in images 100 5.7 Relational Database Interface for multimodal analysis of political affiliation in #ausvotes corpus 101 viii Figures 5.8 Fake medicare cards, a negative meme, and the Malcolm Turnbull Fizza poster all targeting the Liberal party 102 5.9 (Dis)affiliative strategies in Instagram posts made by members of the public 103 5.10 The semiotic distribution of affiliating strategies with the Greens 104 5.11 One screen taken from the Kaleidographic view of political affiliation in the #ausvotes corpus on Instagram 105 6.1 Semantic categories across contexts 121 6.2 Concordance lines of the pattern cause and depression in MEDLAY 124 6.3 Concordance lines of the pattern cause and depression in MEDIA 124 6.4 Concordance lines of the collocation cause and depression in MUMSNET 124 9.1 Diachronic plots of the briefings mentioning egypt*, libya* and syria* 182 9.2 Word cloud of keywords comparing Spring subset vs preceding unit 183 9.3 Word cloud of keywords comparison Spring subset vs following units 184 9.4 Word cloud of keywords comparing February subset vs preceding months 184 9.5 Word cloud of keywords comparing February subset vs Spring 185 9.6 Mentions of egypt*, libya* and syria* in the WHPB corpus, by month 185 9.7 Mentions of egypt*, libya* and syria* in the WHPB corpus, by briefing and area overlap for the months of January and February 186 9.8 Proportion of mentions of Mubarak preceded by President over total mentions, using months as time unit 187 9.9 Proportion of mentions of Mubarak preceded by President over total mentions, using weeks as time unit 187 9.10 Naming of the Libyan administration in the WHPB during the first months of 2011 188 9.11 Naming of the Syrian administration throughout the corpus 188 9.12 Proportion of mentions of Assad preceded by President over total mentions 189 9.13 Percentage of so-called mentions of arab spring in the CNN corpus on a timeline 192 10.1 Example of a Key-Word-In-Context display of language data appearing in the International Journal of Corpus Linguistics 199 Figures ix 10.2 Word cloud variations 201 10.3 Frequency of usage of data visualisations in the International Journal of Corpus Linguistics 202 10.4 Examples of four main categories of visualisation appearing in the International Journal of Corpus Linguistics 203 10.5 Example of a heat map showing the frequency of occurrence of data visualisation types in the International Journal of Corpus Linguistics 204 10.6 Time series charts showing frequency of occurrence of data visualisation types in the International Journal of Corpus Linguistics 205 10.7 Custom data visualisations produced using the R programming language appearing in the International Journal of Corpus Linguistics 206 10.8 Heat map visualisations produced through the corpus. byu.edu interface appearing in the International Journal of Corpus Linguistics 207 10.9 Variations of KWIC concordance displays 211 10.10 ‘KWIC pattern’ concordance displays 212 10.11 ‘Bar-graph’ dispersion plots for the word ‘this’ in three sub-categories of the Brown Corpus 214 10.12 ‘Bar-graph’ dispersion plots for the ‘he’–‘said’ collocate pair 215 10.13 Network (graph) visualisations of word–word collocation pairs in the script for Star Wars: A New Hope 216 10.14 Time-series histograms of the frequency of usage of ‘banking crisis’ in two corpora 217 10.15 Growth in usage of the word ‘Yassss’ in the US 219 10.16 Network maps of Twitter activity related to the two main candidates in the 2016 US Presidential Election 220 10.17 ‘Kaleidographic’ visualisation of news values 221 12.1 Passives in BE06 261 12.2 Development of the form ‘immigrants’ 1800–2000 262 12.3 Research design: key steps 264 12.4 Recycling material in newspapers 267 12.5 Collocates of ‘war’ in BE06 – newspapers 272 12.6 Collocation network: war, terror, troops, Iraq and civil in BE06 - newspapers 272 12.7 Collocates of war in BE06 – newspapers: MI score, log likelihood, log Dice and Delta 274 12.8 Output from #LancsBox 276 12.9 Passives in BE06 278