INVESTIGATING VOCABULARY IN ACADEMIC SPOKEN ENGLISH: CORPORA, TEACHERS, AND LEARNERS BY THI NGOC YEN DANG A thesis submitted to the Victoria University of Wellington in fulfilment of the requirements for the degree of Doctor of Philosophy in Applied Linguistics Victoria University of Wellington 2017 Abstract Understanding academic spoken English is challenging for second language (L2) learners at English-medium universities. A lack of vocabulary is a major reason for this difficulty. To help these learners overcome this challenge, it is important to examine the nature of vocabulary in academic spoken English. This thesis presents three linked studies which were conducted to address this need. Study 1 examined the lexical coverage in nine spoken and nine written corpora of four well-known general high-frequency word lists: West’s (1953) General Service List (GSL), Nation’s (2006) BNC2000, Nation’s (2012) BNC/COCA2000, and Brezina and Gablasova’s (2015) New-GSL. Study 2 further compared the BNC/COCA2000 and the New-GSL, which had the highest coverage in Study 1. It involved 25 English first language (L1) teachers, 26 Vietnamese L1 teachers, 27 various L1 teachers, and 275 Vietnamese English as a Foreign Language learners. The teachers completed 10 surveys in which they rated the usefulness of 973 non-overlapping items between the BNC/COCA2000 and the New- GSL for their learners in a five-point Likert scale. The learners took the Vocabulary Levels Test (Nation, 1983, 1990; Schmitt, Schmitt, & Clapham, 2001), and 15 Yes/No tests which measured their knowledge of the 973 words. Study 3 involved compiling two academic spoken corpora, one academic written corpus, and one non-academic spoken corpus. Each contains approximately 13-million running words. The academic spoken corpora contained four equally-sized sub-corpora. From the first academic spoken corpus, 1,741 word families were selected for the Academic Spoken Word List (ASWL). The coverage of the ASWL and the BNC/COCA2000 in the four corpora and the potential coverage of the ASWL for learners of different vocabulary levels were determined. Six main findings were drawn from these studies. First, in the first academic spoken corpus, the ASWL and its levels had slightly higher coverage in certain disciplinary sub-corpora than in the others. Yet, the list provided around 90% coverage of each sub- corpus. It helps learners to achieve 92%-96% coverage of academic speech depending on their levels. Second, the BNC/COCA2000 is the most suitable general high- frequency word list for L2 learners from the perspectives of corpus linguistics, teachers, and learners. It provided higher coverage than the GSL and the BNC2000, and had i more words known by learners and perceived as being useful by teachers than the New- GSL. Third, general high-frequency words, especially the most frequent 1,000 words, provided much higher coverage in spoken corpora than written corpora in both academic and non-academic discourse. Fourth, despite the importance of general high- frequency words, a reasonable proportion of the learners had insufficient knowledge of these words, which highlights the importance of a word list which is adaptable to learners’ proficiency like the ASWL. Fifth, lexical coverage had significant but small correlations with teacher perception of word usefulness and learner vocabulary knowledge. Sixth, the Vietnamese L1 teachers had the highest correlation between the teacher ratings of word usefulness and the learner vocabulary knowledge. Next came the various L1 teachers, and then the English L1 teachers. This thesis also provides theoretical, pedagogical, and methodological implications of these findings so that L2 learners can gain better support in their vocabulary development and achieve better comprehension of academic spoken English. ii Acknowledgements I would like to express my sincerest gratitude to my two supervisors, Dr. Averil Coxhead and Professor Stuart Webb, for their great patience, invaluable guidance, and generous support during my PhD. It is my great honour to have them as mentors. Working with them helps me become a stronger researcher and grow as a person. My special thanks to Emeritus Professor Paul Nation for helping me to achieve better insight into the nature of word list studies and the RANGE program. It is always a great pleasure to talk and learn from him about vocabulary research. I would like to express my heartfelt thanks to all the teacher and learner participants for their enthusiasm and insights during my project. Also, my great thanks to the following publishers and researchers for their generosity in letting me use their materials to create my corpora: Cambridge University Press, Pearson, Dr. Lynn Grant (Auckland University of Technology), Assistant Professor Michael Rodgers (Carleton University), the lecturers at Victoria University of Wellington, the researchers in the British Academic Spoken English corpus project, the British Academic Written English corpus project, the International Corpus of English project, the Massachusetts Institute of Technology Open courseware project, the Open American National corpus project, the Santa Barbara Corpus of Spoken American-English project, the Stanford Engineering Open courseware project, the University of California, Berkeley Open courseware project, and the Yale University Open courseware project. Without this support, it would be impossible for me to complete this thesis. I am especially indebted to Victoria University of Wellington for supporting my research financially in the form of Victoria Doctoral Scholarship, Postgraduate Research Excellence Award, Faculty Research Grant, and Victoria Doctoral Submission Scholarship. Parts of Section 2.8, Chapter 3, and Section 6.2 have been adapted for a joint authored article (with Professor Stuart Webb as the second author) under the title ‘Evaluating lists of high frequency words’ that appeared in the ITL International Journal of Applied Linguistics, 167(2) (2016), 132-158. I am very grateful to the Editor and the anonymous Reviewers of this journal for their useful feedback on the article. My sincere thanks go to Professor Laurence Anthony (Waseda University) and Dr. Anne O’Keeffe (University of Limerick) for suggesting some sources of academic iii spoken materials, Professor Hilary Nesi (Coventry University) for the useful information about the British Academic Spoken English Corpus and her Spoken Academic Word List, Dr. Lisa Wood (School of Mathematics and Statistics) and Dr. Vlav Brezina (University of Lancaster) for helping me understand more about the nature of some statistical formulas, and Dr. Deborah Laurs, Kirsten Reid, and Emma Rowbotham (Student Learning Support Service) for their useful advice on my writing and oral presentation skills. My warmest thanks to all the people in Vietnam and New Zealand who have helped and supported me so far, from family members, friends, officemates, Vocab Group members, Thesis Group members, and staff at the School of Linguistics and Applied Language Studies, Victoria University of Wellington, and colleagues at the University of Languages and International Studies, Vietnam National University, Hanoi. My deepest gratitude to Mum and Professor Stuart Webb, the two people who have the greatest influence on my direction of life and ways of thinking. I dedicate this work to them for their continual support, encouragement, trust, and guidance during these times and always. iv Table of Contents Abstract ............................................................................................................................ i Acknowledgements ........................................................................................................ iii Table of Contents ............................................................................................................ v List of Tables .................................................................................................................. xi List of Figures ............................................................................................................... xv List of Appendices ........................................................................................................ xv List of Abbreviations .................................................................................................. xvii Chapter 1 – Introduction ............................................................................................... 1 1.1. Why investigate vocabulary in academic spoken English? ................................... 1 1.2. Why investigate general high-frequency vocabulary when examining academic spoken English? ............................................................................................................ 2 1.3. Why investigate corpora, teachers, and learners? .................................................. 3 1.4. Aims and scope of the present research ................................................................. 4 1.5. Significance of the present research ...................................................................... 5 1.6. Organization of the thesis ...................................................................................... 5 Chapter 2 – Literature Review ...................................................................................... 7 2.1. Introduction ............................................................................................................ 7 2.2. What is counted as a word? ................................................................................... 7 2.2.1. Tokens and types ............................................................................................. 8 2.2.2. Lemmas and flemmas...................................................................................... 8 2.2.3. Word families .................................................................................................. 9 2.2.4. What is the most suitable unit of counting in a word list? ............................ 11 2.3. What is involved in knowing a word? ................................................................. 13 2.4. How to classify vocabulary in academic English? .............................................. 15 2.5. The nature of vocabulary in academic spoken English ....................................... 17 2.5.1. Vocabulary in academic spoken English ...................................................... 18 2.5.2. Vocabulary in academic written English ...................................................... 23 2.5.3. Summary ....................................................................................................... 28 2.6. Teacher perception in word list validation .......................................................... 28 2.6.1. Teacher perception in word list evaluation ................................................... 29 2.6.2. Teacher perception of lexical difficulty ........................................................ 31 2.6.3. Teacher perception of vocabulary learning and instruction in general ......... 32 2.7. Learner proficiency in word list development ..................................................... 34 2.7.1. L2 learners’ vocabulary knowledge .............................................................. 34 2.7.2. How are existing word lists adaptable to learners’ proficiency? .................. 38 v 2.7.3. Summary ....................................................................................................... 39 2.8. General high-frequency word lists ....................................................................... 39 2.8.1. What are existing general high-frequency word lists? .................................. 39 2.8.2. What is the best general high-frequency word list for L2 learners? ............. 47 2.8.3. What is the value of general high-frequency words in academic spoken English? ................................................................................................................... 50 2.8.4. Summary ....................................................................................................... 51 2.9. Summary of the chapter and rationale for the next chapter ................................. 52 Chapter 3 – Study 1: Lexical coverage of general high-frequency word lists ........ 57 3.1. Introduction .......................................................................................................... 57 3.2. Research questions ............................................................................................... 57 3.3. Methodology ........................................................................................................ 58 3.3.1. The word lists ................................................................................................ 58 3.3.2. The corpora ................................................................................................... 59 3.3.3. How to compare word lists of different units of counting ............................ 62 3.3.4. How to compare word lists with different numbers of items ........................ 64 3.3.5. Procedure ....................................................................................................... 65 3.4. Results .................................................................................................................. 66 3.4.1. Average coverage .......................................................................................... 68 3.4.2. Coverage provided by the most frequent headwords .................................... 68 3.5. Summary of main findings .................................................................................. 72 3.6. Rationale for the next study ................................................................................. 73 Chapter 4 – Study 2: Teacher perception of word usefulness and learner knowledge of general high-frequency words ............................................................. 75 4.1. Introduction .......................................................................................................... 75 4.2. Research questions ............................................................................................... 76 4.3. Methodology ........................................................................................................ 76 4.3.1. Teacher participants ...................................................................................... 76 4.3.2. Learner participants ....................................................................................... 77 4.3.3. Surveys .......................................................................................................... 78 4.3.4. Vocabulary Levels Test ................................................................................. 80 4.3.5. Yes/ No tests.................................................................................................. 83 4.3.6. Target words .................................................................................................. 85 4.3.7. Pseudowords.................................................................................................. 86 4.3.8. Procedure ....................................................................................................... 87 4.4. Results .................................................................................................................. 91 4.4.1. Survey results ................................................................................................ 91 vi 4.4.2. Vocabulary Levels Test results ..................................................................... 97 4.4.3. Yes/ No test results ...................................................................................... 101 4.4.4. Relationships between the corpus, teacher, and learner data ...................... 103 4.5. Summary of main findings ................................................................................. 106 4.5.1. The receptive vocabulary levels of Vietnamese EFL learners .................... 106 4.5.2. Teacher perception of word usefulness and learner knowledge of the BNC/COCA2000 and New-GSL words ............................................................... 107 4.5.3. Relationship between lexical coverage, teacher perception, and learner vocabulary knowledge........................................................................................... 108 4.5.4. The correlations of three groups of teachers with the learner vocabulary knowledge ............................................................................................................. 108 4.6. Rationale for the next study ............................................................................... 108 Chapter 5 – Study 3: Developing and validating an academic spoken word list .. 109 5.1. Introduction ........................................................................................................ 109 5.2. Research questions ............................................................................................. 110 5.3. Developing the two academic spoken corpora .................................................. 110 5.3.1. Materials selection for the two academic spoken corpora .......................... 111 5.3.2. The first academic spoken corpus ............................................................... 118 5.3.4. The second academic spoken corpus .......................................................... 124 5.4. Developing the academic written corpus and non-academic spoken corpus ..... 129 5.5. Determining the unit of counting for the ASWL ............................................... 132 5.5.1. Justifying the unit of counting ..................................................................... 132 5.5.2. Identifying the word families in the first academic spoken corpus ............. 133 5.5.3. Calculating the range, frequency and dispersion of the word families in the first academic spoken corpus ................................................................................ 136 5.6. Establishing the characteristics of the ASWL ................................................... 136 5.7. Determining the criteria for selecting the ASWL words ................................... 138 5.7.1. Setting the ASWL range criterion ............................................................... 139 5.7.2. Setting the ASWL frequency criterion ........................................................ 139 5.7.3. Setting the ASWL dispersion criterion ....................................................... 144 5.8. Developing and validating the ASWL ............................................................... 148 5.9. Determining the potential coverage that learners may reach by learning the ASWL ....................................................................................................................... 150 5.10. Results .............................................................................................................. 151 5.10.1. The most frequent, wide ranging, and evenly distributed words in academic speech .................................................................................................................... 151 5.10.2. Coverage of the ASWL in hard-pure, hard-applied, soft-pure, and soft- applied speech ....................................................................................................... 156 vii 5.10.3. Coverage of the ASWL in academic speech, academic writing, and non- academic speech .................................................................................................... 157 5.10.4. General high-frequency words in academic spoken, non-academic spoken English, and written English ................................................................................. 158 5.10.5. General high-frequency words and academic written words in the ASWL ............................................................................................................................... 159 5.10.6. Potential coverage of academic speech that learners may reach if they learn the ASWL .............................................................................................................. 162 5.11. Summary of main findings .............................................................................. 164 5.12. Rationale for the next chapter .......................................................................... 165 Chapter 6 – Discussion ............................................................................................... 167 6.1. Introduction ........................................................................................................ 167 6.2. What is the importance of general high-frequency words in L2 vocabulary learning? .................................................................................................................... 167 6.2.1. What is the value of general high-frequency words in academic spoken English? ................................................................................................................. 167 6.2.2. What is the most suitable general high-frequency word list for L2 learners? ............................................................................................................................... 170 6.2.3. How many items should be in a general high-frequency word list? ........... 171 6.3. How can word lists better support L2 vocabulary development? ...................... 174 6.3.1. How can word lists be adaptable to learners’ proficiency? ......................... 175 6.3.2. How can word lists be adaptable to learners’ academic disciplines? .......... 177 6.3.3. What sources of information should be used in word list validation? ........ 182 6.4. How important is teachers’ familiarity with learners’ characteristics in L2 vocabulary teaching and learning?............................................................................ 185 6.5. Rationale for the next chapter ............................................................................ 187 Chapter 7 – Conclusion .............................................................................................. 189 7.1. Introduction ........................................................................................................ 189 7.2. Theoretical contributions ................................................................................... 189 7.3. Methodological contributions ............................................................................ 190 7.3.1. How to make an academic word list adaptable to learners’ proficiency? ... 190 7.3.2. How to validate corpus-based word lists? ................................................... 191 7.3.3. How to compare word lists of different units of counting? ......................... 193 7.4. Pedagogical implications ................................................................................... 193 7.4.1. Developing learners’ knowledge of the BNC/COCA2000 and ASWL words ............................................................................................................................... 193 7.4.2. Combining information from corpora, teachers, and learners in word list implementation ...................................................................................................... 201 7.4.3. Enhancing teachers’ knowledge of learners in vocabulary teaching .......... 202 viii
Description: