RATERS’ ACCENT-FAMILIARITY LEVELS AND THEIR EFFECTS ON PRONUNCIATION SCORES AND INTELLIGIBILITY ON HIGH-STAKES ENGLISH TESTS Thesis submitted for the degree of Doctor of Philosophy at the University of Leicester by Kevin Cogswell Browne Department of Education University of Leicester 2016 Abstract Raters’ accent-familiarity levels and their effects on pronunciation scores and intelligibility on high-stakes English tests Kevin C. Browne Some current high-stakes tests of English have abandoned native-speaker models of pronunciation for scoring purposes, and instead rely largely on raters’ estimations of ‘listener effort’ needed to cope with test-takers’ speech in order to determine pronunciation scores. Recent studies within the field of language testing have revealed significant score variance occurring on such tests due to raters’ differing familiarities with test-takers’ accents. The studies that investigated raters’ accent-familiarity differences as a threat to reliability and validity of scores on high- stakes tests have only determined significant score differences can occur, but have offered little more than speculation concerning why accent-familiarity impacts raters’ score decisions. The purpose of this thesis was to investigate not only the veracity of the threat, but also attempt to provide an explanation why raters’ accent-familiarity differences affect scores. A strong rationale exists supporting a hypothesis that exposure to the speech of a particular group of speakers, or accent, positively affects listeners’ speech processing abilities of utterances in that accent by increasing intelligibility. In order to determine the veracity of the hypothesis two studies were conducted: a pilot study examined the pronunciation scores and intelligibility differences between raters with different levels of accent-familiarity with Japanese- English, and a larger study investigated pronunciation score and intelligibility differences with Arabic-English, Spanish-English and Dhivehi-English. Many-Facets Rasch Measurements of the data revealed significant differences in both pronunciation scores and intelligibility occurred between accent-familiarity rater groups with all accents. The findings also showed significant correlations between level of accent-familiarity and score leniency, as well as accent-familiarity level and increased intelligibility, though the measures and effect sizes were not equal with each accent. Raters’ accent-familiarity differences were confirmed as a valid threat to pronunciation scores. 2 Acknowledgements I would like to first thank ETS for awarding me a TOEFL Small Grant for Doctoral Research in Second or Foreign Language Assessment. It both helped me with the costs of this research and gave me confidence to see it through. I am indebted to them. I would of course, like to thank Professor Glenn Fulcher who supervised my doctoral studies. His unwavering support and faith in me throughout every step and stage of this project is greatly appreciated. I would also like to thank my assistant- supervisor, Professor Pamela Rogerson-Revell, for sharing her expertise in pronunciation with me. I am deeply indebted to Azlifa Ahmed for her amazing generosity of time and patience. I never could have included Dhivehi-English in this study without her. Thank you for being such a great friend and colleague. Other colleagues that assisted me and deserving of recognition here are Lesley Underwood, Shane Dick, Ahmed Nazif and Nathaniel Owen. I would also like to give special thanks to Professor Mike Linacre for his generosity with his time to answer all of my many questions about Rasch and FACETS. Dr. Linacre’s online Rasch Forum is a treasure trove of wisdom and information for all things Rasch. Thank you so much. I would like to thank Jo Singh, Douglas McCall and the crew at the F Bar in Leicester – thank you for taking me in, calling me friend and giving me memories to last a lifetime. I never would have felt like a student without you guys. I am most appreciative to family. To my parents and brother and sister, thank you for believing in me, and your never ending enthusiasm toward my life that took me so far away from home so long ago. To my children, Hunter and Noah, I tried my best to balance my life of work, study and family, and my greatest hope is that you never realized I was so busy or how much it hurt when I thought about the time I was missing with you. Finally, thank you to my wife Yuko. You are the reason I am who I am, and I never could have accomplished this without you. I could never imagine a more supportive and caring partner to go through life with. This dissertation is dedicated to you. 3 Table of Contents Abstract .............................................................................................................................................................. 2 Acknowledgements ....................................................................................................................................... 3 Table of Contents ............................................................................................................................................ 4 List of Tables .................................................................................................................................................... 6 List of Figures .................................................................................................................................................. 9 List of Abbreviations ................................................................................................................................. 10 Chapter 1 : Introduction .......................................................................................................................... 12 1.1 The challenges to high-stakes testing of English pronunciation ................................ 12 1.2 The role of the rater within pronunciation constructs .................................................. 19 1.3 Conceptualizing accents and accent-familiarity ............................................................... 20 1.4 Raters accent familiarities as a construct-irrelevant threat to pronunciation scores .......................................................................................................................................................... 21 1.5 The problem statement, aim and implications of this study ....................................... 24 1.5.1 The problem statement ....................................................................................................... 24 1.5.2 The aim of this research ...................................................................................................... 25 1.5.3 The potential implications and importance of this research .............................. 26 1.6 The research questions and hypotheses .............................................................................. 27 1.6.1 Main research questions ..................................................................................................... 27 1.6.2 Sub-questions .......................................................................................................................... 27 1.6.3 The hypotheses ....................................................................................................................... 28 1.7 Testing the hypotheses ................................................................................................................. 28 1.7.1 Focusing on pronunciation scores .................................................................................. 29 1.7.2 Scoring pronunciation in this study ............................................................................... 30 1.7.3 Isolating accent-familiarity from other forms of familiarity that contribute to speech perception ....................................................................................................................... 31 1.7.4 Testing raters not speakers ............................................................................................... 32 1.7.5 The problematic nature of rater training in validity studies ............................... 33 1.7.5 Contributing concepts from outside the language testing literature .............. 34 1.8 Original contribution to the literature .................................................................................. 35 1.9 Important terms defined ............................................................................................................. 37 1.10 Outline of the dissertation ....................................................................................................... 38 Chapter 2 : Literature Review ................................................................................................................ 42 2.1 High-stakes tests of spoken English ....................................................................................... 43 2.1.1 Direct and semi-direct tests .............................................................................................. 43 2.1.2 Rating scales for measuring speaking in high-stakes tests .................................. 46 2.2 Measuring pronunciation: traditions and trouble ........................................................... 51 2.2.2 The “ownership” of English ............................................................................................... 56 2.3 Accented speech in modern English pronunciation assessment – scoring pronunciation without a model ....................................................................................................... 68 2.4 Accent-familiarity and speech perception ........................................................................... 70 2.4.1 Accent-familiarity .................................................................................................................. 70 2.4.2 Gass and Varonis, 1984 ....................................................................................................... 71 4 2.4.3 The contributions from Bent and Bradlow and Munro, Derwing and Morton ................................................................................................................................................................... 78 2.4.4 Why all accents cannot be considered equally intelligible to all raters ......... 81 2.5 Speech processing and speaker normalization: examples from outside the fields of language testing and linguistics ................................................................................................. 82 2.5.1 The Perceptual Magnet Effect ........................................................................................... 82 2.5.2 The Exemplar Theory ........................................................................................................... 85 2.5.3 Speech processing and the problematic nature of speaker normalization .. 88 2.6 Investigations into raters’ accent familiarities as a threat to test scores ............... 91 2.6.1 Xi and Molluan 2009 ............................................................................................................. 91 2.6.2 Kim 2009 ................................................................................................................................... 93 2.6.3 Huang 2013 .............................................................................................................................. 96 2.6.4 Carey, Mannell and Dunn 2011 ........................................................................................ 98 2.6.5 Winke, Gass and Myford 2011 ....................................................................................... 101 2.7 Clarifying the terms: intelligibility, comprehensibility and interpretability ..... 104 2.7.1 Intelligibility .......................................................................................................................... 105 2.7.2 Comprehensibility .............................................................................................................. 110 2.7.3 Interpretability .................................................................................................................... 112 2.7.4 Observations of the varied use of the terms ........................................................... 113 Chapter 3 : The pilot study ................................................................................................................... 115 3.1 The test ............................................................................................................................................ 116 3.2 The participants ........................................................................................................................... 120 3.2.1 The speaker-participants ................................................................................................ 120 3.2.2 The rater-participants ...................................................................................................... 121 3.4 Results and discussion .............................................................................................................. 122 3.4.1 Pronunciation scores ......................................................................................................... 123 3.4.2 Intelligibility ............................................................................................................................... 130 3.5 Conclusions from the pilot study .......................................................................................... 137 Chapter 4 : Methodology ...................................................................................................................... 142 4.1 The development of the hypothesis and what prompted this study ..................... 142 4.2 Non-native accent selection .................................................................................................... 145 4.2.1 Spanish-English ................................................................................................................... 150 4.2.2 Arabic English ....................................................................................................................... 154 4.2.3 Dhivehi-English .................................................................................................................... 157 4.3 The speaker participants ......................................................................................................... 162 4.3.1 The Spanish-English speaker participants .............................................................. 164 4.3.2 The Arabic-English speaker participants ................................................................. 165 4.3.3 The Dhivehi-English speaker participants ............................................................... 167 4.4 The Rater Participants ............................................................................................................... 168 4.4.1 Why rater training was not employed ....................................................................... 178 4.5 The Test ........................................................................................................................................... 179 4.5.1 Part one of the test ............................................................................................................. 181 4.5.2 Part two of the test ............................................................................................................. 186 4.5.3 Part three of the test .......................................................................................................... 200 4.6 Analyses ........................................................................................................................................... 200 4.6.1 Many-Facets Rasch Measurement analyses ............................................................. 201 5 4.6.2 Other analyses ...................................................................................................................... 208 Chapter 5 : Findings and Discussions ............................................................................................. 210 5.1 The research questions ............................................................................................................. 211 5.2 The alpha level and the inclusion of multiple statistical comparisons ................ 211 5.3 Determining the appropriateness of the test and rater population ...................... 212 5.3.1 The test .................................................................................................................................... 212 5.3.2 The rater population ......................................................................................................... 217 5.4 Raters’ accent familiarities and pronunciation scores ................................................ 222 5.4.1 Spanish-English pronunciation scores ...................................................................... 223 5.4.2 Arabic-English pronunciation scores ......................................................................... 227 5.4.3 Dhivehi-English pronunciation scores ....................................................................... 230 5.5 Raters’ accent familiarities and intelligibility ................................................................. 234 5.5.1 Spanish-English familiarity and intelligibility ........................................................ 236 5.5.2 Arabic-English familiarity and intelligibility ........................................................... 243 5.5.3 Dhivehi-English familiarity and intelligibility ........................................................ 251 5.6 Accent-familiarity’s correlations with pronunciation scoring and intelligibility ..................................................................................................................................................................... 258 5.7 The unequal effect of accent familiarity ............................................................................ 264 5.7.1 Differences in the rater accent-familiarity effect on pronunciation scores 265 5.7.2 Differences in the rater accent-familiarity effect on intelligibility ................ 266 5.8 The rater accent-familiarity effect and test-takers’ L1 population size considerations ...................................................................................................................................... 268 Chapter 6 : Conclusion and Implications ....................................................................................... 272 6.1 Review of the research approach and methods ............................................................. 272 6.2 Accent-familiarity levels and their effects on pronunciation scores ..................... 274 6.3 Accent-familiarity levels and their effects on intelligibility ...................................... 277 6.4 The ‘very familiar’ familiarity level and ‘bias for best’ (Fox, 2004) ....................... 279 6.5 Limitations and recommendations for future research .............................................. 282 6.6 Concluding remarks ................................................................................................................... 284 Appendix A ................................................................................................................................................. 286 Appendix B ................................................................................................................................................. 310 Bibliography ............................................................................................................................................... 397 List of Tables Table 1.1: TOEFL iBT Independent Speaking Rubrics for Delivery ................................................ 16 Table 1.2: TOEFL iBT Integrated Speaking Rubrics for Delivery ..................................................... 17 Table 1.3: IELTS Pronunciation score bands and descriptors (public version) .......................... 17 Table 2.1: The TOEFL iBT Independent speaking rubrics “General Description” category .. 49 Table 2.2: Kim’s (2009) rating scale for the oral English test .......................................................... 94 Table 3.1: The pronunciation rating scale .......................................................................................... 117 Table 3.2: The sentences included in the test with the intelligibility items underlined .... 119 Table 3.3: Rater-participants’ Home Country List ........................................................................... 122 6 Table 3.4: Pronunciation score Facets rater familiarity level group measures ..................... 127 Table 3.5: ANOVA results of four familiarity groups’ pronunciation scores ........................... 128 Table 3.6: Independent t-test results of ‘very familiar’ and all other raters’ pronunciation scores ................................................................................................................................................... 129 Table 3.7: Facets accent-familiarity level measurement report for intelligibility items ..... 132 Table 3.8: Results of ANOVA tests conducted of the Japanese-English intelligibility items ................................................................................................................................................................ 133 Table 3.9: Significant results from independent t-tests measuring the intelligibility differences between the raters ‘very familiar’ with Japanese-English and all other raters .................................................................................................................................................... 134 Table 3.10: Pearson's correlation results measuring familiarity level with Japanese-English and pronunciation score ................................................................................................................ 135 Table 3.11: Pearson's correlation results measuring familiarity level with Japanese-English and intelligibility success ............................................................................................................... 136 Table 4.1: Phonetic inventories of English consonants ................................................................. 149 Table 4.2: The phonetic inventory of Spanish consonants shown in black and blue; red consonants are English consonants not included in Spanish pronunciation ............... 153 Table 4.3: The phonetic inventory of Arabic consonants found in most dialects shown in black and blue; red consonants are English consonants not included in Arabic pronunciation .................................................................................................................................... 156 Table 4.4: The phonetic inventory of Dhivehi consonants shown in black and blue; red consonants are English consonants not included in Dhivehi pronunciation ............... 159 Table 4.5: Final speaker participant information ............................................................................ 163 Table 4.6: Spanish-English speaker candidate information ......................................................... 165 Table 4.7: Arabic-English speaker candidate information ............................................................ 166 Table 4.8: Dhivehi-English speaker candidate information ......................................................... 168 Table 4.9: Raters’ home country (or country they were raised in) ........................................... 170 Table 4.10: How the native English speaker rater participants reported their native language .............................................................................................................................................. 171 Table 4.11: Reported native language(s) of nonnative English speaking rater participants ................................................................................................................................................................ 171 Table 4.12: Countries other than their home country raters participants lived one or more years ..................................................................................................................................................... 174 Table 4.13: Countries the raters were living in at the time of completing the test ............. 175 Table 4.14: Length of time in the country raters were living at the time of completing the test ........................................................................................................................................................ 175 Table 4.15: Rater participants’ reported familiarity with Spanish-English ............................. 176 Table 4.16: Rater participants’ reported familiarity with Arabic-English ................................ 176 Table 4.17: Rater participants’ reported familiarity with Dhivehi-English ............................. 176 Table 4.18: Age ranges of the rater participants ............................................................................. 178 Table 4.19: Questions concerning the biographical and professional details of the rater participants from Part One of the test ...................................................................................... 183 Table 4.20: Rubrics for raters’ self-scoring of accent-familiarity ............................................... 185 Table 4.21: The complete sentence list included in Part 2 of the test ..................................... 191 7 Table 4.22: Sentences constructed for Spanish-English speaker participants ...................... 195 Table 4.23: Sentences constructed for Dhivehi-English speaker participants ....................... 196 Table 4.24: Sentences constructed for Arabic-English speaker participants ......................... 197 Table 4.25: The pronunciation score descriptors for the main test .......................................... 198 Table 5.1: Pronunciation scoring item/speaker facet summary statistics .............................. 214 Table 5.2: All intelligibility item Facet summary statistics ............................................................ 216 Table 5.3: Rasch analyses of all rater participants’ pronunciation ratings ............................. 220 Table 5.4: Outlying rater participants’ details (before removing the outliers) ..................... 221 Table 5.5: One-way analysis of variance (ANOVA) of Spanish-English speakers’ pronunciation scores by familiarity level with Spanish-English ....................................... 224 Table 5.6: Independent t-test results examining pronunciation score variance between raters ‘very familiar’ with Spanish-English and all other raters ....................................... 225 Table 5.7: Facets bias interaction results for Spanish-English speakers and rater Spanish- English familiarity subgroups ....................................................................................................... 227 Table 5.8: One-way analysis of variance of Arabic-English speakers’ pronunciation scores by familiarity level with Arabic-English ..................................................................................... 228 Table 5.9: Independent t-test results examining pronunciation score variance between raters ‘very familiar’ with Arabic-English and all other raters .......................................... 229 Table 5.10: Facets bias interaction results for Arabic-English speakers and rater Arabic- English familiarity subgroup ......................................................................................................... 230 Table 5.11: One-way analysis of variance of Dhivehi-English speakers’ pronunciation scores by familiarity level with Dhivehi-English .................................................................................. 231 Table 5.12: Independent t-test results examining pronunciation score variance between raters ‘very familiar’ with Dhivehi-English and all other raters ........................................ 232 Table 5.13: Facets bias interaction results for Dhivehi-English speakers and rater Dhivehi- English familiarity subgroups ....................................................................................................... 233 Table 5.14: Facets Spanish-English intelligibility item measurement report ......................... 238 Table 5.15: Facets Spanish-English familiarity level report measures ..................................... 239 Table 5.16: Significant results from one-way analyses of variance of the Spanish-English intelligibility items by familiarity level with Spanish-English ............................................. 240 Table 5.17: Significant results from independent t-tests measuring the intelligibility differences between the raters ‘very familiar’ with Spanish-English and all other raters .................................................................................................................................................... 241 Table 5.18: The results of the two IPA transcriptions of the Spanish-English speakers’ utterances ........................................................................................................................................... 243 Table 5.19: Facets Arabic-English intelligibility item measurement report ............................ 246 Table 5.20: Facets Arabic-English familiarity level report measures ........................................ 247 Table 5.21: Significant results from one-way analyses of variance of the Arabic-English intelligibility items by familiarity level with Arabic-English ............................................... 248 Table 5.22: Significant results from independent t-tests measuring the intelligibility differences between the raters ‘very familiar’ with Arabic-English and all other raters ................................................................................................................................................................ 249 Table 5.23: The results of the two IPA transcriptions of the Arabic-English speakers’ utterances ........................................................................................................................................... 250 8 Table 5.24: Facets Dhivehi-English intelligibility item measurement report ......................... 253 Table 5.25: Facets Dhivehi-English familiarity level report measures ...................................... 254 Table 5.26: Significant results from one-way analyses of variance of the Dhivehi-English intelligibility items by familiarity level with Dhivehi-English ............................................. 255 Table 5.27: Significant results from independent t-tests measuring the intelligibility differences between the raters ‘very familiar’ with Dhivehi-English and all other raters ................................................................................................................................................................ 256 Table 5.28: The results of the two IPA transcriptions of the Dhivehi-English speakers’ utterances ........................................................................................................................................... 257 Table 5.29: Pearson's correlation results measuring familiarity level with Spanish-English and pronunciation scores .............................................................................................................. 260 Table 5.30: Pearson's correlation results measuring familiarity level with Spanish-English and intelligibility success ............................................................................................................... 260 Table 5.31: Pearson's correlation results measuring familiarity level with Arabic-English and pronunciation scores .............................................................................................................. 261 Table 5.32: Pearson's correlation results measuring familiarity level with Arabic-English and intelligibility success ............................................................................................................... 261 Table 5.33: Pearson's correlation results measuring familiarity level with Dhivehi-English and pronunciation scores .............................................................................................................. 262 Table 5.34: Pearson's correlation results measuring familiarity level with Dhivehi-English and intelligibility success ............................................................................................................... 263 Table 5.35: Rater-participants’ reported levels of accent-familiarity with nine World English accents and estimated L1 speaker population sizes ............................................................ 270 List of Figures Figure 1.1: Illustration of Kachru’s (1985) concentric circles of English ..................................... 15 Figure 2.1: A schematic illustration of an exemplar model of speech perception, based on the model illustrated in Johnson, 2006, p. 493. ....................................................................... 87 Figure 3.1: Facets Variable Map of Pronunciation Scores including Four Levels of Familiarity ................................................................................................................................................................ 125 Figure 3.2: Facets Variable Map of Intelligibility Gap-fill Outcomes Including Four Levels of Familiarity ........................................................................................................................................... 131 Figure 3.3: Scatter plot with regression line showing the correlation between raters’ familiarity level with Japanese-English and how they score those speakers’ pronunciation .................................................................................................................................... 136 Figure 3.4: Scatter plot with regression line showing the correlation between raters’ familiarity level with Japanese-English and intelligibility success with the accent ... 136 Figure 4.1: The phonetic inventory of vowels and diphthongs used in most dialects of native English. Retrieved from the speech accent archive ................................................ 150 9 Figure 4.2: The phonetic inventory of vowels used by most dialects of Spanish; red symbols represent vowels used in English but not in Spanish, and blue symbols represent vowels used in Spanish but not included in English pronunciation ............ 153 Figure 4.3: The phonetic inventory of vowels used in most dialects of Arabic; red symbols represent vowels used in English but not in Arabic, and blue symbols represent vowels used in Arabic but not included in English pronunciation ................................... 156 Figure 4.4: The phonetic inventory of vowels used in Dhivehi; red symbols represent vowels used in English but not in Dhivehi, and blue symbols represent vowels used in Dhivehi but not included in English pronunciation .............................................................. 159 Figure 4.5: Overview of the contents of the three parts of the test ........................................ 180 Figure 4.6: Screenshot of the opening and instructions taken from Part 2 of the test...... 187 Figure 4.7: A visualization of the Rasch Model. From Many-Facet Rasch Measurement: Facets Tutorial by M. Linacre, 2012a ........................................................................................ 203 Figure 5.1: The Facets variable map from the analyses of the Spanish-English intelligibility gap-fill items ...................................................................................................................................... 237 Figure 5.2: The Facets variable map from the analyses of the Arabic-English intelligibility gap-fill items ...................................................................................................................................... 244 Figure 5.3: The Facets variable map from the analyses of the Dhivehi-English intelligibility gap-fill items ...................................................................................................................................... 252 Figure 5.4: Scatterplots with regression lines of the correlations between raters’ familiarity levels with Spanish-English and raters’ mean pronunciation scores given to the Spanish-English speaker-participants (left) and with the raters’ mean intelligibility success rates transcribing the Spanish-English intelligibility items ................................. 259 Figure 5.5: Scatterplots with regression lines of the correlations between raters’ familiarity levels with Arabic-English and raters’ mean pronunciation scores given to the Arabic- English speaker-participants (left) and with the raters’ mean intelligibility success rates transcribing the Arabic-English intelligibility items ................................................... 261 Figure 5.6: Scatterplots with regression lines of the correlations between raters’ familiarity levels with Dhivehi-English and raters’ mean pronunciation scores given to the Dhivehi-English speaker-participants (left) and with the raters’ mean intelligibility success rates transcribing the Dhivehi-English intelligibility items ................................. 262 List of Abbreviations ACTFL/ETS American Council on the Teaching of Foreign Languages and the Educational Testing Service APU Asia Pacific University (Ritsumeikan) BKB-R Bamford-Kowal-Bench revised CL Common Language effect size EIL English as an International Language EFL English as a foreign language ELF English as a Lingua Franca 10
Description: