The Routledge Handbook of Language Testing The Routledge Handbook of Language Testing offers a critical and comprehensive overview of language testing and assessment within the fields of applied linguistics and language study. An understanding of language testing is essential for applied linguistic research, language education, and a growing range of public policy issues. This handbook is an indispensable introduction and reference to the study of the subject. Specially commissioned chapters by leading academics and researchers of language testing address the most important topics facing researchers and practitioners, including: (cid:1) An overview of the key issues in language testing (cid:1) Key research methods and techniques in language test validation (cid:1) The social and ethical aspects of language testing (cid:1) The philosophical and historical underpinnings of assessment practices (cid:1) The key literature in the field (cid:1) Test design and development practices through use of practical examples TheRoutledgeHandbookofLanguageTestingistheidealresourceforpostgraduatestudents,language teachers and those working in the field of applied linguistics. Glenn Fulcher is Reader in Education (Applied Linguistics and Language Testing) at the University ofLeicesterin the UnitedKingdom. Hisresearch interests includevalidation theory, test and rating scale design, retrofit issues, assessment philosophy, and the politics of testing. Fred Davidson is a Professor of Linguistics at the University of Illinois at Urbana-Champaign. His interests include language test development and the history and philosophy of educational and psychological measurement. This page intentionally left blank The Routledge Handbook of Language Testing Edited by Glenn Fulcher and Fred Davidson Firstpublished2012 byRoutledge 2ParkSquare,MiltonPark,Abingdon,OxonOX144RN SimultaneouslypublishedintheUSAandCanada byRoutledge 711ThirdAvenue,NewYork,NY10017 RoutledgeisanimprintoftheTaylor&FrancisGroup,aninformabusiness ©2012Selectionandeditorialmatter,GlennFulcherandFredDavidson;individualchapters, thecontributors. Therightoftheeditorstobeidentifiedastheauthorsoftheeditorialmaterial,andofthe authorsfortheirindividualchapters,hasbeenassertedinaccordancewithsections77and78 oftheCopyright,DesignsandPatentsAct1988. Allrightsreserved.Nopartofthisbookmaybereprintedorreproducedorutilisedinany formorbyanyelectronic,mechanical,orothermeans,nowknownorhereafterinvented, includingphotocopyingandrecording,orinanyinformationstorageorretrievalsystem, withoutpermissioninwritingfromthepublishers. Trademarknotice:Productorcorporatenamesmaybetrademarksorregisteredtrademarks,and areusedonlyforidentificationandexplanationwithoutintenttoinfringe. BritishLibraryCataloguinginPublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary LibraryofCongressCataloginginPublicationData TheRoutledgehandbookoflanguagetesting/editedbyGlennFulcherandFred Davidson. p.cm. Includesbibliographicalreferencesandindex. 1.Languageandlanguages–Abilitytesting.I.Fulcher,Glenn.II.Davidson,Fred. P53.4.R682011 418.0028'7–dc23 2011019617 ISBN:978-0-415-57063-3(hbk) ISBN:978-0-203-18128-7(ebk) TypesetinTimesNewRoman byTaylor&FrancisBooks Contents List of illustrations ix List of contributors xi Introduction 1 Glenn Fulcher and Fred Davidson PARTI Validity 19 1 Conceptions of validity 21 Carol A. Chapelle 2 Articulating a validity argument 34 Michael Kane 3 Validity issues in designing accommodations for English language learners 48 Jamal Abedi PARTII Classroom assessment and washback 63 4 Classroom assessment 65 Carolyn E. Turner 5 Washback 79 Dianne Wall 6 Assessing young learners 93 Angela Hasselgreen v Contents 7 Dynamic assessment 106 Marta Antón 8 Diagnostic assessment in language classrooms 120 Eunice Eunhee Jang PARTIII The social uses of language testing 135 9 Designing language tests for specific social uses 137 Carol Lynn Moder and Gene B. Halleck 10 Language assessment for communication disorders 150 John W. Oller, Jr. 11 Language assessment for immigration and citizenship 162 Antony John Kunnan 12 Social dimensions of language testing 178 Richard F. Young PARTIV Test specifications 195 13 Test specifications and criterion referenced assessment 197 Fred Davidson 14 Evidence-centered design in language testing 208 Robert J. Mislevy and Chengbin Yin 15 Claims, evidence, and inference in performance assessment 223 Steven J. Ross PARTV Writing items and tasks 235 16 Item writing and writers 237 Dong-il Shin 17 Writing integrated items 249 Lia Plakans 18 Test-taking strategies and task design 262 Andrew D. Cohen vi Contents PARTVI Prototyping and field tests 279 19 Prototyping new item types 281 Susan Nissan and Mary Schedl 20 Pre-operational testing 295 Dorry M. Kenyon and David MacGregor 21 Piloting vocabulary tests 307 John Read PARTVII Measurement theory and practice 321 22 Classical test theory 323 James Dean Brown 23 Item response theory 336 Gary J. Ockey 24 Reliability and dependability 350 Neil Jones 25 The generalisability of scores from language tests 363 Rob Schoonen 26 Scoring performance tests 378 Glenn Fulcher PARTVIII Administration and training 393 27 Quality management in test production and administration 395 Nick Saville 28 Interlocutor and rater training 413 Annie Brown 29 Technology in language testing 426 Yasuyo Sawaki 30 Validity and the automated scoring of performance tests 438 Xiaoming Xi vii Contents PARTIX Ethics and language policy 453 31 Ethical codes and unexpected consequences 455 Alan Davies 32 Fairness 469 F. Scott Walters 33 Standards-based testing 479 Thom Hudson 34 Language testing and language management 495 Bernard Spolsky Index 506 viii Illustrations Tables 7.1 Examiner–student discourse during DA episodes 114 8.1 Incremental granularity in score reporting (for the student JK) 127 14.1 Summary of evidence-centered design layers in the context of language testing 210 14.2 Design pattern attributes and relationships to assessment argument 212 14.3 A design pattern for assessing cause and effect reasoning reading comprehension 213 14.4 Steps taken to redesign TOEFL iBT and TOEIC speaking and writing tests and guided by layers in Evidence-Centered Design 217 23.1 Test taker response on a multiple-choice listening test 339 25.1 Scores for 10 persons on a seven-item test (fictitious data) 366 25.2 Analysis of variance table for the sample data 367 25.3 Scores for 15 persons on two speaking tasks rated twice on a 30-point scale (fictitious data) 369 25.4 Analysis of variance table (p(cid:3)t(cid:3)r) for the sample data 2 (Table 25.3) 370 25.5 Analysis of variance table (p(cid:3)(r:t)) for the sample data 2 (Table 25.3), with raters nested within task 371 26.1 Clustering scores by levels 380 33.1 Interagency Language Roundtable Levels and selected contexts – speaking 481 33.2 The Interagency Language Roundtable (ILR) to the American Council on the Teaching of Foreign Languages (ACTFL) concordance 482 33.3 Foreign Service Institute descriptor for Level 2 speaking 483 33.4 American Council on the Teaching of Foreign Languages Advanced descriptor 483 33.5 Example standards for foreign language learning 484 33.6 Example descriptors for the intermediate learner range of American Council on the Teaching of Foreign Languages K-12 Guidelines 485 33.7 Canadian Language Benchmarks speaking and listening competencies 486 33.8 Example global performance descriptor and performance conditions from the Canadian Language Benchmarks 486 33.9 Common European Framework—global scale 488 33.10 Example descriptors for the Common European Framework of Reference for Languages 489 ix