ebook img

Quantitative data analysis for language assessment. Volume I, Fundamental techniques PDF

289 Pages·2019·4.039 MB·Routledge Research in Language Education.
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Quantitative data analysis for language assessment. Volume I, Fundamental techniques

Quantitative Data Analysis for Language Assessment Volume I Quantitative Data Analysis for Language Assessment Volume I: Fundamental Techniques is a resource book that presents the most fundamental techniques of quantitative data analysis in the field of language assessment. Each chapter provides an accessible explanation of the selected technique, a review of language assessment studies that have used the technique, and finally, an example of an authentic study that uses the technique. Readers also get a taste of how to apply each technique through the help of supplementary online resources that include sample datasets and guided instructions. Language assessment students, test designers, and researchers should find this a unique reference, as it consolidates theory and application of quantitative data analysis in language assessment. Vahid Aryadoust is an Assistant Professor of language assessment literacy at the National Institute of Education of Nanyang Technological University, Singapore. He has led a number of language assessment research projects funded by, for example, the Ministry of Education (Singapore), Michigan Language Assessment (USA), Pearson Education (UK), and Paragon Testing Enterprises (Canada), and published his research in, for example, Language Testing, Language Assessment Quarterly, Assessing Writing, Educational Assessment, Educational Psychology, and Computer Assisted Language Learning. He has also (co)authored a number of book chapters and books that have been published by Routledge, Cambridge University Press, Springer, Cambridge Scholar Publishing, Wiley Blackwell, etc. He is a member of the Advisory Board of multiple international journals including Language Testing, Language Assessment Quarterly, Educational Assessment, Educational Psychology, and Asia Pacific Journal of Education. In addition, he has been awarded the Intercontinental Academia Fellowship (2018–2019) which is an advanced research program launched by the University-Based Institutes for Advanced Studies. Vahid’s areas of interest include theory-building and quantitative data analysis in language assessment, neuroimaging in language comprehension, and eye-tracking research. Michelle Raquel is a Senior Lecturer at the Centre of Applied English Studies, University of Hong Kong, where she teaches language testing and assessment to postgraduate students. She has extensive assessment development and management experience in the Hong Kong education and government sector. In particular, she has either led or been part of a group that designed and administered large- scale computer-based language proficiency and diagnostic assessments such as the Diagnostic English Language Tracking Assessment (DELTA). She specializes in data analysis, specifically Rasch measurement, and has published several articles in international journals on this topic as well as academic English, diagnostic assessment, dynamic assessment of English second-language dramatic skills, and English for specific purposes (ESP) testing. Michelle’s research areas are classroom-based assessment, diagnostic assessment, and workplace assessment. Routledge Research in Language Education The Routledge Research in Language Education series provides a platform for established and emerging scholars to present their latest research and discuss key issues in Language Education. This series welcomes books on all areas of language teaching and learning, including but not limited to language education policy and politics, multilingualism, literacy, L1, L2 or foreign language acquisition, curriculum, classroom practice, pedagogy, teaching materials, and language teacher education and development. Books in the series are not limited to the discussion of the teaching and learning of English only. Books in the series include Interdisciplinary Research Approaches to Multilingual Education Edited by Vasilia Kourtis-Kazoullis, Themistoklis Aravossitas, Eleni Skourtou and Peter Pericles Trifonas From language skills to literacy Broadening the scope of English language education through media literacy Csilla Weninger Addressing Difficult Situations in Foreign-Language Learning Confusion, Impoliteness, and Hostility Gerrard Mugford Translanguaging in EFL Contexts A Call for Change Michael Rabbidge Quantitative Data Analysis for Language Assessment Volume I Fundamental Techniques Edited by Vahid Aryadoust and Michelle Raquel For more information about the series, please visit www.routledge.com/Routledge- Research-in-Language-Education/book-series/RRLE Quantitative Data Analysis for Language Assessment Volume I Fundamental Techniques Edited by Vahid Aryadoust and Michelle Raquel First published 2019 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 52 Vanderbilt Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2019 selection and editorial matter, Vahid Aryadoust and Michelle Raquel; individual chapters, the contributors The right of Vahid Aryadoust and Michelle Raquel to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book has been requested ISBN: 978-1-138-73312-1 (hbk) ISBN: 978-1-315-18781-5 (ebk) Typeset in Galliard by Apex CoVantage, LLC Visit the eResources: www.routledge.com/9781138733121 Contents List of figures vii List of tables ix Preface xi Editor and contributor biographies xiii Introduction 1 VAHID ARYADOUST AND MICHELLE RAQUEL SECTION I Test development, reliability, and generalizability 13 1 Item analysis in language assessment 15 RITA GREEN 2 Univariate generalizability theory in language assessment 30 YASUYO SAWAKI AND XIAOMING XI 3 Multivariate generalizability theory in language assessment 54 KIRBY C. GRABOWSKI AND RONGCHAN LIN SECTION II Unidimensional Rasch measurement 81 4 Applying Rasch measurement in language assessment: unidimensionality and local independence 83 JASON FAN AND TREVOR BOND 5 The Rasch measurement approach to differential item functioning (DIF) analysis in language assessment research 103 MICHELLE RAQUEL vi Contents 6 Application of the rating scale model and the partial credit model in language assessment research 132 IKKYU CHOI 7 Many-facet Rasch measurement: implications for rater-mediated language assessment 153 THOMAS ECKES SECTION III Univariate and multivariate statistical analysis 177 8 Analysis of differences between groups: the t-test and the analysis of variance (ANOVA) in language assessment 179 TUĞBA ELIF TOPRAK 9 Application of ANCOVA and MANCOVA in language assessment research 198 ZHI LI AND MICHELLE Y. CHEN 10 Application of linear regression in language assessment 219 DAERYONG SEO AND HUSEIN TAHERBHAI 11 Application of exploratory factor analysis in language assessment 243 LIMEI ZHANG AND WENSHU LUO Index 262 Figures 1.1 Facility values and distracter analysis 21 1.2 Discrimination indices 22 1.3 Facility values, discrimination, and internal consistency (reliability) 23 1.4 Task statistics 23 1.5 Distracter problems 24 2.1 A one-facet crossed design example 36 2.2 A two-facet crossed design example 37 2.3 A two-facet partially nested design example 38 3.1 Observed-score variance as conceptualized through CTT 55 3.2 Observed-score variance as conceptualized through G theory 56 4.1 Wright map presenting item and person measures 93 4.2 Standardized residual first contrast plot 96 5.1 ICC of an item with uniform DIF 105 5.2 ICC of an item with non-uniform DIF 106 5.3 Standardized residual plot of 1st contrast 115 5.4 ETS DIF categorization of DIF items based on DIF size and statistical significance 116 5.5 Sample ICC of item with uniform DIF (positive DIF contrast) 119 5.6 Sample ICC of item with uniform DIF (negative DIF contrast) 119 5.7 Macau high-ability students (M2) vs. HK high-ability students (H2) sample ICCs of an item with NUDIF (positive DIF contrast) 121 5.8 Macau high-ability students (M2) vs. HK high-ability students (H2) sample ICCs of an item with NUDIF (negative DIF contrast) 121 5.9 Plot diagram of person measures with vs. without DIF items 124 6.1 Illustration of the RSM assumption 136 6.2 Distributions of item responses 140 6.3 Estimated response probabilities for Items 1, 2, and 3 from the RSM (dotted lines) and the PCM (solid lines) 143 viii Figures 6.4 Estimated standard errors for person parameters and test information from the RSM (dotted lines) and the PCM (solid lines) 145 6.5 Estimated response probabilities for Items 6, 7 and 8 from the RSM (dotted lines) and the PCM (solid lines), with observed response proportions (unfilled circles) 146 7.1 The basic structure of rater-mediated assessments 154 7.2 Fictitious dichotomous data: Responses of seven test takers to five items scored as correct (1) or incorrect (0) 155 7.3 Illustration of a two-facet dichotomous Rasch model (log odds form) 156 7.4 Fictitious polytomous data: Responses of seven test takers evaluated by three raters on five criteria using a five-category rating scale 157 7.5 Illustration of a three-facet rating scale measurement model (log odds form) 157 7.6 Studying facet interrelations within a MFRM framework 160 7.7 Wright map for the three-facet rating scale analysis of the sample data (FACETS output, Table 6.0: All facet vertical “rulers”) 162 7.8 Illustration of the MFRM score adjustment procedure 167 9.1 An example of boxplots 202 9.2 Temporal distribution of ANCOVA/MANCOVA-based publications in four language assessment journals 204 9.3 A matrix of scatter plots 212 10.1 Plot of regression line graphed on a two-dimensional chart representing X and Y axes 221 10.2 Plot of residuals vs. predicted Y scores where the assumption of linearity holds for the distribution of random errors 224 10.3 Plot of residuals vs. predicted Y scores where the assumption of linearity does not hold 224 10.4 Plot of standardized residuals vs. predicted values of the dependent variable that depicts a violation of homoscedasticity 227 10.5 Histogram of residuals 237 10.6 Plot of predicted values vs. residuals 238 11.1 Steps in running EFA 245 11.2 Scatter plots to illustrate relationships between variables 247 11.3 Scree plot for the ReTSUQ data 256 Tables 2.1 Key Steps for Conducting a G Theory Analysis 34 2.2 Data Setup for the p × i Study Example With 30 Items (n = 35) 42 2.3 Expected Mean Square (EMS) Equations (the p × i Study Design) 42 2.4 G-study Results (the p × i Study Design) 43 2.5 D-study Results (the p × I Study Design) 44 2.6 Rating Design for the Sample Application 47 2.7 G- and D-study Variance Component Estimates for the p × r ′ Design (Rating Method) 48 2.8 G- and D-study Variance Component Estimates for the p × r Design (Subdividing Method) 49 3.1 Areas of Investigation and Associated Research Questions 59 3.2 Research Questions and Relevant Output to Examine 63 3.3 Variance Component Estimates for the Four Subscales (p• × T• × R• Design; 2 Tasks and 2 Raters) 69 3.4 Variance and Covariance Component Estimates for the Four Subscales (p• × T• × R• Design; 2 Tasks and 2 Raters) 72 3.5 G Coefficients for the Four Subscales (p• × T• × R• Design) 73 3.6 Universe-Score Correlations Between the Four Subscales (p• × T• × R• Design) 74 3.7 Effective Weights of Each Subscale to the Composite Universe-Score Variance (p• × T• × R• Design) 74 3.8 Generalizability Coefficients for the Subscales When Varying the Number of Tasks (p• × T• × R• Design) 75 4.1 Structure of the FET Listening Test 90 4.2 Summary Statistics for the Rasch Analysis 92 4.3 Rasch Item Measures and Fit Statistics (N = 106) 94 4.4 Standardized Residual Variance 96 4.5 Largest Standardized Residual Correlations 98 5.1 Selected Language-Related DIF Studies 109 5.2 Listening Sub-skills in the DELTA Listening Test 112 5.3 Rasch Analysis Summary Statistics (N = 2,524) 113 5.4 Principal Component Analysis of Residuals 114 5.5 Approximate Relationships Between the Person Measures in PCAR Analysis 115

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.