Virtual Standard Settin g: Setting Cut Scores Language Testing Charalambos Kollias and Evaluation Series editors: Claudia Harsch and Günther Sigott Virtual Standard Setting: Setting Cut Scores Volume 46 Zur Qualitätssicherung und Peer Notes on the quality assurance Review der vorliegenden Publikation and peer review of this publication Die Qualität der in dieser Reihe Prior to publication, the quality erscheinenden Arbeiten wird of the work published vor der Publikation durch die in this series is reviewed by Herausgeber der Reihe geprüft. the editors of the series. LLaanngguuaaggee TTeessttiinngg CChhaarraallaamm bbooss KKoolllliiaass aanndd EEvvaalluuaattiioonn SSeerriieess eeddiittoorrss:: CCllaauuddiiaa HHaarrsscchh aanndd GGüünntthheerr SSiiggootttt VViirrttuuaall SSttaannddaarrdd SSeettttiinngg:: SSeettttiinngg CCuutt SSccoorreess VVoolluummee 4466 ZZuurr QQuuaalliittäättssssiicchheerruunngg uunndd PPeeeerr NNootteess oonn tthhee qquuaalliittyy aassssuurraannccee RReevviieeww ddeerr vvoorrlliieeggeennddeenn PPuubblliikkaattiioonn aanndd ppeeeerr rreevviieeww ooff tthhiiss ppuubblliiccaattiioonn DDiiee QQuuaalliittäätt ddeerr iinn ddiieesseerr RReeiihhee PPrriioorr ttoo ppuubblliiccaattiioonn,, tthhee qquuaalliittyy eerrsscchheeiinneennddeenn AArrbbeeiitteenn wwiirrdd ooff tthhee wwoorrkk ppuubblliisshheedd vvoorr ddeerr PPuubblliikkaattiioonn dduurrcchh ddiiee iinn tthhiiss sseerriieess iiss rreevviieewweedd bbyy HHeerraauussggeebbeerr ddeerr RReeiihhee ggeepprrüüfftt.. tthhee eeddiittoorrss ooff tthhee sseerriieess.. Bibliographic Information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available online at http://dnb.d-nb.de. Library of Congre ss Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library ofCongress. ISSN 1612-815X ISBN 978-3-631-80539-8 (Print) E-ISBN 978-3-631-88904-6 (E-Book) E-ISBN 978-3-631-88905-3 (E-PUB) DOI 10.3726/b20407 © Peter Lang GmbH Internationaler Verlag der Wissenschaften Berlin 2023 All rights reserved. Peter Lang – Berlin · Bern · Bruxelles · New York · Oxford · Warszawa · Wien All parts of this publication are protected by copyright. Any utilisation outside the strict limits of the copyright law, without the permission of the publisher, is forbidden and liable to prosecution. This applies in particular to reproductions, translations, microfilming, and storage and processing in electronic retrieval systems. This publication has been peer reviewed. www.peterlang.com Bibliographic Information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available online at http://dnb.d-nb.de. Library of Congre ss Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library ofCongress. To my wife Voula and my sons Raf and Thanos ISSN 1612-815X ISBN 978-3-631-80539-8 (Print) E-ISBN 978-3-631-88904-6 (E-Book) E-ISBN 978-3-631-88905-3 (E-PUB) DOI 10.3726/b20407 © Peter Lang GmbH Internationaler Verlag der Wissenschaften Berlin 2023 All rights reserved. Peter Lang – Berlin · Bern · Bruxelles · New York · Oxford · Warszawa · Wien All parts of this publication are protected by copyright. Any utilisation outside the strict limits of the copyright law, without the permission of the publisher, is forbidden and liable to prosecution. This applies in particular to reproductions, translations, microfilming, and storage and processing in electronic retrieval systems. This publication has been peer reviewed. www.peterlang.com Abstract: In an attempt to combat the high costs associated with conducting face-t o- face (F2F) cut score studies, standard setting practitioners have started exploring other avenues such as virtual standard setting. However, the impact that a virtual communication medium can have on panellists and their cut scores in a virtual standard setting has yet to be fully investigated and understood. Consequently, the aims of this study were to explore whether reliable and valid cut scores could be set in two synchronous e- communication media (audio and video), to explore the panellists’ perceptions towards the two media, and to investigate whether virtual cut scores derived in a virtual environment were comparable with those derived in F2F environments. Forty- five judges were divided into four synchronous virtual standard setting panels, each panel consisting of 9 to 13 judges. Each panel participated in a virtual workshop consisting of two sessions conducted through a different e-c ommunication medium (audio or video). In each session, judges employed the modified Yes/ No Angoff method for Rounds 1 and 2 and provided an overall judgement for Round 3 to set cut scores on two equated language examination instruments. To cater for order effects, test form effects, and e- communication media effects, an embedded, mixed methods, counterbalanced research design was employed. Data were collected from three main sources: (1) panellists’ judgements; (2) survey responses; and (3) focus group interviews. The panellists’ judgements were evaluated through the many-f acet Rasch measurement (MFRM) model and classical test theory (CTT), the survey data were analysed through CTT, and the focus group interview data were analysed through the constant comparison method (CCM). The results were further interpreted through the lens of media naturalness theory (MNT). To compare virtual cut score results with F2F cut score results, data collected from an earlier F2F cut score study using the modified percentage Angoff method for two rounds were used. The findings from the MFRM and CTT analyses reveal that reliable and valid cut scores can be set in both e- communication media. While no statistically significant differences were observed within and across groups and media regarding the panellists’ overall cut score measures, analysis of the open- ended survey responses and focus group transcripts revealed that judges differed in their perceptions regarding each medium. Overall, the panellists expressed preference towards the video medium, a finding in line with MNT. The comparison of virtual cut score measures with F2F mean cut score measures yielded non- significant results for Round 1 and Round 2. However, when final virtual cut score measures (Round 3) were compared with final F2F cut score measures (Round 2), the same was not observed for one of the two virtual groups in the video medium. Group 3 video medium cut score measures differed in a statistically significant way from the F2F cut score measures. This difference may be attributed to Group 3’s idiosyncrasies when setting a cut score in the video medium, as Group 3 set the highest cut score measure compared to the other groups. This study adds to the current limited literature of virtual standard setting, expands the MFRM framework for evaluating multiple virtual cut score studies, and proposes a framework for conducting, analysing, and evaluating virtual cut score studies.