Margaret Wu · Hak Ping Tam Tsung-Hau Jen Educational Measurement for Applied Researchers Theory into Practice Educational Measurement for Applied Researchers Margaret Wu Hak Ping Tam (cid:129) Tsung-Hau Jen Educational Measurement for Applied Researchers Theory into Practice 123 Margaret Wu Hak PingTam National Taiwan Normal University Graduate Institute of Science Education Taipei National Taiwan Normal University Taiwan Taipei Taiwan and Tsung-HauJen Educational Measurement Solutions National Taiwan Normal University Melbourne Taipei Australia Taiwan ISBN978-981-10-3300-1 ISBN978-981-10-3302-5 (eBook) DOI 10.1007/978-981-10-3302-5 LibraryofCongressControlNumber:2016958489 ©SpringerNatureSingaporePteLtd.2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor foranyerrorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerNatureSingaporePteLtd. Theregisteredcompanyaddressis:152BeachRoad,#22-06/08GatewayEast,Singapore189721,Singapore Preface This book aims at providing the key concepts of educational and psychological measurement for applied researchers. The authors of this book set themselves to a challengeofwritingabookthatcoverssomedepthsinmeasurementissues,butyet is not overly technical. Considerable thoughts have been put in to find ways of explaining complex statistical analyses to the layperson. In addition to making the underlying statistics accessible to non-mathematicians, the authors take a practical approach by including many lessons learned from real-life measurement projects. Nevertheless, the book is not a comprehensive text on measurement. For example, derivations of models and estimation methods are not dealt in detail in this book. Readersare referred toother texts for more technically advanced topics. This does not mean that a less technical approach to present measurement can only be at a superficial level. Quite the contrary, this book is written with considerable stimu- lationfordeepthinkingandvigorousdiscussionsaroundmanymeasurementtopics. For those looking for recipes on how to carry out measurement, this book will not provideanswers.Infact,wetaketheviewthatsimplequestionssuchas“howmany respondents are needed for a test?” do not have straightforward answers. But we discussthefactorsimpactingonsamplesizeandprovideguidelinesonhowtowork out appropriate sample sizes. This book is suitable as a textbook for a first-year measurement course at the graduate level, since much of the materials for this book have been used by the authors in teaching educational measurement courses. It can be used by advanced undergraduate students who happened to be interested in this area. While the conceptspresentedinthisbookcanbeappliedtopsychologicalmeasurementmore generally, the majority of the examples and contexts are in the field of education. Someprerequisitestousingthisbookincludebasicstatisticalknowledgesuchasa grasp of the concepts of variance, correlation, hypothesis testing and introductory probabilitytheory.Inaddition,thisbookisforpractitionersandmuchofthecontent covered is to address questions we received along the years. We would like to thank those who have made suggestions on earlier versions ofthechapters.Inparticular,wewouldliketothankTomKnappandMatthiasvon Davier for going through several chapters in an earlier draft. Also, we would like v vi Preface tothanksomestudentswhohadreadseveralearlychaptersofthebook.Webenefit from their comments that help us to improve on the readability of some sections of the book. But, of course, any unclear spots or even possible errors are our own responsibility. Taipei, Taiwan; Melbourne, Australia Margaret Wu Taipei, Taiwan Hak Ping Tam Taipei, Taiwan Tsung-Hau Jen Contents 1 What Is Measurement?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Measurements in the Physical World . . . . . . . . . . . . . . . . . . . . . . . 1 Measurements in the Psycho-social Science Context. . . . . . . . . . . . . 1 Psychometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Formal Definitions of Psycho-social Measurement . . . . . . . . . . . . . . 3 Levels of Measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Nominal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Ordinal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Increasing Levels of Measurement in the Meaningfulness of the Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 The Process of Constructing Psycho-social Measurements. . . . . . . . . 6 Define the Construct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Distinguish Between a General Survey and a Measuring Instrument. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Write, Administer, and Score Test Items. . . . . . . . . . . . . . . . . . . 8 Produce Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Reliability and Validity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Validity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Graphical Representations of Reliability and Validity . . . . . . . . . . 12 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Discussion Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Car Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Taxi Survey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Further Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 vii viii Contents 2 Construct, Framework and Test Development—From IRT Perspectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Linking Validity to Construct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Construct in the Context of Classical Test Theory (CTT) and Item Response Theory (IRT). . . . . . . . . . . . . . . . . . . . . . . . . . 21 Unidimensionality in Relation to a Construct. . . . . . . . . . . . . . . . . . 24 The Nature of a Construct—Psychological Trait or Arbitrarily Defined Construct?. . . . . . . . . . . . . . . . . . . . . . . . 24 Practical Considerations of Unidimensionality . . . . . . . . . . . . . . . 25 Theoretical and Practical Considerations in Reporting Sub-scale Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Summary About Constructs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Frameworks and Test Blueprints . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Writing Items. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Item Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Number of Options for Multiple-Choice Items. . . . . . . . . . . . . . . 29 How Many Items Should There Be in a Test? . . . . . . . . . . . . . . . 30 Scoring Items. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Awarding Partial Credit Scores . . . . . . . . . . . . . . . . . . . . . . . . . 32 Weights of Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Discussion Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Further Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3 Test Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Measuring Individuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Magnitude of Measurement Error for Individual Students . . . . . . . 42 Scores in Standard Deviation Unit . . . . . . . . . . . . . . . . . . . . . . . 43 What Accuracy Is Sufficient?. . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Summary About Measuring Individuals. . . . . . . . . . . . . . . . . . . . 45 Measuring Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Computation of Sampling Error . . . . . . . . . . . . . . . . . . . . . . . . . 47 Summary About Measuring Populations . . . . . . . . . . . . . . . . . . . 47 Placement of Items in a Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Implications of Fatigue Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Balanced Incomplete Block (BIB) Booklet Design . . . . . . . . . . . . 49 Arranging Markers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Discussion Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Appendix 1: Computation of Measurement Error . . . . . . . . . . . . . . . 56 Contents ix References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Further Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 Test Administration and Data Preparation. . . . . . . . . . . . . . . . . . 59 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Sampling and Test Administration . . . . . . . . . . . . . . . . . . . . . . . . . 59 Sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Field Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Data Collection and Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Capture Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Prepare a Codebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Data Processing Programs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Discussion Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 School Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Further Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5 Classical Test Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Concepts of Measurement Error and Reliability . . . . . . . . . . . . . . . . 73 Formal Definitions of Reliability and Measurement Error . . . . . . . . . 76 Assumptions of Classical Test Theory. . . . . . . . . . . . . . . . . . . . . 76 Definition of Parallel Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Definition of Reliability Coefficient . . . . . . . . . . . . . . . . . . . . . . 77 Computation of Reliability Coefficient . . . . . . . . . . . . . . . . . . . . 79 Standard Error of Measurement (SEM). . . . . . . . . . . . . . . . . . . . 81 Correction for Attenuation (Dis-attenuation) of Population Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Correction for Attenuation (Dis-attenuation) of Correlation . . . . . . 82 Other CTT Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Item Difficulty Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Item Discrimination Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Item Discrimination for Partial Credit Items. . . . . . . . . . . . . . . . . 85 Distinguishing Between Item Difficulty and Item Discrimination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Discussion Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Further Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6 An Ideal Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 An Ideal Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 x Contents Ability Estimates Based on Raw Scores . . . . . . . . . . . . . . . . . . . . . 92 Linking People to Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Estimating Ability Using Item Response Theory . . . . . . . . . . . . . . . 95 Estimation of Ability Using IRT . . . . . . . . . . . . . . . . . . . . . . . . 98 Invariance of Ability Estimates Under IRT . . . . . . . . . . . . . . . . . 101 Computer Adaptive Tests Using IRT . . . . . . . . . . . . . . . . . . . . . 102 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Hands-on Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Task 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Task 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Discussion Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Further Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7 Rasch Model (The Dichotomous Case). . . . . . . . . . . . . . . . . . . . . 109 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 The Rasch Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Properties of the Rasch Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Specific Objectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Indeterminacy of an Absolute Location of Ability . . . . . . . . . . . . 112 Equal Discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Indeterminacy of an Absolute Discrimination or Scale Factor. . . . . 113 Different Discrimination Between Item Sets. . . . . . . . . . . . . . . . . 115 Length of a Logit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Building Learning Progressions Using the Rasch Model . . . . . . . . 117 Raw Scores as Sufficient Statistics . . . . . . . . . . . . . . . . . . . . . . . 120 How Different Is IRT from CTT?. . . . . . . . . . . . . . . . . . . . . . . . 121 Fit of Data to the Rasch Model . . . . . . . . . . . . . . . . . . . . . . . . . 122 Estimation of Item Difficulty and Person Ability Parameters . . . . . . . 122 Weighted Likelihood Estimate of Ability (WLE) . . . . . . . . . . . . . . . 123 Local Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Transformation of Logit Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 An Illustrative Example of a Rasch Analysis. . . . . . . . . . . . . . . . . . 125 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Hands-on Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Task 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Task 2. Compare Logistic and Normal Ogive Functions . . . . . . . . 134 Task 3. Compute the Likelihood Function. . . . . . . . . . . . . . . . . . 135 Discussion Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Further Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 8 Residual-Based Fit Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Fit Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140