Table Of Content

The Application of a Cognitive Diagnosis Model via an Analysis of a Large-Scale Assessment and a Computerized Adaptive Testing Administration by Meghan Kathleen McGlohen, B.S., M. A. Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin In Partial Fulfillment Of the Requirements For the Degree of Doctor of Philosophy The University of Texas Austin May 2004 v The Application of a Cognitive Diagnosis Model via an Analysis of a Large-Scale Assessment and a Computerized Adaptive Testing Administration Publication No. ___________ Meghan Kathleen McGlohen, Ph.D. The University of Texas at Austin, 2004 Supervisor: Hua Hua Chang Our society currently relies heavily on test scores to measure individual progress, but typical scores can only provide a limited amount of information. For instance, a test score does not reveal which of the assessed topics were mastered and which were not well understood. According to the U.S. government, this is no longer sufficient. The No Child Left Behind Act of 2001 calls for diagnostic information to be provided for each individual student, along with information for the parents, teachers, and principals to use in addressing individual student needs. This opens the door for a new vi area of psychometrics that focuses on the inclusion of diagnostic feedback in traditional standardized testing. This diagnostic assessment could even be combined with techniques already developed in the arena of computer adaptive testing to individualize the assessment process and provide immediate feedback to individual students. This dissertation is comprised of two major components. First, a cognitive diagnosis-based model, namely the fusion model, is applied to two large-scale mandated tests administered by the Texas Education Agency; and secondly, computer adaptive testing technology is incorporated into the diagnostic assessment process as a way to develop a method of providing interactive assessment and feedback for individual examinees’ mastery levels of the cognitive skills of interest. The first part requires attribute assignment of the standardized test items and the simultaneous IRT-based estimation of both the item parameters and the examinee variables under the fusion model. Examinees are classified dichotomously into mastery and non-mastery categories for the assigned attributes. Given this information, it is possible to identify the attributes with which a given student needs additional help. The second part focuses on applying CAT-based methodology, and in particular item selection, to the diagnostic testing process to form a dynamic test that is sensitive to individual response patterns while the examinee is being administered the test. This combination of computer adaptive testing with diagnostic testing will contribute to the research field by enhancing the results that students and their parents and teachers receive from educational measurement. vii CHAPTER ONE: INTRODUCTION Typically, large-scale standardized assessments provide a single summary score to reflect the overall performance level of the examinee in a certain content area. The utility of large-scale standardized assessment would be enhanced if the assessment also provided students and their teachers with useful diagnostic information in addition to the single overall score. Currently, smaller-scale assessments, such as teacher-made tests, are the means of providing such helpful feedback to students throughout the school year. Negligible concern is expressed about the considerable classroom time that is taken by the administration of these formative teacher-made tests because they are viewed as integral parts of instruction. Conversely, educators view standardized testing of any kind as lost instruction time (Linn, 1990). Some advantages of standardized tests over teacher-made tests are that they allow for the comparison of individuals across various educational settings, they are more reliable, and they are objective and equitable (Linn, 1990). The advantage of teacher-made tests, on the other hand, is that they provide very specific information to the students regarding their strengths and weaknesses in the tested material. Large-scale standardized testing would be even more beneficial if it could also contribute to the educational process in a role beyond that of evaluation while maintaining these existing advantages, such as the reporting of diagnostic feedback. Then the students could use this information to target improvement in areas where they are deficient. A new approach to educational research has begun to effloresce in order to provide the best of both worlds. This research area, dealing with the application of 1 cognitive diagnosis in the assessment process, aims to provide helpful information to parents, teachers, and students, which can be used to direct additional instruction and study to the areas needed most by the individual student. This beneficial information provided by diagnostic assessment deals with the fundamental elements or building- blocks of the content area. The combination of these elements or attributes comprises the content domain of interest. This form of diagnostic assessment is an appropriate approach to conducting formative assessment, because it provides specific information regarding each measured attribute or content element to every examinee, rather than a single score result. An ideal assessment would not only be able to meet the meticulous psychometric standards of current large-scale assessments, but would also be able to provide specific diagnostic information regarding the individual examinee’s educational needs. In fact, the provision of this additional diagnostic information by large-scale state assessments has recently become a requirement; the No Child Left Behind Act of 2001 mandates that such feedback be provided to parents, teachers and students. Despite this requirement, constructing diagnostic assessment from scratch is expensive and impractical. A more affordable solution is to incorporate diagnostic measurement into existing assessments that state and local governments are already administering to public school students. So, in order to incorporate the benefits of diagnostic testing with the current assessment situation, cognitively diagnostic approaches would need to be applied to an existing test. Diagnostic assessment is a very advantageous approach to measurement. In traditional testing, different students may get the same score for different reasons 2 (Tatsuoka M. M. and Tatsuoka, K. K., 1989), but in diagnostic testing, these differences can be discovered and shared with the examinee and his/her teacher. Diagnostic assessment allows the testing process to serve an additional instructional purpose in addition to the traditional purposes of assessment (Linn, 1990), and can be used to integrate instruction and assessment (Campione and Brown, 1990). Furthermore, diagnostic testing offers a means of selecting instructional material according to an individual’s needs (Embretson, 1990). While traditional tests can accomplish assessment goals, such as a ranked comparison of examinees or grade assignments based on certain criteria, they do not provide individualized information to teachers or test-takers regarding specific content in the domain of interest (Chipman, Nichols, and Brennan, 1995). Traditional assessment determines what an individual has learned, but not what s/he has the capacity to learn (Embretson, 1990). Diagnostic assessment can be applied to areas involving the identification of individuals who are likely to experience difficulties in a given content domain, and it can help provide specific information regarding the kinds of help an individual needs. Furthermore, the cognitive diagnosis “can be used to gauge an individual’s readiness to move on to higher levels of understanding and skill” in the given content domain. (Gott, 1990, p. 174). Current approaches dealing with cognitive diagnosis focus solely on the estimation of the knowledge state, or attribute vector, of the examinees. This dissertation proposes the combination of the estimation of item response theory (IRT)-based ˆ individual ability levels (θ) along with an emphasis on the diagnostic feedback provided 3 by individual attribute vectors (αˆ ), thus bridging the current standard in testing technology with a new area of research aimed at helping students benefit from the testing € process through diagnostic feedback. The goal of this research is to not only simultaneously measure individuals’ knowledge states and conventional unidimensional IRT ability levels, but to do so in an efficient way. This research will apply the advantages of computerized adaptive testing to the new measurement area of cognitive diagnosis. The goal of computerized adaptive testing is to tailor a test to each individual examinee by allowing the test to hone in on the examinees’ ability levels in an interactive manner. Accordingly, examinees are relieved from answering many items that are not representative of their abilities. To accomplish this goal, this research study relies on the technology of computer adaptive testing. The aim of this research is to combine the advantages of computerized adaptive testing with the helpful feedback provided by cognitively diagnostic assessment, to enhance the existing testing process. This dissertation proposes a customized diagnostic testing procedure that provides both conventional unidimesional ability estimates as well as a report of attribute mastery status to the examinees, instead of just one or the other. The key idea of this work is to utilize the shadow test technique to optimize the ˆ estimation of the traditional IRT-based ability level, θ, and then select an item from this shadow test that is optimal for the cognitive attribute vector, αˆ , for each examinee. But first, the founding concepts of this research must be elucidated. € 4 CHAPTER TWO: LITERATURE REVIEW Traditional IRT-Based Testing Item response theory (IRT) is a common foundation for wide-scale testing. IRT is based on the idea of test homogeneity (Loevinger, 1947) and logistic statistical modeling (Birnbaum, 1968), and uses these probabilistic models to describe the relationship between item response patterns and underlying parameters. IRT uses the item as the unit of measure (rather than the entire test) to obtain ability scores that are on the same scale despite the differences in item administration across examinees (Wainer and Mislevy, 2000). As outlines in Rogers, Swaminathan and Hambleton’s (1991) text, two main axioms are assumed when employing IRT: (1) The performance of an individual on a set of test items can be rationalized by an underlying construct, latent trait, or set thereof. The context of educational testing uses individual ability levels as the trait, which accounts for correct/incorrect response patterns to the test items. (2) The interconnection between this performance and the intrinsic trait or set of traits can be represented by a monotonically increasing function. That is to say, as the level of ability or trait of interest increases, the probability of a response reflecting this increase (specifically for this context, a correct response) also increases (Rogers, Swaminathan and Hambleton, 1991). A plethora of possible IRT probability models are available, and a few of the most common will be briefly discussed. Each of these three models maps out a probabilistic 5 association between the items and the ability level of the examinee (j), denoted as θ . j These three models are respectively referred to as the one-, two-, and three-parameter € logistic models. The one-parameter logistic model considers the difficulty level of the items on the test, denoted as b for each item i. As the name suggests, item with a higher i degree of difficult are harder to respond correctly to, and hence call on a higher level of the ability trait θ to do so. The probability of obtaining a correct response to item i given a specific ability level θ is shown in Equation 1: j e(θj−bi) P (Y =1θ ) = (1) ij ij j 1+e(θj−bi) for i = 1, 2, …, n, where n is the total number of items, and Y denotes the response to i € item i (Rogers, Swaminathan and Hambleton, 1991). The one-parameter logistic model is also referred to as the Rasch model. Next, the two-parameter logistic model also involves this item difficulty parameter, b, but includes another item parameter which deals with the item i discrimination, denoted as a. Item discrimination reflects an item’s facility in i discriminating between examinees of differing ability levels. The value of the item discrimination parameter is proportional to the slope of the probability function at the location of b on the ability axis (Rogers, Swaminathan and Hambleton, 1991). The i probability of obtaining a correct response to item i given an ability level θ , is shown in j Equation 2, as described by the two-parameter logistic model: eDai(θj−bi) P(Y =1θ ) = (2) i ij j 1+eDai(θj−bi) 6 € for i = 1, 2, …, n, and where D is a scaling constant, and n and Y hold the same meaning i as in the previous model (Rogers, Swaminathan and Hambleton, 1991). Finally, the three-parameter logistic model (3PL) includes both the difficulty level b and discrimination parameter a, but also adds a third item parameter, called the i i pseudo-chance level, denoted as c (Rogers, Swaminathan and Hambleton, 1991). The i pseudo-chance level allows for the instance where the lower asymptote of the probability function is greater than zero; that is to say, the examinees have some non-zero probability of responding correctly to the item regardless of ability level. The 3PL probability of obtaining a correct response to item i given a θ level of ability is shown in Equation 3: j eDai(θj−bi) P (Y =1θ ) =c +(1−c ) (3) ij ij j i i 1+eDai(θj−bi) for i = 1, 2, …, n, and all variables are defined as previously noted (Rogers, Swaminathan € and Hambleton, 1991). Notice the similarities between these models, and in particular, how each is based on the previous. In fact, the latter two can simplify to the other one(s) by setting c to zero (for the two-parameter logistic model) or by setting a equal to one i i and c to zero (for the one-parameter logistic model). i These are just a few of the possible model available for describing the relationship between item response patterns and the latent individual ability levels and item parameters. The next portion of this work will discuss a different approach to testing called diagnostic assessment, and will eventually lead to a discussion about how to combine these long-established IRT approaches with new diagnostic assessment techniques. 7

Description:

Diagnostic assessment is a very advantageous approach to measurement. In traditional . “task, subtask, cognitive process, or skill” somehow involved in the measure (Tatsuoka,. 1995, p.330). dichotomous ability parameters obtained from the Arpeggio analysis will be considered the known or

v The Application of a Cognitive Diagnosis Model via an Analysis of a Large-Scale Assessment ... PDF

74 Pages·2005·2.03 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview v The Application of a Cognitive Diagnosis Model via an Analysis of a Large-Scale Assessment ...

Description:

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.