ebook img

ERIC ED374158: Comparing BILOG and LOGIST Estimates for Normal, Truncated Normal, and Beta Ability Distributions. PDF

31 Pages·1994·0.85 MB·English
by  ERIC
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ERIC ED374158: Comparing BILOG and LOGIST Estimates for Normal, Truncated Normal, and Beta Ability Distributions.

DOCUMENT RESUME TM 022 087 ED 374 158 Abdel-fattah, Abdel-fattah A. AUTHOR Comparing BILOG and LOGIST Estimates for Northal, TITLE Truncated Normal, and Beta Ability Distributions. PUB DATE 91, 31p.; Paper presented at the Annual Meeting of the NOTE (New American Educational Research Association Orleans, LA, April 4-8, 1994). Evaluative/Feasibility (142) Reports PUB TYPE Speeches /Conference Papers (150) MFOI/PCO2 Plus Postage. EDRS PRICE *Ability; *Bayesian Statistics; *Estimation DESCRIPTORS (Mathematics); *Maximum Likelihood Statistics; Monte Carlo Methods; Sample Size; *Statistical Distributions; Test Length Ability Parameters; *BILOG Computer Program; Data IDENTIFIERS Truncation; Item Parameters; *LOGIST Computer Program ABSTRACT The accuracy of estimation procedures in item using Monte Carlo methods and varying response theory was studied of ability sample size, number of subjects, and distribution (1) joint maximum likelihood as implemented in the parameters for: likelihood; and (3) computer program LOGIST; (2) marginal maximum computer program marginal Bayesian procedures as implemented in the accurate item BILOG. Normal ability distributions provided more estimation procedure, parameter estimates for the marginal Bayesian of examinees were especially when the number of items and the number generally more small. The marginal Bayesian estimation procedure was b, and c parameters. When accurate than the others in estimating a, likelihood estimates ability distributions were beta, joint maximum accurate as the of the c parameters were the most accurate, or as sample size corresponding marginal Bayesian estimates depending on obtaining accurate and test length. Guidelines are provided for is estimation for real data. The marginal Bayesian procedure when the ability recommended for short tests and small samples maximum likelihood distribution :1 normal or truncated normal. Joint and the is preferred for large samples when guessing is a concern and 27 figures ability distribution is truncated normal. Five tables references.) (Author/SLD) present analysis results. (Contains 30 *********************************************************************** * made Reproductions supplied by EDRS are the best that can be * from the original document. ********************************************************************** "PERMISSION TO REPRODUCE THIS SFNMENT OF EDUCATION U.S. Do MATERIAL HAS BEEN GRANTED BY ,honal Reeserch and Improvement Ofhce of r RESOURCES INFORMATION EDUC' r161 F, T78 EL- Ffir7;9h1 CENTER (ERIC) (I/ a document has been reproduced as organization 'cowed horn the person or onginatmg it mid, to fenprove 0 Mmor changes hive been reproduction (lushly this d0Cu TO THE EDUCATIONAL RESOURCES Pants of vtaw O. oPmonsstleAd in represent othcraI men1 do not DOCIIIISOnly INFORMATION CENTER (ERIC)." OEM position or policy Comparing BILOG and LOGIST Estimates for Normal, Truncated Normal, and Beta Ability Distributions' A. Abdel-fattah Abdel-fattah American College Testing, 1994 representing 1. The opinicns expressed in this article are those of the author and should not be mistaken as the policy of American College Testing. 2 BEST COPY AVAILABLE Abstract in item The purpose of this study is to compare the accuracy of three estimation procedures implemented in the computer program LOGIST; response theory: the joint maximum likelihood as procedures as implemented in the the marginal maximum likelihood; and the marginal Bayesian using data generated by a Monte computer program BILOG. The comparisons were conducted of items, the number of Carlo simulation based on the three-parameter logistic model. The number The ability parameter subjects, and the distribution of ability parameters varied in each simulation. distribution was the variable of most concern. estimates for the Normal ability distributions provided more accurate item parameter items and number of Marginal Bayesian estimation procedure, especially when number of generally more accurate examinees were small. The Marginal Bayesian estimation procedure was When the ability distribution than the other two procedures in estimating a, b, and c parameters. estimates of the c parameters were the most accurate or as was beta, the Joint Maximum Likelihood estimates depending on sample size and test accurate as the corresponding Maarginal Bayesian length. data and sample Guidelines were provided for obtaining accurate estimation using real this study. For example, the sizes, test lengths, and ability parameter 1istributions investigated in small samples for estimating a, Marginal Bayesian procedure is recommended with short tests and truncated normal. The Joint b, and c parameters when the ability distribution is normal or is a concern and the ability Maximum Likelihood is preferred with large samples when guessing distribution is truncated normal. Comparing BILOG and LOGIST Estimates for Normal, Truncated Normal, and Beta Ability Distributions investigating item bias, and Item response theory. (IRT) is used in equating, scoring, estimates provided establishing item banks. Consequently, comparing the accuracy of parameter One critical to the use of IRT. by BILOG and LOGIST, the most common IRT programs, is of generated values ;,` feasible way tr compare the two programs fairly is to broaden the range the two programs for three ability parameters (Mislevy and Stocking, 1987). This study compares 3PLM was chosen for this distributions using the three-parameter logistic model (3PLM). The applies to the simpler one- and study because the solution to any problem under this model also the one-parameter model do not generalize two-parameter logistic models while some solutions for choice of the more complex model should not to the two- or three-parameter models. The of choice is model fit not undermine the use simpler models because in practice the criterion complexity. The 3PLM has the mathematical form (I) = Ci 1 + expf, correctly by examinee j, ci = the lower where PA) = the probability of item i being answered (i = 1,2,3,...,K; j = 1,2,3,..., (0j bi) where asymptote of item response curve; fi.; = -1.702ai for item i, ej = ability N), ai = discrimination parameter for item i, bi = difficulty parameter and K = number of items. parameter for examinee j, N = number of examinees, that can be used with all three There are currently four well-founded estimation methods maximum likelihood (MML); logistic models: the joint maximum likelihood (JML); the marginal methods. The JB is only presented for the joint Bayesian (JB); and the marginal Bayesian (MB) unavailability of the implementing completion and it will not be included in the study due to for N examinees is the product of N likelihood computer program. The likelihood function functions given by the equation 1- uij K uij N (2) II 1,(0;a,b,c) = II [Pi (OA [Q.i(ei)] j=1 i=1 Pi(0j) is defined by equation (1) where N is the number of examinees, K is the number of items, for the 3PLM, and Qi(0j) = 1 - Pi(0j). Review Estimation Methods earlier are presented, the literature In this section, the four estimation procedures identified is reviewed, and the literature related comparing their accuracy as sample size and test length vary is also reviewed. The JML procedure, the oldest of to distributional variations in IRT estimation maximize the likelihood function or its the four, consists of finding IRT parameter estimates that logarithm: K N E [uij log Pi(ej) + (1-uij) log Qi(0j)] (3) In [L(0,a,b,c)] = j=1 i=1 2 Oi the unknown parameter. The First the initial values of ai, bi, and ci, are used in estimating unknowns to be estimated. This estimated ei is used in the second stage treating ai, bi, and ci as values converge to the final estimates when two-stage process is repeated until the ability and item is negligible. The most commonly known the difference between estimates of successive stages (1974). LOGIST has been implementation of JML is the program LOGIST developed by Lord major revision (Wingersky, available since 1973 (Wingersky & Lord, 1973) and has undergone JML is that item and ability 1983: Wingersky, Barton & Lord, 1982). The main problem with these estimates may not be consistent. Both parameters are estimated simultaneously, therefore model (Haberman, item and ability parameter estimates can be consistent for the one-parameter Swaminathan & Gifford, 1983) 1975) and the two- and the three-parameter models (Lord, 1975; ability estimates do not exist for when sample size and test length are large enough. JML estimates for items answered examinees with either perfect or zero scores and JML item parameter the a and c estimates may drift out of either correctly or incorrectly by all examinees. In LOGIST, Swaminathan and Gifford (1987) placed an bound unless limits are placed on them. For example, of .06 on the a estimates. However, true JML estimates are upper limit of 2.0 and a lower limit distributions which is the key to Bayesian estimation not obtained when using restrictions or prior procedure. Gifford, 1982, 1985, & 1986) the In the joint Bayesian (JB) method ( Swaminathan & for each of the item and ability likelihood in equation (2) is multiplied by a prior distribution parameters to obtain the JB function (4) f(0;a,b,c) = L(0;a,b,c)g(0)g(a)g(b)g(c) posterior distribution of these parameters. For The resulting expression is proportional to the joint prior distribution for the example, Swaminathan and Gifford used a normal/gamma/normal/beta priors tends to prevent estimates bi, and ci parameters. The use of these or other suitable ai, Off, authors implemented JB procedure in a from drifting to intuitively unreasonable values. The available for general distribution. A modified JB was computer program that is not currently ASCAL (Vale & Gialluca, 1985). The implemented in the microcomputer-based program LOGIST were combined with the likelihood equations modified for omitted items used in beta/beta/normal Bayesian prior distributions on a, c, and 0 parameters. Lieberman (1970). The use of the The MML procedure was introduced by Bock and problem of inconsistent item parameter marginal rather than the likelihood function eliminated the probability density function for the ability estimates. Multiplying equation (1) by g(0), the 0 we obtain the marginal probabilities of the response parameters, and integrating with respect to pattern jl p(u le) g(0) de (5) J Pao = 00 interpreted as the marginal likelihood function Once the data are observed this probability can be likelihoods for all examinees is the marginal likelihood for a given examinee. The product of these function for the entire data set which can be written as (6) L(0 ;a,b,c)g(0) dO L(a,b,c) = e maximize the likelihood function. The MML estimates are the values of a, b, and c that the likelihood equations. The Bock and Lieberman (1970) gave a numerical solution to applicable to tests with 10 or fewer items. solution was computationally burdensome and only computational problems. Mislevy and Bock Bock and Aitken (1981) refined this solution to avoid In the MML the item parameters are 1984) implemented this procedure in the program BILOG. considering examinees as a random sample estimated without reference to ability parameters by likelihood function using an approximate ability from a population and integrating them out of the sufficiently large number of For a good approximation of this distribution, a distribution. and the integration process, MML involves exarninees is required. Because of this requirement than JML, JML. However, MML estimates are more consistent more computation than does The maximum Unlike JML, MML has no ability estimates of its own. especially for short tests. MML item parameter estimates and can likelihood (ML) estimates of abilities can be obtained using number of items, the better are the ML-MML be abbreviated as ML-MML. The larger the drift to extreme values. Poor c estimates estimators. As with JML, MML a and c estimates may (Swaminathan & Gifford, 1985). Limits degrade estimation of other item and ability parameters However, these limits and priors drifting. and prior distributions can be used to prevent this of prior distributions introduces the concept produce estimates that are not purely MML. The.use of MB estimation. likelihood given by equation (6) is multiplied In the marginal Bayesian (MB) procedure, the expression is proportional to the posterior by prior distributions for a, b, and c. The resulting density of a, b, and c and can be written as (7) L(a,b,c) = L(a,b,c)g(a)g(b)g(c) drifting to extreme values. Instead, MB tends to prevent item parameter estimates from distribution for item parameters. That center values are pulled towards the center of the prior the priors (Mislevy & Bock, 1984). differs slightly from where it would have been without priors entirely and use MML. For in favorable data, it is preferred to avoid Therefore, fixed prior means at each iteration. unfavorable data, MB of BILOG allows the use of updated or updated prior means should be used, while When samples are large relative to the number of items, preferred. The default priors in BILOG are for small samples, the fixed prior means are lognormal, normal, and beta for a, b, and c; respectively. in Dempster, Rubin, and Tsutakawa Related work on the MB procedure can be found Tsutakawa (1984, 1986). The iterative solution (1981), Rigdon and Tsutakawa (1983, 1987), and than the similar solution by Bock and introduced by Dempster et al. (1981) was more general variables with exponential distributions but the Aitken (1981). The latter was limited to random belonging to non-exponential family distributions. former was extended to random variables maximum likelihood with a fixed b parameter Rigdon and Tsutakawa (1983) derived a marginal logistic model by integrating over O. This is and a random 0 parameter for the one-parameter From the MLF, the conditional maximum called the maximum likelihood fixed (MLF) procedure. the posterior mean of each 0 in the estimation likelihood fixed (CMLF) was developed by using their posterior approximate the unknown Bayesian priors conditioned upon process of the priors to 4 computation required by the conventional MML procedure means. This approximation reduces the Tsutakawa (1987) derived two more when used in estimating priors. From CMLF, Rigdon and called the conditional maximum MB procedures under the one-parameter logistic model. These are likelihood uniform (CMLU). The prior likelihood random (CMLR) and conditional maximum procedure and uniform in the CMLU distribution of b parameter was random in the CMLR with normal prior distribution. The procedure. The ability parameters were assumed random logistic models in a authors implemented these procedures for the one- and two-parameter distribution. computer program unavailable for general Sample Size, Test Length, and Estimation Procedure procedure2 (e.g., Ree, 1979; The JML procedure was found superior to Urry's be superior to the heuristic Swaminathan & Gifford, 1983). The JML was also found to Gialluca, 1988). Comparing BILOG and approximation as implemented in ANCILLES-X (Vale & for item parameters (or the ML LOGIST, Swaminathan and Gifford (1987) concluded that MML JML procedure in estimating a, b, and 0 for ability parameters) is generally superior to the logistic models, particularly when small sample size parameters of the one- and two-parameter model, LOGIST was superior in and/or short test lengths were used. For the three-parameter BILOG was superior in estimating the a estimating b, c, and ability parameters, whereas superior because in LOGIST the a parameters. LOGIST estimates of ability parameters were the inestimable c parameters were set to a parameters were constrained to a reasonable range, of better with the uniform 0 used in the study. The ML common value, and the program works estimated by MML. The a ability parameters are based on the unconstrained item parameter calculation of the mean squared deviations estimates greater than 4.0 were excluded upon the however, these excluded values were greater (MSDs) for both the LOGIST and BILOG estimates; in number for LOGIST than they were for BILOG. employed the MB procedure for item Using a broader range of generated data , Yen (1987) posteriori (EAP) as well as the ML procedures (ML-MB) parameter estimation and the expected a these estimates with the corresponding for ability estimation using BILOG. She compared logistic model. The 0 estimates of EAP were estimates by LOGIST under the three-parameter estimates. Her study was limited to 20- and found to be better than either the MB-ML or the JML values was investigated only over the 40-item tests with 1,000 examinees. Convergence to the true limitations, BILOG was not superior increase in test length from 20 to 40 items. In spite of these the superiority of LOGIST in some cases might be attributed to to LOGIST for all cases. The handle extreme estimates. BILOG choice of generated values, or to the way the two programs distribution so that the center differs a little pulls extreme values towards the center of the prior LOGIST, upper and lower limits are placed from where it would be if the priors were not used. In Qualls and estimates to prevent them from drifting to extreme values. on the a and c parameter with various levels of test lengths and Ansley (1985) used a limited range of generated data, model. They indicated that with ML 0 estimates, sample sizes, under the three-parameter logistic of assigning the lower-bound ability to high- the biweight robustification3 eliminated the problem early version of the computer 2. This is the heuristic approximation procedure as implemented in the program ANCILLES. estimation in the presence 3. A technique of robust data analysis that improves the accuracy of scale score incorrect and by giving reduced weight to unlikely correct of mixed omitting and guessing by scoring omits as responses to suppress the effects of guessing. ability estimation scoring examinees who missed an easy item. Thus with ability robustification, LOGIST. was more accurate with BILOG than with the JB estimates are Swaminathan and Gifford (1982, 1985, & 1986), have shown that and are more accurate, even superior to the JML estimates because they do not drift out of range, generated parameters. The JB when the prior distributions differ from the distributions of the corresponding estimates of LOGIST estimates of ASCAL were also found to be better than the does not provide estimates of the ability (Vale & Gialluca, 1985, & 1988). The JB of ASCAL and takes a long time running large data sets. parameters, is only available for micro-computers, available for general distribution. The J-13 of Swaminathan and Gifford is not currently available procedures for Consequently it can be concluded that the most important and of BILOG. Precautions were comparisons were the JML of LOGIST and the MB and the MML For example, both small and taken so that data generated were reasonable for the two programs. The JML converges only as both large sample sizes and test lengths were used in the comparison. The MML and the MB item parameter the number of items and the number of examinees increase. increases. Thus the small and the estimates converge to their true values as the number examinees important and more reasonable large sample-size and test-length combinations were found more number of examinees chosen in this study than other combinations. The number of items and the accordance with some of the aforementioned studies (e.g., were defined as small and large in Swaminathan & Gifford, 1987). IRT Parameter Distribution and Estimation Procedures the distributions of item or The JML procedure does not incorporate any assumptions about about the ability distribution. The ability parameters. The MML procedure requires an assumption priors of both item and ability parameter JB and MB procedures require assumptions about the impact of different IRT parameter distributions. There is a small body of literature about the For example, Swaminathan and distributions on the efficiency of various estimations procedures. found it had little effect on JML Gifford (1983) varied the ability parameter distribution and of a and c parameters. The a and estimation of the ability and b parameters but did affect estimation normal with negatively skewed ability than with the uniform or c estimates were less accurate and c estimates than ability distribution. The uniform ability distribution produced more accurate a found the poorest item parameter estimates the normal ability distribution did. Ree (1979) also best item parameter estimates with with the positively skewed ability parameter distribution and the include the MML or the MB procedures in the uniform ability distribution. The two studies did not of estimation due to the ability the comparison. They both reported differences in accuracy about the importance of varying the ability parameter distribution and provided some insight found to be superior to the JML of LOGIST parameter distribution. The JB procedures were also do not drift out of range. (Swaminathan & Gifford, 1982, 1985, & 1986) because their estimates different from the generated They were more accurate even when the prior distributions were values. estimates with the true values except for The preceding studies used only the correlations of The MSD and its component variance Yen who used the mean squared deviations (MSDs) as well. various levels, while correlations do not. and bias provide a means for examining estimates at several levels of estimates. None of the preceding studies provided such comparative measures at practical interest at several estimate levels Swaminathan an Gifford (1987) reported differences of worked well with LOGIST, in the three-parameter but they used only uniform distributions, which in estimation accuracy across several model. Thus it is important to investigate differences favor one program over another. It is distributions and to include ability distributions that do not length to show convergence across the ability also important to vary the sample size and test items and numbers of distributions. The two studies that used large and small numbers of of Wingersky and Lord (1985). 'examinees were that by Swaminathan and Gifford (1983) and that latter study LOGIST estimates The two studies did not investigate the BILOG procedures. In the were used as the true values. the effect of ability With the exception of Yen's, none of the preceding studies compared (1987) varied the ability distributions on the estimation accuracy of MB, MML, and JML. Yen and EAP procedures. She held the a and parameter distributions and included the JML, MB, ML, The ability parameter distributions used were c parameters and the number of examinees constant. of ability parameters had slightly kurtic and slightly skewed. Therefore, varying the distribution Yen. The distributions of the only a slight effect on the accuracy of the procedures investigated by Tsutakawa (1987). Rigdon and b parameters were also varied by Yen (1987) and by Rigdon and non-normal b parameter Tsutakawa recommended the CMLR for small sample size and restricted to the one- and distributions. Because the CMLR program is not available publicly and is study. The MB procedure of BILOG was two-parameter models, the CMLR was not used in this used instead. the uniform and the Among the non-normal ability distributions used in the literature are truncated normal distribution used beta distribution used by Swaminathan and Gifford (1983); the used by Yen (1987). The beta and by Ree (1979); and the skewed and the platykurtic distributions study because these are realistic the truncated normal distributions were selected for the present studies. The uniform distributions that showed a negative impact on estimation in previous deviate sufficiently from distribution is unrealistic and Yen's distribution apparently did not normality to have an effect on estimation accuracy. Methodology Design distribution (normal, The conditions varied in this study were the ability parameter size (250, 1000). For each truncated normal, and beta), the test length (20, 60), and the sample items of each replication were combination of this design the data were replicated 10 times and ability estimates of LOGIST were calibrated by three procedures: JML, MML, and MB. The JML MML (ML-MML) and MB (ML-MB) compared to the by-product ML ability estimates from lengths x 2 sample sizes x 3 estimates of BILOG. The total number of data subsets is 120 (2 test estimation procedures x 10 replications). Data Generation and Calibration & Rovinelli, 1973) but is The data generator used is similar to DATAGEN (Hambleton required by this study. The generation capable of manipulating the IRT parameter distributions as of items, the number of examinees, and a suitable seed process starts with specifying the number Then, the normal, the beta, and the which produces reasonable ranges of parameter interval. and the beta and truncated normal truncated normal ability parameter distributions are generated; and beta ability distributions are in distributions are standardized4. The normal, truncated normal, incomplete beta 4. The mean and standard deviation of the beta distribution were taken from Table II of deviation of truncated distributions were distributions by Pearson and Hart ly (1956, p. 436). The mean and standard 13 = 1)] - µ 2) -.5 , [1 + IG(c/2; a = 1.5, calculated using the formulae: p. = 1.5 (2705 (e)-5c and a = (3/4 integral of the incomplete gamma function with where c is the square of the cut off score (.053), and IG is the and 13 was obtained from Table I of the incomplete parameters c/2, a , and p. The integral with parameters c12, a, r-function by Pearson and Hartly (1956, p. 2.). 7 respectively. The ai, bi, and ci the ranges (-3.142, 3.020), -1.534, 4.210), and (-3.635, 1.484) distributions. The a, b, and c parameters are generated from lognormal, normal, and beta 2.478), (-2.19, 2.23), and (0.009, 0.343) parameters are generated to fall in the ranges (0.363, Pi(0i) are computed with respectively. Using ai, bi, ci, and standardized ®, the probabilities closed The random numbers Xii are generated from a uniform distribution on the equation (1). with Pi(®). If Xii interval zero to one. The item responses U are generated by comparing Xii Previous steps are repeated to obtain 10 replications for 5:- NO) then Ujj =1 otherwise lib = 0. more accurate and stable results. options used The generated data are then used as the input for LOGIST and BILOG. The procedure the default priors of in the two programs are they default options. For example, for MB in BILOG at 30 for the EM- BILOG are used. The default number of iteration cycles was fixed 1000 examinees (60X1000) was calibrated step and at 6 for the Newton-step. The 60 item test and by selecting the Other subsets (i.e., 60X250, 20X250, and 20X1000) are then calibrated first. specified number of items and number of examinees and running each of the two programs. Common Metrics for Estimates and True Values resealed to be The a, b, and 0 estimates from 120 various BILOG and LOGIST runs were chi-square scaling method comparable to the corresponding generated true values using the b's, and the ability estimates described by Divgi (1985). Rescaling is necessary to put the a's, the values (Swaminathan & Gifford, of the two programs on the same scale as the corresponding true b12,* = A bit + B, and 0;2* = 1987). The equations of linear transformations are ail* = ai2JA, where A is the slope and B is the intercept. A Op + B , Comparison Indices using the The true parameters were compared to the resealed estimated parameters estimates with the true following four criteria of accuracy. These are, the correlation of the deviation (MSD) of the estimators. The parameters, the bias, the variance, and the mean square between each estimate its first order product moment correlation was used to represent relationship do not reflect corresponding to .e value. These correlations reflect only linear relationship and they correlation percentiles of the 10 the accuracy over replications, therefore the 25th, 50th, and 75th from its true value was replications were calculated. In addition, the MSD of each estimator calculated using the formula N (Ti - Ei)2 N (7) MSD = i=1 estimated parameter for item where Ti is the true parameter value for item (or examinee) i, Ei is the MSD is the total variance attributed (or examinee) i, and N is the number of estimated parameters. sampling errors relate to the stability of estimation to random and measurement errors. Random cr is called the variance and it can be computed as follows over replications. This component of MSD N (E - Ei)2 (8) = Variance 1=1 of the total variance is where E is the mean of estimated parameters. The remaining component This component is called the bias and attributed to errors other than those of sampling fluctuation. Variance. it can be obtained using the following equation: Bias = MSD - 8 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.