Article AppliedPsychologicalMeasurement IRT Parameter Estimation With 34(5)327–347 ªTheAuthor(s)2010 Response Times as Collateral Reprintsandpermission: sagepub.com/journalsPermissions.nav DOI:10.1177/0146621609349800 Information http://apm.sagepub.com Wim J. van der Linden1, Rinke H. Klein Entink2, and Jean-Paul Fox2 Abstract Hierarchical modeling of responses and response times on test items facilitates the use of responsetimesascollateralinformationintheestimationoftheresponseparameters. Inaddi- tion to the regularinformation in the response data, twosources of collateral information are identified:(a)thejointinformationintheresponsesandtheresponsetimessummarizedinthe estimatesofthesecond-levelparametersand(b)theinformationintheposteriordistributionof theresponseparametersgiventheresponsetimes.Thelatterisshowntobeanaturalempirical priordistributionfortheestimationoftheresponseparameters.Unliketraditionalhierarchical itemresponsetheory(IRT)modeling,wherethegaininestimationaccuracyistypicallypaidfor by an increase in bias, use of this posterior predictive distribution improves both the accuracy and the bias of IRT parameter estimates. In an empirical study, the improvements are demon- strated for the estimation of the person and item parameters in a three-parameter response model. Keywords collateral information, empirical Bayes estimation, item response theory (IRT), hierarchical modeling, response time Itemresponsetheory(IRT)modelsbelongtoaclassofmodelsthatexplainthedataforeach unitofobservation(here:eachcombinationofatesttakerandanitem)bydifferentparameters. Oneofthemainadvantagesofahierarchicalapproachtothisclassofmodelsistheborrowingof information on one parameter from the data collected for the units associated with the other parameters. This borrowing is possible through the assumption of a common distribution of theparametersatasecondlevelinthestatisticalmodelforthedata.Theestimatoroftheparam- eterthentypicallycompromisesbetweentheassumeddistributionandthelikelihoodassociated withthedata.Indoingso,ittendstostrikeaprofitablebalancebetweenignoringthedataonthe other parameters (¼case of separate estimates) and the rather stringent assumption of all 1CTB/McGraw-Hill,Monterey,California,USA 2UniversityofTwente,Enschede,TheNetherlands CorrespondingAuthor: WimJ.vanderLinden,CTB/McGraw-Hill,20RyanRanchRoad,Monterey,CA93940 Email:[email protected] Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010 328 Applied Psychological Measurement34(5) parametersbeingequal(¼caseofpooledestimates).Theprofittypicallyoccursintheformof ahigherefficiency ofthe inferenceat the costof alessserious increase inbias. Oneofthefirstexamplesintesttheorydemonstratingthisprincipleistheclassicaltrue-score estimate based onKelley’s regressionfunction, EðTjX ¼xÞ¼r xþð1(cid:2)r Þm ; ð1Þ XX0 XX0 T whereX istheobservedscoreofthetesttaker,T thetruescore,m themeantruescoreinthepop- T ulation oftesttakers,and r the reliabilityofthe test (Lord & Novick, 1968, section 3.7). An XX0 estimateofthetesttaker’struescore,^t,isobtainedbysubstitutingestimatesofm andr derived T XX0 fromthemarginaldistributionoftheobservedscoresintoEquation1.Theestimateisacompromise between X ¼x as a direct estimate of t and the estimate of the population parameter m in the T sense ofa linear combination with weightsr and 1(cid:2)r . Butusing a well-known variance XX0 XX0 partition in classical test theory, the estimate can be shown to be also equivalent to the preci- sion-weightedaverageofxandm^ (Novick&Jackson,1974,equation9.5.11). T AsdiscussedextensivelyinNovickandJackson(1974,section9.5),theKelleyestimateisan instanceofthemoreformalproblemofestimatingmultiplemeanssimultaneously.Laterexam- plesofapplicationsofthesameprincipleofborrowinginformationaretheestimationofmultiple regressionsinmgroupsfromnormaldatainNovick,Jackson,Thayer,andCole(1972)andthe estimationofproportionsinmgroupsfrombinomialdatainNovick,Lewis,andJackson(1973). Aninstructive empiricalapplicationoftheestimationofmultipleregressionsistheoften-cited studyof theeffects of coachingschools on SATscores inRubin(1981). Thenatureofthe‘‘distributionoftheparameters’’hasnotyetbeenspecified.Inafrequentist approach, the units of observation are typically assumed to be sampled from a population and hence theirparameters are takentobe random.The interpretationofKelley’s estimate inclas- sical test theory belongs to this approach. From a Bayesian perspective, any density that approximates theempirical distribution ofthe parameters forthe dataset becomes aprofitable commonpriorfortheinferencewithrespecttoeachoftheseparameters.Thedifferencebetween ahierarchicalmodelwithapopulationdistributionatthehighestlevelandthisempiricalBayes approach resides only in their motivation and interpretation; the more formal aspects of both approaches involve the same two-level structure. For an introduction to the empirical Bayes approach, see, forexample, Carlinand Louis(2000, chap. 3). Toemphasizethattheborrowedinformationiscollectedsimultaneouslywiththedirectinfor- mation on the parameters, Novick and Jackson (1974, section 9.5) introduced the notion of collateral information. This term avoidsthe moretemporal connotationinthe Bayesian termi- nologyofpriorinformation,whichsuggeststhattheextrainformationshouldalwaysbepresent before anydata onthe parameters iscollected. Itshouldbenoticedthattheuseoftheterminformationdiffersfromthatelsewhereinscien- tificendeavors,whereitistypicallytakentoimplythatobservationscanbepredictedfromother variables.However,collateralinformationinthehierarchicalsensedoesnotrequirethepresence ofanypredictingvariablesbutisalreadyavailableiftheunitsofobservationcanbeassumedto follow a common distribution. If the assumption holds, as soon as data are collected for the parametersofsomeoftheunits,informationisreceivedonallofthem,forexample,abouttheir typical range ofvalues. Inthisarticle,first-levelmodelsarecombinedfortheresponsesandtheresponsetimes(RTs) bythetesttakersontheitemswithsecond-levelmodelsforthejointempiricaldistributionsof theiritemandpersonparameters.Asaresult,itisnotonlypossibletoborrowinformationonthe responseparameterforoneitemfromtheresponsescollectedfortheotheritemsbutalsofrom theRTscollectedforthesameitem.Thesameborrowingispossiblefortheresponseparameter Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010 vander Linden etal. 329 foreachperson.BecauseresponsesandRTsarealwaysrecordedsimultaneously,theadditional informationintheRTsisliterallycollateral.Surprisingly,asshownlaterinthisarticle,thefact thatthecollateralinformationisspecificfortheindividualitemsandpersonsleadstoimprove- mentofboththeaccuracyandthebiasoftheestimates.Indoingso,theinformationthusbreaks the bias–accuracy tradeoff typical ofmoretraditional hierarchical modeling. Theresearchinthisarticlewasmotivatedbythefactthatnowthatcomputer-basedtestinghas becomemorepopular,andRTsinthismodeoftestingareautomatically recorded,itwouldbe imprudenttoignoresuchinformation.Atthesametime,thereisacontinuousneedforitemcal- ibrationfromsmallersamplesandabilityestimationfromshortertests.Thelatterisparticularly helpful to maintain realistic time limits on batteries of multiple tests as they are used, for instance,indiagnostictestingforvocationalguidanceorinstructionalpurposes.Beforeshowing howjointhierarchicalmodelingofresponsesandRTshelpstoexploitsuchinformation,acloser lookistakenfirstattheroleofcollateralinformationinthemoretraditionalproblemofparam- eter estimation inaseparate IRT model. Collateral Information in IRT Theexampleconsideredistheestimationofabilityparametery foratesttakerjinaresponse j model (e.g., the well-known three-parameter logistic model) of which the item parameters are already known. The case is met, for instance, when a test from a calibrated item pool is used tomeasure the abilities ofseveraltest takers. SupposethetesttakersarefromapopulationwithanormaldistributionofabilityNðm ;s2Þ, y y ofwhichthemeanm andvariances2havealreadybeenestimatedwithenoughprecisiontotreat y y themasknown.Estimates ofy thatcapitalize onthisinformationshouldbebased onthepos- j teriordistribution fðyju;m ;s2Þ/fðujyÞfðyjm ;s2Þ; ð2Þ j j y y j j j y y whereu ¼ðu u ÞaretheresponsesbytesttakerjonthenitemsinthetestandfðujyÞisthe j 1j;:::; nj j j probability ofthe observed response vectorby the test taker underthe response model. Themeanofthisposteriordistribution,whichisoftenusedasapointestimateofy,isgen- j erallyknowntohaveasmallermeansquareerrorthanaclassicalestimatebasedontheproba- bilityoftheobserveddata,fðujyÞ,only.Thedecreaseisduetotheinformationinthepopulation j j density fðyjm ;s2Þ in the right-hand side of Equation 2, which shows, for instance, where the j y y ability parameters in the population are concentrated and how much they are dispersed. The decreaseispaidforbyanincreaseinthebiasoftheabilityestimatetowardthemeanofthepop- ulation of test takers or the domain of items. This combination of effects is an example of the well-knownbias–accuracytradeoffinstatistics.However,inhierarchicalmodeling,thetradeoff isexploitedtoworktoouradvantage;theimprovedaccuracyisgenerallyrealizedatalessseri- ous increase in bias. The same principle can be shown to hold for the estimation of the item parameters. Hierarchical Model ToprofitfullyfromtheinformationontheIRTparametersintheRTs,amodelhastobeadopted fortheRTsandthecommondistributionofallpersonanditemparametershastobemodeledas well.TheresultisahierarchicalframeworkwiththeIRTandRTmodelsasfirst-levelcompo- nentsandpopulation anddomain modelsfortheIRTandRTparameters assecond-level com- ponents. Fora graphicalillustration ofthe framework, see Figure1. Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010 330 Applied Psychological Measurement34(5) Population Item Domain μP, σP μI, ΣI Item Person Item Person a,b,c θ α, β τ i i i j i i j U T ij ij Figure1 Graphicalrepresentationofhierarchicalmodelingframeworkforresponsesandresponsetimes TheRTmodelinEquation4belowwasproposedinvanderLinden(2006)whereastheexten- sionwiththesecond-levelmodelsbelowwasintroducedinvanderLinden(2007).Thesemodels aretakenonlyasanexampletodemonstratethebenefitsofusingRTsascollateralinformation whenestimatingIRTparameters;forspecificapplications,suchastestswithpolytomousitems orfor misreadings, othermodelsneed tobe substituted. Response and RT Models As the first-level model for the responses of test takers j¼1;:::;N on items i¼1;:::;n; the three-parameter normal-ogive (3PNO) model is used, which gives the probability of a correct response onitem ibyperson jas PðU ¼1;y;a;b;cÞ¼c þð1(cid:2)cÞ(cid:2)ðaðy (cid:2)bÞÞ; ð3Þ ij j i i i i i i j i where(cid:2)ð·Þdenotesthenormaldistributionfunction,y istheabilityparameterfortesttakerj, j and a; b, and c are the discrimination, difficulty, and guessing parameters for item i, i i i respectively. Response–timedistributionsareoftenapproximatedwellbylognormaldistributions.There- fore,analogoustotheIRTmodelinEquation3,theRTsaremodeledasalognormalmodelwith aspeedparametert fortesttakerjandtimeintensityanddiscriminationparametersb anda for j i i itemi,respectively.LetT denotetheRToftesttakerjonitemi.Thelognormalmodelposits ij that(van der Linden,2006,2007) (cid:5) (cid:6) fðt ;t;a;bÞ¼ pai exp (cid:2)1(cid:3)aðlnt (cid:2)ðb (cid:2)tÞÞ(cid:4)2 : ð4Þ ij j i i t ffi2ffiffipffiffiffi 2 i ij i j ij Noticethatexceptforthedifference insign,whichisduetothenegativerelationshipbetween timeandspeed,theinterpretationofthetwoparametersforspeedandtimeintensityinEquation 4areanalogoustothosefortheabilityanditemdifficultyinEquation3.However,unlikeEqua- tion 3, RT distributions have a natural zero and do not involve the estimation of any lower asymptote. Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010 vander Linden etal. 331 Population and Domain Models The population model specifies the joint distribution of the person parameters y and t: It is assumed thatthedistribution isbivariatenormal, ðy;tÞ∼MVNðμ ;Σ Þ; ð5Þ P P where μ ¼ðm ;m Þ ð6Þ P y t andcovariance matrix (cid:7) (cid:8) s2 s Σ ¼ y yt : ð7Þ P s s2 yt t Likewise,itemparametersa,b,andc intheresponsemodelanda andb intheRTmodelare i i i i i assumed tohavea multivariatenormal distribution, ða;b;c;a;bÞ∼MVNðμ ;Σ Þ; ð8Þ I I where μ ¼ðm ;m ;m ;m ;m Þ ð9Þ I a b c a b andmatrix Σ hasall variances andcovariances between theitem parameters aselements. I Thishierarchicalmodelisnotyetfullyidentifiable.Inadditiontotheusuallackofidentification forahierarchicalIRTmodel,theparametersb andt intheRTmodelarenotidentified;addition i j ofaconstanttob canbecompensatedbyadditionofthesameconstanttot.Identifiabilityisob- i j tainedifonesetsμ ¼0ands2 ¼1.Thereasons2neednotbefixedtoaknownconstantisthat P y t allparametersintheRTmodelareonatimescalewithadirectobservableunit(e.g.,second). Alternative Component Models Asalreadynoted,alternativeresponseandRTmodelscanbesubstitutedforthethree-parameter response and lognormal RT model in Equations 3 and 4. One set of candidates is the Poisson modelforreadingerrorsandgammamodelforreadingspeedbyRasch(1960/1980).Bothmod- elshaveseparatereadingpersonanditemparametersandthereforeallowforthedistincttypesof populationanditemdomainmodelsintheprecedingsection (vanderLinden,2009).Otherex- amples of response models are the well-known two-parameter logistic model and the Rasch modelormodelsforpolytomousresponses.ForalternativeRTmodelsbasedonaWeibulldis- tribution,seeRouder,Sun,Speckman,Lu,andZhou(2003)andTatsuokaandTatsuoka(1980), whereMaris(1993)shouldbeconsultedforRTmodelsbasedongammadistributions.However, in addition to these distinct models for responses and RTs, hybrid models have been proposed that either model response distributions but have time parameters as well (e.g., Roskam, 1997; Verhelst, Verstralen, & Jansen, 1997; Wang & Hanson, 2005) or RT distributions but includeresponseparameters(e.g.,Thissen,1983).Thesemodelsdonotlendthemselvesforsub- stitution as first-level models in the framework above. For instance, substitution of a response model with RT parameters into (3) would double the presence of this type of parameter in the twofirst-levelmodelsandviolatetheinherentassumptionofconditionalindependencebetween responses and RTs on which the framework is based (see below). A more complete review of Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010 332 Applied Psychological Measurement34(5) alternativeRTmodelsisgiveninSchnipkeandScrams(2002);formoredetailsondistinctand hybridmodels for responseandRT distributions, see vander Linden (2009). Bayesian Estimation Intheempiricalexampleslaterinthisarticle,themodelparameterswereestimatedusingBayes- ianestimationwithdataaugmentationandGibbssampling.Thedataaugmentationwasaccord- ing to the method described in Albert (1992; see also Johnson & Albert, 1999), that is, with postulatedcontinuouslatentvariablesunderlyingtheresponses.GibbssamplersareMonteCarlo Markovchain(MCMC)methodsforsamplingfromtheposteriordistributionoftheitemparam- eters(e.g.,Gelfand&Smith,1990;Gelman,Carlin,Stern,&Rubin,2004,chap.11).Theydoso by iteratively sampling from the conditional posterior distributions of one set of parameters giventhepreviousdrawsfromthedistributionsofallotherparameters.Forthecurrentmodeling framework,aconjugatenormal-inverse-Wishartpriorforthemeanvectorsandcovariancematri- cesforthemultivariatemodelsinEquations5and8wasspecified(Gelmanetal.,2004,section 3.6).Thisjointpriorleadstosamplingfromconditionalposteriordistributionswithknownden- sities.Fortechnicaldetailsoftheestimationmethod,seeKleinEntink,Fox,andvanderLinden (2009) and van der Linden (2007). A package of procedures in the statistical language R that implementsthe method isdescribed inFox,Klein Entink,andvander Linden (2007). Different Sources of Information Thesame principle as in Equation 2 is demonstrated but this time for a test taker j with response vector u ¼ðu ;...;u Þ and RT vector t ¼ðt ;...;t Þ. Again, without loss of generality, it is j 1j nj j 1j nj assumedthatthesecond-levelmeans,μ andμ ,andcovariancematricesΣ andΣ ,havealready P I P I beenestimatedduringitemcalibration.Consequently,y andt aretheonlyunknownparameters. j j The complication we are now faced with is an estimation problem under two separate models—a primary model for the responses and another for the RTs. To assess the improve- mentintheestimationofy relativetoEquation2,itisattemptedtofactorizetheposteriordistri- j bution of y into a product with the primary model probability for the observed responses as j a factor. A comparison between the remainder of this product and the prior distribution of y in j Equation2shouldallowassessingtheimprovementintheestimationofy relativetoEquation2. j Theposteriordistributionofy followsfromthejointdistributionofy andt givenallknown j j j quantities Z fðyju;t;μ ;Σ Þ¼ fðy;tju;t;μ ;Σ Þdt: ð10Þ j j j P P j j j j P P j Forthe integral, itholds that Z Z fðy;tju;t;μ ;Σ Þdt / fðu;tjy;tÞfðy;tjμ ;Σ Þdt: ð11Þ j j j j P P j j j j j j j P P j Hence, because oflocal independence between responses andRTsgiven (y;t), j j Z fðyju;t;μ ;Σ Þ/ fðujyÞfðtjtÞfðy;tjμ ;Σ Þdt: ð12Þ j j j P P j j j j j j P P j Observe that this is a different form of local independence than the usual independence of responses to items conditional on y. For an empirical study confirming the plausibility of the assumption,see vander Linden andGlas (2010). Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010 vander Linden etal. 333 Factorizing fðy;tjμ ;Σ Þ;the followingisobtained j j P P Z fðyju;t;μ ;Σ Þ/fðujyÞ fðyjt;μ ;Σ ÞfðtjtÞfðtjμ ;Σ Þdt j j j P P j j j j P P j j j P P j Z /fðujyÞ fðyjt;μ ;Σ Þfðtjt;μ ;Σ Þdt; ð13Þ j j j j P P j j P P j where the secondstep followsfromthe definitionofthe posterior distributionof t as j fðtjt;μ ;Σ Þ/fðtjtÞfðtjμ ;Σ Þ: ð14Þ j j P P j j j P P Forthe integral inthesecond lineofEquation 13,it holdsthat Z fðyjt;μ ;Σ ÞfðtjtÞdt /fðyjt;μ ;Σ Þ: ð15Þ j j P P j j j j j P P Noticethattheright-handsideofEquation15istheconditionalposteriordensityofy givent, j j thatis, the probability ofthe test taker’sability y given his or her speed t integrated over the j j posterior distributionoft giventhe response timest. j j Thus,it can beconcludedthat fðyju;t;μ ;Σ Þ/fðujyÞfðyjt;μ ;Σ Þ: ð16Þ j j j P P j j j j P P The result has a simple form that is entirely analogous to Equation 2. It shows that when the RTs are used as collateral information, y is estimated from the probability fðujyÞ of j j j the observed response vector u as when the RTs are ignored but with the original prior j distribution of y in Equation 2 replaced by the conditional posterior distribution of y given j the RTs t for the test taker. j Moregenerally,theresultalsoanswerstheearlierquestionofhowtodealwiththepresenceof two different models in the statistical inference for one kind of parameters in a hierarchical frameworkasinEquations3through9.Thesolutionistokeepthemodelwiththeprimarypa- rametersintactbutabsorbthesecondmodelintheconditionalposteriordensityoftheprimary parameters giventhe information collectedfor the otherparameters. Theresult inEquation16 helpsidentify three different sourcesof informationon y : j 1. The information directly available in u in the first factor of Equation 16, that is, the j regularmodel probability fðujyÞassociated with the observedresponse vector. j j 2. Theinformationsummarizedintheestimatesofμ andΣ inthesecondfactor.This P P informationisderivedfromthevectorsofresponsesandRTsintheentiresampleoftest takers.These estimates generalize the roleofthose ofm ands2 inEquation 2. y y 3. The information in the shape of the conditional posterior distribution of the response parametersgiventheresponsetimes.Unliketheestimatesofthepopulationparameters intheprecedingsourceofinformation,theinformationinthisvectorisuniqueforeach individualtest taker. AnanalogousrolefortheRTscanbeshowntoholdfortheestimationoftheitemparameters ξ ¼ða;b;cÞ.Forthesakeofargument,letitbeassumedthatthepersonparametersareknown. i i i i (Actually,intheGibbssampler,eachtimetheconditionalposteriordistributionofξ issampled, i allotherparametersaretreatedasknown.)Usingthesamederivationasbeforewithμ ;Σ ,y, P P j andt replaced by μ ;Σ , ξ;andt, respectively, theresult is j I I i i Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010 334 Applied Psychological Measurement34(5) fðξju;t;μ ;Σ Þ/fðujξÞfðξjt;μ ;Σ Þ; ð17Þ i i i I I i i i i I I wherethedistributionoftheresponsesu givenξ istheoneinEquation3forknownabilityparam- i i etersandtheposteriordistributionofξ givent;μ ;andΣ isobtainedanalogoustoEquation14. i i I I ThetreatmentinEquations10through17wasforthecaseofthesecond-levelparametershav- ingbeenestimatedfromanearliersamplewithenoughprecisiontotreatthemasknown.Butfor thecurrentargument,itdoesnotmatteriftheywereestimatedalongwiththecurrent y andξ j i parameters.Theestimatesofy andξ areconstrainedbythesametypeofconditionalposterior j i distributions fðyjt;μ ;Σ Þ in Equation 16 and fðξjt;μ ;Σ Þ in Equation 17 when the esti- j j P P i i I I mationprocedure fits theestimates ofðμ ;Σ Þand ðμ ;Σ Þ simultaneously tothedata set. P P I I Bias–Accuracy Tradeoff Itisimportanttonoticetheconsequencesofthereplacementofthecommonpriordistributionof y for all test takers in Equation 2 by the individual posterior distribution of y in Equation 16. j This posterior distribution not only has a smaller variance but, because of its conditioning on theRTsforeachindividualtesttaker,alsotendstohavealocationclosertohisorhertrueability level.Theimpactoftheformerisafurtherincreaseoftheposteriorprecisionofy ;theimpactof j thelatterisadecreaseinthebiasoftheposteriormean.Thestrengthofbotheffectsisdependent onthecorrelationbetweenthepersonparametersintheresponseandRTmodels.Obviously,if thecorrelationisperfect,thesameparameterisbeingestimatedtwice.Theindividualposterior distributionofy wouldthentendtoduplicateitsresponse-basedlikelihoodatthesamelocation; j thatis,therewouldbedoubletheinformationandnoconstrainingbyacommonpriorforysat differentlocations. Asalreadyargued,parameterestimationintraditionalhierarchicalIRTmodelingissubjectto the well-known bias–accuracy tradeoff in statistics. For this type of modeling, the addition of a population model for the distribution of the y parameters leads both to an increase in the bias and a decrease in the lack of accuracy of their estimates. However, typically the net result—summarized inthe meansquare errorsofthe parameters—is positive. Ontheotherhand,theposteriordensityofy inEquation16isconditionalontheactualRTs j byeachtesttakerjandthereforeservesasapriorwithanindividuallocationforeachofthem. Unliketheuseofapopulationdistributionasacommonprior,whoselocationnecessarilycom- promises between the true values of the individual ys, and hence produces a bias in their esti- mates, these individual priors avoid the necessity of such a compromise. In addition, they have smallervariances and arethus moreinformative. Empirical Examples Simulationstudieswereconductedtodemonstratetheeffectoftheuseofthecollateralinforma- tioninRTs onthe estimation ofthe IRTmodel parameters. Simulated datawere used because theyallowcomparingtheestimatesoftheparametersagainsttheirtruevalues.Butbeforepre- senting these studies, a small example with empirical data is given to motivate interest in the effects ofRTs. Empirical Data Thedatawerefromanitemcalibrationstudyforapoolofitemsforacomputerizedversionof afiguralmatrixtestbyHornkeandHabon(1986).Thetotaldatasetwasfor30,000testtakers and456dichotomouslyscoreditems,eachwitheightoptions.Eachtesttakeransweredonly12 Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010 vander Linden etal. 335 itemsbutthetotaldatamatrixwasconnected.Inanearlierstudy,themodelwasfittedsuccess- fullyandacorrelationofr ¼(cid:2):61wasfound(fordetails,seeKleinEntink,Kuhn,Hornke,& yt Fox,2009). Thefirsttwoblocksof667testtakersand17itemsfromthedatasetweretakenandtheIRT parametersbothwithoutandwiththeRTswereestimated,andthesamemodelasintheearlier study (2PNO model) wasfitted. TheGibbs sampler wasrun for the fullmodel with the earlier normal-inverse-Wishart priors with prior means ðm ;m Þ¼ð0;0Þ (for identification), prior means ðm ;m , m ;m Þ¼ð1;0;2;3Þ; scale matriceys0 Σt0 and Σ both equal to a diagonal a0 b0 a0 b P0 I0 matrix with elements 10, and two degrees of freedom. Observe that the choices of diagonal elementsfor thescale matrices andthedegrees offreedomimplynegligiblepriorinformation. For the run for the IRT model only, the same settings for the remaining prior parameters were used.Bothtimes,thesamplerwasrunfor10,000iterations.Therunningtimewasapproximately 15minutes. Trace plotswith thedraws showed convergence after a burn-inof500 iterations. Figure2showsaplotofthetwosetsofyestimatesfortheconditionswithoutandwithRTs (first panel) as well as a plot with the corresponding two sets of posterior standard deviations (SDs).Thepointsinthefirstplotarearoundtheidentitylinewitharandomvariationduetoesti- mationerrortypicalofatestof17items.ThesecondplotillustratestheeffectofRTsintheform ofatrendtolowerSDsfortheconditionwiththeRTs.Figure3showsitemparameterestimates also close to the identity line. Thus, the two types of estimation produce essentially the same parameter estimates butthe useofRTs tendstoimprovetheir accuracy. Simulated Data Obviously, improvements in the accuracy of the parameter estimates are dependent on the second-levelcorrelationsinthehierarchicalframeworkinEquations3through9.Inthesimula- tionstudies,thecorrelation between thespeedandability parameters,r , wasthefocus.(The yt effects for the other parameters are analogous.) The effects of the use of the RTs on the IRT parameterestimateswerethereforeevaluatedforarangeofalternativesizesofr .Inareview yt ofempiricalstudieswithestimatesofr ,theywerefoundtohavevaluesintherange½(cid:2):65;:30(cid:3) yt for tests as disparate as the Armed Services Vocational Aptitude Battery (ASVAB), Certified Public Accountant Exam, Graduate Management Admission Test, and a test of quantitative and scientific proficiencies for colleges students (van der Linden, 2009). For the correlation between the item difficultyand timeintensity parameter, r , therange was½(cid:2):33;:65(cid:3). bb To assess the improvements of the estimation of the item and person parameters in the response model separately, two studies with a different setup had to be conducted. One study addressedthegaininstatisticalaccuracyoftheabilityestimatesduetothecollateralinformation inRTs.Theotheraddressedthesameissuefortheestimationoftheitemparameters.Forboth studies,thesetupwasconservativeinthatonlythecorrelationbetweentheabilityandspeedpa- rameters was varied. The gains reported therefore do not include any additional impact of the RTs throughthejoint distribution oftheitem parameters inEquations8 and9. The accuracy of the estimates of y depends on the length of the simulated test and its item parameters. The item parameters had a typical range of values (see below). The test length wassetequalton¼30items,whichwasneithertoolongnoroverlyshortrelativetoreal-world tests.Ontheotherhand,whenestimatingtheysbothfromtheresponsesandtheRTs,theeffec- tive number of observations on the parameter is longer. Roughly speaking, the effective test length should then be between 30 (r ¼0Þ and 60 items (r ¼1). Besides, notice that for yt yt a fixed correlation, the relative amount of information on y in the responses and RTs remains identical if the test length increases; each additional item entails one extra response and one Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010 336 Applied Psychological Measurement34(5) 2 1 0 u,t θ| −1 −2 −2 −1 0 1 2 θ|u 0.70 0.65 u,t 0.60 θ)| D( S 0.55 0.50 0.45 0.45 0.50 0.55 0.60 0.65 0.70 SD(θ)|u Figure2 Estimatesofyandtheirposteriorstandarddeviations(SDs)forafiguralmatrixtestwithoutand withresponsetimes RT.Butofcourse,eventuallyestimationerrorvanishes,andalthoughtheirrelativeimportance does notchange,sowill theabsolute sizeofthe contribution ofthe RTs. Likewise,fortheitemcalibrationstudy,itwasonlyneededtofocusonthesizeofthesampleof testtakersanditsdistributionofabilityparameters.Thestandardnormalwaschosenasthesim- ulatedabilitydistribution.ThesamplesizewasN ¼300.Thisisasmallsizeforareal-worlditem calibrationstudybut,analogoustotheeffectivelengthintheprecedingargument,theuseofRTs increasestheeffectivesamplesizetoahigherlevel.Thenumberofitemsdoesnothaveanyimpact onthe accuracy ofthe estimatesofthe parameters ofanindividualitem.Twodifferent versions wereneededofthesecondstudy,eachwithseveralconditionsandmultiplereplications. However, thetotalrunningtimefortheprojecthadtobekeptmanageable(theaveragetimeforonerunwas Downloaded from http://apm.sagepub.com at Universiteit Twente on July 2, 2010
Description: