ebook img

ERIC EJ844431: An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and beyond PDF

2005·0.25 MB·English
by  ERIC
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ERIC EJ844431: An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and beyond

Psicolo´gica(2005),26,327-352 An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond GunterMaris&TimoM.Bechger Cito(TheNetherlands) The DA-T Gibbs sampler is proposed by Maris and Maris (2002) as a Bayesian estimation method for a wide variety of Item Response Theory (IRT)models. ThepresentpaperprovidesanexpositoryaccountoftheDA- T Gibbs sampler for the 2PL model. However, the scope is not limited to the2PLmodel. ItisdemonstratedhowtheDA-TGibbssamplerforthe2PL may be used to build, quite easily, Gibbs samplers for other IRT models. Furthermore, the paper contains a novel, intuitive derivation of the Gibbs samplerandcouldbereadforagraduatecourseonsampling. Introduction Let Y = 1 denote the event that person p gives the correct answer to pi item i, and θ his ability. Assume that there exists a latent response variable, p X , such that person p solves item i if X is larger than a threshold δ . That pi pi i is, P(Y = 1|θ ) = P(X > δ |θ ) . pi p pi i p Itisseenthattheprobabilityofacorrectresponsedependsonthethresholdof the item as well as the ability of the respondent. The probability of a correct responseasafunctionofabilityiscalledtheItemResponseFunction(IRF). Addresscorrespondenceto: GunterMaris,Cito,P.O.Box1034,NL-6801MG,Arnhem, TheNetherlands. E-mail: gunter.maris[at]citogroep.nl;Tel:+31-026-3521162. 328 G.MarisandT.M.Bechger 1 0.9 0.8 0.7 0.6 θ1|) Y=pi 0.5 Pr( 0.4 0.3 0.2 0.1 0 −5 −4 −3 −2 −1 0 1 2 3 4 5 θ Figure1. IRFsfortwo2PLitemswithdifferentparameters. UndertheTwo-ParameterLogistic(2PL)model(Birnbaum,1968),X pi isassumedtofollowalogisticdistributionwithmeanα θ andscaleparame- i p terβ = 1sothat ∞ P(X > δ |θ ,α ,δ ) = (x > δ )f(x |θ ,α )dx pi i p i i pi i pi p i pi −∞ ∞ exp(x −α θ ) pi i p = (x > δ ) dx Z−∞ pi i [1+exp(xpi −αiθp)]2 pi exp(α θ −δ ) = Z i p i , 1+exp(α θ −δ ) i p i where(x > δ )denotesanindicatorvariablethatisoneifx > δ ,andzero pi i pi i otherwise. TheTwo-parameterNormalOgive(2NO)model(Birnbaum,1968) isobtainedwhenthedistributionofthelatentresponsevariablesisnormal. The discrimination parameter α determines how fast the IRF changes i withability. Ifα ispositive(negative),theprobabilityofansweringcorrectly i is an increasing (decreasing) function of ability. The Rasch model (Rasch, 1980) is a special case of the 2PL where all items have a discrimination pa- rameterequaltoone. AbouttheDA-TGibbssamplerforthe2PL 329 Asitstands,the2PLisunidentifiable. Specifically, exp(α∗θ∗ −δ∗) P(Y = 1|θ ,α ,δ ) = i p i pi p i i 1+exp(α∗θ∗ −δ∗) i p i where θ −c α∗ = α d, δ∗ = δ −α c, θ∗ = p , i i i i i p d and c and d are arbitrary constants. To deal with this indeterminacy we arbi- trarily set α = 1, and δ = 0. This means that the item parameters must be 1 1 interpretedrelativetothefirstitem. The purpose of this paper is to provide an expository account of Bayesian estimation of the 2PL focussing on the DA-T Gibbs sampler devel- oped by Maris and Maris (2002). In addition, we offer an intuitive derivation of the Gibbs sampler and demonstrate that the DA-T Gibbs sampler for the 2PL can be used to build Gibbs samplers for many other Item Response The- ory(IRT)models. Amongothers,weconsidertheLinearLogisticTestModel (LLTM) (Fischer, 1995), the 3PL (Birnbaum, 1968), and the Nedelsky model formultiplechoiceitems(Bechger,Maris,Verstralen&Verhelst,2005). Gibbs Sampling Let Λ = (Λ ,...,Λ ), m ≥ 2, denote a vector of parameters.1 In 1 m Bayesianstatistics,theunknownparametersareconsideredrandomvariables. Bayes theorem states that the posterior density (the posterior, for short) of Λ giventheobserveddatay isgivenby f(y|λ)f(λ) f(λ|y) = , f(y) where f(y|λ) denotes the likelihood function, and f(y) the marginal likeli- hood function. The prior density f(λ) (prior, for short) expresses substantive knowledge concerning the parameters prior to data collection. In Bayesian statisticsallinferencesabouttheparametersarebasedupontheposterior. 1Weusesubscriptstodistinguishparametervectorsfromscalars. 330 G.MarisandT.M.Bechger Second parameter First parameter Second parameter First parameter Figure2. SchematicrepresentationoftwoiterationsoftheGibbssamplerwithtwo parameters. Theplotmustbereadfromupperlefttolowerright. The Gibbs sampler is an iterative procedure to generate parameter val- uesλ(0),λ(1),...fromtheposterior. Thefirstngeneratedvaluesarediscarded and the rest is considered to be a dependent and identically distributed (did) samplefromtheposterior. Thismeansthat 1. ThedistributionofΛ(n+j) giventhedataistheposteriorforallj > 0. 2. Conditionaluponthedata,Λ(n+j) isnot independentofΛ(n+i). In this section we discuss how the Gibbs sampler works, why it works, and when it works. Alternative explanations can be found, for instance, in Casella and George (1992), Tanner (1996), or Ross (2003). The reader is referredtoTierney(1994)foramorerigoroustreatment. AbouttheDA-TGibbssamplerforthe2PL 331 How The procedure starts by choosing an initial value λ(0). Then, in each successive iteration, individual parameters are sampled independently from their so-called full conditional distributions. The order in which the parame- tersaresampledisarbitrary. The full conditional density (the full conditional, for short) is the den- sity given the observed data and the current value of all other parameters. In the sequel, we will often use the shorthand notation f(λ |...) for the full k conditional of parameter λ . To determine, up to a constant, the full condi- k tional of a parameter λ we write down the density f(λ,y) and remove all k factorsthatareunrelatedtoλ . Specificexampleswillbegivenbelow. k Figure2representstwoiterationsofafictitiousGibbssampler,withtwo parameters being sampled at each iteration. The closed curve represent the support of a two-dimensional posterior. The solid lines indicate the support ofthefullconditionalsandthecrossesdenotearbitraryvaluessimulatedfrom the different full conditional distributions. Observe that the Gibbs sampler “travels” through the support of the posterior along horizontal and vertical paths. Note that the support of the posterior must be such that every region canbereachedbytheGibbssampler,irrespectiveofthepointofdeparture. With a did sample from the posterior we may use the Monte Carlo method to calculate an unbiased estimate of the posterior expectation of any functiong(λ,y): 1 g(λ,y)f(λ|y)dλ ≈ g(λ(j),y) , n s j where n denotesZthe number of sampled values λ(j). That is, we approxi- s X mate the expectation by the sample mean. The posterior probability that a parameterissmallerorequaltoaconstantt,forexample,isestimatedby 1 λ(j) ≤ t . n k s j Thevarianceoftheestimatoroftheposteriorexpectationcanbeestimatedby X(cid:16) (cid:17) thevarianceoverindependentreplicationsoftheGibbssampler. 332 G.MarisandT.M.Bechger 3 2 1 0 −1 −2 −3 −4 −5 0 200 400 600 800 1000 1200 1400 1600 1800 2000 3 2 1 0 −1 −2 −3 −4 −5 2000 2500 3000 3500 4000 4500 5000 iterations Figure3. Plotofsampledvaluesagainstiterations. Unfortunately, there is no established way to determine an appropriate value for n. One option is to look at plots of λ(1),λ(2),... against iterations for a number of independent replications. An illustration with four indepen- dentreplicationsisgiveninFigure3. If,afterniterations,thevaluesappearto fluctuatearoundacommonstationaryvalue,thismaybetakenascircumstan- tial evidence that n is large enough. In Figure 3, the plots appear to stabilize after about 1200 iterations. However, there is no way to be sure since we do not know what will happen after 5000 iterations. Other ad hoc methods to assess the required number of iterations are surveyed by Gelman and Rubin (1992)orGill(2002,chapter11). AbouttheDA-TGibbssamplerforthe2PL 333 Why Let Λ(n),n ≥ 0 denoteaMarkovchain. AMarkovchainisastochas- tic process such that each value depends only on its immediate predecessor; thatis,forƒn > 0, ' f(λ(n)|λ(n−1),...,λ(0)) = f(λ(n)|λ(n−1)) . TheGibbssamplerisaproceduretosimulateaMarkovchainsuchthat the marginal distribution of Λ(n) converges to the posterior if n increases. Convergence to the posterior is guaranteed if the following conditions are satisfied: 1. Theposterioristheinvariantdistribution. 2. Thechainisirreducible. Invariance means that if λ(0) is drawn from the posterior, then all sub- sequentvaluesarealsodrawsfromtheposterior. Suppose,foreaseofpresen- tation,thattherearetwoparameters.2 Theirposteriordensityis f(λ ,λ |y) = f(λ |λ ,y)f(λ |y) . 1 2 1 2 2 To sample from this posterior, we draw λ(1) from the marginal posterior dis- 2 tribution and then λ(1) from the distribution conditional upon λ(1). Note that 1 2 thelatterisafullconditionalasdefinedinthepreviousparagraph. SupposewesetupaMarkovchaintodrawλ(1)fromthemarginalposte- 2 riordistribution. Convergenceisfasterifthedependencebetweensubsequent values is weaker. Thus we aim for a weak degree of dependence. Specifi- cally,weensurethatΛ(1) andΛ(0) areindependentandidenticallydistributed 2 2 conditionaluponΛ(0). Thatis, 1 f(λ(0),λ(1)|y) = f(λ(0)|λ(0),y)f(λ(1)|λ(0),y)f(λ(0)|y)dλ(0) . 2 2 2 1 2 1 1 1 If we integrate f(λ(0),λZ(1)|y) with respect to λ(0) (or λ(1)), we see that Λ(0) 2 2 2 2 2 andΛ(1) havethesamemarginaldistribution. Thisdistributionisthemarginal 2 2Theargumentforthegeneralcasefollowsbymathematicalinduction. 334 G.MarisandT.M.Bechger posterior. Itfollowsthat f(λ(0),λ(1)|y) f(λ(1)|λ(0),y) = 2 2 2 2 f(λ(0)|y) 2 f(λ(0)|λ(0),y)f(λ(0)|y) = f(λ(1)|λ(0),y) 2 1 1 dλ(0) 2 1 f(λ(0)|y) 1 2 = Z f(λ(1)|λ(0),y)f(λ(0)|λ(0),y)dλ(0) . 2 1 1 2 1 Toproduceavalueλ(1) froZmtheposteriordistributionwemayusethemethod 2 ofcomposition(Tanner,1996,section3.3.2)asfollows: 1. Drawλ(0) fromtheposterior. 2 2. Drawλ(0) fromthefullconditionalf(λ |λ(0),y). 1 1 2 3. Drawλ(1) fromthefullconditionalf(λ |λ(0),y). 2 2 1 ThisprocedureisaGibbssamplerstartingwithadrawfromtheposterior. λ(0) 2 λ(0) 1 λ(1) λ(1) 1 2 λ(1) 1 λ(12) λ(22) etc. Figure4. Schematicpictureofthesamplingprocedure. Withintherectangleisthe Gibbssampler. AbouttheDA-TGibbssamplerforthe2PL 335 Withλ(1)drawnfromthemarginalposterior,wethendrawλ(0)fromthe 2 1 full conditional f(λ |λ(1),y) and repeat the process with λ(1) replacing λ(0), 1 2 2 2 etc. Schematically, the sampling procedure may be depicted as in Figure 4 wherethevaluesgeneratedbytheGibbssampleraredrawninsidearectangle. ItcanbeshownthatthesevaluesaretherealizationofaMarkovchainwhose invariantdistributionis,byconstruction,theposterior. Thevaluesoutsidethe rectangleneednotbegeneratedinwhichcaseweobtaintheGibbssampleras describedintheprevioussection. Irreducibility refers to the fact that it must be possible to reach each region in the support of the posterior (see e.g., Figure 2). This is true for the majorityofapplications. When Gibbs sampling is useful when the full conditionals are tractable. If so, it provides an estimation procedure that can be implemented relatively quickly. We call a distribution tractable if there is a simple and efficient method to generate a sample from it. Methods for stochastic simulation can befound,forinstance,inDevroye(1986),Ripley(1997),orRoss(2001). There are many situations where the full conditionals are not tractable. Thisis,infact,thecaseofthe2PL(Maris&Maris,2002). Inthenextsection, we will demonstrate that the DA-T Gibbs sampler is a variant of the Gibbs samplerdesignedtogivetractablefullconditionals. The DA-T Gibbs Sampler for the 2PL ThePrior We conveniently assume that the parameters are a priori independent. Thatis, f(θ,δ,α) = f(θ ) f(δ )f(α ) . (1) p i i p i We also assume that all prior distributions are tractable. Note that the priors Y Y must be chosen relative to the item whose parameters are arbitrarily fixed to identifythemodel. 336 G.MarisandT.M.Bechger TheFullConditionals DA stands for Data Augmentation which entails adding latent data as (auxiliary) parameters (Tanner & Wong, 1987). The principle of DA may be statedasfollows: Augmenttheobserveddatawithlatentdatasothattheaugmented posteriordistributionis“simple”. (e.g.,Tanner,1996,p. 38) Here, the continuous latent responses are added as parameters. Our hope is thatDAwillresultintractablefullconditionals. Let’ssee! TheDAposteriorofthe2PLisproportionalto f(θ,δ,α,x,y) = f(y|x,θ,δ,α)f(x|θ,δ,α)f(θ,δ,α) = f(y|x,δ)f(x|θ,α)f(θ,δ,α) . Personsareassumedtobeindependentofoneanother,sothat 1ifx > δ andy = 1 pi i pi f(y|x,δ) = f(y |x ,δ ) = 1ifx ≤ δ andy = 0 pi pi i pi i pi p i (cid:134) 8 0otherwise (cid:144) > = YY(x > δ )ypi(x ≤<δ )1−ypi , pi i pi i p i > : and YY f(x|θ,α) = f(x |θ ,α ) pi p i p i exp(x −α θ ) pi i p = YY . [1+exp(x −α θ )]2 p i pi i p Thusf(θ,δ,α,x,y)equals YY exp(x −α θ ) (x > δ )ypi(x ≤ δ )1−ypi pi i p f(θ,δ,α) . (2) pi i pi i [1+exp(x −α θ )]2 p i pi i p The next step is to derive the full conditionals, including those for the YY latent responses. Let us first consider the full conditional distribution of δ , i

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.