ebook img

Probabilistic Models for Computerized Adaptive Testing: Experiments PDF

0.36 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Probabilistic Models for Computerized Adaptive Testing: Experiments

Probabilistic Models for Computerized Adaptive Testing: Experiments MartinPlajner Jirˇ´ıVomlel InstituteofInformationTheoryandAutomation InstituteofInformationTheoryandAutomation AcademyofSciencesoftheCzechRepublic AcademyofSciencesoftheCzechRepublic Podvoda´renskouveˇzˇ´ı4 Podvoda´renskouveˇzˇ´ı4 Prague8,CZ-18208 Prague8,CZ-18208 CzechRepublic CzechRepublic 6 1 0 Abstract 2 CATPROCEDUREANDMODEL 2 EVALUATION b e F This paper follows previous research we have All models proposed in this paper are supposed to serve already performed in the area of Bayesian net- for adaptive testing. In this section we briefly outline the 1 works models for CAT. We present models us- process of adaptive testing1 with the help of these models ] ing Item Response Theory (IRT - standard CAT andmethodsfortheirevaluation. Foreverymodelweused I method), Bayesian networks, and neural net- similarprocedures.Thespecificdetailsforeachmodeltype A works. We conducted simulated CAT tests on are discussed in the corresponding sections. At this point . s empirical data. Results of these tests are pre- wediscussthecommonaspects. c sentedforeachmodelseparatelyandcompared. [ In every model type we have the following types of vari- ables. Forsomemodelstheyhaveadifferentspecificname 2 v becauseofanestablishednamingconventionofthecorre- 9 spondingmethod. Nevertheless,themeaningofthesevari- 2 1 INTRODUCTION ablesisthesameandweexplaindifferencesforeachmodel 9 types. Inthispaperweusetwotypesofvariables: 7 0 . Allofusareintouchwithdifferentabilityandskillchecks • A set of n variables we want to estimate S = 1 almosteveryday. Thecomputerizedformoftestingisalso {S ,...,S }. These variables represent latent skills 0 1 n gettinganincreasingattentionwiththespreadofcomput- (abilities,knowledge)ofastudent. Wewillcallthem 6 1 ers,smartphonesandotherdeviceswhichalloweasycon- skillsorskillvariables. WewillusesymbolS tode- : tactwithtargetgroups. ThispaperfocusesontheComput- notethemultivariableS =(S ,...,S )takingstates v 1 n erized Adaptive Testing (CAT) (van der Linden and Glas, s=(s ,...,s ). Xi 2000; Almond and Mislevy, 1999; Almond et al., 2015) 1,i1 n,in • A set of p questions X = {X ,...,X }. We will r anditfollowsapreviousresearchpaper(PlajnerandVom- 1 p a use the symbol X to denote the multivariable X = lel,2015). (X ,...,X )takingstatesx=(x ,...,x ). 1 p 1 p In this previous paper we explained the concept of CAT. Next, we describe our empirical data set. The use of We collected data from paper tests conducted by gram- Bayesian networks for CAT was discussed and we con- mar schools’ students. The description of the test and its structed different types of Bayesian network models for statistics can be found in the paper (Plajner and Vomlel, CAT.Thesemodelsweretestedonempiricaldata. There- 2015). Alltogether,wehaveobtained281testresults. Ex- sultswerepresentedanddiscussed. periments were performed with each model of each type that is described in following sections. We used 10-fold In this paper we present two additional model types for cross-validation method. We learned each model from 9 CAT: Item Response Theory (IRT) and neural networks. 10 of randomly divided data. The remaining 1 of the data Moreover,newBNmodelsareproposedinthispaper. We 10 setservedasatestingset. Thisprocedurewasrepeated10 conductedsimulatedCATtestsonthesameempiricaldata times to obtain 10 learned student models with the same as in the previous paper. This allows us to make compar- structureanddifferentparameters. isonsoftwonewmodeltypes(BNandNN)withtheCAT standardIRTmodel. Resultsarepresentedforeachmodel 1AdditionalinformationaboutCATcanbefoundin(Wainer separatelyandthentheyareallcompared. andDorans,2015) WiththeselearnedmodelswesimulateCATusingtestsets. 3 ITEMRESPONSETHEORY ForeverystudentinatestsetaCATprocedureconsistsof thefollowingsteps: The beginning of Item Response Theory (IRT) stem back to 5 decades ago and there is a large amount of resources • Thenextquestiontobeaskedisselected. available, for example, (Lord and Novick, 1968; Rasch, 1960, 1993). IRT allows more specific measurements of • Thisquestionisaskedandananswerisobtained. certainabilitiesofanexaminee.Itexpectsastudenttohave • Thisanswerisinsertedintothemodel. an ability (skill) which directly influences his/her chance ofansweringaquestioncorrectly. Whenwehaveonlyone • Themodel(whichprovidesestimatesofthestudent’s variable2, itiscommontorefertoitasaproficiencyvari- skills)isupdated. able. This ability is called latent ability or a latent trait θ. • (optional) Answers to all questions are estimated ThetraitθcorrespondstothegeneralskillS1definedinthe Section2. EveryquestionoftheIRTmodelhasanassoci- giventhecurrentestimatesofstudent’sskills. ateditemresponsefunction(IRF)whichisaprobabilityof asuccessfulanswergivenθ. This procedure is repeated as long as necessary. It means untilwereachaterminationcriterion,whichcanbe,forex- Wefittedourdataonthe2parametricIRTmodel. Itmeans ample,atimerestriction,thenumberofquestions,oracon- that characteristic Item Response Functions, as the prob- fidence interval of the estimated variables. Each of these ability of a correct answer to i-th given the ability θ, are criteriawouldleadtoadifferentlearningstrategy(Vomlel, computedbytheformula 2004b), but finding a global optimal selection with these strategies would be NP-hard (L´ın, 2005). We have cho- 1 p (θ)= sen an heuristic approach based on greedy optimization i 1+e−ai(θ−bi) methods. Methodsofthequestionselectiondifferforeach wherea setsthescaleofthequestion(thissetsitsdiscrim- modeltypeandareexplainedintherespectivesections.All i inationability-asteepercurvebetterdifferentiatebetween ofthemusethegreedystrategytoselectquestions. students),b isthedifficultyofthequestion(horizontalpo- i ToevaluatemodelsweperformedasimulationofCATtest sitionofacurveinspace). for every model and for every student. During testing we ForquestionselectionstepofCATweuseiteminformation firstestimatedtheskill(s)ofastudentbasedonhis/heran- ofaquestionithatitisgivenbytheformula swers. Then, based on these estimated skills we used the modeltoestimateanswerstoallquestionsX. Letthetest (p(cid:48)(θ))2 beinthesteps(s−1questionsasked). Attheendofthe Ii(θ)= p (θi)q (θ) i i step s (after updating the model with a new answer) we computemarginalprobabilitydistributionsforallskillsS. wherep(cid:48) isthederivationoftheitemresponsefunctionp . i i Thenweusethistocomputeestimationsofanswerstoall This item information provides one, and most straightfor- questions,whereweselectthemostprobablestateofeach ward,wayofthenextquestionselection. Ineverystepthe questionXi ∈X: questionX∗ whichisselectedisonewiththehighestitem information. x∗ =argmaxP(X =x |S). i i i xi X∗(θ)=argmaxI (θ) i Bycomparingthisvaluetotherealanswertoi−thquestion i x(cid:48) we obtain a success ratio of the response estimates for i Thisapproachminimizesthestandarderrorofthetestpro- allquestionsX ∈ X ofatestresultt(particularstudent’s i cedure(Hambletonetal.,1991)becausethestandarderror result)inthesteps ofmeasurementSE producedbyi−thitemisdefinedas i (cid:80) I(x∗ =x(cid:48)) SRt = Xi∈X i i , where 1 s |X| SEi(θ)= (cid:112) . I (θ) (cid:26) i 1 ifexpristrue I(expr) = 0 otherwise. Thismeansthatthebetterprecisionofdifficultyweareable toachievewhileaskingquestionsthesmallererrorofmea- Thetotalsuccessratioofonemodelinthestepsforalltest surement. dataisdefinedas TheresultofCATsimulationisdisplayedintheFigure1. (cid:80)N SRt SR = t=1 s . We can notice that this model is able to choose correct s N 2TherearevariantsofmultidimensionalIRTmodelwhereitis SR0 isthesuccessrateofthepredictionbeforeaskingany possibletohavemorethenoneabilitybutinthissectionweare questions. goingtodiscussonlymodelswithoneonly. 5 8 0. 0 8 0. e at ss r 75 ce 0. c u s 0 7 0. 5 IRT 6 0. 0 5 10 15 20 25 30 35 step Figure1: SuccessratesofIRTmodel questions to ask very quickly and its prediction success Inthispaperweuseamodifiedmethodformodelscoring risesafteraskingthefirsttwo. Afterthesequestionsitcan comparedtothemethodusedinourpreviousresearch.The notimprovemuchanymore.Thisiscausedbythesimplic- currentmethodisdescribedinthesection2.Thedifference ityofthemodel. is that in this case we estimate answers to all questions in thequestionpoolandthencomparetorealanswersinev- ery step. In the previous version we were estimating an- 4 BAYESIANNETWORKS swers only to unanswered questions in every step. It led to a skewed results interpretation because the value in the In this section we use Bayesian networks (BN) as CAT denominatorofthesuccessrate models. Details about BNs can be found in (Nielsen and (cid:80) I(x∗ =x(cid:48)) Jensen,2007;KjærulffandMadsen,2008).TheuseofBNs SRt = Xi∈X i i s |X| in educational assessment is discussed in (Almond et al., 2015;Culbertson,2015;Milla´netal.,2010). Thistopicis wasdecreasingineverystep.Themodifiedversioniscom- alsodiscussed,forexample,in(Vomlel,2004a,b). paring all questions and because of that the denominator staysthesameineverystep. A Bayesian network is a probabilistic graphical model, a structure representing conditional independence state- From previous models we selected the model marked as ments. Itconsistsofthefollowing: “b3” and “expert”. The former means that it has Boolean answervalues,thereisoneskillvariablehaving3statesand • asetofvariables(nodes), no additional information (personal data of students) was used. SeeFigure2foritsstructure. Thelaterisanexpert • asetofedges, model with 7 skill nodes (each having 2 states), Boolean answervaluesandnoadditionalinformationaboutstudents • asetofconditionalprobabilities. wasused. SeeFigure3foritsstructure. In this paper we present three new BN models. The first SpecificdetailsabouttheuseofBNsforCATcanbefound twoaremodificationsof“b3”model. Theyhavethesame in(PlajnerandVomlel,2015). TypesofnodesinourBNs structure and differ only in the number of states of their correspondtotypesofvariablesdefinedintheSection2.In skillnode. Wepresentexperimentswith4and9states. We thispaperweusequestionnodeswithonlyBooleanstates, performedexperimentswithdifferentnumbersofstatesas i.e., question is either correct or incorrect. Edges are de- well,buttheydonotprovidemoreinterestingresults.Next, finedusuallybetweenskillsandquestions(wepresentex- weaddamodifiedexpertmodel. Thismodifiedmodelhas amplesofconnectionsinfigures). Conditionalprobability alsoBooleanquestionsandnoadditionalinformation. We valueshavebeenlearnedusingstandardEMalgorithmfor haveaddedonestateto7skillnodesfromthepreviousver- BNlearning. sion (they have 3 states in total now). The reason for this es 1. aninputlayer, d o n 2. severalhiddenlayers,and s kill e s 3. anoutputlayer. d f o o n s skill state WtheeiunspeuNtlNayaesr.aTsthuedseenvtamluoedseal.reWtreafnesefdorsmtueddentotathneswhiedrdseton f f o o layer(s).Thereisnogeneralrulehowtochoosethenumber o. o. Modelname Figure N N of hidden layers and their size. In our case we performed simple 3s 2 1 3 experiments with one hidden layer of different sizes. The simple 4s 2 1 4 hidden layer then further transforms to the output layer. simple 9s 2 1 9 NNs are not suitable for unsupervised learning. Because expert old 3 7 2 ofthat,wedonotestimateanunknownstudentskillinthe expert new 4 7+1 3 output layer. We would not have any target value needed duringthelearningstepoftheNN.Insteadofthat, wees- Table1: OverviewofBayesiannetworkmodels timate the score (the test result) of a student directly. The scoreofastudentisknownforeverystudentatthetimeof learning. Theoutputlayerthenprovideanestimateofthis addition is an analysis of the question selection criterion. score. Nevertheless,thisscoreisacorrespondingvariable We select questions by minimizing the expected entropy toskillvariablesdescribedintheSection2 overskillnodes. Withonlytwostatesitmeansthatweare pushingastudentintooneortheothersideofthespectrum Toselectthenextquestionweusethefollowingprocedure. (basically, we want him to be either good or bad). With We want the selected question to provide us as much in- 3 states we allow them to approach mediocre skill quality formationaspossibleaboutthetestedstudent. Thatmeans as well. Moreover, we realized that the model structure thatastudentwhoanswersincorrectlyshouldbeasfaras asintheFigure3hasonlyskillsthatareveryspecialized. possibleonthescorescalefromanotherwhoanswerscor- Weintroduceanew8thskillnodewhichconnectprevious rectly. LettheS|Xi,x bethescorepredictionafteranswer- 7 skill nodes. Its representation is an overall mathemati- ing the i−th question’s state x, P(Xi,x) the probability cal skill combining all other skills. It allows skills on the ofstatextobetheanswertothequestioni. P(Xi,x)can lowerleveltoinfluenceeachotherandtoprovideevidence beobtained,forexample,bystatisticalanalysisofanswers. between themselves. The final model structure is in the We select a question X∗ maximizing the variance of pre- Figure4. dictedscores: AllmodelsaresummarizedintheTable1. ResultsofCAT simulation with BN models are displayed in the Figure 5. X∗ =argmaxVar(SC|X ) Increasingthenumberofstatesofoneskillnodeimproved i x i,x prediction accuracy of the model (simple 4s, simple 9s), (cid:88) = P(X )(SC|X −SC|X ),where i,x i,x i but only slightly. As we can see, one additional state (4 x statesintotal)isbetterthanmorestates(9). Thisconfirms (cid:88) SC|X = P(X )SC|X ourexpectationthatsimplyaddingnodestatescannotim- i i,x i,x x prove the model quality for long due to over fitting of the model.Next,wecanobservethatthereisalargedifference isthemeanvalueofpredictedscores. between the new and the old expert model. The success Inourexperimentweusedonlyonehiddenlayerwithmany rate of the new version exceeds all other models. Adding differentnumbersofhiddenneurons. Fromthemweselect additional skill node connecting other skills proved to be modelswith3,5,and7neuronsinthehiddenlayerbecause acorrectstep. Possibilitiesinthemodelstructurearestill they provide the most interesting results. The structure of large and it remains to be explored how to create the best thenetworkwith5hiddenneuronsisintheFigure6. Re- possiblestructure. sults of CAT simulation with NN models are displayed in theFigure7. Aswecanseeinthisfigure,thequalityofes- 5 NEURALNETWORKS timates while using NNs increases very slowly. This may becausedbythequestionselectioncriterion.Ifwewerese- lecting better questions, it is possible that the success rate Neural networks are models for approximations of non- wouldbeincreasingfaster. Itremainstobeexploredwhich linearfunctions. FormoredetailsaboutNNs, pleaserefer selection criterion would provide such questions. Never- to(Haykin,2009;AleksanderandMorton,1995). theless, this better question selection does not change the TherearethreedifferentpartsofaNN: final prediction power of the model (the maximal success Figure2: Bayesiannetworkwithonehiddenvariableandpersonalinformationaboutstudents ratewouldnotbeexceeded). Thispredictionpowercould els and Computerized Adaptive Testing. Applied Psy- be increased by using a different version of NNs. More chologicalMeasurement,23(3):223–237. specifically, we will perform experiments with one of the Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., recurrent versions of NNs, i.e., Elman’s networks or Jor- and Williamson, D. M. (2015). Bayesian Networks in dan’snetworks. Educational Assessment. Statistics for Social and Be- havioralSciences.SpringerNewYork,NewYork,NY. 6 MODELCOMPARISONAND Culbertson, M. J. (2015). Bayesian Networks in Educa- CONCLUSIONS tionalAssessment: TheStateoftheField. AppliedPsy- chologicalMeasurement,40(1):3–21. Wepresentagraphicalcomparisonofallthreemodeltypes D´ıez, F. J. and Druzdzel, M. J. (2007). Canonical Proba- intheFigure8. Onemodelisselectedfromeachtype. We bilistic Models for Knowledge Engineering. Technical canseethattheneuralnetworkmodelscoredtheworstre- report,ResearchCentreonIntelligentDecision-Support sult.ThismaybefurtherimprovedbyabetterNNstructure Systems. andbetterquestionselectionprocess. ThenewBNexpert modelisscoringthebest. Eveninthiscasewebelievethat Hambleton, R. K., Swaminathan, H., and Rogers, H. J. further improvements are possible to increase its success (1991). Fundamentals of item response theory, vol- rate. We will focus our future research into methods for ume21. SAGEPublications. BNmodelscreationandcriteriafortheircomparison. Es- Haykin,S.S.(2009). NeuralNetworksandLearningMa- pecially,wewouldliketouseaconceptofthelocalstruc- chines. Number v. 10 in Neural networks and learning tureinBNmodels(D´ıezandDruzdzel,2007). Thatwould machines.PrenticeHall. allowustocreatemorecomplexmodels,yetwithlesspa- Kjærulff, U.B.andMadsen, A.L.(2008). BayesianNet- rameters to be estimated during learning. Both previous worksandInfluenceDiagrams. Springer. modelscanbecomparedwiththeIRTmodelwhichisthe standardinthefieldofCAT. L´ın, V. (2005). Complexity of Finding Optimal Observa- tion Strategies for Bayesian Network Models. In Pro- Acknowledgements ceedingsoftheconferenceZnalosti,Vysoke´ Tatry. Lord,F.M.andNovick,M.R.(1968). StatisticalTheories The work on this paper has been supported from GACR ofMentalTestScores. (Behavioralscience: quantitative projectn. 16-12010S. methods).Addison-Wesley. Milla´n,E.,Loboda,T.,andPe´rez-de-laCruz,J.L.(2010). References Bayesiannetworksforstudentmodelengineering. Com- Aleksander, I. and Morton, H. (1995). An Introduction to puters&Education,55(4):1663–1683. Neural Computing. Information Systems. International Nielsen,T.D.andJensen,F.V.(2007). BayesianNetworks ThomsonComputerPress. and Decision Graphs (Information Science and Statis- Almond,R.G.andMislevy,R.J.(1999). GraphicalMod- tics). Springer. Figure3: Bayesiannetworkwith7hiddenvariables(theoldexpertmodel) Plajner, M. and Vomlel, J. (2015). Bayesian Network Models for Adaptive Testing. Technical report, ArXiv: 1511.08488,http://arxiv.org/abs/1511.08488. Rasch, G.(1960). Studiesinmathematicalpsychology: I. Probabilistic models for some intelligence and attain- menttests. DanmarksPaedagogiskeInstitut. Rasch, G. (1993). Probabilistic Models for Some Intelli- genceandAttainmentTests.ExpandedEd.MESAPress. vanderLinden,W.J.andGlas,C.A.W.(2000).Computer- izedAdaptiveTesting: TheoryandPractice,volume13. KluwerAcademicPublishers. Vomlel,J.(2004a). Bayesiannetworksineducationaltest- ing. InternationalJournalofUncertainty,Fuzzinessand Knowledge-BasedSystems,12(supp01):83–100. Vomlel,J.(2004b).BulidingAdaptiveTestUsingBayesian Networks. Kybernetika,40(3):333–348. Wainer,H.andDorans,N.J.(2015). ComputerizedAdap- tiveTesting: APrimer. Routledge. Figure4: Bayesiannetworkwith7+1hiddenvariables(thenewexpertmodel) 5 8 0. 0 8 0. e at ss r 75 ce 0. c u s 0 7 simple_3s 0. simple_4s simple_9s expert_old 65 expert_new 0. 0 5 10 15 20 25 30 35 step Figure5: ResultsofCATsimulationwithBNs X H 1 1 X H 2 2 ... H SC 3 score X H m 4 H 5 Figure6: Neuralnetworkwith5hiddenneurons 5 8 0. 0 success rate 0.750.8 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 0 l 7 0. neural_5 l neural_7 5 neural_10 6 0. 0 5 10 15 20 25 30 35 step Figure7: ResultsofCATsimulationwithNNs 5 8 0. 0 success rate 0.750.8 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 0 l 7 0. IRT l neural_7 5 expert_new 6 0. 0 5 10 15 20 25 30 35 step Figure8: CATsimulationresultscomparison

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.