ebook img

ERIC ED615621: Online Estimation of Student Ability and Item Difficulty with Glicko-2 Rating System on Stratified Data PDF

2021·0.53 MB·English
by  ERIC
Save to my drive
Quick download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ERIC ED615621: Online Estimation of Student Ability and Item Difficulty with Glicko-2 Rating System on Stratified Data

Online Estimation of Student Ability and Item Difficulty with Glicko-2 Rating System on Stratified Data Jaesuk Park KnowreKoreaInc. [email protected] ABSTRACT texts, ascomprehensively overviewedin[2]. Inoneparticu- We propose an adaptation of the Glicko-2 rating system in larlineofapproach[11,13,12,15],dynamicpairedcompar- a K-12 math learning software setting, where variable time ison models were used to quickly estimate student abilities intervals between solution attempts and the stratification and item difficulties in a scalable manner. In these adapta- of student-item pairings by grade levels necessitate modifi- tions, the players consist of students (“users”) and units of cation of the original model. The discrete-time stochastic learning task (e.g., problem items, assignments), and each process underlying the original system has been modified solutionattemptisconceptualizedasamatchbetweenastu- into a continuous-time process to account for the irregular- dent and a learning task, in which the winner earns 1 point ity of intervals between solution attempts. Also, concep- and the loser earns 0 points (with no draw). The primary tual prerequisite relationships between items were used to advantage of such models over traditional IRT methodolo- provide initial rating estimates that allow for rating values gies is in their ability to compute ability estimates“on the to be meaningfully compared across grade levels. Fitting fly”[11]whileretainingasimilarmathematicalstructureto the model using real student learning data results in rating IRT. value distributions successfully exhibiting a gradation with the increase of grade level. A potential area of application The problem occurs, however, when the dataset is strati- inapersonalizededucationsettingisalsobrieflydiscussed. fied—i.e. when student-problem pairings can be grouped into distinct (or largely nonoverlapping) groups such that a problem’sratingcannotbeadequatelyadjustedbyastudent Keywords outsidethegrouptowhichitbelongs. InaK-12mathlearn- Item response theory, dynamic paired comparison model, ing software, because students are only exposed to prob- stratified data, educational assessment, stochastic variance lems appropriate for their grade level, grade levels serve as model strata. Consequently, we cannot adequately tell how a stu- dent would perform outside of their regular grade level just 1. INTRODUCTION by looking at the student’s rating value. See Fig. 1 for an We consider the problem of assigning appropriate curricu- illustration. lum levels in a large-scale K-12 math learning software to students who are substantially ahead or behind their peers. Ideally,wewouldnothavethisproblembygatheringenough Previousstudieshavesuggestedtheimportanceofmatching learningdatafromalargenumberofstudentsfor12+years, learning content difficulty to a student’s ability for positive during which they would work through all curricula offered student learning outcomes [3, 10, 16]. In light of this, stu- bytheproductinsequence. However, inacommercialedu- dents who are much farther ahead (e.g., gifted students) or cational software context where a user is not bound to use behind their peers (e.g., students with learning disabilities) products from just one vendor, this is highly impractical. canbenefitmuchfromreceivingamoretailorededucational feedback, based on learner and skill models that can model Hence we raise a question: is there a way to enforce rating their differences more effectively. values to reflect the relative positions of the strata, despite the absence of sufficient overlaps in students/items among Withtherecentadvancesincomputingdevices,variousap- them? Onepossiblestrategyistoinitializetheratingsdiffer- proacheshavebeensoughttoharnessthepowerofcomput- ently for each stratum according to their relative positions, ing to model learners more accurately in educational con- e.g., to initialize first-grade rating values to 100, second- grade rating values to 200, etc., and then let the dynamic pairedcomparisonalgorithmdothecalibrationwithineach grade level. But then how could we justify that the initial estimationdoneforallcurriculaisproperlyreflectiveoftheir Jaesuk Park “Online Estimation of Student Ability and Item Difficulty actual difficulties relative to one another? with Glicko-2 Rating System on Stratified Data”. 2021. In: Proceed- ings of The 14th International Conference on Educational Data Min- Here, the key insight is that the partial ordering of mathe- ing (EDM21). International Educational Data Mining Society, 879-885. matical concepts due to prerequisite relationships provides https://educationaldatamining.org/edm2021/ a basis for the division of concepts into grade-level curric- EDM’21June29-July022021,Paris,France Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 879 ——— : Student Ability - - - - - : Problem Difficulty id user_id unit_id result modified_at Student Ability Gr. 2 id user_id unit_id result modified_at id user_id unit_id result modified_at Gr. 4 Problem Difficulty id user_id unit_id result modified_at Alg. 2 Figure 1: An illustration of the impact of data stratification on the rating interpretability. As a result of stratification, the distributions of rating values can overlap unreasonably much with each other, and the corresponding mean rating values may not align with the actual order of grade levels. ula, which then in turn stratifies the learning data. In the normally distributed. K-12 math learning software used in our study, each prob- lem item is conceptualized as a particular instantiation of 2.1 Continuous-timeGlicko-2Model a mathematical concept (“knowledge unit,”or just“unit”) The original Glicko-2 system presented in [7] assumes the withspecificvalues. Thesemathematicalconceptshavepre- underlying stochastic processes to be discrete-time, where requisiterelationshipsdefinedamongthem,thecollectionof the overall measurement period is discretized into time in- which can be represented as a directed graph. We attempt crementscalled“ratingperiods.”Withineachratingperiod, to employ these relationships to obtain statistically inter- thematchesareassumedtooccursimultaneously. However, pretable and contextually appropriate estimations. becausethereistoomuchimbalanceintheaveragenumber of matches between users and items, [7]’s recommendation Specifically,ourcontributionistwofold: 1)modificationofa of having 5-10 matches per rating period for every player is dynamicpairedcomparisonratingsystemmodeltoaccount not feasible to implement in our application context. [15] forimbalanceinratingupdatefrequenciesbetweenstudents hassuccessfullyworkedaroundthislimitationbyconstrain- and items, and 2) use of prerequisite relationships between ing each rating period to contain only one match, but the conceptsforratinginitializationtoachieveratingcompara- workaround did not account for an increase in rating un- bility between curriculum levels. We aim to yield, from a certainty due to the passage of time, which is a key feature stratified dataset, a set of ratings that can be meaningfully of the Glicko rating system family. Here, we take the ap- compared across grade levels: where students and items in proachofmodifyingtheGlicko-2modelunderacontinuous- a lower grade level would generally have lower ratings than time stochastic process framework, so that the model can those in a higher grade level. account for rating uncertainty increase due to the passage of time without discretizing the measurement period. The remainder of this paper is organized as follows. Sec- tion2presentsourparticularadaptionofadynamicpaired Letθ (t)denotetheabilityestimateofusersattimet,and s comparison model, including the details for incorporating let β (t) denote the difficulty estimate of unit i at time t. i the conceptual prerequisite information into rating initial- Thenasaresultofusingcontinuous-timestochasticprocess ization. Section 3 describes the dataset used for evaluating framework, the model equations for latent trait parameters our model and presents our results. Section 4 discusses the become potential for applying our model to assign grade levels for students far ahead or behind their peers, lists some of the θs(t) ∼ N(µs(t), φ2s(t)). (1) limitations of our work, and suggests a few possible direc- tions for further research. θ (t+∆t)|θ (t), σ2(t+∆t) ∼ N(θ (t), ∆tσ2(t+∆t)) (2) s s s s s logσ2(t+∆t) | logσ2(t), τ2 ∼ N(logσ2(t), τ2) (3) 2. MODEL s s s The Glicko-2 rating system [7] falls under the family of dy- for user ability estimates, and namicpairedcomparisonmodels,alongwiththeGlickorat- β (t) ∼ N(µ (t), φ2(t)) (4) ingsystem[6](itspredecessor)andtheEloratingsystem[4] i i i (of which the two Glicko systems are extensions). Improv- for unit difficulty estimates. Here, as in [8], µ denotes rat- inguponitspredecessor,theGlicko-2ratingsystemmodels ing, φ denotes rating deviation (RD), and σ denotes rating thechangeinvarianceofplayerstrengthasanotherstochas- volatility. Note that the difficulty of a mathematical con- ticprocess,therebyaccountingforthepossibilityofsudden ceptisexpectedtoremainconstantovertime,sowedonot changesinstrength. Morespecifically,thealgorithmmodels impose any stochastic volatility assumption on β (t). i the change in player strength per unit time with a normal distribution with variance equal to the square of the rating As for the correctness probability (i.e., the probability of volatility, whose logarithmic change per unit time is itself userscorrectlyansweringaninstantiationofunitiattime 880 Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) t),theGlickoratingsystemfamilydiffersfromtheElorating Also, in equation (6), p(a(t)|y ) is the marginal posterior s,i systeminitsincorporationofratinguncertaintytocalculate densityfunctionfora(t)=logσ2(t),approximatedusingthe s thisquantity. Wearegenerallyinterestedinthecorrectness productofthefollowingtwonormaldensityfunctions(here, probabilitybefore theusersactuallyattemptsuniti. How- ϕ(z;m,ς2) denotes the normal density function with mean ever, the time elapsed between the user’s last attempt and m and variance ς2): thecurrentattemptcanvarythroughouttheuser’sactivity history, which also varies the amount of inflation to apply each time on the user’s rating uncertainty, φs. Hence we 1. ϕ(a(t);a(ts),τ2), which comes from equation (3), and applyequation(2)priortocalculatingthecorrectnessprob- ability. Lett andt denotethelasttimeuserandunitlatent 2. ϕ(θ∗(t);µ (t ),φ2(t )+(t−t )ea(t)+v2(t)), which is s i s s s s s s s trait estimates, respectively, were updated. Let Y (t) be a the normal approximation of the marginal likelihood s,i Bernoulli random variable denoting user response correct- distributionofθ (t),whosemodeisdenotedwithθ∗(t). s s ness. Then the correctness probability is given by: Pr(Ys,i(t)=1)=E(µsˆ(t), µiˆ(t), φ2sˆ(t)+φ2iˆ(t)) (5) The latter normal density function features the quantity where E(µ1, µ2, φ2) = (cid:104)1+e−g(φ2)(µ1−µ2)(cid:105)−1 is the ex- (oθrs∗d(etr)T−ayµlso(rtse)x)p,awnshiiocnh. is approximated in [6] using first- pected score function that accounts for rating uncertainty [7], and Finally,notethattopreventaratingdeviationfrombecom- ing arbitrarily large, the quantity is constrained in equa- tions (7) and (8) to never exceed the value for a brand new • g(φ2)=(cid:104)1+ 3φ2(cid:105)−1/2, user/unit, just like how it was done in [5]. π2 • φ2sˆ(t)=φ2s(ts)+(t−ts)σs2(ts), 2.2 InitialParameterEstimation • φ2ˆ(t)=φ2(t ), To address the stratification issue mentioned in the intro- i i i duction, the user and unit ratings are differentially initial- • µˆ(t)=µ (t ), and ized based on their respective curricula. Instead of setting s s s each curriculum’s initial rating value arbitrarily, we want • µˆ(t)=µ (t ). thevaluestoreflectmorecloselyourpriorknowledgeofthe i i i distributions of concepts within each curriculum. Here, we use σs2(ts) in place of σs2(t) to estimate φ2sˆ(t), al- We find this prior knowledge in our proprietary conceptual though their equivalence only holds in expectation. precedencegraph,whereunitsarerepresentedasnodes(ver- tices) in a directed graph. Each edge (u,v) in the graph is After user s finishes solution attempt for unit i with result interpretedas:“Aninstanceofunituisbeingusedasastep ys,i ∈{0,1},theupdateequationsforlatenttraitestimates in solving an instance of unit v.”Hence unit u corresponds aregivenasbelow,following[7]’sderivationofcorresponding to a prerequisite concept that a user must have mastered equations under the continuous-time framework: before being able to successfully master unit v. (cid:18) (cid:19) σ2(t)=exp argmaxp(a(t)|y ) (6) The key idea in our usage of the graph is that a question s s,i a(t) item (corresponding to a specific knowledge unit) that in- volves one or more steps to solve must in general be harder φ2(t)=min(cid:40)φ2(0),(cid:20) 1 + 1 (cid:21)−1(cid:41) (7) thananyofthestepsthemselves. Henceweassigneachunit s s φ2(t )+σ2(t) v2(t) with anon-negative integer value, whichwe call“depth,”in s s s s such a way that for every edge, the tail node is assigned (cid:40) (cid:20) 1 1 (cid:21)−1(cid:41) with a lower depth value than the head node. This way, a φ2(t)=min φ2(0), + (8) concept appearing in a higher grade level would in general i i φ2(t ) v2(t) i i i correspondtoahigherdepthvalue(sincetheywouldgener- ally incorporate lower-level curriculum concepts as prereq- µ (t)=µ (t )+φ2(t)·g(φ2ˆ(t))·(y (t)−E (t)) (9) uisites), making the depth values roughly signify how“in- s s s s i s,i s depth”the corresponding concepts are. See Fig. 2 for an µ (t)=µ (t )+φ2(t)·g(φ2ˆ(t))·((1−y (t))−E (t)) (10) illustration. i i i i s s,i i In these equations, we have We also seek to differentiate among units with no parents (i.e., concepts with no prerequisites) by imposing that the • E (t)=E(µˆ(t),µˆ(t),φ2ˆ(t)), depthdifferencebetweenaunitanditssuccessorbeassmall s s i i inmagnitudeaspossible,whilestillensuringthateveryunit • E (t)=E(µˆ(t),µˆ(t),φ2ˆ(t)), has a strictly greater depth value than any of its parents. i i s s • v2(t)=(cid:104)g(φ2ˆ(t))2E (t)(1−E (t))(cid:105)−1, and From a graph theory perspective, the problem of assigning s i s s depth values can be formulated as a variant of layer assign- • vi2(t)=(cid:104)g(φ2sˆ(t))2Ei(t)(1−Ei(t))(cid:105)−1. wmietnhtmprinoibmleamldounmamdyirevcetretdicaesc,yfcolircmgarlalyphstGat=ed(aVs(tGhe),fEol(lGow))- Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 881 soadditionalstepsmustbetakentoequatethedepthvalue distributions for each curriculum across all WCCs. In par- ticular, we label each SCC with the lowest-level curriculum that features at least one of its constituent units. Next, we takethesmallestnumberofWCCsthattogethercontainall curriculumlabels. WecallthiscollectionofWCCsreference WCCs. Afterward,weoffsetthedepthvalueforeachSCCin everynon-referenceWCCtobeatleasttheminimumdepth value of all SCCs in the reference WCCs that are labeled with the same curriculum. Figure 2: Illustration of assigning depth values to knowledge units in a simple conceptual precedence Once the adjusted depth values for all SCCs (and thereby graph. Knowledge units are represented as nodes allunits)arethuscomputed,eachcurriculum’sdepthvalue (gray ovals). On the right of each oval, a red circle is set to be the average depth value of all units in the cur- shows the corresponding depth values assigned. riculum. Belowisthesummaryofprocedureforassigningdepthd(k) for each curriculum k∈X ={1,...,K}: 1. Let G = (V(G),E(G)) be our conceptual precedence graph, which is a directed graph such that each node v∈V(G) is associated with a curriculum χ(v)∈X. 2. Condense G to yield a directed acyclic graph C = (V(C),E(C)). 3. LetW ,...,W beWCCsofC,fromlargesttosmall- 1 n Figure3: Threeinstancesofsimplecyclesinthecon- est. ceptual precedence graph used in our study, which 4. ForeachW =(V(W ),E(W )),solvetheILPgivenin allbelongtoonestronglyconnectedcomponent. We i i i (11) to yield pre-adjustment depth values d (S) for foundthatcyclesexistmostlyduetothepresenceof init each SCC S. “gateway units”(shown in cyan ovals), whose main roleistoselectwhichconcepttoapplyfrommultiple 5. Label each SCC S with a curriculum related concepts. χ (S)= min χ(v). min v∈V(S) ing integer linear program (ILP): 6. LetA={W ,...,W }bethereferenceWCCs(defined 1 r (cid:88) above), such that r is minimized; i.e., choose no more min d(v)−d(u) WCCs than necessary. (u,v)∈E(G) (11) 7. For each curriculum k∈X, let s.t. d(v)−d(u)≥1 ∀(u,v)∈E(G) d(v)∈Z ∀v∈V(G) (cid:91)r ≥0 d (k)=min{d(S) | χ (S)=k,S ∈ V(W )}. min min i (here,d(v)denotesthedepthvalueassignedtonodev). For i=1 a general overview of the layer assignment problem and its 8. ForeachW =W ,...,W ,adjustdepthvalued(S) variations, readers are referred to Section 13.3 of [9]. j r+1 n for each SCC S ∈ W to be at least d (χ (S)). j min min However,dosoinawaythattheadjusteddepthvalues Two challenges arise in initializing rating values through still satisfy the constraints of the ILP given in (11). solvingthedepthassignmentproblem. Thefirstchallengeis that our conceptual precedence graph could contain cycles, 9. WenowhavetheadjusteddepthvaluesforeverySCC such as ones shown in Fig. 3. To address this challenge, we S ∈ V(C). For each SCC S, let d(v) = d(S) for all assignthesamedepthvaluetoallunitsinthesamestrongly v∈S. connected component (SCC), noting that any directed cy- 10. For each k∈X, let cle is strongly connected. Implementationally, this corre- sponds to solving the ILP given in (11) on the conceptual d(k)=mean{d(v) | v∈V(G), χ(v)=k}. precedencegraph’scondensation,whichisadirectedacyclic graph formed by contracting each SCC into one node. Wenowgiveeachusersorunitiassociatedwithcurriculum k as follows: The second challenge in assigning depths to nodes on the conceptualprecedencegraphisthatthegraph(andthusalso µs(0)=µmin+α·d(k) (12) its condensation) may consist of multiple weakly connected µ (0)=µ +α·d(k) (13) components (WCCs), which are subgraphs whose underly- i min ingundirectedgraphsareconnected. TheaboveILPassigns wherequantitiesµ andαarehyperparameterstobeop- min depthvaluesrelativeonlytootherSCCsinthesameWCC, timized. 882 Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 3. EVALUATION We evaluate our model using a dataset consisting of stu- dent practice records from January 2016 to December 2019 through our adaptive software used in math learning cen- ters located throughout the United States. Students are givenproblemstopracticebasedontheircurrentgradelevel and the content areas where they struggle. The data con- sistsof5,179,493recordsof10,194users’combinedattempts forproblemsassociatedwith7,513knowledgeunits,ranging from Grade 2 concepts to Algebra 2 concepts. When a stu- dentgetsaproblemwronginthefirstattempt,thestudent gets to make a second attempt for the same problem after being walked through the steps; in our analysis, however, only the first attempt’s result was considered. FortheGlicko-2modelhyperparameters,weusedthevalues suggested in [8]: 350.0 for the initial RD (in Glicko-1 scale; [8]showshowtoconvertbetweenthetwoscales)and0.06for theinitialuservolatility. Inthecaseofτ,forwhicharange of values is suggested, we used 0.5. The time elapsed from oneattempttothenext,usedinratinguncertaintyinflation, ismeasuredindays. Finally,throughextensivesimulations, wechoseα≈0.2303andµ ≈−2.8782,which,inGlicko- min 1scale(onwhichthevalueswereoriginallyset),areexactly 40.0 and 1000.0, respectively. Each unit’s associated curriculum was based on the infor- mation provided in our content management system. For Figure 4: Top: Cumulative RMSE values calculated units appearing in multiple curricula, the earliest curricu- at every 1,000 records. For effective visualization, lum in the sequence was used. For users, due to the lack onlyresultsfromthefirst500,000recordswereplot- of availability of exact registration dates for all users at the ted. Bottom: Reliability diagram with sharpness timeofthestudy,eachuser’scurriculumwassetasthecur- graph inserted in the lower right. riculumassociatedwiththefirstunitattemptedbytheuser. The initial parameters for both users and units were then set following the procedure described previously. 3.1 PredictivePerformance To assess the predictive performance of our adaptation of theGlicko-2ratingsystem,weplottedthechangeinRMSE values for every 1,000 records over time (for the rationale behind the metric choice, see [14]). As the latent trait esti- mates are calibrated based on student practice records, we expect the RMSE across the entire system to decay over time. We see that this is exactly the case in Fig. 4, where the calibration curve for our model is also reported along with the reliability and resolution values. We also report a convergent pattern in unit rating values and dynamically adjusting user rating values, analogous to the results obtained in [15], in Fig. 5. 3.2 GradationofUnitRatingDistributions Wealsoplotthedistributionsofthefinalunitratingvalues for each curriculum. We expect that using a conceptual precedencegraphtoinitializeratingvalueswouldcausethe centraltendenciesoftheratingdistributionswouldshowan upward trend as the curriculum level increases. As shown in Fig. 6, the final ratings computed without the graph- based rating initialization fail to show an upward trend in Figure 5: Rating values as a function of time for 5 the mean rating values, whereas they do with the graph- most frequently attempted units (top) and for the basedratinginitialization. Alsonoteworthyisthecomplete user with the most number of attempts (bottom). disappearance of overlap in IQR between two curricula far Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 883 Figure 6: Final distributions of knowledge unit ratings. Orange bars indicate medians, green dots indicate means. Note that the rating values on the vertical axis are on the Glicko-1 scale. apartfromeachother,suchasGrade2andAlgebra2,upon AnotherpotentialthreattothevalidityofusingtheGlicko- using a conceptual precedence graph to initialize ratings. 2 model for student ability measurement is in its unidi- mensionality assumption. Part of the challenge of verifying whether the student response data can be modeled with a 4. DISCUSSION one-dimensional construct in a learning setting is that un- We have used conceptual prerequisite relationships to give likeinIRTsettings,astudent’sabilityisexpectedtochange our model a better prior distribution— one that better re- throughoutthedatacollectionperiod. Aninterestingfuture flects the stratified nature of student practice data. The direction would be to investigate whether there is sufficient depthvaluesusedtocalculatetheinitialratingvalues,how- evidence to suggest that students’ mathematical ability is ever, are still quite coarse estimates; for example, the dif- multidimensional, and if so, how a model like the Glicko-2 ference in difficulty between a unit and one of its prereq- rating system can be extended to reflect the multidimen- uisite units may not be even across the conceptual prece- sionality; thedegreetowhichtheextensionpresentedin[1] dence graph. Nevertheless, we see that the distribution of can be applied also remains to be seen. thelowest-levelcurriculum(Grade2inourstudy)andthat of the highest-level one (Algebra 2 in our study) show a Also, when assigning each curriculum with a depth value, substantially little overlap compared to when we used the the average depth values for all constituent units were cal- initialization method of the original Glicko-2 system, which culated. In practice, however, as learning software product suggeststhattherewasstillanontrivialimprovement. Note continuestoexpand,unitscanbeaddedorremoved,ortheir thattheseparationofunitratingdistributionsbetweentwo edge connections may change. Our current choice of taking adjacent curricula (for example, Grade 2 and Grade 3) are an average makes the algorithm sensitive to changes in the not well separated. This is expected, as we would not ex- conceptual precedence graph’s internal connectivity struc- pectahugejumpintermsofcurriculumdifficultyfromone ture. Median may be a more robust, and thus more practi- school year to the next. cal, choice, though this may come at the risk of decreased differentiability across consecutive curricula. One interesting area of application of this framework is de- termining the appropriate grade level for students whose 5. CONCLUSION mathematical achievement levels are substantially ahead or WehavepresentedanadaptationoftheGlicko-2ratingsys- behindtheirgradelevels. Withestimatesofitemdifficulties teminaK-12mathlearningsoftwarecontext. Thestratified that account for grade-level hierarchy, we can have a data- nature of student-item pairings has made effective discrim- based justification that would allow gifted students to be ination of students and problems across grade levels chal- placed at a higher-level curriculum that is neither too hard lenging. We have shown evidence that by using the prereq- nortooeasyforthem. Likewise,wecouldallowforstudents uisiterelationshipsbetweenconceptstoinitializeratingval- laggingbehindtheirpeerstobeplacedatalower-levelcur- ues, we can allow for the gradation of rating distributions riculum, where they could ensure that their foundational from lower-level curriculum to the higher-level curriculum understanding of lower-level mathematical concepts is firm whileensuringthatthepredictionerrorforstudentresponse before moving onto the next grade level. For this applica- correctnessstilldecreasesovertime. Apotentialareaofap- tion, a separate round of validation with external measure- plication is for determining the grade level appropriate for ments, e.g., standardized test scores, must first take place. students substantially ahead or behind their peers. A well-known limitation of using the Glicko rating system family for educational applications is its inability to model 6. ACKNOWLEDGEMENT multiple-choice item correctness probabilities. This is be- I thank my boss Kurt Cho, who gave nudges in productive cause the correctness probability of such an item has an directions whenever I was stuck, and my colleagues Sungh- infimum strictly greater than 0, making the corresponding wanChoandSeunghunLee,whoworkedondevelopingdata probability distribution improper. Hence a natural future pipelineinfrastructureonwhichtheproposedmodelcanbe directionwouldbetoaddressthislimitation,e.g.,byincor- deployed. Also, I thank everyone in my company, who pa- porating the particle-based method presented in [12]. tientlywaitedinsupportwhiletheprojectwasintheworks. 884 Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 7. REFERENCES [1] L. Cai. Potential applications of latent variable modeling for the psychometrics of medical simulation. Military Medicine, 178(suppl 10):115–120, 2013. [2] M. C. Desmarais and R. S. J. d. Baker. A review of recent advances in learner and skill modeling in intelligent learning environments. User Modeling and User-Adapted Interaction, 22(1-2):9–38, 2012. [3] J. S. Eccles. Expectancies, values and academic behaviors. Achievement and achievement motives, pages 74–146, 1983. [4] A. E. Elo. The Rating of Chessplayers, Past and Present. Arco Publishing, 1978. [5] M. E. Glickman. The Glicko system. http://www.glicko.net/glicko/glicko.pdf. [6] M. E. Glickman. Parameter estimation in large dynamic paired comparison experiments. Journal of the Royal Statistical Society: Series C (Applied Statistics), 48(3):377–394, 1999. [7] M. E. Glickman. Dynamic paired comparison models with stochastic variances. Journal of Applied Statistics, 28(6):673–689, 2001. [8] M. E. Glickman. Example of the Glicko-2 system. http://www.glicko.net/glicko/glicko2.pdf, 2013. [9] P. Healy and N. Nikolov. Hierarchical Drawing Algorithms, pages 409–454. 08 2013. [10] C. S. Hulleman, K. E. Barron, J. J. Kosovich, and R. A. Lazowski. Student motivation: Current theories, constructs, and interventions within an expectancy-value framework. In Psychosocial Skills and School Systems in the 21st Century, pages 241–278. Springer, 2016. [11] S. Klinkenberg, M. Straatemeier, and H. L. van der Maas. Computer adaptive practice of maths ability using a new item response model for on the fly ability and difficulty estimation. Computers & Education, 57(2):1813–1824, 2011. [12] J. Niˇznan, R. Pela´nek, and J. Riha´k. Student models for prior knowledge estimation. International Educational Data Mining Society, 2015. [13] J. Papousek, R. Pela´nek, and V. Stanislav. Adaptive practice of facts in domains with varied prior knowledge. In Educational Data Mining 2014, 2014. [14] R. Pela´nek. Metrics for evaluation of student models. Journal of Educational Data Mining, 7(2):1–19, 2015. [15] R. Reddick. Using a Glicko-based algorithm to measure in-course learning. International Educational Data Mining Society, 2019. [16] S. Sampayo-Vargas, C. J. Cope, Z. He, and G. J. Byrne. The effectiveness of adaptive difficulty adjustments on students’ motivation and learning in an educational computer game. Computers & Education, 69:452–462, 2013. Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 885

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.