Table Of ContentMathematical Language Processing:
Automatic Grading and Feedback
for Open Response Mathematical Questions
AndrewS.Lan,DivyanshuVats,AndrewE.Waters,RichardG.Baraniuk
RiceUniversity
Houston,TX77005
mr.lan,dvats,waters,richb @sparfa.com
5 { }
1
ABSTRACT INTRODUCTION
0
While computer and communication technologies have pro- Large-scaleeducationalplatformshavethecapabilitytorev-
2
videdeffectivemeanstoscaleupmanyaspectsofeducation, olutionize education by providing inexpensive, high-quality
n thesubmissionandgradingofassessmentssuchashomework learning opportunities for millions of learners worldwide.
a
assignmentsandtestsremainsaweaklink. Inthispaper,we Examples of such platforms include massive open online
J
studytheproblemofautomaticallygradingthekindsofopen courses (MOOCs) [6, 7, 9, 10, 16, 42], intelligent tutoring
8
response mathematical questions that figure prominently in systems[43],computer-basedhomeworkandtestingsystems
1
STEM (science, technology, engineering, and mathematics) [1,31,38,40],andpersonalizedlearningsystems[24].While
courses. Our data-driven framework for mathematical lan- computerandcommunicationtechnologieshaveprovidedef-
]
L guageprocessing(MLP)leveragessolutiondatafromalarge fective means to scale up the number of learners viewing
M number of learners to evaluate the correctness of their solu- lectures (via streaming video), reading the textbook (via the
tions, assign partial-credit scores, and provide feedback to web), interacting with simulations (via a graphical user in-
t. eachlearneronthelikelylocationsofanyerrors. MLPtakes terface),andengagingindiscussions(viaonlineforums),the
a
inspiration from the success of natural language processing submissionandgradingofassessmentssuchashomeworkas-
t
s for text data and comprises three main steps. First, we con- signmentsandtestsremainsaweaklink.
[
vert each solution to an open response mathematical ques-
1 tion into a series of numerical features. Second, we clus- There is a pressing need to find new ways and means to au-
tomatetwocriticaltasksthataretypicallyhandledbythein-
v ter the features from several solutions to uncover the struc-
structororcourseassistantsinasmall-scalecourse: (i)grad-
6 turesofcorrect,partiallycorrect,andincorrectsolutions. We
ing of assessments, including allotting partial credit for par-
4 develop two different clustering approaches, one that lever-
tiallycorrectsolutions,and(ii)providingindividualizedfeed-
3 agesgenericclusteringalgorithmsandonebasedonBayesian
4 nonparametrics. Third, we automatically grade the remain- backtolearnersonthelocationsandtypesoftheirerrors.
0
ing(potentiallylargenumberof)solutionsbasedontheiras- Substantial progress has been made on automated grading
.
1 signed cluster and one instructor-provided grade per cluster. andfeedbacksystemsinseveralrestricteddomains,including
0 Asabonus,wecantracktheclusterassignmentofeachstep essayevaluationusingnaturallanguageprocessing(NLP)[1,
5 ofamultistepsolutionanddeterminewhenitdepartsfroma 33], computer program evaluation [12, 15, 29, 32, 34], and
1 cluster of correct solutions, which enables us to indicate the mathematicalproofverification[8,19,21].
v: likely locations of errors to learners. We test and validate
Inthispaper,westudytheproblemofautomaticallygrading
i MLP on real-world MOOC data to demonstrate how it can
X the kinds of open response mathematical questions that fig-
substantially reduce the human effort required in large-scale
ureprominentlyinSTEM(science,technology,engineering,
r educationalplatforms.
a and mathematics) education. To the best of our knowledge,
thereexistnotoolstoautomaticallyevaluateandallotpartial-
AuthorKeywords
credit scores to the solutions of such questions. As a result,
Automaticgrading,Machinelearning,Clustering,Bayesian
large-scale education platforms have resorted either to over-
nonparametrics,Assessment,Feedback,Mathematical
simplifiedmultiplechoiceinputandbinarygradingschemes
languageprocessing
(correct/incorrect),whichareknowntoconveylessinforma-
tionaboutthelearners’knowledgethanopenresponseques-
tions[17],orpeer-gradingschemes[25,26],whichshiftthe
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor burdenofgradingfromthecourseinstructortothelearners.1
classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributed
forprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitation
onthefirstpage. Copyrightsforcomponentsofthisworkownedbyothersthan
ACMmustbehonored. Abstractingwithcreditispermitted. Tocopyotherwise, 1While peer grading appears to have some pedagogical value for
or republish, to post on servers or to redistribute to lists, requires prior specific learners[30],eachlearnertypicallyneedstogradeseveralsolutions
permissionand/orafee.Requestpermissionsfrompermissions@acm.org. fromotherlearnersforeachquestiontheysolve,inordertoobtain
L@S’15,March14–March15,2015,Vancouver,Canada.
anaccurategradeestimate.
Copyright©2015ACMISBN/15/03...$15.00.
First, we convert each solution to an open response mathe-
maticalquestionintoaseriesofnumericalfeatures. Inderiv-
ingthesefeatures, wemakeuseofsymbolicmathematicsto
transformmathematicalexpressionsintoacanonicalform.
Second,weclusterthefeaturesfromseveralsolutionstoun-
coverthestructuresofcorrect,partiallycorrect,andincorrect
solutions. We develop two different clustering approaches.
(a)Acorrectsolutionthatreceives3/3 (b)Anincorrectsolutionthatreceives
MLP-Susesthenumericalfeaturestodefineasimilarityscore
credits 2/3creditsduetoanerrorinthelast
betweenpairsofsolutionsandthenappliesagenericcluster-
expression
ingalgorithm,suchasspectralclustering(SC)[22]oraffinity
propagation (AP) [11]. We show that MLP-S is also useful
forvisualizingmathematicalsolutions. Thiscanhelpinstruc-
tors identify groups of learners that make similar errors so
thatinstructorscandeliverpersonalizedremediation.MLP-B
(c)Anincorrectsolutionthatreceives (d)Anincorrectsolutionthatreceives definesanonparametricBayesianmodelforthesolutionsand
1/3creditsduetoanerrorinthesecond 0/3credits appliesaGibbssamplingalgorithmtoclusterthesolutions.
expression
Third, onceahumanassignsagradetoatleastonesolution
Figure1: Examplesolutionstothequestion“Findthederiva- in each cluster, we automatically grade the remaining (po-
tiveof(x3 +sinx)/ex”thatwereassignedscoresof3,2,1 tentially large number of) solutions based on their assigned
and0outof3,respectively,byourMLP-Balgorithm. cluster. As a bonus, in MLP-B, we can track the cluster as-
signment of each step in a multistep solution and determine
whenitdepartsfromaclusterofcorrectsolutions,whichen-
ablesustoindicatethelikelylocationsoferrorstolearners.
(x2+x+sin2x+cos2x)(2x 3)
� IndevelopingMLP,wetacklethreemainchallengesofana-
=2x3+2x2+2xsin2x+2xcos2x
lyzingopenresponsemathematicalsolutions. First,solutions
3x2 3x 3sin2x 3cos2x
� � � � mightcontaindifferentnotationsthatrefertothesamemath-
=2x3 x2 3x+2x(sin2x+cos2x)
� � ematicalquantity. Forinstance,inFigure1, thelearnersuse
�3(sin2x+cos2x) bothe−x and 1 torefertothesamequantity. Second,some
=2x3 x2 3x+2x(1) 3(1) ex
� � � questions admit more than one path to the correct/incorrect
=2x3 x2 x 3
� � � solution. For instance, in Figure 2 we see two different yet
(a)Acorrectsolutionthatmakesthe (b)Acorrectsolutionthatmakesthe correct solutions to the same question. It is typically infea-
simplification sin2x+cos2x=1in thesimplification sin2x+cos2x=1 sible for an instructor to enumerate all of these possibilities
thefirstexpression inthethirdexpression toautomatethegradingandfeedbackprocess. Third,numer-
ically verifying the correctness of the solutions does not al-
Figure2:Examplesoftwodifferentyetcorrectpathstosolve waysapplytomathematicalquestions,especiallywhensim-
the question “Simplify the expression (x2 + x + sin2x + plificationsarerequired. Forexample,aquestionthatasksto
cos2x)(2x 3).” simplifytheexpressionsin2x+cos2x+xcanhaveboth1+x
−
andsin2x+cos2x+xasnumericallycorrectanswers,since
boththeseexpressionsoutputthesamevalueforallvaluesof
x.However,thecorrectansweris1+x,sincethequestionex-
MainContributions
pectsthelearnerstorecognizethatsin2x+cos2x=1.Thus,
Inthispaper,wedevelopadata-drivenframeworkformath-
methodsdevelopedtocheckthecorrectnessofcomputerpro-
ematicallanguageprocessing(MLP)thatleveragessolution
gramsandformulaebyspecifyingarangeofdifferentinputs
datafromalargenumberoflearnerstoevaluatethecorrect-
andcheckingforthecorrectoutputs,e.g.,[32],cannotalways
ness of solutions to open response mathematical questions,
be applied to accurately grade open response mathematical
assign partial-credit scores, and provide feedback to each
questions.
learneronthelikelylocationsofanyerrors. Thescopeofour
frameworkis broadand coversquestions whosesolutionin-
RelatedWork
volvesoneormoremathematicalexpressions. Thisincludes
Prior work has led to a number of methods for grading and
notjustformalproofsbutalsothekindsofmathematicalcal-
providing feedback to the solutions of certain kinds of open
culations that figure prominently in science and engineering
response questions. A linear regression-based approach has
courses. Examples of solutions to two algebra questions of
beendevelopedtogradeessaysusingfeaturesextractedfrom
variouslevelsofcorrectnessaregiveninFigures1and2. In
a training corpus using Natural Language Processing (NLP)
this regard, our work differs significantly from that of [8],
[1,33]. Unfortunately,suchasimpleregression-basedmodel
whichfocusesexclusivelyonevaluatinglogicalproofs.
doesnotperformwellwhenappliedtothefeaturesextracted
Our MLP framework, which is inspired by the success of frommathematicalsolutions. Severalmethodshavebeende-
NLP methods for the analysis of textual solutions (e.g., es- veloped for automated analysis of computer programs [15,
saysandshortanswer),comprisesthreemainsteps. 32]. However, these methods do not apply to the solutions
toopenresponsemathematicalquestions,sincetheylackthe MLP identifies the unique mathematical expressions con-
structure and compilability of computer programs. Several tained in the learners’ solutions and uses them as features,
methodshavealsobeendevelopedtocheckthecorrectnessof effectively extending the bag-of-words model to use mathe-
thelogicinmathematicalproofs[8,19,21]. However,these maticalexpressionsasfeaturesratherthanwords. Tocoina
methodsapplyonlytomathematicalproofsinvolvinglogical phrase,MLPusesanovelbag-of-expressionsmodel.
operationsandnotthekindsofopen-endedmathematicalcal-
Oncethemathematicalexpressionshavebeenextractedfrom
culations that are often involved in science and engineering
a solution, we parse them using SymPy, the open source
courses.
Python library for symbolic mathematics [36].2 SymPy has
The idea of clustering solutions to open response questions powerful capability for simplifying expressions. For exam-
into groups of similar solutions has been used in a number ple, x2 +x2 can be simplified to 2x2, and exx2/e2x can be
of previous endeavors: [2, 5] uses clustering to grade short, simplifedtoe−xx2. Inthisway,wecanidentifytheequiva-
textual answers to simple questions; [23] uses clustering to lenttermsinexpressionsthatrefertothesamemathematical
visualize a large collection of computer programs; and [28] quantity, resulting in more accurate features. In practice for
uses clustering to grade and provide feedback on computer somequestions,however,itmightbenecessarytotonedown
programs. Althoughthehigh-levelconceptunderlyingthese thelevelofSymPy’ssimplification. Forinstance, thekeyto
worksisresonantwiththeMLPframework,thefeaturebuild- solvingthequestioninFigure2istosimplifytheexpression
ingtechniquesusedinMLPareverydifferent,sincethestruc- usingthePythagoreanidentitysin2x+cos2x=1.IfSymPy
tureofmathematicalsolutionsdifferssignificantlyfromshort is called on to perform such a simplification automatically,
textualanswersandcomputerprograms. thenitwillnotbepossibletoverifywhetheralearnerhascor-
rectlynavigatedthesimplificationintheirsolution. Forsuch
This paper is organized as follows. In the next section, we
problems,itisadvisabletoperformonlyarithmeticsimplifi-
develop our approach to convert open response mathemati-
cations.
cal solutions to numerical features that can be processed by
machine learning algorithms. We then develop MLP-S and Afterextractingtheexpressionsfromthesolutions,wetrans-
MLP-B and use real-world MOOC data to showcase their form the expressions into numerical features. We assume
abilitytoaccuratelygradealargenumberofsolutionsbased that N learners submit solutions to a particular mathemati-
on the instructor’s grades for only a small number of solu- cal question. Extracting the expressions from each solution
tions, thus substantially reducing the human effort required using SymPy yields a total of V unique expressions across
inlarge-scaleeducationalplatforms. Weclosewithadiscus- theN solutions.
sionandperspectivesonfutureresearchdirections.
We encode the solutions in a integer-valued solution feature
matrix Y NV×N whose rows correspond to different ex-
∈
pressions and whose columns correspond to different solu-
MLPFEATUREEXTRACTION
ThefirststepinourMLPframeworkistotransformacollec- tions;thatis,the(i,j)thentryofYisgivenby
tion of solutions to an open response mathematical question
Y =timesexpressioniappearsinsolutionj.
into a set of numerical features. In later sections, we show i,j
how the numerical features can be used to cluster and grade EachcolumnofYcorrespondstoanumericalrepresentation
solutionsaswellasgenerateinformativelearnerfeedback. ofamathematicalsolution. Notethatwedonotconsiderthe
orderingoftheexpressionsinthismodel; suchanextension
Asolutiontoanopenresponsemathematicalquestionwillin
is an interesting avenue for future work. In this paper, we
generalcontainamixtureofexplanatorytextandcoremath-
indicate in Y only the presence and not the frequency of an
ematicalexpressions. Sincethecorrectnessofasolutionde-
expression,i.e.,Y 0,1 V×N and
pendsprimarilyonthemathematicalexpressions,wewillig- ∈{ }
norethetextwhenderivingfeatures. However,werecognize (cid:26)
1 if expressioniappearsinsolutionj
thatthetextispotentiallyveryusefulforautomaticallygener- Y = (1)
i,j 0 otherwise.
atingexplanationsforvariousmathematicalexpressions. We
leavethisavenueforfuturework.
Theextensiontoencodingfrequenciesisstraightforward.
AworkhorseofNLPisthebag-of-wordsmodel;ithasfound
To illustrate how the matrix Y is constructed, consider the
tremendous success in text semantic analysis. This model
solutionsinFigure2(a)and(b). Acrossbothsolutions,there
treats a text document as a collection of words and uses the
are 7 unique expressions. Thus, Y is a 7 2 matrix, with
frequencies of the words as numerical features to perform ×
each row corresponding to a unique expression. Letting the
tasksliketopicclassificationanddocumentclustering[4,5].
first four rows of Y correspond to the four expressions in
A solution to an open response mathematical question con- Figure2(a)andtheremainingthreerowstoexpressions2–4
sistsofaseriesofmathematicalexpressionsthatarechained inFigure2(b),wehave
togetherbytext, punctuation, ormathematicaldelimitersin-
(cid:20) (cid:21)T
cluding =, , >, , , etc. For example, the solution Y = 1 1 1 1 0 0 0 .
in Figure 1(b≤) conta∝ins≈the expressions ((x3 + sinx)/ex)(cid:48), 1 0 0 1 1 1 1
((3x2+cosx)ex (x3+sinx)ex))/e2x,and(2x2 x3+
cosx sinx)/ex−thatareallseparatedbythedelimit−er“=”. 2Inparticular,weusetheparse exprfunction.
−
We end this section with the crucial observation that, for a belongtothesamecluster. Foreachfigure,weshowasam-
widerangeofmathematicalquestions,manyexpressionswill plesolutionfromsomeoftheseclusters,withtheboxedsolu-
besharedacrosslearners’solutions.Thisistrue,forinstance, tionscorrespondingtocorrectsolutions. Wecanmakethree
inFigure2. Thissuggeststhattherearealimitednumberof interestingobservationsfromFigure3:
types of solutions to a question (both correct and incorrect)
In the top left figure, we cluster a solution with the final
andthatsolutionsofthesametypetendtendtobesimilarto • answer3x2+cosx (x3+sinx))/exwithasolutionwith
each other. This leads us to the conclusion that the N solu- −
tionstoaparticularquestioncanbeeffectivelyclusteredinto thefinalanswer3x2+cosx (x3+sinx))/ex. Although
−
K N clusters. In the next two sections, we will develop thelatersolutionisincorrect,itcontainedatypographical
ML(cid:28)P-S and MLP-B, two algorithms to cluster solutions ac- error where 3 x 2 was typed as 3 x 2. MLP-S is
∗ ∧ ∧ ∧
cordingtotheirnumericalfeatures. able to identify this typographical error, since the expres-
sion before the final solution is contained in several other
correctsolutions.
MLP-S:SIMILARITY-BASEDCLUSTERING
In this section, we outline MLP-S, which clusters and then
In the top right figure, the correct solution requires iden-
gradessolutionsusingasolutionsimilarity-basedapproach. • tifying the trigonometric identify sin2x + cos2x = 1.
Theclusteringalgorithmisabletoidentifyasubsetofthe
TheMLP-SModel
learnerswhowerenotabletoidentifythisrelationshipand
WestartbyusingthesolutionfeaturesinYtodefineanotion
hencecouldnotsimplifytheirfinalexpression.
of similarity between pairs of solutions. Define the N N
×
similarity matrix S containing the pairwise similarities be-
MLP-S is able to identify solutions that are strongly con-
tween all solutions, with its (i,j)th entry the similarity be- •
nectedtoeachother.Suchavisualizationcanbeextremely
tweensolutionsiandj
useful for course instructors. For example, an instructor
yTy can easily identify a group of learners who lack mastery
S = i j . (2) ofacertainskillthatresultsinacommonerrorandadjust
i,j min yTy ,yTy
{ i i j j} theircourseplanaccordinglytohelptheselearners.
Thecolumnvectory denotestheithcolumnofYandcorre-
i
spondstolearneri’ssolution. Informally,S isthenumber Auto-GradingviaMLP-S
i,j
ofcommonexpressionsbetweensolutioniandsolutionj di- HavingclusteredallsolutionsintoasmallnumberK ofclus-
videdbytheminimumofthenumberofexpressionsinsolu- ters, we assign the same grade to all solutions in the same
tionsiandj. Alarge/smallvalueofS correspondstothe cluster. Ifacourseinstructorassignsagradetoonesolution
i,j
twosolutionsbeingsimilar/dissimilar. Forexample,thesim- from each cluster, then MLP-S can automatically grade the
ilarity between the solutions in Figure 1(a) and Figure 1(b) remainingN Ksolutions. Weconstructtheindexset S of
− I
is1/3andthesimilaritybetweenthesolutionsinFigure2(a) solutionsthatthecourseinstructorneedstogradeas
and Figure 2(b) is 1/2. S is symmetric, and 0 Si,j 1.
Equation (2) is just one of any possible solutio≤n simil≤arity (cid:88)N
metrics. Wedeferthedevelopmentofothermetricstofuture S = argmax Si,j, k =1,2,...,K ,
work. I i∈Ck j=1
where represents the index set of the solutions in cluster
ClusteringSolutionsinMLP-S Ck
k. In words, in each cluster, we select the solution having
Having defined the similarity S between two solutions i
i,j the highest similarity to the other solutions (ties are broken
andj,wenowclustertheN solutionsintoK N clusters
randomly)toincludein . Wedemonstratetheperformance
(cid:28) S
suchthatthesolutionswithineachclusterhavehighsimilarity I
ofauto-gradingviaMLP-Sintheexperimentalresultssection
score between them and solutions in different clusters have
below.
lowsimilarityscorebetweenthem.
Given the similarity matrix S, we can use any of the mul- MLP-B:BAYESIANNONPARAMETRICCLUSTERING
titude of standard clustering algorithms to cluster solutions. In this section, we outline MLP-B, which clusters and then
Two examples of clustering algorithms are spectral cluster- gradessolutionsusingaBayesiannonparameterics-basedap-
ing (SC) [22] and affinity propagation (AP) [11]. The SC proach. TheMLP-Bmodelandalgorithmcanbeinterpreted
algorithmrequiresspecifyingthenumberofclustersK asan asanextensionofthemodelin[44],whereasimilarapproach
inputparameter,whiletheAPalgorithmdoesnot. isproposedtoclustershorttextdocuments.
Figure3illustrateshowAPisabletoidentifyclustersofsim-
ilar solutions from solutions to four different mathematical TheMLP-BModel
questions. The figures on the top correspond to solutions to Following the key observation that the N solutions can be
the questions in Figures 1 and 2, respectively. The bottom effectivelyclusteredintoK N clusters,letzbetheN 1
(cid:28) ×
two figures correspond to solutions to two signal processing clusterassignmentvector,withzj 1,...,K denotingthe
∈{ }
questions. Eachnodeinthefigurecorrespondstoasolution, cluster assignment of the jth solution with j 1,...,N .
∈ { }
and nodes with the same color correspond to solutions that Using this latent variable, we model the probability of the
Figure3: IllustrationoftheclustersobtainedbyMLP-Sbyapplyingaffinitypropagation(AP)onthesimilaritymatrixScorre-
spondingtolearners’solutionstofourdifferentmathematicalquestions(seeTable1formoredetailsaboutthedatasetsandthe
Appendixforthequestionstatements). Eachnodecorrespondstoasolution. Nodeswiththesamecolorcorrespondtosolutions
thatareestimatedtobeinthesamecluster. Thethicknessoftheedgebetweentwosolutionsisproportionaltotheirsimilarity
score. Boxedsolutionsarecorrect;allothersareinvaryingdegreesofcorrectness.
solutionofalllearners’solutionstothequestionas ofΦandcharcterizesthemultinomialdistributionoverallthe
V featuresforclusterk.
N (cid:32) K (cid:33)
(cid:89) (cid:88)
p(Y)= p(y z =k)p(z =k) , Inpractice,oneoftenhasnoinformationregardingthenum-
j j j
|
j=1 k=1 berofclustersK. Therefore,weconsiderK asanunknown
parameter and infer it from the solution data. In order to do
whereyj,thejthcolumnofthedatamatrixY,correspondsto so, we impose a Chinese restaurant process (CRP) prior on
learnerj’ssolutiontothequestion. Herewehaveimplicitly the cluster assignments z, parameterized by a parameter α.
assumedthatthelearners’solutionsareindependentofeach TheCRPcharacterizestherandompartitionofdataintoclus-
other. By analogy to topic models [4, 35], we assume that ters,inanalogytotheseatingprocessofcustomersinaChi-
learnerj’ssolutiontothequestion,yj,isgeneratedaccording neserestaurant. ItiswidelyusedinBayesianmixturemodel-
toamultinomialdistributiongiventheclusterassignmentsz ingliterature[3,14]. UndertheCRPprior,thecluster(table)
as assignmentofthejth solution(customer),conditionedonthe
clusterassignmentsofalltheothersolutions,followsthedis-
p(y z =k)=Mult(y φ )
j j j k
| | tribution
(cid:80)
( Y )!
= i i,j ΦY1,jΦY2,j...ΦYV,j,
Y !Y !...Y ! 1,k 2,k V,k
1,j 2,j V,j
(3) (cid:26) nk,¬j if clusterkisoccupied,
p(z =k z ,α)= N−1+α
whereΦ [0,1]V×K isaparametermatrixwithΦv,k denot- j | ¬j N−α1+α if clusterkisempty,
ingits(v,∈k)th entry. φ [0,1]V×1 denotesthekth column (4)
k
∈
α
α whereΓ()istheGammafunction.
β Φ Y z α ·
Ifaclusterbecomesemptyafterweremoveasolutionfrom
K N α its current cluster, then we remove it from our sampling
β
process and erase its corresponding multinomial parame-
Figure4: Graphicalmodelofthegenerationprocessofsolu- ter vector φk. If a new clusteris sampled for zj, thenwe
sample its multinomial parameter vector φ immediately
tionstomathematicalquestions. α , α andβ arehyperpa- k
α β
rameters,zandΦarelatentvariablestobeinferred,andYis according to Step 2 below. Otherwise, we do not change
φ untilwehavefinishedsamplingzforallsolutions.
theobserveddatadefinedin(1). k
2. Sample Φ: For each cluster k, sample φ from its pos-
k
terior Dir(φ n +β,...,n +β), where n is the
wherenk,¬j representsthenumberofsolutionsthatbelongto k| 1,k V,k i,k
clusterkexcludingthecurrentsolutionj,with(cid:80)K n = number of times feature i occurs in the solutions that be-
k=1 k,¬j longtoclusterk.
N 1. Thevectorz representstheclusterassignmentsof
¬j
−
theothersolutions. Theflexibilityofallowinganysolutionto 3. Sampleα:Sampleαusingtheapproachdescribedin[41].
startanewclusterofitsownenablesustoautomaticallyin-
ferKfromdata.Itisknown[37]thattheexpectednumberof 4. Update β: Update β using the fixed-point procedure de-
clustersundertheCRPpriorsatisfiesK O(αlogN) N, scribedin[20].
∼ (cid:28)
soourmethodscaleswellasthenumberoflearnersN grows
The output of the Gibbs sampler is a series of samples that
large. WealsoimposeaGammapriorα Gam(α ,α )on
α β
∼ correspond to the approximate posterior distribution of the
αtohelpusinferitsvalue.
various parameters of interest. To make meaningful infer-
SincethesolutionfeaturedataYisassumedtofollowamulti- encefortheseparameters(suchastheposteriormeanofapa-
nomial distribution parameterized by Φ, we impose a sym- rameter), it is important to appropriately post-process these
metricDirichletprioroverΦasφk Dir(φk β)becauseof samples. Forourestimateofthetruenumberofclusters,K(cid:98),
∼ |
itsconjugacywiththemultinomialdistribution[13]. wesimplytakethemodeoftheposteriordistributiononthe
Thegraphicalmodelrepresentationofourmodelisvisualized numberofclustersK. WeuseonlyiterationswithK =K(cid:98) to
in Figure 4. Our goal next is to estimate the cluster assign- estimatetheposteriorstatistics[39].
mentszforthesolutionofeachlearner,theparametersφ of
k Inmixturemodels,theissueof“label-switching”cancausea
eachcluster,andthenumberofclustersK,fromthebinary-
modeltobeunidentifiable, becausetheclusterlabelscanbe
valuedsolutionfeaturedatamatrixY.
arbitrarilypermutedwithoutaffectingthedatalikelihood. In
ordertoovercomethisissue,weuseanapproachreportedin
ClusteringSolutionsinMLP-B
[39]. First, we compute the likelihood of the observed data
We use a Gibbs sampling algorithm for posterior inference ineachiterationasp(Y Φ(cid:96),z(cid:96)), whereΦ(cid:96) andz(cid:96) represent
under the MLP-B model, which automatically groups solu- the samples of these var|iables at the (cid:96)th iteration. After the
tionsintoclusters. Westartbyapplyingagenericclustering
algorithmterminates,wesearchfortheiteration(cid:96) withthe
algorithm (e.g., K-means, with K = N/10) to initialize z, largest data likelihood and then permute the labemlsaxz(cid:96) in the
and then initialize Φ accordingly. Then, in each iteration of
otheriterationstobestmatchΦ(cid:96) withΦ(cid:96)max. WeuseΦ(cid:98) (with
MLP-B,weperformthefollowingsteps:
columnsφˆ )todenotetheestimateofΦ,whichissimplythe
k
1. Samplez: Foreachsolutionj,weremoveitfromitscur- posteriormeanofΦ.Eachsolutionjisassignedtothecluster
rent cluster and sample its cluster assignment zj from the indexedbythemodeofthesamplesfromtheposteriorofzj,
posteriorp(zj =k|z¬j,α,Y). UsingBayesrule,wehave denotedbyzˆj.
p(z =k z ,Φ,α,Y)=p(z =k z ,φ ,α,y )
j ¬j j ¬j k j
| | Auto-GradingviaMLP-B
p(z =k z ,α)p(y z =k,φ ).
∝ j | ¬j j| j k We now detail how to use MLP-B to automatically grade a
Thepriorprobabilityp(z =k z ,α)isgivenby(4). For largenumberN oflearners’solutionstoamathematicalques-
j ¬j
|
non-emptyclusters,theobserveddatalikelihoodp(yj zj = tion,usingasmallnumberK(cid:98) ofinstructorgradedsolutions.
|
k,φk)isgivenby(3).However,thisdoesnotapplytonew First,asinMLP-S,weselecttheset B of“typicalsolutions”
clusters that are previously empty. For a new cluster, we fortheinstructortograde. WeconstIruct byselectingone
B
marginalizeoutφ ,resultingin I
k solutionfromeachoftheK(cid:98) clustersthatismostrepresenta-
(cid:90) tiveofthesolutionsinthatcluster:
p(y z =k,β)= p(y z =k,φ )p(φ β)
j j j j k k
| (cid:90)φk | | IB ={argmjaxp(yj|φˆk),k =1,2,...,K(cid:98)}.
= Mult(y z =k,φ )Dir(φ β)
j| j k k| In words, for each cluster, we select the solution with the
φk largestlikelihoodofbeinginthatcluster.
V
Γ(Vβ) (cid:89)Γ(Yi,j +β)
= Γ((cid:80)V Y +Vβ) Γ(β) , The instructor grades the K(cid:98) solutions in IB to form the set
i=1 i,j i=1 ofinstructorgrades gk fork B. Usingthesegrades,we
{ } ∈I
assigngradestotheothersolutionsj / accordingto Table1: Datasetsconsistingofthesolutionsof116learners
B
∈I to4mathematicalquestionsonalgebraandsignalprocessing.
(cid:80)K(cid:98) p(y φˆ )g SeetheAppendixforthequestionstatements.
gˆ = k=1 j| k k. (5)
j
(cid:80)K(cid:98) p(y φˆ )
k=1 j| k No.ofsolutionsN No.offeatures(uniqueexpressions)V
Thatis,wegradeeachsolutionnotin Bastheaverageofthe Question1 108 78
instructorgradesweightedbythelikeIlihoodthatthesolution Question2 113 53
Question3 90 100
belongstocluster. Wedemonstratetheperformanceofauto-
Question4 110 45
gradingviaMLP-Bintheexperimentalresultssectionbelow.
FeedbackGenerationviaMLP-B ofquestionsincludes2high-schoollevelmathematicalques-
In addition to grading solutions, MLP-B can automatically tionsand2college-levelsignalprocessingquestions(details
provideusefulfeedbacktolearnersonwheretheymadeerrors aboutthequestionscanbefoundinTable1,andthequestion
intheirsolutions. statementsaregivenintheAppendix). Foreachquestion,we
pre-processthesolutionstofilterouttheblanksolutionsand
For a particular solution j denoted by its column feature
extract features. Using the features, we represent the solu-
value vector y with V total expressions, let y(v) denote
j j j tionsbythematrixYin(1).Everysolutionwasgradedbythe
the feature value vector that corresponds to the first v ex-
courseinstructorwithoneofthescoresintheset 0,1,2,3 ,
pressions of this solution, with v = {1,2,...,Vj}. Un- withafullcreditof3. { }
der this notation, we evaluate the probability that the first v
expressions of solution j belong to each of the K(cid:98) clusters: Baseline: Randomsub-sampling
p(yj(v)|φˆk),k = {1,2,...,K(cid:98)},forallv. Usingtheseproba- WMeLPc-oBmapgaarienstthaebaausteol-ignreamdientghopderthfoartmdaonecsenootfgMroLupP-tSheasnod-
bilities, wecanalsocomputetheexpectedcreditofsolution
lutionsintoclusters.Inthismethod,werandomlysub-sample
j afterthefirstvexpressionsvia
allsolutionstoformasmallsetofsolutionsfortheinstructor
(cid:80)K(cid:98) p(y(v) φˆ )g tograde.Then,eachungradedsolutionissimplyassignedthe
gˆ(v) = k=1 j | k k, (6) gradeofthesolutioninthesetofinstructor-gradedsolutions
j (cid:80)K(cid:98) p(y(v) φˆ ) that is most similar to it as defined by S in (2). Since this
k=1 j | k
smallsetispickedrandomly,werunthebaselinemethod10
where gk isthesetofinstructorgradesasdefinedabove. timesandreportthebestperformance.3
{ }
Usingthesequantities,itispossibletoidentifythatthelearner
Experimentalsetup
has likely made an error at the vth expression if it is most
Foreachquestion,weapplyfourdifferentmethodsforauto-
likely to belong to a cluster with credit g less than the full
k grading:
creditor,alternatively,iftheexpectedcreditgˆ(v) islessthan
j
thefullcredit. Random sub-sampling (RS) with the number of clusters
•
K 5,6,...,40 .
The ability to automatically locate where an error has been ∈{ }
made in a particular incorrect solution provides many bene- MLP-S with spectral clustering (SC) with K
• ∈
fits. Forinstance,MLP-Bcaninforminstructorsofthemost 5,6,...,40 .
{ }
commonlocationsoflearnererrorstohelpguidetheirinstruc-
MLP-Swithaffinitypropagation(AP)clustering. Thisal-
tion. Itcanalsoenableanautomatedtutoringsystemtogen- •
gorithmdoesnotrequireK asaninput.
eratefeedbacktoalearnerastheymakeanerrorintheearly
steps of a solution, before it propagates to later steps. We MLP-B with hyperparameters set to the non-informative
demonstrate the efficacy of MLP-B to automatically locate • valuesα = α = 1andrunningtheGibbssamplingal-
α β
learnererrorsusingreal-worldeducationaldataintheexper- gorithmfor10,000iterationswith2,000burn-initerations.
imentssectionbelow.
MLP-SwithAPandMLP-Bbothautomaticallyestimatethe
numberofclustersK. Oncetheclustersareselected,weas-
EXPERIMENTS
sign one solution from each cluster to be graded by the in-
Inthissection,wedemonstratehowMLP-SandMLP-Bcan
structorusingthemethodsdescribedinearliersections.
beusedtoaccuratelyestimatethegradesofroughly100open
responsesolutionstomathematicalquestionsbyonlyasking
Performancemetric
thecourseinstructortogradeapproximately10solutions.We
Weusemeanabsoluteerror(MAE),whichmeasuresthe“av-
also demonstrate how MLP-B can be used to automatically
erageabsoluteerrorperauto-gradedsolution”
providefeedbacktolearnersonthelocationsoferrorsintheir
solutions. (cid:80)N−K gˆ g
MAE= j=1 | j − j|,
N K
Auto-GradingviaMLP-SandMLP-B −
3Otherbaselinemethods,suchasthelinearregression-basedmethod
Datasets
usedintheedXessaygradingsystem[33], arenotlisted, because
Our dataset that consists of 116 learners solving 4 open re- theydidnotperformaswellasrandomsub-samplinginourexperi-
sponse mathematical questions in an edX course. The set ments.
as our performance metric. Here, N K equals the num- 0.7 RS 0.7
ber of solutions that are auto-graded,−and gˆ and g repre- 0.6 MLP−S−SC 0.6
sent the estimated grade (for MLP-B, the esjtimatedjgrades 0.5 MMLLPP−−SB−AP 0.5
are rounded to integers) and the actual instructor grades for E0.4 E0.4
A A
M0.3 M0.3
theauto-gradedsolutions,respectively.
0.2 0.2
Resultsanddiscussion 0.1 0.1
In Figure 5, weplot the MAE versus the number ofclusters 0 10 20 30 40 0 10 20 30 40
No. of clusters K No. of clusters K
K for Questions 1–4. MLP-S with SC consistently outper-
formstherandomsamplingbaselinealgorithmforalmostall (a)Question1 (b)Question2
valuesofK. Thisperformancegainislikelyduetothefact 0.7 0.7
that the baseline method does not cluster the solutions and 0.6 0.6
thusdoesnotselectagoodsubsetofsolutionsfortheinstruc- 0.5 0.5
tortograde. MLP-BismoreaccuratethanMLP-Swithboth E0.4 E0.4
A A
SCandAPandcanautomaticallyestimatethevalueofK,al- M0.3 M0.3
thoughatthepriceofsignificantlyhighercomputationalcom- 0.2 0.2
plexity (e.g., clustering and auto-grading one question takes 0.1 0.1
2minutesforMLP-Bcomparedtoonly5secondsforMLP-S 0 10 20 30 40 0 10 20 30 40
No. of clusters K No. of clusters K
withAPonastandardlaptopcomputerwitha2.8GHzCPU
(c)Question3 (d)Question4
and8GBmemory).
BothMLP-SandMLP-Bgradethelearners’solutionsaccu- Figure 5: Mean absolute error (MAE) versus the number of
rately(e.g.,anMAEof0.04outofthefullgrade3usingonly instructor graded solutions (clusters) K, for Questions 1–
K = 13 instructor grades to auto-grade all N = 113 solu- 4, respectively. For example, on Question 1, MLP-S and
tions to Question 2). Moreover, as we see in Figure 5, the MLP-Bestimatethetruegradeofeachsolutionwithanaver-
MAE for MLP-S decreases as K increases, and eventually ageerrorofaround0.1outofafullcreditof3. “RS”repre-
reaches0whenK islargeenoughthatonlysolutionsthatare sentstherandomsub-samplingbaseline. BothMLP-Smeth-
exactlythesameaseachotherbelongtothesamecluster. In odsandMLP-Boutperformsthebaselinemethod.
practice,onecantunethevalueofKtoachieveabalancebe-
tween maximizing grading accuracy and minimizing human
effort. Such a tuning process is not necessary for MLP-B,
sinceitautomaticallyestimatesthevalueofK andachieves We have developed a framework for mathematical language
suchabalance. processing(MLP)thatconsistsofthreemainsteps: (i)con-
vertingeachsolutiontoanopenresponsemathematicalques-
FeedbackGenerationviaMLP-B tionintoaseriesofnumericalfeatures;(ii)clusteringthefea-
turesfromseveralsolutionstouncoverthestructuresofcor-
Experimentalsetup
rect, partiallycorrect, andincorrectsolutions; and(iii)auto-
SinceQuestions3–4requiresomefamiliaritywithsignalpro-
maticallygradingtheremaining(potentiallylargenumberof)
cessing, we demonstrate the efficacy of MLP-B in provid-
solutions based on their assigned cluster and one instructor-
ing feedback on mathematical solutions on Questions 1–2.
provided grade per cluster. As our experiments have indi-
Among the solutions to each question, there are a few types
cated,ourframeworkcansubstantiallyreducethehumanef-
ofcommonerrorsthatmorethanonelearnermakes. Wetake
fortrequiredforgradinginlarge-scalecourses. Asabonus,
oneincorrectsolutionoutofeachtypeandrunMLP-Bonthe
MLP-S enables instructors to visualize the clusters of solu-
other solutions to estimate the parameter φˆ for each clus-
k tionstohelpthemidentifycommonerrorsandthusgroupsof
ter. Using this information and the instructor grades g ,
{ k} learnershavingthesamemisconceptions.Asafurtherbonus,
aftereachexpressionv inasolution,wecomputetheproba-
MLP-Bcantracktheclusterassignmentofeachstepofamul-
bilitythatitbelongstoaclusterp(yj(v)|φˆk)thatdoesnothave tistep solution and determine when it departs from a cluster
full credit (gk < 3), together with the expected credit using ofcorrectsolutions,whichenablesustoindicatethelocations
(6). Oncetheexpectedgradeiscalculatedtobelessthanfull oferrorstolearnersinrealtime.Improvedlearningoutcomes
credit,weconsiderthatanerrorhasoccurred. shouldresultfromtheseinnovations.
Resultsanddiscussion Thereareseveralavenuesforcontinuedresearch.Wearecur-
Two sample feedback generation process are shown in Fig- rentlyplanningmoreextensiveexperimentsontheedXplat-
ure6. InFigure6(a),wecanprovidefeedbacktothelearner form involving tens of thousands of learners. We are also
ontheirerrorasearlyasLine2,beforeitcarriesovertolater planningtoextendthefeatureextractionsteptotakeintoac-
lines. Thus, MLP-B can potentially become a powerful tool count both the ordering of expressions and ancillary text in
to generate timely feedback to learners as they are solving asolution. Clusteringalgorithmsthatallowasolutiontobe-
mathematicalquestions,byanalyzingthesolutionsitgathers longtomorethanoneclustercouldmakeMLPmorerobust
fromotherlearners. tooutliersolutionsandfurtherreducethenumberofsolutions
thattheinstructorsneedtograde. Finally, itwouldbeinter-
CONCLUSIONS estingtoexplorehowthefeaturesofsolutionscouldbeused
((x3+sinx)/ex)(cid:48) REFERENCES
1. Attali,Y.Constructvalidityofe-raterinscoringTOEFL
=(ex(x3+sinx)(cid:48) (x3+sinx)(ex)(cid:48))/e2x
− essays.Tech.rep.,EducationalTestingService,May
prob.incorrect=0.11, exp.grade=3
2000.
=(ex(2x2+cosx) (x3+sinx)ex)/e2x
− 2. Basu,S.,Jacobs,C.,andVanderwende,L.
prob.incorrect=0.66, exp.grade=2
Powergrading: Aclusteringapproachtoamplifyhuman
=(2x2+cosx x3 sinx)/ex effortforshortanswergrading.Trans.Associationfor
− −
prob.incorrect=0.93, exp.grade=2 ComputationalLinguistics1(Oct.2013),391–402.
=(x2(2 x)+cosx sinx)/ex 3. Blei,D.,Griffiths,T.,andJordan,M.Thenestedchinese
− −
prob.incorrect=0.99, exp.grade=2 restaurantprocessandBayesiannonparametricinference
oftopichierarchies.J.ACM57,2(Jan.2010),7:1–7:30.
(a)Asamplefeedbackgenerationprocesswherethelearnermakesan
errorintheexpressioninLine2whileattemptingtosolveQuestion1. 4. Blei,D.M.,Ng,A.Y.,andJordan,M.I.LatentDrichlet
allocation.J.MachineLearningResearch3(Jan.2003),
(x2+x+sin2x+cos2x)(2x 3)
− 993–1022.
=(x2+x+1)(2x 3)
− 5. Brooks,M.,Basu,S.,Jacobs,C.,andVanderwende,L.
prob.incorrect=0.09, exp.grade=3
Divideandcorrect: Usingclusterstogradeshort
=4x3+2x2+2x 3x2 3x 3 answersatscale.InProc.1stACMConf.onLearningat
− − −
prob.incorrect=0.82, exp.grade=2 Scale(Mar.2014),89–98.
=4x3 x2 x 3 6. Champaign,J.,Colvin,K.,Liu,A.,Fredericks,C.,
− − −
prob.incorrect=0.99, exp.grade=2 Seaton,D.,andPritchard,D.Correlatingskilland
improvementin2MOOCswithastudent’stimeon
(b)Asamplefeedbackgenerationprocesswherethelearnermakesan tasks.InProc.1stACMConf.onLearningatScale
errorintheexpressioninLine3whileattemptingtosolveQuestion2.
(Mar.2014),11–20.
Figure6: Demonstrationofreal-timefeedbackgenerationby
7. Coursera.https://www.coursera.org/,2014.
MLP-B while learners enter their solutions. After each ex-
pression, we compute both the probability that the learner’s 8. Cramer,M.,Fisseni,B.,Koepke,P.,Ku¨hlwein,D.,
solutionbelongstoaclusterthatdoesnothavefullcreditand Schro¨der,B.,andVeldman,J.TheNaprocheproject–
thelearner’sexpectedgrade. Analertisgeneratedwhenthe Controllednaturallanguageproofcheckingof
expectedcreditislessthanfullcredit. mathematicaltexts,June2010.
9. Dijksman,J.A.,andKhan,S.KhanAcademy: The
tobuildpredictivemodels,asintheRaschmodel[27]oritem world’sfreevirtualschool.InAPSMeetingAbstracts
responsetheory[18]. (Mar.2011).
10. edX.https://www.edx.org/,2014.
APPENDIX:QUESTIONSTATEMENTS
Question1: Multiply 11. Frey,B.J.,andDueck,D.Clusteringbypassing
(x2+x+sin2x+cos2x)(2x 3) messagesbetweendatapoints.Science315,5814
− (2007),972–976.
andsimplifyyouranswerasmuchaspossible.
12. Galenson,J.,Reames,P.,Bodik,R.,Hartmann,B.,and
x3+sin(x) Sen,K.CodeHint: Dynamicandinteractivesynthesisof
Question2: Findthederivativeof andsimplify
ex codesnippets.InProc.36thIntl.Conf.onSoftware
youranswerasmuchaspossible. Engineering(June2014),653–663.
Question3: Adiscrete-timelineartime-invariantsystemhas 13. Gelman,A.,Carlin,J.,Stern,H.,Dunson,D.,Vehtari,
theimpulseresponseshowninthefigure(omitted).Calculate A.,andRubin,D.BayesianDataAnalysis.CRCPress,
H(ejω),thediscrete-timeFouriertransformofh[n].Simplify 2013.
youranswerasmuchaspossibleuntilithasnosummations.
14. Griffiths,T.,andTenenbaum,J.Hierarchicaltopic
Question4: Evaluatethefollowingsummation modelsandthenestedchineserestaurantprocess.
AdvancesinNeuralInformationProcessingSystems16
∞
(cid:88)
δ[n k]x[k n]. (Dec.2004),17–24.
− −
k=−∞ 15. Gulwani,S.,Radicˇek,I.,andZuleger,F.Feedback
Acknowledgments generationforperformanceproblemsinintroductory
ThankstoHeatherSeebaforadministeringthedatacollection programmingassignments.InProc.22ndACM
process and Christoph Studer for discussions and insights. SIGSOFTIntl.SymposiumontheFoundationsof
Visitourwebsitewww.sparfa.com,whereyoucanlearnmore SoftwareEngineering(Nov.2014,toappear).
about our project and purchase t-shirts and other merchan-
dise.
16. Guo,P.,andReinecke,K.Demographicdifferencesin 31. SaplingLearning.http://www.saplinglearning.com/,
howstudentsnavigatethroughMOOCs.InProc.1st 2014.
ACMConf.onLearningatScale(Mar.2014),21–30.
32. Singh,R.,Gulwani,S.,andSolar-Lezama,A.
17. Kang,S.,McDermott,K.,andRoedigerIII,H.Test Automatedfeedbackgenerationforintroductory
formatandcorrectivefeedbackmodifytheeffectof programmingassignments.InProc.34thACM
testingonlong-termretention.EuropeanJ.Cognitive SIGPLANConf.onProgrammingLanguageDesignand
Psychology19,4-5(July2007),528–558. Implementation,vol.48(June2013),15–26.
18. Lord,F.ApplicationsofItemResponseTheoryto 33. Southavilay,V.,Yacef,K.,Reimann,P.,andCalvo,R.
PracticalTestingProblems.ErlbaumAssociates,1980. Analysisofcollaborativewritingprocessesusing
19. Megill,N.Metamath: Acomputerlanguageforpure revisionmapsandprobabilistictopicmodels.InProc.
mathematics.Citeseer,1997. 3rdIntl.Conf.onLearningAnalyticsandKnowledge
(Apr.2013),38–47.
20. Minka,T.EstimatingaDrichletdistribution.Tech.rep.,
MIT,Nov.2000. 34. Srikant,S.,andAggarwal,V.Asystemtograde
computerprogrammingskillsusingmachinelearning.In
21. Naumowicz,A.,andKorniłowicz,A.Abriefoverview
Proc.20thACMSIGKDDIntl.Conf.onKnowledge
ofMIZAR.InTheoremProvinginHigherOrderLogics,
DiscoveryandDataMining(Aug.2014),1887–1896.
vol.5674ofLectureNotesinComputerScience.Aug.
2009,67–72. 35. Steyvers,M.,andGriffiths,T.Probabilistictopic
models.HandbookofLatentSemanticAnalysis427,7
22. Ng,A.,Jordan,M.,andWeiss,Y.Onspectralclustering:
(2007),424–440.
Analysisandanalgorithm.AdvancesinNeural
InformationProcessingSystems2(Dec.2002), 36. SymPyDevelopmentTeam.Sympy: Pythonlibraryfor
849–856. symbolicmathematics,2014.http://www.sympy.org.
23. Nguyen,A.,Piech,C.,Huang,J.,andGuibas,L. 37. Teh,Y.Drichletprocess.InEncyclopediaofMachine
Codewebs: Scalablehomeworksearchformassiveopen Learning.Springer,2010,280–287.
onlineprogrammingcourses.InProc.23rdIntl.World
WideWebConference(Seoul,Korea,Apr.2014), 38. Vats,D.,Studer,C.,Lan,A.S.,Carin,L.,andBaraniuk,
491–502. R.G.Testsizereductionforconceptestimation.In
Proc.6thIntl.Conf.onEducationalDataMining(July
24. OpenStaxTutor.https://openstaxtutor.org/,2013.
2013),292–295.
25. Piech,C.,Huang,J.,Chen,Z.,Do,C.,Ng,A.,and
39. Waters,A.,Fronczyk,K.,Guindani,M.,Baraniuk,R.,
Koller,D.TunedmodelsofpeerassessmentinMOOCs.
andVannucci,M.ABayesiannonparametricapproach
InProc.6thIntl.Conf.onEducationalDataMining
fortheanalysisofmultiplecategoricalitemresponses.J.
(July2013),153–160.
StatisticalPlanningandInference(2014,Inpress).
26. Raman,K.,andJoachims,T.Methodsforordinalpeer
40. WebAssign.https://webassign.com/,2014.
grading.InProc.20thACMSIGKDDIntl.Conf.on
KnowledgeDiscoveryandDataMining(Aug.2014), 41. West,M.HyperparameterestimationinDrichletprocess
1037–1046. mixturemodels.Tech.rep.,DukeUniversity,1992.
27. Rasch,G.ProbabilisticModelsforSomeIntelligence 42. Wilkowski,J.,Deutsch,A.,andRussell,D.Studentskill
andAttainmentTests.MESAPress,1993. andgoalachievementinthemappingwithGoogle
28. Rivers,K.,andKoedinger,K.Acanonicalizingmodel MOOC.InProc.1stACMConf.onLearningatScale
forbuildingprogrammingtutors.InProc.11thIntl. (Mar.2014),3–10.
Conf.onIntelligentTutoringSystems(June2012),
43. Woolf,B.P.BuildingIntelligentInteractiveTutors:
591–593.
Student-centeredStrategiesforRevolutionizing
29. Rivers,K.,andKoedinger,K.Automatinghint E-learning.MorganKaufmanPublishers,2008.
generationwithsolutionspacepathconstruction.In
44. Yin,J.,andWang,J.ADrichletmultinomialmixture
Proc.12thIntl.Conf.onIntelligentTutoringSystems
model-basedapproachforshorttextclustering.InProc.
(June2014),329–339.
20thACMSIGKDDIntl.Conf.onKnowledgeDiscovery
30. Sadler,P.,andGood,E.Theimpactofself-and andDataMining(Aug.2014),233–242.
peer-gradingonstudentlearning.Educational
Assessment11,1(June2006),1–31.