Table Of Content

Topic Transitions in MOOCs: An Analysis Study ∗ Fareedah ALSaad Thomas Reichel Yuchen Zeng UniversityofIllinoisat UniversityofIllinoisat UniversityofIllinoisat Urbana-Champaign Urbana-Champaign Urbana-Champaign [email protected] [email protected] [email protected] Abdussalam Alawini UniversityofIllinoisat Urbana-Champaign [email protected] ABSTRACT thehighdropoutrateinMOOCs.Learnershavedifferentlearning WiththeemergenceofMOOCs,itbecomescrucialtoautomatethe demandsdependingontheirmotivationsandgoals.Forinstance, processofacourse design to accommodate thediverselearning learners may seek knowledge about an interdisciplinary domain demandsofstudents.Modelingtherelationshipsamongeducational and hence need to learn modules from courses in several areas. topicsisafundamentalfirststepforautomatingcurriculumplanning ThisproblemrequiresadoptingamodelinwhichMOOCsareused andcoursedesign.Inthispaper,weintroduceTopicTransitionMap asmodularizedresources,ratherthanasetofpre-designedstatic (TTM),ageneralstructurethatmodelsthecontentofMOOCsat courses.Acrucialfirststeptowarddevelopingsuchamodelisthe thetopiclevel.TTMscapturethevariouswaysinstructorsorganize automationofcourseplandesignbysequencinglecturesamong topicsintheircoursesbymodelingthetransitionsbetweentopics.We differentcourses. investigateandanalyzefourdifferentmethodsthatcanbeexploited tolearntheTopicTransitionMap:1)PairwiseConstrainedK-Means, Themainprincipleindesigningthecurriculumofanycourseisto 2)MixtureofUnigramLanguageModel,3)HiddenMarkovMixture organizecoursecontentaccordingtosomerelationsbetweentopics. Model,and4)StructuralTopicModel.Toevaluatedtheeffectiveness For instance,to help students to learn the materials,instructors ofthesemethods,wequalitativelycomparethetopictransitionmaps carefullyorganizelecturesasasequence,basedonthedifficulty generatedbyeachmodelandinvestigatehowtheTopicTransition levels of topics [10, 27, 1] as well as the dependency relations Map can be used in three sequencing tasks: 1) determining the betweentopics[11,21,23,1].Thefundamentalsequentialstructure correctsequence,2)predictingthenextlecture,and3)predictingthe ofacoursedesignistoplacetopicsthatareeasyorprerequisite sequenceoflectures.OurevaluationrevealedthatPCK-Meanshas inearlierlectureswhilemoreadvancedanddependenttopicsare thehighestperformanceinthefirsttask,HMMULMoutperforms taughtinlaterlectures[1].Consequently,modelingtherelatedness othermethodsintask2,whilethereisnowinningintask3. amongeducationaltopicsisaverycrucialfirststepforautomating curriculumplanningandcoursedesign. Keywords ModelingthecontentstructureofMOOCshasrecentlyattracted TopicTransitionMap,TopicTransition,WordDistribution,Mixture muchresearch.Mostofthecurrentresearchhasfocusedonmodeling Model,HiddenMarkovModel,Clusters,SequencingTasks. theprerequisiterelationshipsbetweencourses[29,15],betweenlec- turesorsegmentsoflectures[6,7],orbetweenconceptsdiscussed 1. INTRODUCTION withinoracrosscourses[3,14,17,29,15].Usingconceptstomodel Formanydecades,theprocessofcreatingcourseshasbeenamanual MOOCs’contentcanbeeasilygeneralizedtocapturetherelations taskthatneedstobecarefullymanagedbyinstructorsandexperts. intheconceptspace.However,becauseconceptsarerepresented However,withtherecentadvancesintechnologiesandtheemergence askeywordsorphrases,itishardtocapturethedifferentlevelsof of Massive Open Online Courses (MOOCs),it becomes critical granularitybetweenlecturesandcourses.Inaddition,modelingpre- to automate the process of course design to accommodate the requisiterelationshipsbetweenconceptscannotcapturethevarious heterogeneityofonlinestudentsandtheirdiverseneeds.According learningpathsaccommodatedbydifferentcourses. to[32],learningon demandisconsideredonefactorthatcauses ∗ Inthispaper,weintroducetheTopicTransitionMap,ageneral KingAbdulAzizUniversity,Jeddah,SaudiArabia. structurethatmodelstheeducationalmaterialsatthetopiclevel. We model a course as a set of topics,and each topic is a set of concepts.Modelingcontentatthetopiclevelisamorenaturalway todesigncustomcourseplans.Wecanthinkofacourseasapathin thegeneralizedTopicTransitionMap.Thus,designinganewcourse FareedahAlsaad, ThomasReichel, YuchenZengandAbdussalamAlaw- becomesataskofidentifyingapathintheTopicTransitionMap. ini“TopicTransitionsinMOOCs: AnAnalysisStudy”. 2021. In: Pro- ceedingsofThe14thInternationalConferenceonEducationalDataMin- Additionally,weinvestigatefourmethodsthatcanbeleveragedto ing (EDM21). International Educational Data Mining Society, 139-149. constructtheTopicTransitionMap:PairwiseConstrainedK-Means https://educationaldatamining.org/edm2021/ (PCK-Means)[2],MixtureofUnigramLanguageModel(MULM), EDM’21June29-July022021,Paris,France HiddenMarkovMixtureModel(HMMULM),andStructuralTopic Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 139 Model(strTM)[24].WeanalyzeandcomparetheTopicTransition Bothstudies[1,16]havedevelopedsupervisedapproachesbased Maps learned by these methods by studying how to exploit the onfeatureengineeringthatextractedfeaturesfromsomeexternal TopicTransitionMapinthreesequencingtasks:1)determiningthe knowledge suchas Wikipedia [1] andDBpedia [16] to inferthe correctsequence,2)predictingthenextlecture,and3)predicting prerequisiterelationsbetweenconcepts. Ourworkisdifferentas thesequenceoflectures.Tothebestofourknowledge,wearethe instead of modeling the prerequisite relations between concepts firstworktointroduceandinvestigatetheuseofTopicTransition usingsupervisedapproaches,wemodeltheTopicTransitionMapor MapsinmodelingMOOCscontentandsequencinglectures. thevariouspathsbetweentopicsusingunsupervisedmethods,where atopicisasetofconcepts.Inaddition,ourmethodsrelyonlyonthe Toevaluatetheeffectivenessofallmethods,weuserealMOOCsfrom contentofMOOCswithoutusinganyexternalknowledge.While threedifferentdomains:Python,StructuralQueryLanguage,and Agrawaletal.[1]usedtheconceptdependencygraphtoorganize MachineLearningClusteringalgorithms.Ourevaluationrevealed conceptstoconstructlearningunitsandthensequencethelearning thatwhilethePCK-Meanshasthehighestperformanceonthetask units,we use lectures from existing MOOCs andinvestigate the offindingthebestsequencefromalistofpossiblesequences,the impactofthelearnedtransitionsbetweentopicstosequencelectures. HMMULMachievesthebestperformanceonthetaskthatpredicts thenextlectureinthesequence.Additionally,allmethodsperform ThemostrelevantresearchtoourstudyistheworkbyShenetal.[22]. similarlyinthetaskofpredictingthewholesequencewithMULM Shenetal.[22]haveproposedamethodforlinkingsimilarcourses hasthelowestperformanceasitsometimescannotpredictthewhole toconstructamapoflecturesconnectedbytwotypesofrelations: sequence.Inadditiontocomparingvariousmodelsinsequencing similarand prerequisite. The constructed map only captures the tasks,wevisualizetheTopicTransitionMapsgeneratedbydifferent similarityandprerequisiterelationsbetweencertainunits(lectures) methodstoqualitativelycomparetheresultedtopictransitionmaps. andisnotgeneralizedtootherlecturesandthuscannotbeusedto WefoundthatPCK-Meanshasextractedmoremeaningfultopics predictthesequenceofnewlectures.Inthispaper,wemaplectures withthebestworddistributionsthatclearlyexplaineachtopic. totopicsandconstructtheTopicTransitionMapthatdepictsthe precedencerelationsbetweentopicsandhencenottiedwithany Therestofthepaperisorganizedasfollows.Insection2,wepresent specificunits.HavingageneralizedTopicTransitionMapcanhelp someoftherelatedwork.Section3,definesthetopicsandtopic infindingthesequenceoflecturesorpredictthenextlectureinthe transitionsandstatessomeapplicationsofthetopictransitionmaps. sequenceaswediscussinsection6.2. Insection4,weformallydefineourproblembeforedescribingthe fourdifferentmethodsweexploittoconstructthetopictransition Another related line of research is the work on structural topic maps in section 5. Section 6 elaborates on ourapproachforthe modelingbytheNaturalLanguageProcessing,NLP,Community. evaluationandtheanalysisofvariousmodels.Finally,weconclude In NLP,topic transitions have been usedto modellatenttopical ourworkinsection7. structuresinsidedocumentsbyassumingeachsentenceisgenerated fromatopicwheretopicssatisfythefirstorderMarkovproperty 2. RELATEDWORK [12,25].WhileGruberetal.[12]onlymodeledthetransitionbetween topics asa binaryrelation (eitherremain on the currenttopicor MostoftheworkthatmodelsthecontentofMOOCshasfocused shiftto a newtopicwitha certain probability),Wang et al. [25] oncapturingtheprerequisiterelationshipsusingdifferentlevelsof havedevelopedaStructuralTopicModelcalledstrTMtoexplicitly granularitysuchascourses[29,15],lecturesorsegmentsoflectures modelthetopictransitionsasprobabilitiesthatcapturehowlikely [6,7],orconceptsdiscussedwithinoracrosscourses[3,14,17,29, onetransitsfromatopictoanother.Modelingtransitionshavebeen 15].Modelingtherelationsbetweencourses,lectures,orseqmentsof usedinmanyapplicationsrelatedtoNLPsuchassentenceordering lecturesisrestrictedtotheseunitsandcannotbegeneralized.While [25],topicsegmentation[9],andmulti-documentssummarization modelingdependencyrelationsbetweenconceptsisconsidereda [28]. In this paper,we investigate the use oftopictransitions on generalstructurethatcapturestherequiredconceptsbeforelearning modelingthetopicalstructuresinMOOCsbyassumingalectures anyconcept,prerequisiterelationscannotmodelthevariouslearning isgeneratedbyonetopicandusethesequencesoflecturestolearn pathsaccommodatedbydifferentcourses.ALSaadandAlawini[2] thetransitionsbetweentopics.Wealsoexploretheimpactofusing have addressedthis problem byproposing the precedence graph the Topic Transition Map to sequence lectures in three different thatcapturesthesimilaritiesandvariationsoflearningpathsamong sequencingtasks. differentcourses.WebuildontheirworkandintroducetheTopic TransitionMapthatmapseachlecturetoatopicandleveragesthe sequencesoflecturesamongcoursestocapturethetopictransitions 3. TOPICTRANSITIONS pattern and hence the likelihood of such a transition. The main Beforedefiningthetopictransitions,itisimportanttobrieflyexplain differencebetweentheTopicTransitionMapandtheprecedence ourrepresentationoftopicsusedinthispaper.Similartothedefinition graphisthatTopicTransitionMapmodelsselftransitionsbetweena oftopics in the literature ofthe topic modeling research[5, 13], topicanditselfandalsocaptureshowlikelyeachtopictobethefirst we define a topic as a distribution of concepts where concepts topicincourses.WhileALSaadandAlawini[2]haveinvestigatedthe withhigherprobabilitiestendtoexplainorcharacterizethetopic. useofPCK-Meansinmodelingtheprecedencegraph,inthispaper, Conceptscanberepresentedaswordsorphrasesofwords[3,18,26]. weexplorethreemoremethodsinadditiontoPCK-Mean,namely Eachlectureisacompositionofconceptsandhencecanbemapped MULM,HMMULM,and strTM,for modeling Topic Transition tosometopics.Dependingonthelengthoflectures,lecturescan Maps.Wealsoexaminetheimpactofthelearnedtopictransition coveroneormoretopics.Longerlecturesusuallycovermoretopics mapsonthreedifferentsequencingtasks.Webelievethatwearethe thanshorterlectures.Forexample,traditionaluniversitylectures firstworkthatexaminestheuseoftopictransitionsmodeledfrom tendtobemoreelaboratedandhavelongerdurationthanMOOCs existingMOOCstolearnhowtosequencenewcourses. lectures,whichareusuallyconciseandshortinlength.Therefore, thenumberoftopicsperlecturediscussedinMOOCsislessthan Some research has investigated the use of prerequisite relations thatoftraditionaluniversitylectures.Inthispaper,sinceourwork betweenconceptstoconstructandsequencelearningunits[1,16]. focusonlearningthetopictransitionsfromMOOCs,weassume 140 Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) thateachlectureismappedonlytoonetopic.Thisassumptionis thematrixArepresentsthelikelihoodofthetransitionfromtopici reasonableaslecturesinMOOCsareconciseandshortinlength. totopicj.Itreflecthowcommontheprecedencerelationfromtopic Havingthisassumptionisalsoveryusefulasithelpsinleveraging itojinthedatasetcourses.InadditiontotheTopicTransitionMap, thesequencesoflecturestolearntherelationsbetweentopics. wealsoaimtolearntheprobabilityofeachtopicbeinganinitial topicincourses.Wedenotetheinitialprobabilityofeachtopicas Atopictransitioncapturestheprecedencerelationsbetweentopics. avectorπ,whereπ∈RM.AlongwiththeTopicTransitionMap Inotherwords,itmeanshowlikelyinstructorsmoveortransitfrom andtheinitialprobabilityofeachtopic,itisimportanttomodel onetopictoanotherinthecoursedelivery.Itmodelsthevarious theworddistributionofeachtopic,whichrepresentedasamatrix waysofhowinstructorsdynamicallyassembleconceptsfromthe B∈RM×V,whereV isthevocabularysize. conceptsspaceinordertoconstructthestudyplanoftheircourses. Forinstance,someinstructorsdecidetostarttheirPythoncourseby 5. MODELINGTOPICTRANSITIONS explainingthetopics:datatypes,conditionalstatements,loops,and Inthissection,weexplainthefourdifferentmodelsweexploitto thenreadingandwritingfromfiles.Otherinstructorsmaychoosea capturetopicsandTopicTransitionMaps. differentorder,suchasconditionalstatements,loops,string,andthen lists.Byleveragingthesequencesoflecturesfrommultiplecourses, 5.1 PairwiseConstrainedK-Means wecaninferthelatenttopicsofeachlectureandhencemodelthe commontransitionpatterssharedbymultiplecoursesaswellasthe PCK-Meansclusteringalgorithm[4]isavariationofthestandard variationsofdifferenttransitionsorpaths.Todeterminethestrength K-Meansalgorithm.Toclusterinstances,PCK-Meansincorporates orhowcommonthattransitionis,eachtransitionisattachedwitha distancebetweenpointsaswellaspairwiseconstraintstoguidethe scoreoraprobability.Forexample,giventhetopicsinComputer clusteringprocess.Sincethepurposeofclusteringistocapturetopic Science Programming: “Conditional Statements”, “Loops”, and transitionpatternsacrosscourses,usingPCK-Meanshelpstorestrict “Arrays”. It is more likely that instructors will explain the topic the clustering process to cluster lectures across courses instead “ConditionalStatements”immediatelybeforethetopic“Loops”and ofwithin courses [2]. To guide the clustering,PCK-Means uses thusthetopictransitionscorebetweenthemwouldbehigherthan twotypesofconstraints:Must-LinkandCannot-Link.Must-Link thetransitionscorebetweenthetopic“ConditionalStatements”and constraintdetermineslecturepairsthatneedtobeclusteredtogether, thetopic“Arrays”. while Cannot-Link constraint specifies pairs that should not be groupedintothesamecluster.Tofindtheclusters,PCK-Meansuses Learningthetopictransitionscanbetheinitialblocktofacilitate anobjectivefunctionthatminimizesboth:1)thedistancebetween severalusefulapplicationsthatcansupportmodernlearning.For points(lectures)andtheclustercentroid,and2)thepenaltycostsof instance,wecanusetheTopicTransitionMaptoextractthemost violatingtheconstraints.FormoreinformationaboutPCK-Menas, commonpathsoftopicsinthefieldorexplainthetopicspacein pleasereferto[4]. thecurrentMOOCofferings.Learnerscanusetransitionmapsto getmoreinsightsaboutthestructureoftopicsinMOOCofferings. SimilartoALSaadandAlawini[2],weusePCK-Meanstobuildthe Ontheotherhand,instructorscanusethesemapstoimprovetheir TopicTransitionMapA.WefirstconstructthelistofMust-Linkand courseofferingsbyexaminingthetopicstructureofrelatedcourses. Cannot-Linkconstraintstoclusterslecturesbasedontheircontent similarityintoclusters.Weassumethateachclusterformsatopic OneimportantapplicationoftheTopicTransitionMapistosupport andhence we needto learn the worddistributions ofeachtopic automaticcurriculumplanningandcoursedesign.Sincecourses alongwithtopictransitions.Welinkclustersbyusingtheprecedence consistoftopics,learningtherelationsbetweentopicswouldbethe relationsbetweenadjacentlecturesandcapturethestrengthofthe initialsteptounderstandhowlikelyinstructorstransitfromonetopic transition by accumulating the frequency of transitions. To find toanother.Wecanthinkofacourseasapathinthegeneralized theworddistributionofeachclusterortopicinthematrixB,we TopicTransitionMap.Thus,designinganewcoursebecomesatask accumulatethevectorrepresentationsofeachlecturethatbelongsto ofidentifyingapathintheTopicTransitionMap.Inthispaper,we thesamecluster.Formoreinformation,pleasesee[2]. analyzehowcanweusethelearnedtopictransitionstosequence newcourses. Inordertoestimatetheinitialprobabilityπforeachtopic,wesimply countthenumberoftimesofeachtopicbeingthefirsttopicinthe 4. PROBLEMFORMULATION setofcoursesC.Thenwedonormalizationtofindtheprobability. Inthissection,weformallyformulateourproblem.Givenasetof 5.2 MixtureofUnigramLanguageModels courses C = {X ,X ,X ,...,X } from a particulardomain, 1 2 3 N whereN isthetotalnumberofcourses.Weassumethatcoursesin Tocapturetopics,weuseamixturemodelofM unigramlanguage Caresimilarandhencehavesomecontentoverlapsbetweenthem models(MULM)withabag-of-wordsrepresentation.Themixture andalsohavethesamedifficultylevel(e.g.Beginner,Medium,or modelis a generative probabilisticmodelthathas been usedfor Advance).AcourseX isrepresentedasanorderedlistoflectures documentsclustering.Thus,itwillhelpinclusteringlecturesbased i X = [x ,x ,...,x ], where |X | is the total number of ontheirtopics,whereeachlecturebelongsonlytooneclusterorone i i1 i2 i|Xi| i lecturesinthecourseX .Eachlectureisacompositionofconcepts topic.Inthemixturemodel,togenerateadocument,firstoneneeds i represented in some narrative way. In this paper, we assume a tochoosethetopicofthedocumentaccordingtotheprobability conceptasasinglewordandhencelecturesarerepresentedusinga P(θi),whereM isthenumberoftopics,andthengenerateallthe bag-of-wordrepresentation. wordsinthedocumentusingtheprobabilityP(w|θi).According tothemodel,thelikelihoodsofadocumentxandthecorpusCare GiventhenumberoftopicsM,ourgoalistomapeachlecturetoa calculatedasfollows: topicandleveragethesequencesoflecturestolearntopictransitions M andconstructtheTopicTransitionMap.TheTopicTransitionMap P(x|λ)=(cid:88)P(θ ) (cid:89) P(w|θ )c(w,x) (1) isrepresentedasamatrixA,whereA∈RM×M.Eachentrya of i i ij i=1 w∈V Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 141 (cid:89)N InMOOCs,weonlyobservecourses,wherecoursesaresequenceof P(C|λ)= P(x |λ) (2) j lectures,whilethetopicsoflecturesandthetransitionbetweenthem j=1 areinvisibleorlatent.Therefore,HMMwouldbeagreatmodelto simulatethegenerationprocessofcoursesandhenceinferthelatent Toestimatethemodelparametersλ=({θ },P(θ )),whereθ isthe statesthatcontributeintheevolutionoftheselectures.InHMM, i i i worddistributionoftopicifromthematrixB,weuseExpectation- eachhiddenstategeneratesonlyonesymbolorword(seeFigure Maximizationalgorithm[8]tofindtheparametersthatmaximize 1).Asourgoalistocapturetopictransitionsusingsequencesof thelikelihoodofthedata: lecturesasobserveddata,wemapeachlecturetoatopicandassume eachhiddenstategeneratesalectureinsteadofaword.Ourrevised λ¯=argmaxP(C|λ) (3) HMMassumesthateachhiddenstateproducesonelecturewhere λ eachlectureisabag-of-words.Weignorethesequenceofwords in lectures since the orderof the words would not contribute to Afterlearningtheparameters,wemapeachlecturetoaclusteror capturingthetopicofeachlecture.Figure2depictstheHMMULM topicbymaximizingthefollowingequation: utilizedtocapturethecontentofMOOCs. c=argmaxP(x|z ) (4) i zi Inordertocaptureboththelectures’topicsandthetransitionsbe- tweenthem,wecombinethemixturemodel(MULM)withHMM, Usingthemixturemodelcanhelpinclusteringlecturesaccording andwecallthenewmodelHiddenMarkovMixtureofUnigram totheirtopics,however;itwillnotcapturethetransitionpatterns LanguageModel(HMMULM).Todothat,weassumetheMarko- betweentopicsortheinitialprobabilityofeachtopic.Therefore, vianassumptionbetweentopicswhereinthegenerationprocess, similartothePCK-Meansmethod,weleveragethesequencesof thechoiceofthenexttopicdependsonlyonthecurrenttopic.Even lecturestocalculatethescoreoftopictransitionsandconstructthe though,thechoiceofthetopicinthecoursedeliverydependsonthe TopicTransitionMapA.Likewise,wecountthenumberoftimes previoustopicsdiscussedsofar,thissimplifiedassumptionmakes coursesstartwitheachtopicandnormalizetheresultstomodelthe senseduetothelocalityofreferenceproperty[1]ofcoursedesign. initialprobabilityπ. Basedonthisproperty,whenaninstructordesignsacourse,adepen- dentlectureshouldappearassoonaspossibleaftertheprerequisite 5.3 HiddenMarkovMixtureModels lecturetoreducestudentscomprehensionburden.Therefore,assum- ingthedependencybetweenadjacentlecturesnotonlysimplifies Insteadofseparatelyclusteringlecturesandthenlearningthetransi- themodelbutalsoaidsincapturingthetransitionsbetweenhighly tionsbetweenthem,usingaHiddenMarkovModelwouldallowus relatedtopics. By combining the HMM withmixture modelthe tojointlylearntheworddistributionsofeachtopic(B),thetransition likelihoodofgeneratingacourseisasfollow: probabilitiesbetweentopics(A)aswellastheinitialprobabilityof eachtopic(π). TheHiddenMarkovModel,HMM,isaprobabilisticgraphicalmodel thatdescribestheprocessofgeneratingasequenceofobservable eventsaccordingtosomehiddenfactors[20].Itsimulateshowthe (cid:88) P(X|λ)= P(Z|λ)P(X|Z,λ) realworldsequencedataisgeneratedfromhiddenstates.Particularly, allZ itconsistsoftwostochasticprocesses:1)invisibleprocess,and2) T visibleprocess[30].InHMM,invisibleprocessconsistsofhidden (cid:88) (cid:89) = P(z )P(x |z ) P(z |z )P(x |z ) stateswhereasvisibleprocessisobservedsequenceofsymbolsthat 1 1 1 t t−1 t t allZ t=2 aredrawnfromtheprobabilitydistributionsofthehiddenstates. Figure1demonstratestheHMMmodel.AsyoucanseefromFigure = (cid:88) P(z1) (cid:89) P(w|z1)c(w,x1) 1,eachobservableeventinthesequencearegeneratedfromahidden allZ w∈V stateandobservationsareconditionallyindependentgiventhehidden T state.YoucanalsonoticethatthehiddenstatesformaMarkovchain (cid:89)P(zt|zt−1) (cid:89) P(w|zt)c(w,xt) whereeachhiddenstatedependsonlyonthepreviousstatesuchas t=2 w∈V Zt+1dependsonZt. =(cid:88)M (cid:88)M π(z =s ) (cid:89) B(z =s ,w)c(w,x1) 1 i 1 i Tocontroltheprocessofgeneratingtheobservedsequencesfrom i=1j=1 w∈V hiddenstates,HMMhasthreeparameters:π,A,andB.Thefirst T parameter,π=π1,π2,...,πM,istheinitialprobabilitydistribution (cid:89)A(zt−1 =si,zt =sj) (cid:89) B(zt =sj,w)c(w,xt) ofeachhiddenstate.Theparameterπdeterminestheprobability t=2 w∈V oftheMarkovchaintostartateachstateandhencecontrolswhich (5) state can be chosen as an initialstate forthe observedsequence. Thesecondparameter,A ∈ RM×M,isthetransitionprobability matrixthatspecifieshowlikelythemodelcantransitfromonestate toanother,denotedbyP(Z |Z )inFigure1.Thethirdparameter, t+1 t B∈RM×V,istheemissionprobabilitymatrix,whereV isthetotal ToestimatetheHMMULMparametersλ = (π,A,B),weusea numberofdistinctsymbols. Itdeterminesthelikelihoodofeach modifiedversionofBaum-Welchalgorithminordertomodelthe statetoproduceeachsymbol,denotedbyP(X |Z )inFigure observation sequences as a multidimensional categorical events. t+1 t+1 1.Forexample,togenerateasentence,asequenceofwordswould Followingthework[19],wederivedtheequationsofE-stepand bedrawnfromtheHMMmodelaccordingtothethreeparameters M-steptotrainthemodelandinferthetransitionprobabilitybetween π,A,andB. topics.IntheE-step,weusetheequations: 142 Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) Table1:Thedatasetutilizedintheexperiment. Transition probability P( zt+1 | zt ) Domain #ofCourses #ofLectures Avg#ofLectures Sequence of states z1 zt zt+1 zT Python 21 460 22 Emission probability P(xt+1 | zt+1) SQL 15 247 16 Sequence of observation x1 xt xt+1 xT ML 10 99 10 t t+1 Time topic.Functionaltopic,denotedbyz ,isusedtofilteroutdocument- B independentwordsthatmodelsthecorpusbackground(orgeneral Figure1:ThegraphicalmodelofHMM. terms)[31].Eachwordinthelectureiseithergeneratedbyoneof thecontenttopicsorthefunctionaltopic: Transition probability Sequence of states z1 zt zt+1 zT w∼θP(w|β,zi)+(1−θ)P(w|β,zB) (11) Emission probability where θ is the controlling parameter. According to strTM, the Sobeqseurevnacteio onf w1 w|d1| w1 w|dt| w1 w|dt+1| w1 w|dT| probabilityoflecturexj beinggeneratedbysometopicziis: x1 xt xt+1 xT P(x |z )= (cid:89)[θP(w|β,z )+(1−θ)P(w|β,z )]c(w,xj) (12) t t+1 j i i B Time w∈V Figure 2: The graphical model of HMMULM used to AnotherdifferencebetweenstrTMandHMMULM,isthatstrTM modelthecontentofMOOCs. assumesthetransitionprobabilitiesAandtheemissionprobability B are drawn from Multinomial distributions and use the conju- gate Dirichletdistribution to impose a prioron the Multinomial distributions: γt(i)=P(zt =si|X,λ) αz ∼Dir(η) (13) = αt(i)βt(i) (6) (cid:80)M α (j)β (j) βz ∼Dir(γ) (14) j=1 t t Whereηandγaretheconcentrationhyperparametersthatcontrol ξ (i,j)=P(z =s ,z =s |X,λ) t t i t+1 j sparsityofα andβ respectively. z z = αt(i)Aijβt+1(j)(cid:81)w∈V Bj(w)c(w,xt+1) (cid:80)Mi=1(cid:80)Mj=1αt(i)Aijβt+1(j)(cid:81)w∈V Bj(w)c(w,xt+1) TmoaxeismtiimzaatitoentahlegopraitrhammeatsedrsesocrfibsetrdTbMy,[2w5e].Fuosermthoereeixnpfoercmtaattiioonn- (7) aboutstrTM,pleasereferto[25]. IntheM-step,thefollowingequationsareusedtochoosethepa- 6. EVALUATION rametersthatmaximizethelikelihoodoftheobservedsequenceof Inthissection,wefirstdemonstrateourdatasetandtheparameters lectures: settings.Second,wecomparedifferentmodelsbystudyingtheimpact π(i)=γ (i) (8) oftopictransitionslearnedfromvariousmodelsonthreelecture 1 sequencingtasks.Finally,wequalitativelyevaluatethetopicsand (cid:80)T−1ξ (i,j) theirtransitions. A = t=1 t (9) ij (cid:80)Tt=−11γt(i) 6.1 DatasetandParametersSettings We collected ourdataset from real online courses using various B (w)= (cid:80)Tt=1γt(i)c(w,xt) (10) MOOCplatformsandinthreedifferentdomains:Python,Structural i (cid:80)V (cid:80)T γ (i)c(v,x ) Query Language (SQL),andMachine Learning Clustering algo- v=1 t=1 t t rithms(ML).Table1presentsthestatisticofthedataset.Weuse 75%ofthedataasatrainingsetand25%asatestset.Tochoosethe FormoreinformationaboutBaum-Welchalgorithm,pleasesee[20]. numberoftopicsineachdomain,wemanuallyinspectedthedataset It is clearthat the E-step and M-step equations are very similar tochoosethenumberoftopics.ThenumberoftopicsforPython, tothestandardHMMexceptthatinsteadofemittingonesymbol, SQL,andMLweresetto13,10,and9respectively. HMMULMemitsonelecturerepresentedasabag-of-words. Eachcourseinthedatasetisrepresentedasasequenceoflecturevideo 5.4 StructuralTopicModel transcripts.Wepreprocesslecturetranscriptsbyeliminatingstop TheStructuralTopicModel(StrTM)[25]isanotherprobabilistic wordsandsomerareterms.Aftercleaningthedata,weconstructed graphicalmodelthatfunctionsverysimilartoHMMULM.Ithas thebag-of-wordvectorrepresentationsofalllectures.Weonlyuse beenusedtomodelthelatenttopicalstructuresinsidedocuments. lecturetranscriptstorepresentlectures;therefore,weonlyneedto LikeHMMULM,itmodelstopicsandtheirtransitionsashidden settwothresholds(K andK )ofthePCK-Meansmethodinorder 1 2 statesthatemitlecturesasbags-of-words.UnlikeHMMULM,strTM to selectthe listofMust-linkandCannot-linkconstraints. Since assumeseachlectureasamixtureofcontenttopicsandfunctional wedonothavelabeleddatawechosethethresholdsthatmaximize Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 143 Table2:TheperformanceofTask1:Findingthecorrectsequenceusingthepermutationmethod.ItisclearthatPCK-Means achievesthehighestperformance. Methods Dataset Measures Cosine PCK-Means MULM HMMULM strTM kendall’sτ(σ) 0.60 0.73 0.49 0.66 0.54 Dir-P 0.37 0.50 0.43 0.28 0.31 Python Undir-P 0.52 0.56 0.50 0.44 0.35 Lev-Sim 0.60 0.63 0.45 0.49 0.50 kendall’sτ(σ) 0.58 0.59 0.58 0.55 0.41 Dir-P 0.46 0.43 0.40 0.36 0.41 SQL Undir-P 0.67 0.58 0.49 0.44 0.41 Lev-Sim 0.53 0.57 0.54 0.49 0.37 kendall’sτ(σ) 0.68 0.75 0.63 0.58 0.64 Dir-P 0.34 0.34 0.44 0.34 0.43 ML Undir-P 0.52 0.52 0.57 0.41 0.53 Lev-Sim 0.61 0.65 0.55 0.43 0.53 theSilhouetteCoefficientclusteringmeasureusingthetrainingdata. normalizedsimilaritycloseto1whichindicatesthatthenumber We setK = 0.55 andK = 0.004 forPython,K = 0.8 and ofeditsrequiredisminimal.Third,weutilizethedirectedbigram 1 2 1 K = 0.01forSQL,andK = 0.55andK = 0.01forML.To precision(seeequation15)thatcapturesthecorrectnessoftheorder 2 1 2 setthehyperparametersofstrTMmethod,weusedagridsearch betweenadjacentlectures.Theintuitionbehindusingthismeasureis andchosethevaluesthatmaximizethelikelihoodofthetraining toevaluatewhetherthetransitionmapslearnedbydifferentmodels data.Wesetθ = 0.2,γ = 0.3,andη = 0.6forPython,θ = 0.1, havetheabilitytocapturethecorrectdirectionorderbetweentopics γ =0.3,andη=0.1forSQL,andθ=0.1,γ =0.1,andη=0.6 andadjacentlectures.Finally,weusetheundirectedbigramprecision forML. showninequation16tomeasurewhetherthetransitionmapofeach modelcanrecognizeadjacentlecturesbutincorrectlycapturedthe 6.2 SequencingTasks directionbetweentopics. In this experiment,our goal is to compare the topic transitions #of correct(a→b)inestimatedsequence P = modeledbydifferentmethodsinthreetasks:1)Findingthecorrect Dir−bigram #of correct(a→b)ingroundtruth sequenceoflectures,2)Predictingthenextlecturegivenasequence (15) of lectures,and 3) Predicting the sequence of a list of lectures wherethefirstlectureinthesequenceisgiven.Anexampleofreal applicationfortask1andtask3isdesigninganewcourseplanby #of correct{a,b}inestimatedsequence sequencinglecturesbeforedeliveringthemtostudents.However, P = task1andtask3exploittwodifferenttechniquestofindthesequence. Undir−bigram #of correct{a,b}ingroundtruth (16) Incontrast,task2canbeappliedtorecommendthenextlectureto learnerstocustomizetheirlearningbasedonthehistoryoflectures theyalreadywatched.Intheevaluation,thepurposeofeachtaskisto 6.2.2 Task1:FindingTheCorrectSequence comparedifferentmethodsandevaluatetheabilityoftheparameters Tofindthecorrectsequenceoflectures,wefollowthepermutation (A,B,andπ)ofeachmodeltofindthecorrectsequenceinthethree methodutilizedby[25]. Withcoursesthathavelargenumberof differenttasks. lectures,itisinfeasibletofindalltheorderingsoflectures.Therefore, whenthenumberofthepermutationsexceeds500,werandomly 6.2.1 EvaluationMeasures permutated500possibleorderingsoflecturesascandidates.Weran To compare different models,we use the sequences of lectures theexperiment20timesforeachmethodandrecordedtheaverage from courses in the test set as the ground truth sequences and results. exploit different measures to do the evaluation. First,we follow Wang et al. [25] and use kendall’s τ(σ). Kendall’s τ(σ) is an Inordertoselecttheoptimalsequencefromthelistofpermutations informationretrievalmeasurethatcapturesthecorrelationbetween instrTMandHMMULM,wefollowWangetal.[25]andchoosethe tworankedlist. Itindicateshowthepredictedorderdiffersfrom sequencethathasthehighestgenerationprobabilitycalculatedas: the ground truth where 1 means perfect match,−1 means total (cid:88) σ¯(m)=argmax P(x ,x ,...,x ,Z|λ) (17) mismatch, and 0 indicates that the two orders are independent. σ[0] σ[1] σ[m] Second,we use Levenshtein normalized similarity which is the σ(m) Z opposite of Levenshtein normalized distance that measures the minimum numberofedits (insertions,deletions orsubstitutions) TochoosethebestsequenceforMULM,wefirstfindthebesttopic requiredtotransformthepredictedsequencetothegroundtruth cthatgenerateseachlecturesinthetestsetaccordingtoequation sequence.ThegoalistofindthesequencethathastheLevenshtein 4.Afterthat,weselectthesequencethathasthehighestlikelihood 144 Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 0.7 0.50 arity0.6 0.25 mil Si0.5 Kendall's Taus 000...520050 nshtein Normalized 000...234 0.75 Leve0.1 1.00 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 Courses Courses 1.0 CMoUsLinMe 0.6 n HPCMKMMULM m Precision00..45 am Precisio00..68 strTM d Bigra0.3 ed Bigr0.4 Directe0.2 ndirect U0.2 0.1 0.0 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 Courses Courses Figure3:TheperformanceofdifferentmethodsinTask3:Predictingthewholesequence.Allmethodshavecomparableperformance. basedontheequation: Table3:TheperformanceofTask2:Predictingthenext lecture.ItisclearthatHMMULMachienesthehighest σ¯(m)=argmaxP(C)P(X|C) performance. σ(m) |σ(m)| (18) (cid:88) Accuracy =P(c )P(x |c ) P(c |c )P(x |c ) 1 1 1 i i−1 i i Method Python SQL ML i=2 Cosine-Similarity 0.46 0.56 0.42 SincePCK-Meansisaclusteringmethodthatminimizesthedistance PCK-Means 0.45 0.49 0.47 betweenlecturesandclusters’centroids,weassignlecturesxofthe MULM 0.41 0.34 0.37 testsettotheclosestclustersziusingEuclideandistancesasshown HMMULM(Viterbi) 0.52 0.56 0.60 inequation19.Then,weselectthesequencethatmaximizesthetopic StrTM(Viterbi) 0.39 0.27 0.43 transitionsbetweenlecturesinthesequenceaswellasminimizesthe distancebetweenadjacentlectures(seeequation20).Theintuition behindthatistoensurethetopiccoherencebetweenadjacentlectures groundtruthsequencesandneedtheminimaleditstobetransformed andalsoreducesthegapsbyminimizingthedistancebetweenthem. tothegroundsequences.However,PCK-Meansonlyoutperforms othermodelsinthedirectedandundirectedbigramprecisioninthe Pythondataset,indicatingthatitsometimesnotabletocapturethe c(x)=argmin(cid:107)x−µ (cid:107)2 (19) sequencebetweenadjacentlectures. zi zi Ingeneral,itisclearthatPCK-Meansachievesthehighestperfor- σ¯(m)=argmaxπ(c(x ))|σ(cid:88)(m)|A(x ,x )−(cid:107)x −x (cid:107)2 manceinmostmeasuresandalmostinallthedatasets.Wethinkthat 1 i−1 i i i−1 combiningthetopictransitionswiththeEuclideandistancehelps σ(m) i=2 PCK-Meansinfindingthebestsequencefromthelistofpossible (20) sequences. Asabaselineweaccumulatethecosinesimilaritybetweenadjacent lecturesinthesequenceandselectthesequenceinthepermutations 6.2.3 Task2:PredictingTheNextLecture thathasthehighestsimilarityscoretobetheoptimalsequence. Intask2,eachmodelpredictsthenextlecturegivenasequenceof lectures.Wevariedthelengthofthegivensequencestartingfrom Table2summarizestheresultsofTask1foreachmethod.Wecan one.AsstrTMandHMMULMarebasedonHMM,weutilizedthe noticethatPCK-Meanshasthehighestscoreinkendall’sτ(σ)and Viterbialgorithm[20]tofindthemostprobablesequenceofhidden Levenshteinnormalizedsimilarityinalldatasetswhichindicatesthat statesortopicsthatgeneratedthelecturesinthegivensequence. PCK-Meanshaschosenthesequencesthatareverycorrelatedtothe Thenwegreedilychoosethenextprobablelectureinthesequence Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 145 MULM strTM HMMULM HMMULM PCK-Means PCK-Means (a) (b) Figure4:QualitativeAnalysisofSequencingTask3.(a)Examplesofpreferringselftransitionbehaviourwhenselectingnextlecture inthesequence,(b)Examplesoftheproblemofsequencingadjacentlecturesthatcoverthesametopics. accordingtotheequation: capturestheKendall’stausinFigure3,wecannoticethatHMMULM hasachievedatausscore≥0.50infourcourses,PCK-Meanshas x¯=argmaxP(z |z )P(x|z ) (21) i i−1 i achievedthesamescoreinonlytwocourses,MULMandstrTM x in only one course,and Cosine method in non courses. For the Levenshteinnormalizedsimilarity,itisclearthatallmethodshave Similartotask1,forMULMandPCK-Meansmodelsweassign comparableresults.Forthedirectedandundirectedbigramprecision, lecturestothebestclustersusingtheequations4and19respectively. allmethodshavealsocomparableresultsexceptMULM.Thereason AfterthatMULMgreedilychoosesthenextlecturethatmaximizes is that MULM sometimes cannot complete the whole sequence theequation21.Ontheotherhand,PCK-Meansmodelselectsthe becauseitonlyusesthegreedymethodwhichcannotcompletethe nextlecturethatmaximizesthetopictransitionandminimizesthe sequenceinthecaseoftheabsenceofthetopictransitionsrequired distancewiththelastlectureinthegivensequence.Forthebaseline, tosequencecoursesinthetestset. Inthecaseofothermethods, weusethecosinesimilaritywherewechoosethenextlecturethat theyalwaysfindthewholesequenceeitherbecauseoftheViterbi hasthehighestsimilarityscorewiththelastlectureinthegiven algorithmusedbyHMMULMandstrTMorduetothesimilarityor sequence. distancemeasuresutilizedbyPCK-MeansandCosinemethods. Table 3 summarizes the results of Task 2 for each method. We Inadditiontoquantitativelycomparingthemethods,wetrytoquali- can notice that HMMULM achieves the highest accuracy in all tativelyevaluatetheresultsbyexaminingthegeneratedsequencesof datasets.UsingtheViterbialgorithmalongwiththelearnedtopic eachmethods.Ingeneral,wefoundtwocommonbehaviourshared transitionshelpsincapturingthemostprobablehiddenstatesor byallmethods. topics that generate the given sequence of lectures. In addition, thetopictransitionslearnedbyHMMULMhelpingreedilypick First,inmostcasesalmostallthemethodspreferselftransitionwhen the nextlecture in the sequence. While StrTM also uses Viterbi theypickthenextlectureinthesequence.Forexample,asshownin algorithmsimilartoHMMULM,itsaccuracyscoreswerefarless Figure4(a),MULM,HMMULM,andPCK-Meansselectthenext than HMMULM. We think the main reason for that due to the lecturethathasthesametopicasthecurrentlecture. performanceofthelearnedtopictransitionsasweexplaininsection 6.3. Second,allmethodscannotsequencelecturesthatbelongtothesame topic. InMOOCs,duetotheshortlengthoflectures,instructors 6.2.4 Task3:PredictingTheSequence sometimesexplainthesametopicusingmultiplelectures.Asaresult, Task3isverysimilartotask2exceptthateachmethodneedstofind itishardtofindthecorrectsequenceoflecturesthatcoverthesame thewholesequenceofgivenlectureswherethefirstlectureinthe topic.Forexample,asshowninFigure4(b),thelastfourlectures sequenceisgiven.Figure3depictstheresultsofTask3. ofthecourseexplainthe“PrincipalComponentAnalysisalgorithm” andhencestrTM,HMMULM,andPCK-Meanscannotpredictthe Asthistaskisconsideredthemostchallengingtask,itisclearthat correctsequenceoftheselectures.Inthiscase,weneedtouseother thereisnowiningmethod.However,fromtheupperleftgraphthat 146 Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 0.37 0.44 Topics Terms 0.38 Topics Terms 0.13 CLoonodpit i&o n File 0.19 String sqmfutureniontchtgteioo,, d nidn.,od pueabxrl,ae qm, useouttbeesrsst,r, i snfutgrni,n ccgthisoa,n rcsah,c aterar,c eterircs,, 0.38CStoantedmitieonntasl 0.12 File 0.05Lis0t. 5&0 CSV File cqcsouvlou,t meq,un cosot.elusm, finle, ,d soturibnlge,, ftoarbmlea, tc, oromwm, a, 0.3F9unction 0.08 0.06 0.07 CSV File 0.6 FunFcilteion agfirelrleoga,bud fiam, llef,eo senl,dr troesop,r r,er slei,nn td,ue edr,n filr,,ie n spcecatsoor.apryme, .peyte, ri,n fsotramll,a tle, xt, 0.12 0.06 0.1 0.120.41 String LFoiolep fislsiotlemeeocr,apa ofl,tlun ieolndasncsrtg.,t, i ebofisrlneet,,s a ow,k pr,he esialnetad,, t,lie innomfiebnsejei,nt celtit,,n, blecol,laoo ahpsr,ese t,.ar u, e, 0.08 Str0in.0g80.008.00.812 0T.1u3plLeiss t 00..1326 0.103.D0i6ction0.a1r3y 0.65 DCCiLocSonLtViodios piFntt iia&oler nyccleidcdleonissooliilfiosccetvlu,nuettpm, i.n roim,,pta e tneetshnnnra,q,uo gs trekusen,ey el,a,,eid, f y llf.kseo,iavt e,elos alesyfitmplta,elue, et, k,e,e s, ee c,femr oylooweqsrenwmumo,nd ,erd etaidct,nini t,coogc, t tvlne,rtiu oaae,,lmi nblasbiutauttneleesorl,mas,ri ce ,t,ri kas n,sbm ,,b,dm o wslapeoeetphaxrl,ne i,piinlr uaeisgn.n,,s g ,. , 0.58 0L.o1o3p TDyapteas0 .16000..2.0085 CSV 0FD.i0ilce8ti00o..n025a1ry0.1400..160SL4.oi2sr1tt 0.21 CSDDtLoaiacintstatedti o miTStnyieoopannrrettyassl flidteudcsplnoroooexsirutcooa,peurcet ptgittirrne,omy,,en r t dfnapeslsaaao,ame,sx mll ssr.ripskaomtyee,e,r,n n e,, iflyl kned,bseov egosgslalaoiys,mirfl ot,e,couas n,kkbaen,c ,s,eot.,ce e oeiyoulxnirrslsl,mpstte ,eitser n,audeg t,gwanl iseacc,t.os rettos,iiir ooomnodflnnndro,e ta,siat, tnu ,rlitla otiiipce,snnn olstgeg,,n, ,, ue vptvaqyeaaguprilrteaue,s ,le,,, s. 0.35 0.4 & Sort TuSpolerst & tliuspt,l ere, vseorrsteed, ,l aramnbddoam, ,f usnocrtti,o tnu,p olersd,e sr.orting, 0.16 SLitsritn &g lmiitseetm,t hisno.ddesx, ,c olabsjesc, ti,t esmtri,n pgo, smiteiotnh,o sdt,r ings, LCaPnlargsousgae.g se ccophlallbroaanojusseggrsscurs,eata, s asmgl,e,t e tmiamrnsirbsi.enntuta,gh tnte,oa clsadse,k n,.o slgab,u njteaogcgoutel,a, , mg vceiedo,ten hpocoresod,pgs ctr,sa at,mel cmaumcalahicntihingoig,nn ,es ,, Not Clear eroxoptr,e esxspiorens, sairoenas,, ldairvgiedsetd, ,g fluoeastsin, gre, cpui,r sciuvbee, . LInaPsPnytrgtaohulgloain.ng ge fiwptsalcreisonr,ikgdp isrnot,a, sw smftescame,rl lilfi,pi. nldtesgisr,, e, p lcawrtanoincgrdtyui,coa pewgy,,e , l c,ec llaioecrmaknr,im nnfo,ga lc,nd ovdeni,dr ,cte eoipopoystlt,,sh ,o n. Not Clear tcmliusaartlt,ccl etuhu,li apnfltleoeio.o,n rf,su a,n mpcit,oi ordanun, leed,loe pmmro,e cgnortam, mrma,n eegnrertos, ,rl so,o spt,a ck, List laisctc, uamccuulamteu,l aitteomr, ss,t aotbej,e nctu, mp,o isteitimon, ,a mppuetantde,. Not Clear scpolmit,m aap,i ,f oprrmevaite, wfi,l ejo, iwn,o srdpsa.ce, image, white, Not Clear asqccuuamreu, lcautobre, ,l idstic, tgiouneasrsy,, sctoauten,t ,r otootta, lf.unction, Not Clear gplaorbaaml,e rteecr.ursive, tuples, sequence, Figure5:PythontopicsandtheirtransitionsusingPCK- Figure6:PythontopicsandtheirtransitionsusingMULM Meansmethod.Thetablerepresentsthetoptermsineach method.Thetablerepresentsthetoptermsineachtopic. topic. 0.86 Topics Terms techniquestopredictthesequence.Onenaivesolutionistoassume 0.48 0.28 Lis&Dt ,S iTcoutr,p tle 0.79 String slliostrcti,an stgiuo, bnins, dtsreilnixcg,e ,c. chhaararaccteter,r ss,t rpinregvsi,e qwu,o tes, thatalladjacentlecturesthebelongtothesametopicasoneatomic function, float, type, expression, return, String 0.06 Loop & Function parameter, parameters, integer, int, unitandweonlyneedtosequencelecturesthathavedifferenttopics. 0.88 Condition parentheses, string. Furtherinvestigationofsolvingthesequencingproblemoflectures 0.08 0.05 File ficloel,u cmsnv,, rfieleasd,, ptahbolnee, sd,i rreocwto, rfoy.rmat, open, thatbelongtothesametopicisleftforfuturework. 0.17 File 0.09 0.43 CLoonodpit i&o nleoqoupa, lt,r luaerg, efaslts,e b, lowchkil,e i,n cfionnited,i teiolsne, .loops, 0.13 0.06 Dictionary 6.3 TopicTransitionsExamples Function 0.43 Dictionarydkeicytiso, ncaoruy,n lti,s ct,o aucnctsu,m louolapt,o sr,t akteey,, nituemm., Inthissection,wepresentexamplesoftopicsandtopictransitions 0.06 List 0.37 List llisstt,, roebfejerecnt, caep, psetrnindg, sli,s mts,u ittaetme,s b, opuonsditi.on, 0.68 leexaarmnepdlesbyusdiniffgePreynthtomnedtahtoadsest..DWueettroystopaacnealcyoznestthreaiwnto,rwdsewpirtehsethnet 0.51 DTiucStp,o lLert i&st , dkeicytiso, nsaorryt,, leisxtp, rteuspsleio, nk,e iyte, mso,r steodrt,i nfugn.ction, highestprobabilitiesintheworddistributionsoftopicslearnedby class, attributes, methods, instance, comments, calculations, floor, input, Classes dictionary, method, turtle, classes, module, Not Clear algorithms, computers, warm, web, city, hot, eachmethodsandmanuallymappedthemtotopicwordsorphrases. objects, concepts. fast. For instance,if the word distribution has the words: list,range, LaPnrgouga.g e llsaecnarigrpnutisna.gg,e s, ypnrotagxr,a tmasmkisn,g c, olenacrenp, tssc, rlaipnt,g duaatgae, s, Not Clear slineeco, ncodds,e b, lroucnks,, ttriamceesb,a ecxke, chuotuer.s, blah, turtle, items,index,andappend,thenitisclearthatthisworddistribution Not Clear tmeeascshaingge,s e, reicr,r ocrosd, ee,r rcoerl,l ,m gorandthe., install, button, Not Clear kperolvginra, mte,m spteepr,a eturrroer,, hlooouprs, ,c fealhsriuesn,h ienidt,e enqt.ual, capturesthetopic“List”.Theworddistributionswithtopicphrases of each topic learned by PCK-Means,MULM,HMMULM,and Figure7:PythontopicsandtheirtransitionsusingHM- strTMmethodsinPythondatasetaredepictedinFigure5,6,7,and MULMmethod.Thetablerepresentsthetoptermsineach 8respectively.Sincewehave13topicsinthePythondataset,we topic. onlyvisualizethetopictransitionsofasubsetofthesetopicsand depictedthetransitionsthathavescores≥0.05. ItisclearfromtheFiguresthatallmodelsextractsomeusefultopics 0.45 Function0.05 0.44 Topics Terms 0.37 list, string, if, position, index, python, wherethetoptermsofeachtopicclearlyexplainthetopic.However, 0.1 CExopnrdeitsisoinoanl String stytrpineg.s, quotes, character, number, single, PCK-Meanshasthebestworddistributionsthatclearlyexplaineach String 0.08 0.070.48 0.08 Function feuxnpcrteiosnsi,o rne,t usrqnu, acroed, ein, spidaera, mfuentcetri,o ns, topicfollowedbyHMMULMandthenMULMwhilestrTMhasthe 0.13 Loop & 0.08 print, parameters, statement, arguments. lowestperformance.WealsonoticefromtheFiguresthatPCKMeans 0.06 0.07Condition File filinlee, sc,s dva, tfiale, sq,u oopteesn,, tlainbele, .read, list, row, haveextracted11usefultopicswithtwotopicsthathaveunclear Loop & if, loop, code, while, run, else, statement, 0.54 Condition block, equal, condition, true. worddistributionsandcannotbemappedtoanyusefultopics.In 0.07 Dictionary Dictionaryddiiccttiioonnaarryie, sk,e syo, rltisetd, ,f ulisntcst,i osno,r tk,e tyusp,l e, contrast,MULMhasmodeled10meaningfultopicswiththreetopics File 0.05 values. formnoiseandcannotbemappedtoanytopics.Ontheotherhand, List 0.13 List lfiusnt,c ltiisotns,, eif,le cmouenntt,, sitterming, sin, dfielex., loop, HcaMnnMotUbLeMmaanpdpestdrTtoMancaypptuhrreas9e.toIpnicgsenweirthalf,otuhrisufinncldeianrgtoinpdiciscathteast 0.54 0.52 CExopnrdeitsisoinoanl tceroxudpeer,e ,i fsb,s ofiaoolnslesea,, noe,pl sbeelro,a cetkoq,r usga.rle, aetxepr,r ession, tohfatthPeCcoKu-rMseesainnsthhaesPtyhtehobnesdtaptaesrefotramsaitnmceodinelmsmodoerleinugsetfhuelttooppiiccss List & Dict laiitsectcm, ucsmh.aurlaactoter,r ,v dairciatibolnea, rnyu, mitebmer,, locooupn, t, index, Not Clear python, vvaarriiaabbllees, ,s ctroindge,, tflyopaet,i nifg, ,i nktiengde.r, point, withclearworddistributions.TheresultsalsoindicatethatstrTM Classes ltiuspt,l ecsla, sinsd, etuxp, lme,e ftuhnocdtiso,n o, bmjeectth.od, set, data, Not Clear inf,u lmoobpe,r ,g aunessws,e rro, otimt, emsa, cchuibnee., start, simply, achievesthelowestperformancebecauseeventhoughitcapturesthe data, dictionary, language, kelvin, keys, samenumberofmeaningfultopicsasHMMULM,strTMhasthe Not Clear loop, if, trfaules,e p, yruthno, ns,m satallteesmt,e vnat,r ilaabrglee.st, code, Not Clear pcorolugmranmsm.ing, table, row, column, dictionaries, lowestperformanceintheclarityoftheworddistributions. Figure8:PythontopicsandtheirtransitionsusingstrTM method.Thetablerepresentsthetoptermsineachtopic. Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) 147 AsshownintheFigures5,6,7,and8,themeaningfultopicsextracted In the future,we plan to explore incorporating Topic Transition byallmethodsareverysimilarwithsomevariations.Forexample, Map withconceptdependency relations andexamine ifthis can whilePCK-MeansandMULMseparate“Files”and“CSVFiles”, solvetheproblemofsequencinglecturesthatbelongtothesame HMMULMandstrTMcombinesthemintoonetopic.Inaddition, topic.Further,weaimtocombinedifferentmethodssuchasPCK- whilePCK-MeansandHMMULMcombines“Loops&Condition”, Means and HMMULM in order to improve the accuracy of the MULMdifferentiatesbetweenthem.StrTM,ontheotherhand,has Topic Transition Map and hence improving the performance of both“Loops&Condition”and“ConditionalStatements.” thesequencingtasks.Finally,weplantoapplyourworkonother domainssuchastraditionalUniversitycoursesoreducationalbooks. Regardingthetopictransitions,itisclearthatallmodelscaptureself Todothat,weneedtoinvestigatehowtodividelonglecturesorbook transitionswiththetopicanditself.Thisindicatesthat,inMOOCs, sectionsintosegmentswhereeachsegmentismappedtoonetopic. instructorsusedmultiplelecturestoexplainthesametopic.However, HMMULMgavehigherprobabilitytoselftransitionscomparedto 8. REFERENCES othermethods.WecannoticefromtheFiguresthattherearesome [1] R.Agrawal,B.Golshan,andE.Papalexakis.Towarddata-driven consensusbetweenallmethodsonsometransitionsbetweentopics design of educational courses: A feasibility study. JEDM- suchas:“List”→“Dictionary”and“String”→“File.”Therearealso JournalofEducationalDataMining,8(1):1–21,2016. somevariationsoftopictransitionsbetweendifferentmodels.For [2] F.ALSaadandA.Alawini. Unsupervisedapproachformodel- instance,wilePCK-Means,HMMULM,andstrTMhaveatransition ingcontentstructuresofmoocs. InProceedingsofThe13th between“String”→“List”,MULMcombinesthesetwotopicsinto InternationalConferenceonEducationalDataMining(EDM onetopicorcluster.Anothervariationisthat,PCK-Means,strTMand 2020),pages18–28,2020. MULMhaveatransition“Loop&Condition”→“String”,whereas [3] F.ALSaad,A.Boughoula,C.Geigle,H.Sundaram,andC.Zhai. HMMULMmissesthistransition. Miningmooclecturetranscriptstoconstructconceptdepen- dencygraphs. InProceedingsofthe11thInternationalCon- Ingeneral,allmethodscapturesusefultopicswithclearworddis- ferenceonEducationalDataMining,pages467–473.EDM, tributions.Regardingthetopictransitions,allmethodscaptureself 2018. transitionsandalsohavesomeconsensusonsometransitions.There [4] S. Basu, A. Banerjee, and R. J. Mooney. Active semi- arealsosomevariationsbetweenmethodsandthesedifferencesdue supervisionforpairwiseconstrainedclustering.InProceedings tohoweachmethodidentifytopicsofeachlecture.Improvingthe ofthe2004SIAMinternationalconferenceondatamining, modelingoftopicsandthemappingbetweenlecturesandtopics pages333–344.SIAM,2004. clearlywouldimprovethequalityofthetopictransitionmaps. [5] D. M. Blei,A. Y. Ng,and M. I. Jordan. Latent dirichlet allocation. theJournalofmachineLearningresearch,3:993– 7. CONCLUSION 1022,2003. In this paper,we introduce the TopicTransition Mapwhichis a [6] D. Chaplot and K. R. Koedinger. Data-driven automated generalstructurethatmodelsthecontentofMOOCsastopics,where inductionofprerequisitestructuregraphs. InProceedingsof eachlectureismappedtoatopic,andcapturesthetransitionbetween the9thInternationalConferenceonEducationalDataMining, topics.Itmodelsthevariouswaysofhowinstructorsorganizetopics pages318–323.EDM,2016. inordertoconstructthestudyplanoftheircourses.Weinvestigate [7] W. Chen, A. S. Lan, D. Cao, C. Brinton, and M. Chiang. fourdifferentmethodstoconstructtheTopicTransitionMap:PCK- Behavioral analysis at scale: Learning course prerequisite Means,MULM,HMMULM,andstrTM.PCK-MeansandMULM structuresfromlearnerclickstreams. InProceedingsofthe separatelyclusterlecturesintotopicsandthenlearnthetransitions 11thInternationalConferenceonEducationalDataMining, betweentopics,byleveragethesequencesoflecturesindifferent pages66–75.EDM,2018. courses.Incontrast,HMMULMandstrTMassumefirstorderMarkov [8] A. P. Dempster,N. M. Laird,andD. B. Rubin. Maximum propertyamonglatenttopicsandhencejointlylearntopicsandtheir likelihoodfromincompletedataviatheemalgorithm.Journal transitions.Whilethethreemodel,MULM,HMMULM,andstrTM oftheRoyalStatisticalSociety:SeriesB(Methodological),39 areprobabilisticmodels,PCK-Meansisdistance-basedclustering (1):1–22,1977. algorithmthatincorporatessomeconstraintstoguidetheclustering [9] L.Du,J.K.Pate,andM.Johnson. Topicmodelswithtopic process. orderingregularitiesfortopicsegmentation. In2014IEEE International Conference on Data Mining,pages 803–808. Weevaluatedthegeneratedtopictransitionsfromvariousmethods IEEE,2014. using three different tasks: 1) determining the correct sequence, [10] L. D. Fink. Creating significant learning experiences: An 2) predicting the nextlecture,and3) predicting the sequence of integratedapproachtodesigningcollegecourses. JohnWiley lectures. Our evaluation revealed that PCK-Means achieves the &Sons,2013. highest performance in determining the correct sequence while HMMULM outperforms othermethods in the taskofpredicting [11] R. M. Gagne and L. J. Briggs. Principles of instructional thenextlecture.Sincethetaskofpredictingthewholesequence design. Holt,Rinehart&Winston,1974. oflecturesisconsideredthemostchallengingtask,therewasno [12] A.Gruber,Y.Weiss,andM.Rosen-Zvi. Hiddentopicmarkov winningmethodandallmethodshavecomparableperformancewith models. InArtificialintelligenceandstatistics,pages163–170. MULMhasthelowestperformanceasitsometimesfailstopredict PMLR,2007. thewholesequence.WealsovisualizethetheTopicTransitionMaps [13] T. Hofmann. Probabilistic latent semantic analysis. arXiv generatedbydifferentmethodstoqualitativelyevaluatetheresulted preprintarXiv:1301.6705,2013. maps.WefoundthatPCK-Meanshasextractedmoremeaningful [14] C. Liang,J. Ye,Z. Wu,B. Pursel,andC. L. Giles. Recov- topicswiththebestworddistributionsthatclearlyexplaineachtopic. ering concept prerequisite relations from university course dependencies. 2017. [15] H.Liu,W.Ma,Y.Yang,andJ.Carbonell. Learningconcept 148 Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021)