Personalization of Learning Paths in Online Communities of Creators Mingxuan Sun∗ Seungwon Yang DivisionofComputerScienceandEngineering SchoolofLibraryandInformationScience LouisianaStateUniversity CenterforComputationandTechnology [email protected] LouisianaStateUniversity [email protected] ABSTRACT “changecolor”,and“doIf”. Eachblockiscategorizedinacer- Inmassive onlinecommunitiesof creators(OCOCs), one of tain Computational Thinking (CT) concept [6]. Users are the core challenges is to encourage users to learn to create expected to learn CT concepts such as“motion”by manip- original contents using basic components. Recommending ulating blocks such as“goto”,“bounce”, and“turn”. Users the right learning components at the right time is criti- mayfollowdifferentlearningpathsovertime. Basedonpro- cal for improving user engagement and has not been fully grammingblocksthateachuserhasusedinhis/herprevious studied due to the unstructured nature of online commu- projects, we can recommend particular blocks, concepts, or nities. To address the problem, we propose in this paper projects tailored to the individual. For instance, for users a novel recommendation model which integrates Cox’s sur- who are interested in animation projects with some basic vival analysis and collaborative filtering. Our model can motion blocks such as“goto”, the system can recommend incorporate factors such as user learning history and social projects that have more advanced motion blocks such as engagements, which provides us insights in improving the “bounce”. personalizedservice. Weapplyourmethodtotheuserdata from Scratch online platform and demonstrate the perfor- In addition to what to recommend, when is a good time mance of the model. torecommendisanotherimportantfactortoconsidersince suggesting blocks to users at the right time may influence 1. INTRODUCTION learningeffectivenessandefficiency. Forexample,ifauseris stillstrugglingwithbasicmotiontechniquessuchas“goto”, In recent years, the number of online learning communi- it may not be a good idea to introduce a project or a more ties (OCOCs) has increased exponentially as evidenced by successful platforms such as Scratch online1. These online advancedprogrammingconceptsuchas“turn”or“direction”. Our goal is to alleviate the high dropout rates in the early communitiesofferflexiblelearningenvironmentwhereusers stage through personalization of the learning path. cancreateprojects(e.g.,games,artdesigns),shareprojects, andengagewithlike-mindedusersinthecommunity. Oneof In this paper, we propose a model to learn the probability thegoalsistofosterlearningprogrammingconceptsthrough of a user’s exposure to a certain learning component at a developingandsharingprojectsamongitsusersbasedonin- particular time. The probability of exposure is estimated teractions in the community [11]. Previous studies [7] have basedonacollaborativefilteringmodel,whichrecommends found that creating and sharing projects is the gateway to the user the items favored by the like-minded. The condi- other online social activities including commenting and fol- tionalprobabilityofauserbeingexposedtoagivenitemat lowing. However,onlyabout29%ofScratchuserswouldlike aparticulartimeismodeledbytheCoxproportionalhazard tosharetheirprojectsandabouthalfofthemcontributeno model from survival analysis. more than one project. Onewaytoimproveuserengagementistotrackusers’learn- 2. RELATEDWORK ing history and recommend contents tailored to each indi- EarlystudiesonlearningbehavioranalysisforOCOCshave vidual. For example, Scratch users learn to create projects beenbasedoncase-studiesevaluatinglearningprocessqual- by manipulating basic programming blocks such as“goto”, itatively [5,12]. Otherattempts[3,7]havefocusedonclus- tering user behaviors based on types and volumes of users’ ∗Corresponding author. 1https://scratch.mit.edu/ onlineactivities. ArecentworkbyYangetal.[14]modeled informal learning trajectories quantitatively as the growth of cumulative usage of programming blocks by each user. Personalizationapproachesthatarebasedonuserbehaviors have been widely studied in different types of Web services such as e-commerce. In e-commerce, most personalization approachesfocusonrecommendinguserstheitemsthathave been favored by like-minded users based on their purchase history. Traditional recommendation algorithms are mem- ory based methods including vector similarity and correla- Proceedings of the 9th International Conference on Educational Data Mining 513 occursattimet orthevaluezeroifeventdoesnotoccurtill ! ! k " "#$ 5 timetbytheendofobservationwindow. Theparametersβ +!’,)-.&’/*012 andthebaselinehazardλ canbeestimatedbymaximizing 0 %&’()* the log partial likelihood with Breslow’s approximation [4]. 5 +!’,)-34!42 Wefurtherestimatetheprobabilityp(u,i)ofauserfavoring a particular item (e.g., block) by adopting collaborative fil- tering (CF) recommendation algorithms. User interactions Figure 1: Time-aware recommendation. The occur- containsubstantialinformationtoimproverecommendation rence time of user-item interaction is modeled using accuracy. For example, in Scratch, users play with a set of survivalanalysis. Ourgoalistopredictthemostde- programming blocks to develop a project. Therefore, the sired learning item i at a particular time t for each frequency of each type of block may indicate their prefer- individual user u. ences. Based on the previous learning history, the system can predict interesting blocks tailored to individual taste. tion[2]. Thestate-of-the-artmethodsincludingtheonethat Collaborativefilteringmethodsfocusondetectinguserswith won the Netflix competition [9] are based on matrix factor- similar preferences and recommending items favored by the ization. The time factor in personalization services largely like-minded. Algorithms range from similarity based CF affectstheusersatisfactionoftheservice[13,10]. Ourcon- methods[2]tomatrixfactorizationbasedCFmethodspop- tribution in this paper lies in that we incorporate both the ularized by the Netflix Prize Competition [9]. Cox model and collaborative filtering to provide personal- ized recommendation for online learners. Let rui denote the observed preference of user u for item i, where u = 1,2,...,m and i = 1,2,...,n. The pairs (u, i) are stored in the set O = (u,i) r is observed . Since 3. METHOD { | ui } the observed ratings or event frequencies are very sparse, InOCOCs,userscreateandshareprojectsconsistingofba- matrix factorization is used to learn latent features of both sicitemssuchasprogrammingblocksinScratch. Eachitem users and items in a lower dimensional space such that the belongs to a certain category. Based on user-item interac- product of each user-item pair can best approximate the tionhistories,wewouldliketosuggestitemstailoredtoeach ratings. Specifically, let θ and v denote latent features u i user at a particular time. To achieve this goal, we propose for user u and items i, where θ and v are k-dimensional u i to estimate the joint probability p(u,i,t) = p(tu,i)p(u,i), | vectors. Thelatentfeaturescanbeestimatedbyminimizing where p(u,i) is the probability of user u interacting with apredictionlossfunctionbetweenthepredictedratingsand item i and p(tu,i) is the conditional probability of user u | true ratings of users. That is, interacting with item i at time t. Wemodeltheoccurrencetimetoftheeventthatuseruin- mΘ,iVn (rui−θu>vi)2, (3) teractswithitemiusingtheCoxmodelinsurvivalanalysis. (uX,i)∈O Survival analysis is used to estimate the probability of the where Θ = [θ1,θ2,...,θm] is a k m matrix and V = × occurrence of an event p(event in [t,t+∆t]) such as when [v1,v2,...,vn] is a k n matrix. A gradient descent based × a patient fails to survive. In the online learning context, method [9] can be used to estimate latent features. The our task is to estimate the probability of the occurrence of probabilityofuserfavoringanitemp(u,i)canbegenerated exposing to a specific learning block for each user, which is using a softmax function: p(t(u,i)). As shown in Figure 1, in the observed sequences of u|ser-item interactions, a user builds a project with a set p(u,i)= exp(rui) , (4) n exp(r ) of items (e.g.,“isequal”and“goto”) at time t . Then item j=1 uj k i is used again in another project of the same user at time P 4. EXPERIMENTALRESULTS t . Letx bethecovariatesassociatedwithuseruattime k+1 k t . We are interested in predicting the time gap t t . Weevaluatethemodelperformancethroughtwosteps: time- k k+1 k − to-return prediction and time-aware recommendation. In Letλ(t)denotetheinstantaneousrateofeventhappeningat the first step, for every user-item interaction (u,i), we esti- time t following the last event given the covariates x , that mate the probability of the next occurrence at time t and k isλ(t) = P(T = t T t). TheCoxmodelassumesthat use the expected value of the time as the predicted time to the covariates only aff|ect≥the magnitude of each individual return. In the second step, for each user u at a particular hazard rates. Formally, for an individual observation with timet,werankeachitemibythejointprobabilityp(u,i,t) covariates x , the hazard at time t is: and recommend top-K items. We present the experimen- k taldetailsincludingdatacollection,evaluationmetrics,and λ(t)=λ0(t)∗exp(xTkβ), (1) competing baselines. whereλ isthenon-parametricbaselinehazardfunction,x 0 k isthecovariates,andβ istheregressioncoefficient. Thelog 4.1 DataCollection likelihood of observing the occurrences is: We apply our method to user data which was released in spring of 2014 from Scratch online2. Users can create a logL= K d logλ(t ) tkλ(τ)dτ , (2) projectbyprogrammingwithbasiccomponentscalledblocks. k k kX=1(cid:26) −Z0 (cid:27) EachblockcanbecategorizedintooneormoreCTconcepts. whered isacensorindicator,takingthevalueoneifevent 2https://llk.media.mit.edu/scratch-data/ k Proceedings of the 9th International Conference on Educational Data Mining 514 Table 1: Covariate analysis for CT concept“condi- Table 2: Covariate analysis for CT concept“data”. tionals”. ***p<0.001, **p<0.01, *p<0.05, . p<0.1 *** p<0.001, ** p<0.01, * p<0.05, . p<0.1 CovariateName Coefficient P-Value CovariateName Coefficient P-Value is.remix 0.190556 0.000593*** is.remix 0.15007 0.018629* is.self.remix -0.140119 0.062772. is.self.remix -0.18239 0.037235* is.remixed 0.447668 2.44e-15*** is.remixed 0.52571 3.33e-16*** like2ormore 0.226432 0.001440** like2ormore 0.23287 0.005221** follow2ormore 0.262599 0.000346*** follow2ormore 0.20475 0.023510* comments2ormore 0.478668 <2e-16*** comment2ormore 0.54465 <2e-16*** conditionalsexperience 0.332191 1.14e-08*** conditionalsexperience 0.03648 0.605257 operatorsexperience -0.074161 0.236036 operatorsexperience 0.14616 0.041800* dataexperience -0.157914 0.010259* dataexperience 0.12105 0.076548. WeadoptthethemappingtablefromblockstoCTconcepts able implies a higher hazard if the value of the variable is as suggested in [6]. Users are encouraged to share their high. Both tables show that the regression coefficients for projects and interact with others by commenting projects, the variable“is.remixed.bool”are positive. It indicates that favoring projects, or following other users. For each user, if a user’s project is remixed by others, the hazard rate of the dataset includes the project details including block us- observing the user’s next event will increase by a factor of age and timestamps. It also maintains tables of different exp(0.190556) 1 compared with the baseline hazard. On types of social interactions including user follower-followed − thecontrary,anegativeregressioncoefficientimpliesalower relationshipandcomments. Theuserhistorydatacollected hazard,whichmeanstheprobabilityofuserinteractingwith from December 2011 to March 2012 are used to create the the blocks belonging to that concept will be smaller. The training and the testing datasets. Possible spam users who valueofthecoefficientisstatisticallysignificantatdifferent createmorethan100projectsinadayarefilteredout. The significancelevels. Weonlyshowthecovariateswithhighest remainingdatacontains22415usersand170learningblocks significant levels. with 6 CT concepts. All user records observed during De- cember 2011 to February 2012 are used to train the model Asshowninthetables,forbothCTconcepts“conditionals” through cross-validation and all user records during March and“data”, for users who share projects later remixed by 2012 are used for testing. others,itismorelikelythattheseuserswillbebackcreating projectsinthefuture. Interestingly,userswhoremixothers’ ThefollowingcovariatesareusedtoestimatetheCoxmodel. projectswillbemorelikelytocreateprojectsthanthosewho Covariates related to user activity history include the num- remixtheirownprojects. Inaddition,userswholiketwoor ber of days since registration and the gap since last lo- more projects, who follow two or more friends, and who gin. User social interaction covariates include the number have two or more comments are more likely to create and of projects liked, the number of friends followed, and the share projects in the future than those who have no social number of comments on projects. User project details in- interactions. Thisimpliesthatsocialinteractionshelpusers clude the number of projects created, the number of types to learn and share. In addition, we can see that users who of blocks, and the number of concepts. We collect user co- havebuiltblocksintheconcept“conditional”aremorelikely variates on a daily basis and predict the days till the user’s to build blocks falling into the same concept. Interestingly, next event. Users who had not been exposed to the event users who have built blocks in the concepts“operator”and by the end of the time window were censored. “data”aremorelikelytobuildblocksintheconcept“data”. 4.2 PerformanceEvaluation Wethenusetheestimatedmodeltopredictthetimetothe In the first step“time-to-return prediction”, for every block next event in each CT concept. Table 3 displays the root pair(u,i),weestimatetheprobabilityofthenextoccurrence meansquareerror(RMSE)forthereturntimepredictionus- at time t and treat the expected value of the time as the ingtheCoxmodelandbaselines,respectively. Forconcepts predictedtimetoreturn. Sincethedataaresparse,adirect “loops”,“conditionals”,“operators”, and“data”, the hazard estimation of a survival model for each block will be noisy. basedapproachoutperformsalltheotherbaselines. Forcon- Instead, we train a Cox model for each CT concept using cept“event”,thehazardbasedapproachperformsveryclose the interactions events of blocks belonging to that concept. to linear regression and both of them perform better than To evaluate the performance, we predict the expected time the others. All the baselines do not model the underlying from the learned density function and compute the Rooted temporal patterns in the observed sequences. Mean Square Error (RMSE) with respect to the true time. We compare the Cox model against the baselines including For the final step “time-aware recommendation”, suppose linearregressionanddecisiontreeregression. SmallerRMSE thetestingeventofuseruoccursattimet,wecomputethe values indicate better performance. probability p(u,i,t) of the user favoring an item i at time t foreachitemiandrankamongallitemsbyprobability. Ide- The importance of covariates for predicting each individual ally,theobserveditemsthattheuseractuallyinteractswith user’s exposure to CT concepts “conditionals” and “data” should appear on top positions. In information retrieval, are shown in Tables 1 and 2. Both tables show the co- we focus on the evaluation accuracy on top positions us- variates’ names, the regression coefficients and the signif- ingseveralstandardmetricsincludingprecisionatk(P@k), icance scores. A positive regression coefficient for a vari- MeanAveragePrecision(MAP)andNormalizedDiscounted Proceedings of the 9th International Conference on Educational Data Mining 515 ofthesixthACMSIGKDDinternationalconferenceon Table 3: RMSE comparison for user return time pre- Knowledge discovery and data mining, pages 280–284. diction. Smaller values indicate better performance. ACM, 2000. Loops Events Conditionals Operators Data Linear 9.13 9.20 8.94 8.79 8.68 [4] D.R.Cox.Regressionmodelsandlife-tables.Journalof Regres- theRoyalStatisticalSociety.SeriesB(Methodological), sion pages 187–220, 1972. Decision 9.33 9.41 9.13 9.00 8.80 Tree Re- [5] A. Dahotre, Y. Zhang, and C. Scaffidi. A qualitative gression study of animation programming in the wild. In Pro- Cox 9.04 9.25 8.63 7.97 7.62 model ceedings of the 2010 ACM-IEEE International Sympo- sium on Empirical Software Engineering and Measure- ment, ESEM ’10, pages 29:1–29:10, New York, NY, Table 4: Comparison of recommendation accuracy. USA, 2010. ACM. P@1 P@3 P@5 MAP@20 NDCG@20 [6] S. Dasgupta, W. Hale, A. Monroy-Hern´andez, and NMF 0.78 0.70 0.64 0.71 0.67 SurvMF 0.84 0.72 0.64 0.72 0.68 B. M. Hill. Remixing as a pathway to computational thinking. In ACM Conference on Computer-Supported Cooperative Work and Social Computing, pages 1438– Cumulative Gain at k(NDCG) [8]. We compare with the 1449. ACM Press, 2016. state-of-the-art baseline non-negative matrix factorization [7] D.Fields,M.Giang,andY.Kafai. Understandingcol- (NMF) [1]. We follow the standard procedure in collabora- laborative practices in the scratch online community: tive filtering to estimate the model using the user data in Patterns of participation among youth designers. To thetrainingsetandevaluatetheperformanceofthepredic- see the world and a grain of sand: Learning across lev- tion in the test set. Specifically, the user records observed els of space, time, and scale: CSCL 2013 Conference beforeMarch2012areusedtotrainandtheuserrecordsin Proceedings, 2013. March 2012 are used to test. The data contains the rating ofeachuser-blockpair,wheretheratingcorrespondstothe [8] K. J¨arvelin and J. Kek¨al¨ainen. Cumulated gain-based categorizationofeventoccurrences. Themaximumratingis evaluation of ir techniques. ACM Transactions on In- 6forsixormoreeventoccurrences. Atthetimeofthetest- formation Systems, 20(4):422–446, 2002. ingevent,wecomparetherankedlistwithgroundtruth. As showninTable4,sinceourmethod(SurvMF)integratesthe [9] Y. Koren. Factor in the neighbors: Scalable and ac- survival model into the matrix factorization to capture the curate collaborative filtering. ACM Transactions on temporal dynamics of user-item interaction, it can achieve Knowledge Discovery from Data, 4(1):1–24, 2010. better performance. [10] J. Lehmann, M. Lalmas, E. Yom-Tov, and G. Dupret. Models of user engagement. In User Modeling, Adap- 5. CONCLUSIONSANDFUTUREWORK tation, and Personalization, pages 164–175. Springer, In this work, we have focused on personalization of learn- 2012. ing path in massive online communities of creators. One of the main challenges in online learning is high dropout [11] M. Resnick, J. Maloney, A. Monroy-Herna´ndez, ratesintheearlystageduetocognitiveoverload. Toallevi- N. Rusk, E. Eastmond, K. Brennan, A. Millner, ate the problem, we propose a novel model integrating the E. Rosenbaum, J. Silver, and Y. K. B. Silverman. Coxmodelandmatrixfactorizationtorecommendtheright Scratch: programming for all. Communications of the learning contents at the right time. The model can incor- ACM, 52(11):60–67, 2009. porate factors such as user learning history and social en- [12] C. Scaffidi and C. Chambers. Skill progression demon- gagements. Inaddition, the latent features learnedthrough stratedbyusersinthescratchanimationenvironment. matrix factorization further improves the recommendation Int.J.Hum.Comput.Interaction,28(6):383–398,2012. accuracy. Empirical evaluations on real world data demon- strate the performance of our model. [13] J. Wang and Y. Zhang. Opportunity model for e- commerce recommendation: right product; right time. References In Proceedings of the 36th international ACM SIGIR [1] M.W.Berry,M.Browne,A.N.Langville,V.P.Pauca, conference on Research and development in informa- and R. J. Plemmons. Algorithms and applications for tion retrieval, pages 303–312. ACM, 2013. approximatenonnegativematrixfactorization. Compu- [14] S. Yang, C. Domeniconi, M. Revelle, M. Sweeney, tationalstatistics&dataanalysis,52(1):155–173,2007. B. Gelman, C. Beckley, and A. Johri. Uncovering tra- [2] J.Breese,D.Heckerman,andC.Kadie.Empiricalanal- jectories of informal learning in large online communi- ysis of predictive algorithms for collaborative filtering. ties of creators. In Proceedings of the Second (2015) In Proc. of the 14th Conference on Uncertainty in Ar- ACM Conference on Learning@ Scale, pages 131–140. tificial Intelligence, 1998. ACM, 2015. [3] I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. White. Visualization of navigation patterns on a web site using model-based clustering. In Proceedings Proceedings of the 9th International Conference on Educational Data Mining 516