Table Of Content

Extracting Features from Ratings: The Role of Factor Models 1 JoachimSelke and Wolf-Tilo Balke Abstract. Performing effectivepreference-based dataretrieval re- aretheitemvectors)andausermatrix(whosecolumnsaretheuser quires detailed and preferentially meaningful structurized informa- vectors). tionaboutthecurrentuseraswellastheitemsunderconsideration. Thesuccessoffactormodelsisusuallyattributedtotheintuition Acommonproblemisthatrepresentationsofitemsoftenonlycon- thatthecoordinatespaceusedtorepresentitemsandusersactually sistofmeretechnicalattributes,whichdonotresemblehumanper- is a latent feature space. That is, its dimensions capture the items’ ception.Thisisparticularlytrueforintegralitemssuchasmoviesor perceptual propertiesaswellastheusers’preferencejudgmentsre- songs.Itisoftenclaimedthatmeaningfulitemfeaturescouldbeex- gardingtheseproperties.Forexample,whenitemsaremovies,thein- tractedfromcollaborative ratingdata,whichisbecoming available dividualdimensionsaregenerallythoughttomeasure(moreofless) 1 1 throughsocialnetworkingservices.However,thereisonlyanecdotal “obvious” features such as horror vs. romance, the level of sophis- 0 evidence supporting thisclaim; but if it istrue, the extracted infor- tication,ororientationtowardsadults.Forusers,eachcoordinateis 2 mation could very valuable for preference-based data retrieval. In thoughttodescribetherelativedegreeofimportanceattachedtothe this paper, we propose a methodology to systematically check this respective dimension. This understanding of factor models can be n commonclaim.Weperformedapreliminaryinvestigationonalarge foundthroughouttheliterature,forexample,in[2,15,16,18,23]. a J collectionofmovieratingsandpresentinitialevidence. Although it is intuively appealing, to our knowledge, the corre- spondence to features has never been systematically proven, but is 2 1 onlyreportedanecdotically.Forexample,Korenetal.[16]performed 1 INTRODUCTION afactorizationontheNetflixmoviedatasetandmanuallyinterpreted ] thefirsttwocoordinatesforselectedmoviesasfollows: I Recommendersystems[1,17]areoneofthemostprominentapplica- A tionsofpreferencehandlingtechnology[6]andahighlyactivearea Someonefamiliarwiththemoviesshowncanseeclearmean- s. of research. In particular, fueled by theNetflixcompetition and its inginthelatentfactors.Thefirstfactorhasononesidelowbrow c onemilliondollarprizemoney[2],researchoncollaborativerecom- comediesandhorrormovies,aimedatamaleoradolescentau- [ mendation techniques [21] has recently made significant advances, dience, while the other side contains drama or comedy with 1 mostnotablythroughtheintroductionoffactormodels[16,22]. seriousundertonesandstrongfemaleleads.Thesecondfactor- v In collaborative recommender systems, users repeatedly express izationaxishasindependent,criticallyacclaimed,quirkyfilms 8 theirpreferencesforitems,whichusuallyisdonebygivingexplicit onthetop,andonthebottom,mainstreamformulaicfilms. 7 ratingsonsomepredefinednumerical scale.Thisdatacanbemod- 3 eledusingaratingmatrix,whoserowscorrespondtoitems,columns FurtherevidencehasbeenprovidedbyTakácsetal.[23].Afterper- 2 tousers, and entries toratings. Typically, ratings matrices arevery forming a factorization of the Netflix data set, they manually as- 1. sparse,thatis,onlyasmallfractionofallpossibleratingshaveactu- signedlabelstoindividualdimensionsoftheircoordinatespace,such 0 allybeenobserved.Personalizedrecommendationsaregeneratedby asLegendary,Typicalformen,Romantic,andNOTMontyPython. 1 predictingunobserved ratingsfromtheavailabledataand,foreach Inthispaper,weproposeasystematicmethodforstudyingtheco- 1 user,selectingthoseitemsconsideredtobemostappealing. ordinatespacesderivedfromfactormodelsandapplyittheMovie- v: Most state-of-the-art collaborative recommendation methods— Lens10Mdataset,alargereal-worldcollectionofmovieratings.The i includingthewinneroftheNetflixPrize—arebasedonfactormod- maincontributionof our workconsistsinlayingimportant ground- X els,whichareknowntoyieldmuchmoreaccuratepredictionsthan work,onwhichfurtherresearchinrecommendersystemsandprefer- r traditionalneighborhood-basedmethods[14,15,22,23,24].Infac- encehandlingcanbebuild.Inparticular,weseetwoconcretedirec- a tor models, each user and each item is represented by a vector in tionsforfuturework: somesharedrealcoordinatespace.Thevectorsarechosensuchthat eachobservedratingiscloselyapproximated bythedotproduct of • First,knowingwhatkindofsemanticinformationisextractedby thecorrespondingitemanduservectors.Theselectionofcoordinates factor models—and how itisrepresented incoordinate spaces— usuallyisformalizedasanoptimizationproblem.Predictionsforun- willenableadeeperunderstanding ofthesemethods.Ultimately, observed ratings are generated by computing the respective scalar these findings may lead to a more systematic development and products. Equivalently,thisapproach canbeseenasafactorization refinement of recommender systems. In particular, a systematic oftheratingmatrixintotheproductofanitemmatrix(whoserows assessment of semantic structuresprovides anadditional way of evaluating the effectiveness of factor-based recommenders. This 1 Institut für Informationssysteme, Technische Universität Braunschweig, wouldperfectlycomplement traditionalevaluationmethods[11], Germany whichfocusonpredictiveaccuracy. • Second, we believe that factor models might be apowerful tool More advanced versions of the SVD model exclude systematic forautomaticallyextractingmeaningfuldescriptionsofotherwise rating deviations from the factorization and model them explicitly hard-to-describeitemssuchasmoviesorsongs—particularly,es- using new variables. Belland Koren [3] propose toestimaterating sentialfeaturesofmoviescannotbecharacterizedatallbypurely ri,uby technicalfeaturessuchasruntime,language,orreleasedate.2But d givenacoordinaterepresentationofmoviesthatmatcheshuman rî,u =µ+δi+δu+ ai,rbr,u, perception, the fullmachinery developed inpreference handling Xr=1 researchcanbeapplied[6,9].Forexample,clusteringtechniques wheretheconstantµdenotesthemeanofallobservedratings;δiand can give user an initial high-level impression of the available δuareI+U newmodelparametersexpressingsystematicitemand items,itemrankings canbe learntfromordinal preference state- userdeviationsfromµ.Again,theparametersarechosenaccording ments[10]orutilities[5],andthebestitemscanberetrievedby toaregularizedleastsquaresproblem: meansofTop-kalgorithms[12]. d min SSE R,Rˆ +λ a2i,r+b2r,u +δi+δu . basSeidncreetoriuervaplrtiemcahrnyiqrueesseatrochiteimntecroelsltecliteiosnisn,ianptphliysinpgapperrefwereewnciell- A,B,δ⋆ (cid:0) (cid:1) (i,Xu)∈R Xr=1(cid:0) (cid:1) ! concentrate on evaluating the semantic structures contained in the Therationaleunderlyingthisapproach—whichwerefertoasδ-SVD itemmatrix A. Performing a similar analysis of the user matrixB inthefollowing—isthattheremovalofitem-anduser-specificgen- mayrequireentirelydifferentmethods. eraltrends fromthefactorization allowstofocuson moresophisti- Thepaperisstructuredasfollows:Afterintroducingnotationand catedratingpatterns. reviewing the most important factor models, we develop general Thethirdbasicfactormodelbeingrelevanttoourworkperforms guidelinesonhowtoevaluatecoordinatespacesforsemanticinfor- anon-negativefactorizationoftheratingmatrix[23].Itisidentical mation.Then,weillustratehowtoapplytheseguidelinestotheeval- totheregularizedSVDmodeluptotheadditionalconstraintthatall uationoffactorspacesgeneratedfrommovieratingdataandperform entriesofAandB mustbenon-negative. Extendingthismodelby experimentsontheMovieLens10Mdataset. explicititemandusersdeviationsisnotreasonablesincethiswould requirenegativeentriesinAandBtoapproximateRcloseenough. The non-negative matrix factorization model aims at creating a co- 2 PRELIMINARIES ordinate space in which effectsof different dimensions on the esti- In the following, we use the variables i and j to identify items, matedratingscannotcancelouteachother.Henceforth,wereferto whereasuandvdenoteusers.WearedealingwithratingsgiventoI thismodelasNNMF. itemsbyU users.LetR=(ri,u)∈{R∪∅}I×U bethecorrespond- ing rating matrix, where ri,u = ∅, if item i has not been rated by 3 EVALUATINGCOORDINATESPACES useru;otherwise,ri,u expressesthestrengthofuseru’spreference Givenanitem–featurematrixA ∈ RI×d generatedbysomefactor for item i. Ratings are usually limited to a fixed integer scale (for model,howcanwedeterminewhethertheitems’coordinatesinthis example,onetotenstars).Moreover,R= (i,u)|ri,u 6=∅ isthe d-dimensionalspaceresemblea“semanticallymeaningful”pattern? setofallitem–userpairsforwhichratingsareknown.Letnbethe (cid:8) (cid:9) Themoststraightforwardapproachconsistsinextendingandsystem- totalnumberofratingsobserved(thecardinalityofR).Typically,n atizingthecasual investigations described inthe introduction. This isverysmallcomparedtothenumberofpossibleratingsI ·U (for example,intheNetflixdatasetitis n ≈1.4%). could easily be done by presenting the item coordinate space to a I·U numberofdifferentpeopleandaskingthemtolabelitsdimensions. Givensometargetdimensionalityd,thebasicideaunderlyingfac- tormodelsistofindmatricesA=(ai,r)∈RI×dandB=(br,u)∈ Thecorrespondencebetweenthegenerateditemcoordinatesandhu- Rd×U such that their product Rˆ = A·B closely resembles R on manperceptioncould,forexample,bedonebymeasuringthedegree ofconsensusamongpeopleortheaveragetimeneededtocomeup all known entries. To quantify this notion of “close resemblance,” withadequatelabels. thesumofsquarederrors(SSE)ispopularlychosen.TheSSEdiffer- encebetweentheratingmatrixRanditsestimationRˆ = (rî,u)is Althoughthiskindofinvestigationseemsveryreasonable,itcon- tainssomesevereflaws,whichcannotbefixedbycarefulstudyde- definedas SSE R,Rˆ = ri,u−rî,u 2. sign: (cid:0) (cid:1) (i,Xu)∈R(cid:0) (cid:1) 1. Thedimensionalitychoseninmostapplicationsoffactormodels Factor models are typically formulated as optimization problems typicallyrangesbetweend=10andd=100.Acomprehensive over A and B, inwhich the SSE(or some other measure) isto be analysisoftheresultingdatasetswouldrequiretheuserstocom- minimized. prehendhigh-dimensionalspaces,whichisimpossibleevenwhen ProbablythemostpopularfactormodelisBrandynWebb’sregu- usingadvancedvisualizationtechniques. larized SVD model [16, 18], in which A and B aredefined as the 2. Duetohindsightbias,givenenoughtime,userswillbeabletoas- solutionoftheleastsquaresproblem signafittinglabeltoalmostanydimensionofthecoordinatespace. Chancesaregoodthatthiseffectaccountsforratherquestionable d labelssuchasNOTMontyPython. min SSE R,A·B +λ a2i,r+b2r,u . 3. By using free association to name dimensions, the collection of A,B (cid:0) (cid:1) (i,Xu)∈RXr=1(cid:0) (cid:1) resultinglabelstendtoshowahighvariabilityandreflectindivid- ualdifferencesbetweenusers.Toproducestatisticallysignificant Here,λ≥0isaregularizationconstantusedtoavoidoverfitting. results,either thesample sizemust be extended (whichrequires 2 Acomplementaryapproachtoclosingthissemanticgapiscontent-based more study participants and results in higher costs), or the vari- imageandvideoretrieval[8]. abilitymust be reduced, for example, by trainingparticipants to useanestablisheddomain-specificvocabularytoarticulatethese- elementsare ordered by increasing magnitude. Moreover, thediag- manticpropertiestheyrecognizeinthedata(whichalsoincreases onal matrix S can be eliminated from this factorization by setting timeandeffort). X = U′V′,whereU′ = US12 andV′ = US12.ThematricesU′ 4. Typically,therearemanynear-optimalsolutionstotheabovemen- andV′ areuniqueifalldiagonalelementsofS havebeenmutually tionedoptimizationproblems,whichcanbetransformedintoone different. another by rotation of the coordinate axes. This is because, for Inoursetting,wewillapplythesingularvaluedecompositionto any invertible matrix M ∈ Rd, the solution pairs (A,B) and transformtheproductX =A·BintoanewproductA′·B′asjust (AM,M−1B) produce the same SSE. Although regularization described.Sinceratingdatatendstobevery“noisy,”wecansafely usuallyenforcesthetheoreticalexistenceofauniqueoptimalsolu- assumethat(A′,B′)isauniquerepresentationof(A,B);wedidnot tionpair,inpracticetheenormousproblemsizeoftenallowsonly encounteranycounterexamplesduringourexperimentsonlargereal- findingoneofthemanynear-optimalsolutions.Consequently,the worldratingdata.Moreover,anyequivalentpair(AM,M−1B)also direction of the coordinate axes is completely arbitrary, which getstransformedinto(A′,B′),whichwedefineasthecorrespond- makesthetaskofassigninglabelsahopelessundertaking. ingstandardrepresentation.Itcanbecomputedefficientlyusingthe productdecompositionalgorithmproposedin[7,Sec.3]. 3.1 SomeGuidelines 3.2 UseCase:MovieRatings Inthissection,wedeviseasetofguidelinesonwhichtobasemore Based on these guidelines, we now present a concrete method for appropriateapproachestotheanalysisofcoordinatespaces. performing a basic evaluation of coordinate spaces generated from movieratings.Ourfocusrestsonimmediateapplicability,sowere- • Intheviewofproblems(1)and(4),werecommendtoavoidany latetheitemcoordinatestoreferencedatathatisalreadyavailable. direct human interaction with item coordinates. Instead, human Thereferencesourceforallkindsofmovie-relatedinformationis inputshouldconcentrateondescribingitemproperties,whichin IMDb,theInternetMovieDatabase4,whichcurrentlycoversabout turnarerelatedtocoordinatesaswellascomparedbyalgorithmic 1.6milliontitles.MostofIMDb’sdatahasbeencreatedwiththehelp means. ofitsusers.Therefore,alargeproportionoftheavailablecontentcan • Theonlyeffectivewaytoeliminatehindsightbias(2)iscollecting freelybedownloadedandusedfornon-commercialpurposes5.Based feedbackonitemsbeforegeneratingandpresentinganyinforma- onthiscomprehensivedata,oneshouldbeabletocross-referenceany tionextractedbythefactormodelsunderconsideration. collectionofmovieratingswithIMDb. • To resolve problem (3), we primarily recommend to adapt a Forthesemanticevaluationswearegoingtoperform,thefollow- domain-specificvocabularytoallowastructurizeddescriptionof ingattributesoftitlesmayprovehelpful:genres,certifications(e.g., items. For example, to characterize music, the rich vocubulary USA:PGforparentalguidancesuggested),yearofrelease,andplot developed byallmusic3 seemsappropriate;amongst others,itin- keywords. To illustratethe general procedure, wewill only exploit cludesverydetailedinformationaboutgenres,styles,moods,and genreinformationinthispaper.Extendigourmethodtoothertypes connectionsbetweenartists.Sincethiskindofsemanticinforma- ofsemanticinformationisstraightforward.Checkingthecorrespon- tioncanbe(oralreadyhavebeen)providedbyasmallnumberof dence between genres and item coordinates also makes up a good experts and usually is littleprone to debate, it is easy to assem- firsttestofwhetheratleastsomebasicsemanticpropertiesofmovies ble and work with. In later stages of analysis, unrestricted user arerepresentedincoordinatespaces,whichisexactlythepurposeof feedback may be included to reveal the position and extent of thecurrentwork. morefine-grainedandrathersubjectiveconceptsinthecoordinate IMDb recognizes 28 different genres, from Action to Western, space. where each movie may belong to multiple genres. The assignment ofgenresisdonebyIMDb’sexpertstaffincooperationwithIMDb Wealsoproposetoapplyastandardizationproceduretothegener- users.Toenforceconsistency,thisprocessisbaseduponacollection atedcoordinatespace.Thisisforthefollowingreasons:First,recall ofpubliclyavailableguidelines6.Therefore,thisdatasourcematches that,foranyinvertiblematrixM ∈Rd,thesolutionpairs(A,B)and (AM,M−1B)areequivalent;toenablecomparisonsbetweendiffer- therequirementsdevelopedintheprevioussection. Toanalyzewhetherthedistributionofgenresincoordinatespace ent factor models andeven different runsof thesameoptimization displaysanysignificantpattern,weturntoestablishedclassification algorithms,weneedtodefineonesolutionpairasthestandardrepre- algorithms, which explicitly have been designed to exploit any rel- sentation.Second,toenableabetterseparationofdifferenteffectsin evantpatternsinthedataifthereareany.Inparticular,wepropose thedata,theaxesoftheitem(anduser)coordinatespaceshouldbe tomeasurethedegreeofadherencetoapatternbytheclassification chosentobeorthogonal.Moreover,axesshouldbeorderedaccording accuracy shown by these algorithms when predicting the genre of totheirrelativeimportance(measuredbythevarianceofdataalong moviesbasedontheircoordinates.Inessence,wetransformouranal- eachaxis);thatis,thefirstdimensionshouldbeassignedtothemost ysisintoasequenceofbinaryclassificationproblems(oneforeach importantaxis. genre), which enables us to build on solid grounds. Following the The perfect tool for matching these requirements is the singular common methodology, weusecross-validation; that is,accuracy is value decomposition, a well-known matrix factorization technique measuredonadataset,whichisindependentoftheoneusedtotrain fromlinearalgebra,whichinspiredtheSVDfactormodel.Itisbased theclassifier.Byapplying proven techniques tocounter overfitting, onthefactthat,foranyrank-dmatrixX ∈I×U,thereisacolumn- orthonormal matrixU ∈ RI×d,adiagonal matrixS ∈ Rd×d,and ourapproachalsoovercomesanypossibleproblemsrelatedtohind- sightbias. arow-orthonormal matrixV ∈ ×UsuchthatX = USV.Byre- orderingrowsandcolumns,S canbechosensuchthatitsdiagonal 4http://www.imdb.com 5http://www.imdb.com/interfaces#plain 3http://www.allmusic.com 6http://www.imdb.com/updates/guide/genres For a start, we selected two popular classification algorithms, 4 EXPERIMENTSONMOVIELENS10M whichareabletodetectdifferentkindsofpatternsinthedata:support vectormachinesandkNN-classifiers. We applied our approach to the MovieLens10M data set 7, which Supportvectormachineswillbeusedintwodifferentflavors:first, consists of about 10 million ratings collected by the online movie using a linear kernel (refered to as SVM-lin), and second, using a recommenderserviceMovieLens8.Afterpostprocessingtheoriginal Gaussian radial basis function kernel (SVM-RBF). Linear support data (removing one non-existing movie, merging several duplicate vector machines will show a high classification accuracy if most movie entries, and removing movies that received less than 20 rat- moviesoftherespectivegenrearegroupedatonesideofthedataset, ings),ournewdatasetconsistsof9,984,419ratingsof8938movies whichcanbeseparatedfromallremainingmoviesbyahyperplane. providedby69878users.Theratingsusea10-pointscalefrom0.5 Forexample,thiscanbeusedtodisprovethehypothesisthatthereex- (worst)to5(best).Eachusercontributedatleast14ratings. istsadirectioninthecoordinatespacealongwhich,say,theamount OuranalysisrequiresthegenreinformationmaintainedbyIMDb, ofaction,increasesmonotonically.Incontrast,theSVM-RBFclassi- so we had to map each movie in the data set to its corresponding fierdetectswhethergroupsofmovieswiththesamegenretendtobe IMDb entry. This task has been simplified a lot by the fact that locatedinclosevincinity. all items inthe MovieLens10M data set are relatively well-known kNN-classifiersperformwellifthedistancebetweenmovieshav- moviesdeveloped for cinema.9 Wemapped about 8000 movies au- ingthesamegenretypicallyissmallerthanthedistancetomovies tomatically by comparing titles and release years; the remaining nothavingthisgenre.Therefore,theycanbeusedtocheckwhether movieshavebeenassignedmanuallyorsemi-automatically. genres form spatially separated patterns in coordinate space. Since Toavoidtheproblemoflearningfromverysmallsamplesfornow, factormodelsarenotbasedonanotionofproximity,itisnotclear wedidnotuseall28genresdistinguishedbyIMDb.Instead,wetake whatmeasureofdistancesuitsfactormodelsbest.Wewilltryoutthe only those genres into consideration that have been assigned to at followingfourmeasures:Euclideandistance,standardizedEuclidean least5%ofallmoviesinourdataset.Table1listsallremaining13 distance(where,toensureequallyweighteddimensions,coordinate genres and their relative frequencies. On average, 2.3 genres have valuesaredividedbythestandarddeviationofthedatawithrespect beenassignedtoeachmovie. each dimension), negative scalar product (which essentially adapts themethodofratingpredictiontomeasuredistance),andcosinesim- Genre % Genre % ilarity(whichismonotonicallyrelatedtotheanglebetweentwovec- Action 16.0 Horror 10.1 tors). Adventure 12.7 Mystery 9.1 Toevaluate thetruebenefit of coordinate spaces generated from Comedy 38.2 Romance 25.2 factormodels,wepropose thefollowingbaseline,whichisderived Crime 16.6 Sci-Fi 8.6 fromtraditionalneighborhood-basedrecommendationmethods[20] Drama 54.6 Thriller 24.2 Family 8.4 War 5.2 andconstructedasfollows:First,foranyitemsiandj,wecompute Fantasy 8.3 theirPearsoncorrelationcoefficient Table1. Relativefrequenciesofgenres. ̺i,j = u∈Ri,j(ri,u−µi,j)(rj,u−µj,i) , u∈RPi,j(ri,u−µi,j)2 u∈Ri,j(rj,u−µj,i)2 qP qP 4.1 GeneratingCoordinateSpaces where Ri,j isthe set of all users who rated both i and j, and µi,j Weimplemented eachofthefour coordinateextractionmethods in is the mean rating given to item i by users who rated both i and MATLABandexecutedthemonourratingdata. j. IfRi,j isempty, then̺i,j isundefined. ThePearsoncorrelation ForSVD,δ-SVD,andNNMF,wefollowedtheliteratureandused coefficient̺i,j measuresthetendencyofuserstorateitemsiandj anoptimizationprocedurebasedongradientdescent;toreducecom- similarly.Toavoidbiasedestimatesincaseswhereni,j = |Ri,j|is putation time, we applied the Hessian speedup proposed in [19]. verysmall,wederiveanewmeasureofsimilarity Adaptingthecommonmethodology,wechosetheregularizationpa- rameterλbycross-validationsuchthattheSSEisminimizedonran- si,j = ni,j ·̺i,j domlychosentestsets.Weendedupwithavalueofλ = 0.04for ni,j+λ eachofthethreealgorithms. Since optimization by gradient descent is known to get stuck in from̺i,jbyshrinkingtowardszero[15].Here,λ≥0isaregulariza- localextremaofthefunctiontobeminimized,weranthethreepro- tionparameter.Finally,wecarryoverthesesimilarityintodistances cedures at least three times, each with different initial coordinates, byapplyingalogarithmictransformation: whichhavebeenchosenrandomly.Foreachresult,wecomputedthe standardized solution pair asdescribed intheprevious section. We di,j =−ln 1+si,j . fsoigunnidfictahnattlythaeftseorlsuttainodnasrdgieznaetiroante.dThbiyseinadcihcaetxetsrathcatotroudrocnoootrddiinfafeter 2 (cid:18) (cid:19) spacesmatchtheuniquesolutionofeachoptimizationproblem. ForourMDSprocedure,weusedtheregularizationconstantλ= Toderivead-dimensional coordinatespaceinwhichitemsiand j 20, which we determined by adapting the recommendation Koren approximately have distance di,j, we use metric multidimensional scaling[4].Sinceneighborhood-basedrecommendationmethodsare 7http://www.grouplens.org/node/73 usuallyoutperformedbyfactormodels,weexpectourbaselinecoor- 8http://www.movielens.org dinatespacetobefarinferiortothoseconstructedusingfactormod- 9ThisisthereasonwhywedidnotconsidertheNetflixdataset.Itconsistsof els.WerefertoourbaselinemodelasMDS. allkindsofDVDtitles,whichoftenlackaclearcorrespondenceinIMDb. gavefortheNetflixdataset[15].Thecoordinateshavebeengener- SVD-10 SVD-50 SVD-100 atedbyMATLAB’smdscalefunctionusing themetricstresscri- SVM-lin 0.08 0.18 0.20 terion. Since in our data set about 14 percent of all movie–movie SVM-RBF 0.15 0.23 0.25 pairshad noratersincommon, wetreatedtherespective entriesof 1NN-Eucl −0.24 −0.21 −0.19 thedistancematrixasmissingdata. 3NN-Eucl 0.01 0.05 0.04 Tomeasuretheeffectofdimensionality,wegeneratedthreediffer- 9NN-Eucl 0.12 0.16 0.14 entcoordinatespaceswitheachextractorbyvaryingtheparameterd. 1NN-sEucl −0.25 −0.27 −0.31 3NN-sEucl 0.01 0.00 −0.06 Wechosed=10,d=50,andd=100. 9NN-sEucl 0.12 0.12 0.04 1NN-scal −0.42 −0.30 −0.30 3NN-scal −0.16 −0.03 −0.03 4.2 ApplyingtheClassifiers 9NN-scal 0.01 0.11 0.12 1NN-cos −0.25 −0.18 −0.16 In total, we used 14 different classifiers to evaluate each of the 12 3NN-cos 0.00 0.06 0.06 9NN-cos 0.12 0.17 0.16 coordinatespaceswithrespecttoeachofthe13genres. We implemented the two support vector machine classifiers by Table2. KappasforcoordinatesgeneratedbySVD. soft-margin SVMs with parameters C = 4 and (for SVM-RBF) γ = 0.1, which have been determined by cross-validation to max- δ-SVD-10 δ-SVD-50 δ-SVD-100 imizeclassificationaccuracy. SVM-lin 0.07 0.16 0.18 Each of the four different kNN-classifiers will be applied to the SVM-RBF 0.13 0.20 0.23 data sets with three different choices of k. To measure whether 1NN-Eucl −0.26 −0.26 −0.26 moviesofthesamegenretendtooccur inlargergroups, wechose 3NN-Eucl −0.01 0.01 −0.02 9NN-Eucl 0.11 0.12 0.08 k =1,k =3,andk =9.Inthefollowing,wewillrefertothese12 1NN-sEucl −0.26 −0.29 −0.36 classifiersaskNN-Eucl,kNN-sEucl,kNN-scal,andkNN-cos. 3NN-sEucl 0.00 −0.03 −0.11 Toenablecomparisonsamongclassifiersanddatasets,wegener- 9NN-sEucl 0.11 0.09 −0.01 ated20pairsoftrainingandtestsets,eachbyrandomlychosing40% 1NN-scal −0.41 −0.28 −0.22 of all movies for training and 10% (of the remaining movies) for 3NN-scal −0.06 0.02 0.06 9NN-scal 0.05 0.13 0.16 testing. For each of the resulting 2184 combinations of coordinate 1NN-cos −0.26 −0.19 −0.16 spaces,classifiersandgenres,weusethesame20pairsofitemsets 3NN-cos 0.00 0.07 0.09 fortrainingandtesting.Ineachcase,wemeasuredtheclassification 9NN-cos 0.12 0.18 0.19 accuracy.Allresultsreportedbelowareaveragesoverthe20runs. Table3. Kappasforcoordinatesgeneratedbyδ-SVD. 4.3 Results NNMF-10 NNMF-50 NNMF-100 SVM-lin 0.02 0.05 0.11 Probably the most popular way of assessing a classifier’s perfor- SVM-RBF 0.02 0.09 0.14 mance is measuring its accuracy, that is, the fraction of test items 1NN-Eucl −0.56 −0.47 −0.41 3NN-Eucl −0.20 −0.16 −0.13 which have been classified correctly. However, in our setting, this 9NN-Eucl −0.02 0.01 0.02 measure isnot very helpful. Toseethis, recall that the relativefre- 1NN-sEucl −0.56 −0.47 −0.45 quencyofgenresisverydifferentinourdataset.Forexample,over 3NN-sEucl −0.20 −0.16 −0.16 halfofallmoviesbelongtothegenreDrama,butthereareonlyabout 9NN-sEucl −0.02 0.01 0.00 5%War movies.Whileattaininganaccuracyof95%wouldbesig- 1NN-scal −0.37 −0.34 −0.34 3NN-scal −0.11 −0.10 −0.09 nificantforthegenreDrama,itcaneasilybeachievedforthegenre 9NN-scal −0.02 0.00 0.02 War just by classifying any movie as non-War. To enable compar- 1NN-cos −0.56 −0.45 −0.41 isonsacrossgenres,weproposetouseamodifiedversionofCohen’s 3NN-cos −0.20 −0.15 −0.13 kappameasure. 9NN-cos −0.03 0.02 0.03 Anyresultofabinaryclassificationtaskcanbedescribedbyfour Table4. KappasforcoordinatesgeneratedbyNNMF. numbers, which sum up to 1: the fraction of true positives (α ), tp the fraction of false positives (α ), the fraction of false negatives fp MDS-10 MDS-50 MDS-100 (α ), and the fraction of true negatives (α ). Accuracy is defined fn tn SVM-lin −0.16 0.15 0.19 as acc = α + α . Moreover, the accuracy of a static majority- tp tn SVM-RBF 0.03 0.16 0.17 basedclassifier(whichalwaysreturnsthelabelofthemorefrequent 1NN-Eucl −0.29 −0.19 −0.18 class) is accmaj = max{αtp +αfn,αfp +αtn}. We propose to use 3NN-Eucl −0.01 0.06 0.06 thiskindofnaiveclassifierfornormalizingtheaccuracyanddefine 9NN-Eucl 0.13 0.18 0.18 1NN-sEucl −0.29 −0.23 −0.29 κ = (acc−acc )/(1−acc ). This measure expresses a clas- maj maj 3NN-sEucl −0.01 0.05 −0.01 sifier’srelativeperformancewithrespecttothemajority-basedclas- 9NN-sEucl 0.13 0.17 0.12 sifier. If acc = 1 then κ = 1, if acc > accmaj, then κ > 0, if 1NN-scal −0.29 −0.19 −0.18 acc=acc ,thenκ=0,andifacc<acc ,thenκ<0. 3NN-scal −0.01 0.07 0.08 maj maj Bymeasuringaccuracy intermsof κ,wecan averageclassifica- 9NN-scal 0.12 0.18 0.18 1NN-cos −0.28 −0.18 −0.16 tionperformance over different genres. Tables2–5 report themean 3NN-cos 0.00 0.07 0.08 κsoverall260classificationresultsobtainedforeachcombination 9NN-cos 0.13 0.19 0.19 of coordinate space and classifiertype. Allentries larger than 0.10 havebeenmarkedinboldface.Wecanobservethefollowing: Table5. KappasforcoordinatesgeneratedbyMDS. • The coordinate space derived by NNMF does not contain much WeappliedourapproachtotheMovieLens10Mdatasetandfound helpfulinformationaboutgenresthatcanbeexploitedbyourclas- initialevidenceforthisclaim. sifiers.Theperformanceinallotherspacesissignificantlybetter. Ourresultsencourageustofollowthislineofresearchinseveral • Except for NN-sEucl, classification performance generally im- ways. First, we would like to investigate whether our results also proveswithincreasingdimensionality.However,thedifferencein carryovertomoreadvancedandcomplexfactormodels,whichhave performance between d = 10 and d = 50 is much larger than beenproposedveryrecently[13,15].Itwouldalsointerestingtosee theonebetweend = 50andd = 100.Thisindicatesthatouror- whatmoretraditionalmethodssuchasmultidimensionalscalingcan deringofdimensionsduringstandardizationindeedcapturessome contributetotheproblemoffeatureextractionfromratingdata,since notionofrelativeimportance.Thisisprobablyalsothereasonfor our resultsindicatethat thesemethods cansucessfullybemodified NN-sEucl’sdecreasing performance withgrowingd;treatingall foruseinournewsetting. dimensionsequallyseemstooverweightinformationfromdimen- sionsattheendofthelist. • The SVM-RBF classifier slightly outperforms SVM-lin, but is References comparable in performance to 9NN-Eucl, 9NN-scal, and 9NN- cos.Thisindicatesthatgenresindeedtendtoclusterincoordinate [1] GediminasAdomaviciusandAlexanderTuzhilin,‘Towardthenextgen- erationofrecommendersystems:Asurveyofthestate-of-the-art and spaces,evenwithrespecttodifferentmeasuresofdistance. possibleextensions’,IEEETransactionsonKnowledgeandDataEngi- • TheNN-classifiersdisplaybadperformancefork=1andk=3, neering,17(6),734–749,(2005). whichindicatesthat,althoughmoviesofthesamegenreroughly [2] Robert M. Bell, Jim Bennett, Yehuda Koren, and Chris Volinsky, occurinclusters,eachclusterusuallyalsocontainsmoviesthatdo ‘Themilliondollarprogrammingprize’,IEEESpectrum,46(5),28–33, (2009). nothaveassignedtherespectivegenre. [3] RobertM.BellandYehudaKoren,‘Scalablecollaborativefilteringwith • In contrast to our expectations, the performance in coordinate jointlyderivedneighborhoodinterpolationweights’,inProceedingsof spaces generated by factor models is comparable to the perfor- ICDM2007,pp.43–52.IEEEComputerSociety,(2007). manceshownonourbaselinecoordinatespaceMDS. [4] IngwerBorgandPatrickJ.F.Groenen,ModernMultidimensionalScal- ing:TheoryandApplications,Springer,secondedn.,2005. [5] CraigBoutilier,KevinRegan,andPaoloViappiani,‘Preferenceelicita- Moreover, the results suggest that the performance of kNN- tionwithsubjectivefeatures’,inProceedingsofRecSys2009,pp.341– classifiersmightevenfurtherincreaseforlargervaluesofk.Tocheck 344.ACM,(2009). this,weperformedsomepreliminarytestswithk≈20,buthavenot [6] RonenI.BrafmanandCarmelDomshlak,‘Preferencehandling:Anin- beenabletoconfirmthisconjective. troductorytutorial’,AIMagazine,30(1),58–86,(2009). [7] ZlatkoDrmacˇ,‘Accuratecomputationoftheproduct-inducedsingular Wealsoinvestigatedtheinfluenceofindividual genresonclassi- value decomposition with applications’, SIAMJournal onNumerical ficationperformance; asan example, theresultsfor SVM-RBFare Analysis,35(5),1969–1994,(1998). reportedinTable6.Entrieslargerthan0.20havebeenindicated.We [8] PeterEnserandChristineSandom,‘Towardsacomprehensivesurvey canseethatsomegenres,suchasHorrorandDrama,canclearlybe ofthesemanticgapinvisualimageretrieval’,inProceedingsofCIVR identified by the classifier, while others cannot. We have expected 2003,volume2728ofLNCS,pp.291–299.Springer,(2003). [9] JohannesFürnkranzandEykeHüllermeier,‘Preferencelearning’,Kün- muchbetterperformanceonclear-cutgenressuchasWar. stlicheIntelligenz,2005(1),60–61,(2005). [10] Ralf Herbrich, Thore Graepel, and Klaus Obermayer, ‘Large margin SVD-100 δ-SVD-100 NNMF-100 MDS-100 rankboundariesforordinalregression’,inAdvancesinLargeMargin Classifiers,115–132,MITPress,(2000). Action 0.34 0.31 0.22 0.22 [11] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and Adventure 0.13 0.12 0.08 0.00 John T. Riedl, ‘Evaluating collaborative filtering recommender sys- Comedy 0.45 0.42 0.25 0.42 tems’,ACMTransactionsonInformationSystems,22(1),5–53,(2004). Crime 0.08 0.06 −0.01 0.00 [12] IhabF.Ilyas,GeorgeBeskales,andMohamedA.Soliman,‘Asurvey Drama 0.47 0.43 0.37 0.44 oftop-kquery processing techniques inrelational database systems’, Family 0.43 0.46 0.31 0.34 ACMComputingSurveys,40(4),(2008). Fantasy 0.03 0.05 0.01 0.00 [13] YehudaKoren,‘Factorizationmeetstheneighborhood:Amultifaceted Horror 0.56 0.54 0.31 0.61 collaborative filteringmodel’,inProceedingsofKDD2008,pp.426– Mystery 0.06 0.04 −0.00 0.00 434.ACMPress,(2008). Romance 0.11 0.10 −0.00 0.00 [14] YehudaKoren,‘Collaborativefilteringwithtemporaldynamics’,Com- Sci-Fi 0.23 0.20 0.09 0.00 municationsoftheACM,53(4),89–97,(2010). Thriller 0.31 0.27 0.14 0.15 [15] YehudaKoren,‘Factorintheneighbors:Scalableandaccuratecollabo- War 0.05 0.06 −0.00 0.00 rativefiltering’,ACMTransactionsonKnowledgeDiscoveryfromData, 4(1),(2010). Table6. KappasforSVM-RBFbygenre. [16] YehudaKoren,RobertBell,andChrisVolinsky,‘Matrixfactorization techniquesforrecommendersystems’,IEEEComputer,42(8),30–37, In summary, these preliminary experiments suggest that the co- (2009). [17] DonMonroe,‘Justforyou’,Communications oftheACM,52(8),15– ordinatespaces derivedbySVD,δ-SVD,andMDS indeedcontain 17,(2009). somesignificantsemanticinformationabouttherepresentedmovies. [18] Gregory Piatetsky-Shapiro, ‘Interview with Simon Funk’, ACM However,thesituationisbyfarnotasclearasclaimedbythelitera- SIGKDDExplorationsNewsletter,9(1),38–40,(2007). ture. [19] Tapani Raiko, Alexander Ilin, andJuhaKarhunen, ‘Principal compo- nent analysis for large scale problems with lots of missing values’, in Proceedings of ECML 2007, volume 4701 of LNAI, pp. 691–698. 5 CONCLUSIONANDOUTLOOK Springer,(2007). [20] BadrulSarwar,GeorgeKarypis,JosephKonstan,andJohnReidl,‘Item- basedcollaborative filteringrecommendationalgorithms’,inProceed- In the current paper, we presented a general methodology for sys- ingsofWWW2001,pp.285–295.ACMPress,(2001). tematicallyanalyzingwhethercoordinatespacesgeneratedfromfac- [21] J.BenSchafer,DanFrankowski,JonHerlocker,andShiladSen,‘Col- tormodelscontainsemanticinformation,asitiscommonlyclaimed. laborativefilteringrecommendersystems’,inTheAdaptiveWeb:Meth- odsandStrategiesofWebPersonalization,volume4321ofLNCS,291– 324,Springer,(2007). [22] AjitP.SinghandGeoffreyJ.Gordon,‘Aunifiedviewofmatrixfactor- izationmodels’,inProceedingsofECMLPKDD2008:PartII,volume 5212ofLNCS,pp.358–373.Springer,(2008). [23] Gábor Takács,István Pilászy, Bottyán Németh, and Domonkos Tikk, ‘Scalablecollaborativefilteringapproachesforlargerecommendersys- tems’,JournalofMachineLearningResearch,10,623–656,(2009). [24] MarkusWeimer,AlexandrosKaratzoglou,andAlexSmola,‘Improving maximummarginmatrixfactorization’,MachineLearning,72(3),263– 276,(2008).