ebook img

Extracting Features from Ratings: The Role of Factor Models PDF

0.12 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Extracting Features from Ratings: The Role of Factor Models

Extracting Features from Ratings: The Role of Factor Models 1 JoachimSelke and Wolf-Tilo Balke Abstract. Performing effectivepreference-based dataretrieval re- aretheitemvectors)andausermatrix(whosecolumnsaretheuser quires detailed and preferentially meaningful structurized informa- vectors). tionaboutthecurrentuseraswellastheitemsunderconsideration. Thesuccessoffactormodelsisusuallyattributedtotheintuition Acommonproblemisthatrepresentationsofitemsoftenonlycon- thatthecoordinatespaceusedtorepresentitemsandusersactually sistofmeretechnicalattributes,whichdonotresemblehumanper- is a latent feature space. That is, its dimensions capture the items’ ception.Thisisparticularlytrueforintegralitemssuchasmoviesor perceptual propertiesaswellastheusers’preferencejudgmentsre- songs.Itisoftenclaimedthatmeaningfulitemfeaturescouldbeex- gardingtheseproperties.Forexample,whenitemsaremovies,thein- tractedfromcollaborative ratingdata,whichisbecoming available dividualdimensionsaregenerallythoughttomeasure(moreofless) 1 1 throughsocialnetworkingservices.However,thereisonlyanecdotal “obvious” features such as horror vs. romance, the level of sophis- 0 evidence supporting thisclaim; but if it istrue, the extracted infor- tication,ororientationtowardsadults.Forusers,eachcoordinateis 2 mation could very valuable for preference-based data retrieval. In thoughttodescribetherelativedegreeofimportanceattachedtothe this paper, we propose a methodology to systematically check this respective dimension. This understanding of factor models can be n commonclaim.Weperformedapreliminaryinvestigationonalarge foundthroughouttheliterature,forexample,in[2,15,16,18,23]. a J collectionofmovieratingsandpresentinitialevidence. Although it is intuively appealing, to our knowledge, the corre- spondence to features has never been systematically proven, but is 2 1 onlyreportedanecdotically.Forexample,Korenetal.[16]performed 1 INTRODUCTION afactorizationontheNetflixmoviedatasetandmanuallyinterpreted ] thefirsttwocoordinatesforselectedmoviesasfollows: I Recommendersystems[1,17]areoneofthemostprominentapplica- A tionsofpreferencehandlingtechnology[6]andahighlyactivearea Someonefamiliarwiththemoviesshowncanseeclearmean- s. of research. In particular, fueled by theNetflixcompetition and its inginthelatentfactors.Thefirstfactorhasononesidelowbrow c onemilliondollarprizemoney[2],researchoncollaborativerecom- comediesandhorrormovies,aimedatamaleoradolescentau- [ mendation techniques [21] has recently made significant advances, dience, while the other side contains drama or comedy with 1 mostnotablythroughtheintroductionoffactormodels[16,22]. seriousundertonesandstrongfemaleleads.Thesecondfactor- v In collaborative recommender systems, users repeatedly express izationaxishasindependent,criticallyacclaimed,quirkyfilms 8 theirpreferencesforitems,whichusuallyisdonebygivingexplicit onthetop,andonthebottom,mainstreamformulaicfilms. 7 ratingsonsomepredefinednumerical scale.Thisdatacanbemod- 3 eledusingaratingmatrix,whoserowscorrespondtoitems,columns FurtherevidencehasbeenprovidedbyTakácsetal.[23].Afterper- 2 tousers, and entries toratings. Typically, ratings matrices arevery forming a factorization of the Netflix data set, they manually as- 1. sparse,thatis,onlyasmallfractionofallpossibleratingshaveactu- signedlabelstoindividualdimensionsoftheircoordinatespace,such 0 allybeenobserved.Personalizedrecommendationsaregeneratedby asLegendary,Typicalformen,Romantic,andNOTMontyPython. 1 predictingunobserved ratingsfromtheavailabledataand,foreach Inthispaper,weproposeasystematicmethodforstudyingtheco- 1 user,selectingthoseitemsconsideredtobemostappealing. ordinatespacesderivedfromfactormodelsandapplyittheMovie- v: Most state-of-the-art collaborative recommendation methods— Lens10Mdataset,alargereal-worldcollectionofmovieratings.The i includingthewinneroftheNetflixPrize—arebasedonfactormod- maincontributionof our workconsistsinlayingimportant ground- X els,whichareknowntoyieldmuchmoreaccuratepredictionsthan work,onwhichfurtherresearchinrecommendersystemsandprefer- r traditionalneighborhood-basedmethods[14,15,22,23,24].Infac- encehandlingcanbebuild.Inparticular,weseetwoconcretedirec- a tor models, each user and each item is represented by a vector in tionsforfuturework: somesharedrealcoordinatespace.Thevectorsarechosensuchthat eachobservedratingiscloselyapproximated bythedotproduct of • First,knowingwhatkindofsemanticinformationisextractedby thecorrespondingitemanduservectors.Theselectionofcoordinates factor models—and how itisrepresented incoordinate spaces— usuallyisformalizedasanoptimizationproblem.Predictionsforun- willenableadeeperunderstanding ofthesemethods.Ultimately, observed ratings are generated by computing the respective scalar these findings may lead to a more systematic development and products. Equivalently,thisapproach canbeseenasafactorization refinement of recommender systems. In particular, a systematic oftheratingmatrixintotheproductofanitemmatrix(whoserows assessment of semantic structuresprovides anadditional way of evaluating the effectiveness of factor-based recommenders. This 1 Institut für Informationssysteme, Technische Universität Braunschweig, wouldperfectlycomplement traditionalevaluationmethods[11], Germany whichfocusonpredictiveaccuracy. • Second, we believe that factor models might be apowerful tool More advanced versions of the SVD model exclude systematic forautomaticallyextractingmeaningfuldescriptionsofotherwise rating deviations from the factorization and model them explicitly hard-to-describeitemssuchasmoviesorsongs—particularly,es- using new variables. Belland Koren [3] propose toestimaterating sentialfeaturesofmoviescannotbecharacterizedatallbypurely ri,uby technicalfeaturessuchasruntime,language,orreleasedate.2But d givenacoordinaterepresentationofmoviesthatmatcheshuman rˆi,u =µ+δi+δu+ ai,rbr,u, perception, the fullmachinery developed inpreference handling Xr=1 researchcanbeapplied[6,9].Forexample,clusteringtechniques wheretheconstantµdenotesthemeanofallobservedratings;δiand can give user an initial high-level impression of the available δuareI+U newmodelparametersexpressingsystematicitemand items,itemrankings canbe learntfromordinal preference state- userdeviationsfromµ.Again,theparametersarechosenaccording ments[10]orutilities[5],andthebestitemscanberetrievedby toaregularizedleastsquaresproblem: meansofTop-kalgorithms[12]. d min SSE R,Rˆ +λ a2i,r+b2r,u +δi+δu . basSeidncreetoriuervaplrtiemcahrnyiqrueesseatrochiteimntecroelsltecliteiosnisn,ianptphliysinpgapperrefwereewnciell- A,B,δ⋆ (cid:0) (cid:1) (i,Xu)∈R Xr=1(cid:0) (cid:1) ! concentrate on evaluating the semantic structures contained in the Therationaleunderlyingthisapproach—whichwerefertoasδ-SVD itemmatrix A. Performing a similar analysis of the user matrixB inthefollowing—isthattheremovalofitem-anduser-specificgen- mayrequireentirelydifferentmethods. eraltrends fromthefactorization allowstofocuson moresophisti- Thepaperisstructuredasfollows:Afterintroducingnotationand catedratingpatterns. reviewing the most important factor models, we develop general Thethirdbasicfactormodelbeingrelevanttoourworkperforms guidelinesonhowtoevaluatecoordinatespacesforsemanticinfor- anon-negativefactorizationoftheratingmatrix[23].Itisidentical mation.Then,weillustratehowtoapplytheseguidelinestotheeval- totheregularizedSVDmodeluptotheadditionalconstraintthatall uationoffactorspacesgeneratedfrommovieratingdataandperform entriesofAandB mustbenon-negative. Extendingthismodelby experimentsontheMovieLens10Mdataset. explicititemandusersdeviationsisnotreasonablesincethiswould requirenegativeentriesinAandBtoapproximateRcloseenough. The non-negative matrix factorization model aims at creating a co- 2 PRELIMINARIES ordinate space in which effectsof different dimensions on the esti- In the following, we use the variables i and j to identify items, matedratingscannotcancelouteachother.Henceforth,wereferto whereasuandvdenoteusers.WearedealingwithratingsgiventoI thismodelasNNMF. itemsbyU users.LetR=(ri,u)∈{R∪∅}I×U bethecorrespond- ing rating matrix, where ri,u = ∅, if item i has not been rated by 3 EVALUATINGCOORDINATESPACES useru;otherwise,ri,u expressesthestrengthofuseru’spreference Givenanitem–featurematrixA ∈ RI×d generatedbysomefactor for item i. Ratings are usually limited to a fixed integer scale (for model,howcanwedeterminewhethertheitems’coordinatesinthis example,onetotenstars).Moreover,R= (i,u)|ri,u 6=∅ isthe d-dimensionalspaceresemblea“semanticallymeaningful”pattern? setofallitem–userpairsforwhichratingsareknown.Letnbethe (cid:8) (cid:9) Themoststraightforwardapproachconsistsinextendingandsystem- totalnumberofratingsobserved(thecardinalityofR).Typically,n atizingthecasual investigations described inthe introduction. This isverysmallcomparedtothenumberofpossibleratingsI ·U (for example,intheNetflixdatasetitis n ≈1.4%). could easily be done by presenting the item coordinate space to a I·U numberofdifferentpeopleandaskingthemtolabelitsdimensions. Givensometargetdimensionalityd,thebasicideaunderlyingfac- tormodelsistofindmatricesA=(ai,r)∈RI×dandB=(br,u)∈ Thecorrespondencebetweenthegenerateditemcoordinatesandhu- Rd×U such that their product Rˆ = A·B closely resembles R on manperceptioncould,forexample,bedonebymeasuringthedegree ofconsensusamongpeopleortheaveragetimeneededtocomeup all known entries. To quantify this notion of “close resemblance,” withadequatelabels. thesumofsquarederrors(SSE)ispopularlychosen.TheSSEdiffer- encebetweentheratingmatrixRanditsestimationRˆ = (rˆi,u)is Althoughthiskindofinvestigationseemsveryreasonable,itcon- tainssomesevereflaws,whichcannotbefixedbycarefulstudyde- definedas SSE R,Rˆ = ri,u−rˆi,u 2. sign: (cid:0) (cid:1) (i,Xu)∈R(cid:0) (cid:1) 1. Thedimensionalitychoseninmostapplicationsoffactormodels Factor models are typically formulated as optimization problems typicallyrangesbetweend=10andd=100.Acomprehensive over A and B, inwhich the SSE(or some other measure) isto be analysisoftheresultingdatasetswouldrequiretheuserstocom- minimized. prehendhigh-dimensionalspaces,whichisimpossibleevenwhen ProbablythemostpopularfactormodelisBrandynWebb’sregu- usingadvancedvisualizationtechniques. larized SVD model [16, 18], in which A and B aredefined as the 2. Duetohindsightbias,givenenoughtime,userswillbeabletoas- solutionoftheleastsquaresproblem signafittinglabeltoalmostanydimensionofthecoordinatespace. Chancesaregoodthatthiseffectaccountsforratherquestionable d labelssuchasNOTMontyPython. min SSE R,A·B +λ a2i,r+b2r,u . 3. By using free association to name dimensions, the collection of A,B (cid:0) (cid:1) (i,Xu)∈RXr=1(cid:0) (cid:1) resultinglabelstendtoshowahighvariabilityandreflectindivid- ualdifferencesbetweenusers.Toproducestatisticallysignificant Here,λ≥0isaregularizationconstantusedtoavoidoverfitting. results,either thesample sizemust be extended (whichrequires 2 Acomplementaryapproachtoclosingthissemanticgapiscontent-based more study participants and results in higher costs), or the vari- imageandvideoretrieval[8]. abilitymust be reduced, for example, by trainingparticipants to useanestablisheddomain-specificvocabularytoarticulatethese- elementsare ordered by increasing magnitude. Moreover, thediag- manticpropertiestheyrecognizeinthedata(whichalsoincreases onal matrix S can be eliminated from this factorization by setting timeandeffort). X = U′V′,whereU′ = US12 andV′ = US12.ThematricesU′ 4. Typically,therearemanynear-optimalsolutionstotheabovemen- andV′ areuniqueifalldiagonalelementsofS havebeenmutually tionedoptimizationproblems,whichcanbetransformedintoone different. another by rotation of the coordinate axes. This is because, for Inoursetting,wewillapplythesingularvaluedecompositionto any invertible matrix M ∈ Rd, the solution pairs (A,B) and transformtheproductX =A·BintoanewproductA′·B′asjust (AM,M−1B) produce the same SSE. Although regularization described.Sinceratingdatatendstobevery“noisy,”wecansafely usuallyenforcesthetheoreticalexistenceofauniqueoptimalsolu- assumethat(A′,B′)isauniquerepresentationof(A,B);wedidnot tionpair,inpracticetheenormousproblemsizeoftenallowsonly encounteranycounterexamplesduringourexperimentsonlargereal- findingoneofthemanynear-optimalsolutions.Consequently,the worldratingdata.Moreover,anyequivalentpair(AM,M−1B)also direction of the coordinate axes is completely arbitrary, which getstransformedinto(A′,B′),whichwedefineasthecorrespond- makesthetaskofassigninglabelsahopelessundertaking. ingstandardrepresentation.Itcanbecomputedefficientlyusingthe productdecompositionalgorithmproposedin[7,Sec.3]. 3.1 SomeGuidelines 3.2 UseCase:MovieRatings Inthissection,wedeviseasetofguidelinesonwhichtobasemore Based on these guidelines, we now present a concrete method for appropriateapproachestotheanalysisofcoordinatespaces. performing a basic evaluation of coordinate spaces generated from movieratings.Ourfocusrestsonimmediateapplicability,sowere- • Intheviewofproblems(1)and(4),werecommendtoavoidany latetheitemcoordinatestoreferencedatathatisalreadyavailable. direct human interaction with item coordinates. Instead, human Thereferencesourceforallkindsofmovie-relatedinformationis inputshouldconcentrateondescribingitemproperties,whichin IMDb,theInternetMovieDatabase4,whichcurrentlycoversabout turnarerelatedtocoordinatesaswellascomparedbyalgorithmic 1.6milliontitles.MostofIMDb’sdatahasbeencreatedwiththehelp means. ofitsusers.Therefore,alargeproportionoftheavailablecontentcan • Theonlyeffectivewaytoeliminatehindsightbias(2)iscollecting freelybedownloadedandusedfornon-commercialpurposes5.Based feedbackonitemsbeforegeneratingandpresentinganyinforma- onthiscomprehensivedata,oneshouldbeabletocross-referenceany tionextractedbythefactormodelsunderconsideration. collectionofmovieratingswithIMDb. • To resolve problem (3), we primarily recommend to adapt a Forthesemanticevaluationswearegoingtoperform,thefollow- domain-specificvocabularytoallowastructurizeddescriptionof ingattributesoftitlesmayprovehelpful:genres,certifications(e.g., items. For example, to characterize music, the rich vocubulary USA:PGforparentalguidancesuggested),yearofrelease,andplot developed byallmusic3 seemsappropriate;amongst others,itin- keywords. To illustratethe general procedure, wewill only exploit cludesverydetailedinformationaboutgenres,styles,moods,and genreinformationinthispaper.Extendigourmethodtoothertypes connectionsbetweenartists.Sincethiskindofsemanticinforma- ofsemanticinformationisstraightforward.Checkingthecorrespon- tioncanbe(oralreadyhavebeen)providedbyasmallnumberof dence between genres and item coordinates also makes up a good experts and usually is littleprone to debate, it is easy to assem- firsttestofwhetheratleastsomebasicsemanticpropertiesofmovies ble and work with. In later stages of analysis, unrestricted user arerepresentedincoordinatespaces,whichisexactlythepurposeof feedback may be included to reveal the position and extent of thecurrentwork. morefine-grainedandrathersubjectiveconceptsinthecoordinate IMDb recognizes 28 different genres, from Action to Western, space. where each movie may belong to multiple genres. The assignment ofgenresisdonebyIMDb’sexpertstaffincooperationwithIMDb Wealsoproposetoapplyastandardizationproceduretothegener- users.Toenforceconsistency,thisprocessisbaseduponacollection atedcoordinatespace.Thisisforthefollowingreasons:First,recall ofpubliclyavailableguidelines6.Therefore,thisdatasourcematches that,foranyinvertiblematrixM ∈Rd,thesolutionpairs(A,B)and (AM,M−1B)areequivalent;toenablecomparisonsbetweendiffer- therequirementsdevelopedintheprevioussection. Toanalyzewhetherthedistributionofgenresincoordinatespace ent factor models andeven different runsof thesameoptimization displaysanysignificantpattern,weturntoestablishedclassification algorithms,weneedtodefineonesolutionpairasthestandardrepre- algorithms, which explicitly have been designed to exploit any rel- sentation.Second,toenableabetterseparationofdifferenteffectsin evantpatternsinthedataifthereareany.Inparticular,wepropose thedata,theaxesoftheitem(anduser)coordinatespaceshouldbe tomeasurethedegreeofadherencetoapatternbytheclassification chosentobeorthogonal.Moreover,axesshouldbeorderedaccording accuracy shown by these algorithms when predicting the genre of totheirrelativeimportance(measuredbythevarianceofdataalong moviesbasedontheircoordinates.Inessence,wetransformouranal- eachaxis);thatis,thefirstdimensionshouldbeassignedtothemost ysisintoasequenceofbinaryclassificationproblems(oneforeach importantaxis. genre), which enables us to build on solid grounds. Following the The perfect tool for matching these requirements is the singular common methodology, weusecross-validation; that is,accuracy is value decomposition, a well-known matrix factorization technique measuredonadataset,whichisindependentoftheoneusedtotrain fromlinearalgebra,whichinspiredtheSVDfactormodel.Itisbased theclassifier.Byapplying proven techniques tocounter overfitting, onthefactthat,foranyrank-dmatrixX ∈I×U,thereisacolumn- orthonormal matrixU ∈ RI×d,adiagonal matrixS ∈ Rd×d,and ourapproachalsoovercomesanypossibleproblemsrelatedtohind- sightbias. arow-orthonormal matrixV ∈ ×UsuchthatX = USV.Byre- orderingrowsandcolumns,S canbechosensuchthatitsdiagonal 4http://www.imdb.com 5http://www.imdb.com/interfaces#plain 3http://www.allmusic.com 6http://www.imdb.com/updates/guide/genres For a start, we selected two popular classification algorithms, 4 EXPERIMENTSONMOVIELENS10M whichareabletodetectdifferentkindsofpatternsinthedata:support vectormachinesandkNN-classifiers. We applied our approach to the MovieLens10M data set 7, which Supportvectormachineswillbeusedintwodifferentflavors:first, consists of about 10 million ratings collected by the online movie using a linear kernel (refered to as SVM-lin), and second, using a recommenderserviceMovieLens8.Afterpostprocessingtheoriginal Gaussian radial basis function kernel (SVM-RBF). Linear support data (removing one non-existing movie, merging several duplicate vector machines will show a high classification accuracy if most movie entries, and removing movies that received less than 20 rat- moviesoftherespectivegenrearegroupedatonesideofthedataset, ings),ournewdatasetconsistsof9,984,419ratingsof8938movies whichcanbeseparatedfromallremainingmoviesbyahyperplane. providedby69878users.Theratingsusea10-pointscalefrom0.5 Forexample,thiscanbeusedtodisprovethehypothesisthatthereex- (worst)to5(best).Eachusercontributedatleast14ratings. istsadirectioninthecoordinatespacealongwhich,say,theamount OuranalysisrequiresthegenreinformationmaintainedbyIMDb, ofaction,increasesmonotonically.Incontrast,theSVM-RBFclassi- so we had to map each movie in the data set to its corresponding fierdetectswhethergroupsofmovieswiththesamegenretendtobe IMDb entry. This task has been simplified a lot by the fact that locatedinclosevincinity. all items inthe MovieLens10M data set are relatively well-known kNN-classifiersperformwellifthedistancebetweenmovieshav- moviesdeveloped for cinema.9 Wemapped about 8000 movies au- ingthesamegenretypicallyissmallerthanthedistancetomovies tomatically by comparing titles and release years; the remaining nothavingthisgenre.Therefore,theycanbeusedtocheckwhether movieshavebeenassignedmanuallyorsemi-automatically. genres form spatially separated patterns in coordinate space. Since Toavoidtheproblemoflearningfromverysmallsamplesfornow, factormodelsarenotbasedonanotionofproximity,itisnotclear wedidnotuseall28genresdistinguishedbyIMDb.Instead,wetake whatmeasureofdistancesuitsfactormodelsbest.Wewilltryoutthe only those genres into consideration that have been assigned to at followingfourmeasures:Euclideandistance,standardizedEuclidean least5%ofallmoviesinourdataset.Table1listsallremaining13 distance(where,toensureequallyweighteddimensions,coordinate genres and their relative frequencies. On average, 2.3 genres have valuesaredividedbythestandarddeviationofthedatawithrespect beenassignedtoeachmovie. each dimension), negative scalar product (which essentially adapts themethodofratingpredictiontomeasuredistance),andcosinesim- Genre % Genre % ilarity(whichismonotonicallyrelatedtotheanglebetweentwovec- Action 16.0 Horror 10.1 tors). Adventure 12.7 Mystery 9.1 Toevaluate thetruebenefit of coordinate spaces generated from Comedy 38.2 Romance 25.2 factormodels,wepropose thefollowingbaseline,whichisderived Crime 16.6 Sci-Fi 8.6 fromtraditionalneighborhood-basedrecommendationmethods[20] Drama 54.6 Thriller 24.2 Family 8.4 War 5.2 andconstructedasfollows:First,foranyitemsiandj,wecompute Fantasy 8.3 theirPearsoncorrelationcoefficient Table1. Relativefrequenciesofgenres. ̺i,j = u∈Ri,j(ri,u−µi,j)(rj,u−µj,i) , u∈RPi,j(ri,u−µi,j)2 u∈Ri,j(rj,u−µj,i)2 qP qP 4.1 GeneratingCoordinateSpaces where Ri,j isthe set of all users who rated both i and j, and µi,j Weimplemented eachofthefour coordinateextractionmethods in is the mean rating given to item i by users who rated both i and MATLABandexecutedthemonourratingdata. j. IfRi,j isempty, then̺i,j isundefined. ThePearsoncorrelation ForSVD,δ-SVD,andNNMF,wefollowedtheliteratureandused coefficient̺i,j measuresthetendencyofuserstorateitemsiandj anoptimizationprocedurebasedongradientdescent;toreducecom- similarly.Toavoidbiasedestimatesincaseswhereni,j = |Ri,j|is putation time, we applied the Hessian speedup proposed in [19]. verysmall,wederiveanewmeasureofsimilarity Adaptingthecommonmethodology,wechosetheregularizationpa- rameterλbycross-validationsuchthattheSSEisminimizedonran- si,j = ni,j ·̺i,j domlychosentestsets.Weendedupwithavalueofλ = 0.04for ni,j+λ eachofthethreealgorithms. Since optimization by gradient descent is known to get stuck in from̺i,jbyshrinkingtowardszero[15].Here,λ≥0isaregulariza- localextremaofthefunctiontobeminimized,weranthethreepro- tionparameter.Finally,wecarryoverthesesimilarityintodistances cedures at least three times, each with different initial coordinates, byapplyingalogarithmictransformation: whichhavebeenchosenrandomly.Foreachresult,wecomputedthe standardized solution pair asdescribed intheprevious section. We di,j =−ln 1+si,j . fsoigunnidfictahnattlythaeftseorlsuttainodnasrdgieznaetiroante.dThbiyseinadcihcaetxetsrathcatotroudrocnoootrddiinfafeter 2 (cid:18) (cid:19) spacesmatchtheuniquesolutionofeachoptimizationproblem. ForourMDSprocedure,weusedtheregularizationconstantλ= Toderivead-dimensional coordinatespaceinwhichitemsiand j 20, which we determined by adapting the recommendation Koren approximately have distance di,j, we use metric multidimensional scaling[4].Sinceneighborhood-basedrecommendationmethodsare 7http://www.grouplens.org/node/73 usuallyoutperformedbyfactormodels,weexpectourbaselinecoor- 8http://www.movielens.org dinatespacetobefarinferiortothoseconstructedusingfactormod- 9ThisisthereasonwhywedidnotconsidertheNetflixdataset.Itconsistsof els.WerefertoourbaselinemodelasMDS. allkindsofDVDtitles,whichoftenlackaclearcorrespondenceinIMDb. gavefortheNetflixdataset[15].Thecoordinateshavebeengener- SVD-10 SVD-50 SVD-100 atedbyMATLAB’smdscalefunctionusing themetricstresscri- SVM-lin 0.08 0.18 0.20 terion. Since in our data set about 14 percent of all movie–movie SVM-RBF 0.15 0.23 0.25 pairshad noratersincommon, wetreatedtherespective entriesof 1NN-Eucl −0.24 −0.21 −0.19 thedistancematrixasmissingdata. 3NN-Eucl 0.01 0.05 0.04 Tomeasuretheeffectofdimensionality,wegeneratedthreediffer- 9NN-Eucl 0.12 0.16 0.14 entcoordinatespaceswitheachextractorbyvaryingtheparameterd. 1NN-sEucl −0.25 −0.27 −0.31 3NN-sEucl 0.01 0.00 −0.06 Wechosed=10,d=50,andd=100. 9NN-sEucl 0.12 0.12 0.04 1NN-scal −0.42 −0.30 −0.30 3NN-scal −0.16 −0.03 −0.03 4.2 ApplyingtheClassifiers 9NN-scal 0.01 0.11 0.12 1NN-cos −0.25 −0.18 −0.16 In total, we used 14 different classifiers to evaluate each of the 12 3NN-cos 0.00 0.06 0.06 9NN-cos 0.12 0.17 0.16 coordinatespaceswithrespecttoeachofthe13genres. We implemented the two support vector machine classifiers by Table2. KappasforcoordinatesgeneratedbySVD. soft-margin SVMs with parameters C = 4 and (for SVM-RBF) γ = 0.1, which have been determined by cross-validation to max- δ-SVD-10 δ-SVD-50 δ-SVD-100 imizeclassificationaccuracy. SVM-lin 0.07 0.16 0.18 Each of the four different kNN-classifiers will be applied to the SVM-RBF 0.13 0.20 0.23 data sets with three different choices of k. To measure whether 1NN-Eucl −0.26 −0.26 −0.26 moviesofthesamegenretendtooccur inlargergroups, wechose 3NN-Eucl −0.01 0.01 −0.02 9NN-Eucl 0.11 0.12 0.08 k =1,k =3,andk =9.Inthefollowing,wewillrefertothese12 1NN-sEucl −0.26 −0.29 −0.36 classifiersaskNN-Eucl,kNN-sEucl,kNN-scal,andkNN-cos. 3NN-sEucl 0.00 −0.03 −0.11 Toenablecomparisonsamongclassifiersanddatasets,wegener- 9NN-sEucl 0.11 0.09 −0.01 ated20pairsoftrainingandtestsets,eachbyrandomlychosing40% 1NN-scal −0.41 −0.28 −0.22 of all movies for training and 10% (of the remaining movies) for 3NN-scal −0.06 0.02 0.06 9NN-scal 0.05 0.13 0.16 testing. For each of the resulting 2184 combinations of coordinate 1NN-cos −0.26 −0.19 −0.16 spaces,classifiersandgenres,weusethesame20pairsofitemsets 3NN-cos 0.00 0.07 0.09 fortrainingandtesting.Ineachcase,wemeasuredtheclassification 9NN-cos 0.12 0.18 0.19 accuracy.Allresultsreportedbelowareaveragesoverthe20runs. Table3. Kappasforcoordinatesgeneratedbyδ-SVD. 4.3 Results NNMF-10 NNMF-50 NNMF-100 SVM-lin 0.02 0.05 0.11 Probably the most popular way of assessing a classifier’s perfor- SVM-RBF 0.02 0.09 0.14 mance is measuring its accuracy, that is, the fraction of test items 1NN-Eucl −0.56 −0.47 −0.41 3NN-Eucl −0.20 −0.16 −0.13 which have been classified correctly. However, in our setting, this 9NN-Eucl −0.02 0.01 0.02 measure isnot very helpful. Toseethis, recall that the relativefre- 1NN-sEucl −0.56 −0.47 −0.45 quencyofgenresisverydifferentinourdataset.Forexample,over 3NN-sEucl −0.20 −0.16 −0.16 halfofallmoviesbelongtothegenreDrama,butthereareonlyabout 9NN-sEucl −0.02 0.01 0.00 5%War movies.Whileattaininganaccuracyof95%wouldbesig- 1NN-scal −0.37 −0.34 −0.34 3NN-scal −0.11 −0.10 −0.09 nificantforthegenreDrama,itcaneasilybeachievedforthegenre 9NN-scal −0.02 0.00 0.02 War just by classifying any movie as non-War. To enable compar- 1NN-cos −0.56 −0.45 −0.41 isonsacrossgenres,weproposetouseamodifiedversionofCohen’s 3NN-cos −0.20 −0.15 −0.13 kappameasure. 9NN-cos −0.03 0.02 0.03 Anyresultofabinaryclassificationtaskcanbedescribedbyfour Table4. KappasforcoordinatesgeneratedbyNNMF. numbers, which sum up to 1: the fraction of true positives (α ), tp the fraction of false positives (α ), the fraction of false negatives fp MDS-10 MDS-50 MDS-100 (α ), and the fraction of true negatives (α ). Accuracy is defined fn tn SVM-lin −0.16 0.15 0.19 as acc = α + α . Moreover, the accuracy of a static majority- tp tn SVM-RBF 0.03 0.16 0.17 basedclassifier(whichalwaysreturnsthelabelofthemorefrequent 1NN-Eucl −0.29 −0.19 −0.18 class) is accmaj = max{αtp +αfn,αfp +αtn}. We propose to use 3NN-Eucl −0.01 0.06 0.06 thiskindofnaiveclassifierfornormalizingtheaccuracyanddefine 9NN-Eucl 0.13 0.18 0.18 1NN-sEucl −0.29 −0.23 −0.29 κ = (acc−acc )/(1−acc ). This measure expresses a clas- maj maj 3NN-sEucl −0.01 0.05 −0.01 sifier’srelativeperformancewithrespecttothemajority-basedclas- 9NN-sEucl 0.13 0.17 0.12 sifier. If acc = 1 then κ = 1, if acc > accmaj, then κ > 0, if 1NN-scal −0.29 −0.19 −0.18 acc=acc ,thenκ=0,andifacc<acc ,thenκ<0. 3NN-scal −0.01 0.07 0.08 maj maj Bymeasuringaccuracy intermsof κ,wecan averageclassifica- 9NN-scal 0.12 0.18 0.18 1NN-cos −0.28 −0.18 −0.16 tionperformance over different genres. Tables2–5 report themean 3NN-cos 0.00 0.07 0.08 κsoverall260classificationresultsobtainedforeachcombination 9NN-cos 0.13 0.19 0.19 of coordinate space and classifiertype. Allentries larger than 0.10 havebeenmarkedinboldface.Wecanobservethefollowing: Table5. KappasforcoordinatesgeneratedbyMDS. • The coordinate space derived by NNMF does not contain much WeappliedourapproachtotheMovieLens10Mdatasetandfound helpfulinformationaboutgenresthatcanbeexploitedbyourclas- initialevidenceforthisclaim. sifiers.Theperformanceinallotherspacesissignificantlybetter. Ourresultsencourageustofollowthislineofresearchinseveral • Except for NN-sEucl, classification performance generally im- ways. First, we would like to investigate whether our results also proveswithincreasingdimensionality.However,thedifferencein carryovertomoreadvancedandcomplexfactormodels,whichhave performance between d = 10 and d = 50 is much larger than beenproposedveryrecently[13,15].Itwouldalsointerestingtosee theonebetweend = 50andd = 100.Thisindicatesthatouror- whatmoretraditionalmethodssuchasmultidimensionalscalingcan deringofdimensionsduringstandardizationindeedcapturessome contributetotheproblemoffeatureextractionfromratingdata,since notionofrelativeimportance.Thisisprobablyalsothereasonfor our resultsindicatethat thesemethods cansucessfullybemodified NN-sEucl’sdecreasing performance withgrowingd;treatingall foruseinournewsetting. dimensionsequallyseemstooverweightinformationfromdimen- sionsattheendofthelist. • The SVM-RBF classifier slightly outperforms SVM-lin, but is References comparable in performance to 9NN-Eucl, 9NN-scal, and 9NN- cos.Thisindicatesthatgenresindeedtendtoclusterincoordinate [1] GediminasAdomaviciusandAlexanderTuzhilin,‘Towardthenextgen- erationofrecommendersystems:Asurveyofthestate-of-the-art and spaces,evenwithrespecttodifferentmeasuresofdistance. possibleextensions’,IEEETransactionsonKnowledgeandDataEngi- • TheNN-classifiersdisplaybadperformancefork=1andk=3, neering,17(6),734–749,(2005). whichindicatesthat,althoughmoviesofthesamegenreroughly [2] Robert M. Bell, Jim Bennett, Yehuda Koren, and Chris Volinsky, occurinclusters,eachclusterusuallyalsocontainsmoviesthatdo ‘Themilliondollarprogrammingprize’,IEEESpectrum,46(5),28–33, (2009). nothaveassignedtherespectivegenre. [3] RobertM.BellandYehudaKoren,‘Scalablecollaborativefilteringwith • In contrast to our expectations, the performance in coordinate jointlyderivedneighborhoodinterpolationweights’,inProceedingsof spaces generated by factor models is comparable to the perfor- ICDM2007,pp.43–52.IEEEComputerSociety,(2007). manceshownonourbaselinecoordinatespaceMDS. [4] IngwerBorgandPatrickJ.F.Groenen,ModernMultidimensionalScal- ing:TheoryandApplications,Springer,secondedn.,2005. [5] CraigBoutilier,KevinRegan,andPaoloViappiani,‘Preferenceelicita- Moreover, the results suggest that the performance of kNN- tionwithsubjectivefeatures’,inProceedingsofRecSys2009,pp.341– classifiersmightevenfurtherincreaseforlargervaluesofk.Tocheck 344.ACM,(2009). this,weperformedsomepreliminarytestswithk≈20,buthavenot [6] RonenI.BrafmanandCarmelDomshlak,‘Preferencehandling:Anin- beenabletoconfirmthisconjective. troductorytutorial’,AIMagazine,30(1),58–86,(2009). [7] ZlatkoDrmacˇ,‘Accuratecomputationoftheproduct-inducedsingular Wealsoinvestigatedtheinfluenceofindividual genresonclassi- value decomposition with applications’, SIAMJournal onNumerical ficationperformance; asan example, theresultsfor SVM-RBFare Analysis,35(5),1969–1994,(1998). reportedinTable6.Entrieslargerthan0.20havebeenindicated.We [8] PeterEnserandChristineSandom,‘Towardsacomprehensivesurvey canseethatsomegenres,suchasHorrorandDrama,canclearlybe ofthesemanticgapinvisualimageretrieval’,inProceedingsofCIVR identified by the classifier, while others cannot. We have expected 2003,volume2728ofLNCS,pp.291–299.Springer,(2003). [9] JohannesFürnkranzandEykeHüllermeier,‘Preferencelearning’,Kün- muchbetterperformanceonclear-cutgenressuchasWar. stlicheIntelligenz,2005(1),60–61,(2005). [10] Ralf Herbrich, Thore Graepel, and Klaus Obermayer, ‘Large margin SVD-100 δ-SVD-100 NNMF-100 MDS-100 rankboundariesforordinalregression’,inAdvancesinLargeMargin Classifiers,115–132,MITPress,(2000). Action 0.34 0.31 0.22 0.22 [11] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and Adventure 0.13 0.12 0.08 0.00 John T. Riedl, ‘Evaluating collaborative filtering recommender sys- Comedy 0.45 0.42 0.25 0.42 tems’,ACMTransactionsonInformationSystems,22(1),5–53,(2004). Crime 0.08 0.06 −0.01 0.00 [12] IhabF.Ilyas,GeorgeBeskales,andMohamedA.Soliman,‘Asurvey Drama 0.47 0.43 0.37 0.44 oftop-kquery processing techniques inrelational database systems’, Family 0.43 0.46 0.31 0.34 ACMComputingSurveys,40(4),(2008). Fantasy 0.03 0.05 0.01 0.00 [13] YehudaKoren,‘Factorizationmeetstheneighborhood:Amultifaceted Horror 0.56 0.54 0.31 0.61 collaborative filteringmodel’,inProceedingsofKDD2008,pp.426– Mystery 0.06 0.04 −0.00 0.00 434.ACMPress,(2008). Romance 0.11 0.10 −0.00 0.00 [14] YehudaKoren,‘Collaborativefilteringwithtemporaldynamics’,Com- Sci-Fi 0.23 0.20 0.09 0.00 municationsoftheACM,53(4),89–97,(2010). Thriller 0.31 0.27 0.14 0.15 [15] YehudaKoren,‘Factorintheneighbors:Scalableandaccuratecollabo- War 0.05 0.06 −0.00 0.00 rativefiltering’,ACMTransactionsonKnowledgeDiscoveryfromData, 4(1),(2010). Table6. KappasforSVM-RBFbygenre. [16] YehudaKoren,RobertBell,andChrisVolinsky,‘Matrixfactorization techniquesforrecommendersystems’,IEEEComputer,42(8),30–37, In summary, these preliminary experiments suggest that the co- (2009). [17] DonMonroe,‘Justforyou’,Communications oftheACM,52(8),15– ordinatespaces derivedbySVD,δ-SVD,andMDS indeedcontain 17,(2009). somesignificantsemanticinformationabouttherepresentedmovies. [18] Gregory Piatetsky-Shapiro, ‘Interview with Simon Funk’, ACM However,thesituationisbyfarnotasclearasclaimedbythelitera- SIGKDDExplorationsNewsletter,9(1),38–40,(2007). ture. [19] Tapani Raiko, Alexander Ilin, andJuhaKarhunen, ‘Principal compo- nent analysis for large scale problems with lots of missing values’, in Proceedings of ECML 2007, volume 4701 of LNAI, pp. 691–698. 5 CONCLUSIONANDOUTLOOK Springer,(2007). [20] BadrulSarwar,GeorgeKarypis,JosephKonstan,andJohnReidl,‘Item- basedcollaborative filteringrecommendationalgorithms’,inProceed- In the current paper, we presented a general methodology for sys- ingsofWWW2001,pp.285–295.ACMPress,(2001). tematicallyanalyzingwhethercoordinatespacesgeneratedfromfac- [21] J.BenSchafer,DanFrankowski,JonHerlocker,andShiladSen,‘Col- tormodelscontainsemanticinformation,asitiscommonlyclaimed. laborativefilteringrecommendersystems’,inTheAdaptiveWeb:Meth- odsandStrategiesofWebPersonalization,volume4321ofLNCS,291– 324,Springer,(2007). [22] AjitP.SinghandGeoffreyJ.Gordon,‘Aunifiedviewofmatrixfactor- izationmodels’,inProceedingsofECMLPKDD2008:PartII,volume 5212ofLNCS,pp.358–373.Springer,(2008). [23] Gábor Takács,István Pilászy, Bottyán Németh, and Domonkos Tikk, ‘Scalablecollaborativefilteringapproachesforlargerecommendersys- tems’,JournalofMachineLearningResearch,10,623–656,(2009). [24] MarkusWeimer,AlexandrosKaratzoglou,andAlexSmola,‘Improving maximummarginmatrixfactorization’,MachineLearning,72(3),263– 276,(2008).

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.