ebook img

Learning User Preferences to Incentivize Exploration in the Sharing Economy PDF

18 Pages·2017·1.32 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Learning User Preferences to Incentivize Exploration in the Sharing Economy

Learning User Preferences to Incentivize Exploration in the Sharing Economy ChristophHirnschall AdishSingla SebastianTschiatschek AndreasKrause ETHZurich MPI-SWS MicrosoftResearch* ETHZurich Zurich,Switzerland Saarbru¨cken,Germany Cambridge,UnitedKingdom Zurich,Switzerland [email protected] [email protected] [email protected] [email protected] Abstract Such dynamics create a need for platforms in the shar- ingeconomytoactivelyengageuserstoshapedemandand Westudyplatformsinthesharingeconomyanddiscussthe improve efficiency. Several previous papers have proposed needforincentivizinguserstoexploreoptionsthatotherwise the idea of using monetary incentives to encourage desir- wouldnotbechosen.Forinstance,rentalplatformssuchas ablebehaviorinsuchsystems.Oneexampleis(Frazieretal. Airbnb typically rely on customer reviews to provide users 2014),whostudiedtheprobleminamulti-armedbanditset- withrelevantinformationaboutdifferentoptions.Yet,oftena ting,whereaprincipal(e.g.amarketplace)attemptstomax- largefractionofoptionsdoesnothaveanyreviewsavailable. Suchoptionsarefrequentlyneglectedasviablechoices,and imize utility by incentivizing agents to explore arms other inturnareunlikelytobeevaluated,creatingaviciouscycle. thanthemyopicallypreferredone.Intheirsetting,theopti- Platforms can engage users to deviate from their preferred malamountisknowntothesystem,andthemaingoalisto choicebyofferingmonetaryincentivesforchoosingadiffer- quantifytherequiredpaymentstoachieveanoptimalpolicy entoptioninstead.Toefficientlylearntheoptimalincentives with myopic agents. The idea of shaping demand through to offer, we consider structural information in user prefer- monetary incentives in the sharing economy has also been encesandintroduceanovelalgorithm-CoordinatedOnline testedinpractice.Forexample,(Singlaetal.2015)usemon- Learning (CoOL) - for learning with structural information etaryincentivestoencourageusersofbikesharingsystems modeled as convex constraints. We provide formal guaran- toreturnbikesatbeneficiallocations,makingautomaticof- teesontheperformanceofouralgorithmandtesttheviabil- fersthroughthebikesharingapp. ityofourapproachinauserstudywithdataofapartmentson Airbnb.Ourfindingssuggestthatourapproachiswell-suited In this context, an important question is what amounts a tolearnappropriateincentivesandincreaseexplorationonthe platform should offer to maximize its utility. (Singla et al. investigatedplatform. 2015) introduce a simple protocol for learning optimal in- centives in the bike sharing system to make users switch from the preferred station to a more beneficial one, ignor- Introduction inginformationaboutspecificswitchesandadditionalcon- In recent years, numerous sharing economy platforms with text.Extendingontheseideas,weexploreageneralonline a variety of goods and services have emerged. These plat- learningprotocolforefficientlylearningoptimalincentives. forms are shaped by users that primarily act in their own OurContributions interest to maximize their utility. However, such behavior mightinterferewiththeusefulnessoftheplatforms.Forex- Weprovidethefollowingmaincontributionsinthispaper: ample,usersofmobilitysharingsystemstypicallypreferto • Structuralinformation:Weconsiderstructuralinforma- dropoffrentalsatthelocationinclosestproximity,whilea tion in user preferences to speed up learning of incen- morebalanceddistributionwouldallowthemobilitysharing tives,andprovideageneralframeworktomodelstructure servicetooperatemoreefficiently. across tasks via convex constraints. Our algorithm, Co- Undesirable user behavior in the sharing economy is in ordinated Online Learning (CoOL) is also of interest for manycasesevenself-reinforcing.Forexample,usersinthe relatedmulti-tasklearningproblems. apartmentrentalmarketplaceAirbnbarelesslikelytoselect infrequentlyreviewedapartmentsandarethereforeunlikely • Computationalefficiency:Weintroducetwonovelideas toprovidereviewsfortheseapartments(Fradkin2014).This of sporadic and approximate projections to increase the isalsoreflectedinthedistributionofreviews,whereinmany computationalefficiencyofouralgorithm.Wederivefor- cities20%ofapartmentsaccountformorethan80%ofcus- mal guarantees on the performance of the CoOL algo- tomerreviews1. rithmandachieveno-regretboundsinthissetting. • User study on Airbnb: We collect a unique data set *WorkperformedwhileatETHZurich. Copyright©2018,AssociationfortheAdvancementofArtificial through a user study with apartments on Airbnb and test Intelligence(www.aaai.org).Allrightsreserved. the viability and benefit of the CoOL algorithm on this 1Datafrominsideairbnb.com. dataset. Preliminaries Algorithm1:OL–OnlineLearning In the following, we introduce the general problem setting 1 Input: ofthispaper. • Learningrateconstant:η >0 Platform.Weinvestigateageneralplatforminthesharing economy,suchastheapartmentrentalmarketplaceAirbnb. 2 Initialize:p0i,j ∈S,τi0,j =0 On this platform, users can choose from n goods and ser- 3 fort=1,2,...,T do vices,denotedasitems.Auserthatarrivesattimetchooses 4 Sufferlosslt(pti,j) afonrImintecgmeaniitntisv∈iuzt[iinnlig]t.yIefuxtitph.leourasetirocnh.oTosheestionibtiuayl ictheomiciet,,tihteemplaitt-, 65 SCeatlcτuit,ljat=egτrita,−jd1ie+nt1git,j mightnotmaximizetheplatform’sutility,andtheplatform 7 Updatepti,+j1 =pti,j − √ητt git,j mightbeinterestedinofferingadifferentitemjtwithutility end i,j ut > ut instead. For example, j could represent an infre- j i quentlyrevieweditemthattheplatformwantstoexplore.To motivatetheusertoselectitemjt instead,theplatformcan offeranincentivept,forexampleintheformofamonetary The algorithm, Online Learning (OL), for a single pair discount on that item. The user can either accept or reject of items is shown in Algorithm 1. The regret, which mea- theofferpt dependingontheprivatecostct,wheretheuser suresthedifferenceinlossesagainstany(constant)compet- acceptstheofferifpt ≥ctandrejectstheofferotherwise.If ingweightvectorpi,j ∈S afterT rounds,isdefinedas theuseracceptstheoffer,theutilitygainoftheplatformis T utj −uti−pt. Regret (T,p )=(cid:88)(cid:16)lt(pt )−lt(p )(cid:17). (1) Objective.Inthissetting,twotasksneedtobeoptimized OLi,j i,j i,j i,j t=1 toachieveahighutilitygain:findinggoodswitchesi → j, and finding good incentives pt. Good switches i → j are MultiplePairsofItems those, in which the achievable utility gain is positive, i.e. We now relax the assumption of a fixed pair of items and ut −ut−ct >0.Torealizeapositiveutilitygain,theoffer j i returntoouroriginalproblemoflearningoptimalincentives ptneedstobegreaterorequaltoct,sinceotherwisetheoffer for multiple pairs of items, i.e. the algorithm receives spe- wouldberejected. cificitemsitandjtasinputforeachuser.Ifweconsiderall Inthispaper,wefocusonlearningoptimalincentivespt itemsnontheplatform,thetotalnumberofpairsisn2−n. overtime,whiletheplatformchoosesrelevantswitchesi→ Forlearningtheoptimalincentiveforeachpairofitems, j independently. thealgorithmmaintainsaspecificlearningrateproportional to 1/(cid:113)τt for each pair of items and performs one gradient Methodology i,j updatestepusingAlgorithm1.Werefertothisstraightfor- In this section, we present our methology for learning op- ward adaptation of the OL algorithm as Independent On- timal incentives pt and start with a single pair of items lineLearning(IOL)andusethisalgorithmasabaselinefor i,j (i,j). We allow for natural constraints on pt , such that our analysis. Using regret bounds of (Zinkevich 2003) and i,j pt ∈ S ,whereS isconvexandnon-empty.Forexam- denoting the number of pairs of items as K, we can upper i,j i,j i,j boundtheregretofIOLas ple, S might be lower-bounded by 0 and upper-bounded i,j bythemaximumdiscountthattheplatformiswillingtoof- fer. 3√ Regret (T)≤ TK(cid:107)S (cid:107)(cid:107)ggg (cid:107). (2) IOL 2 max max SinglePairofItems StructuralInformation We consider the popular algorithmic framework of online convex programming (OCP) (Zinkevich 2003) to learn op- Inareal-worldsetting,incentivesfordifferentpairsofitems timalincentivespt forasinglepairofitems.TheOCPal- typically are not independent, and in some cases, certain i,j gorithm is a gradient-descent style algorithm that updates structuralinformationmayhelptospeeduplearningofopti- withanadaptivelearningrateandperformsaprojectionaf- malincentives.Inthefollowing,wediscussseveralrelevant tereverygradientsteptomaintainfeasibilitywithinthecon- typesofstructuralinformation. straintsS .Weuseτt todenotethenumberoftimesapair Independent learning. In this baseline setting each pair i,j i,j ofitemsi,j hasbeenobservedandη todenotethelearning ofitemsislearnedindividually.Thus,thenumberofincen- rate. To measure the performance of the algorithm, we use tives that need to be learned grows quadratically with the the loss lt(pt ), which is the difference between the opti- numberofitemsonaplatform.Whileapplicableforasmall i,j malpredictionandthepredictionprovidedbythealgorithm, number of items, this approach is not favorable on typical such that lt(pt) = 1 ·(pt −ct)+1 ·(u− platformsinthesharingeconomy. {pt≥ct} {pt<ct} ct)foru≥ct,andlt(pt)=0foru<ct2. Shared learning. Another commonly studied setting is shared learning. In this setting, all pairs of items are con- 2Notethatthislossfunctionisnon-convex.Aconvexversionis sideredequivalent,andonlyoneglobalincentiveislearned. presentedinthecasestudy. While allowing the platform to learn about many pairs of items at the same time, this approach fails to consider nat- nationoftheproblemspecificweightvectorsattimet,i.e. uralasymmetriesintheproblem.Forexample,therequired incentiveforswitchingfromitojisoftendifferentthanthe wwwt =(cid:2)(wwwt1)(cid:48)···(wwwtz)(cid:48)···(wwwtK)(cid:48)(cid:3)(cid:48). required incentive for switching from j to i, as can be also Theavailablestructuralinformationismodelledbyaset observedinthecasestudyofthispaper. ofconvexconstraints,suchthatthejointcompetingweight Metric/hemimetric structure. Assuming that the re- vectorwww∗,againstwhichthelossateachroundismeasured, quired incentives are related to the dissimilarity of items i liesinaconvex,non-empty,andclosedsetS∗ ⊆ S,repre- andj,metricsareanaturalchoicetomodelstructuraldepen- sentingarestrictedjointsolutionspace,i.e.www∗ ∈S∗.Inthe dencies,astheycapturethepropertyoftriangleinequalities following,weprovideseveralpracticalexamplesofhowS∗ in dissimilarity functions. However, incentives for pairs of canbedefined. itemsarenotnecessarilyrequiredtobesymmetric.Forex- Independentlearning.S∗ ≡Smodelsthesettingwhere ample, the required incentives for switching from a highly theproblemsareunrelated/independent. reviewed apartment on Airbnb to one without reviews is Sharedlearning.Asharedparametersettingcanbemod- likely higher than vice versa. Therefore, we use hemimet- eledas rics, which are a relaxed form of a metric that satisfy only non-negativity constraints and triangular inequalities, cap- S∗ ={www∗ ∈S |www∗ =···=www∗ =···=www∗ }. 1 z K turingasymmetriesinpreferences(cf.(Singla,Tschiatschek, Insteadofsharingallparameters,anothercommonscenario andKrause2016)).Theusefulnessofthehemimetricstruc- istoshareonlyafewparameters.Foragivend(cid:48) ≤d,sharing ture for learning optimal incentives is demonstrated in the d(cid:48)parametersacrosstheproblemscanbemodeledas experiments. Inthefollowing section,weintroducea general-purpose S∗ ={www∗ ∈S |www∗[1:d(cid:48)]···=www∗[1:d(cid:48)]=···www∗ [1:d(cid:48)]} 1 z K algorithmforlearningwithstructuralinformation,wherethe wherewww∗[1:d(cid:48)] denotes the first d(cid:48) entries inwww∗. This ap- structure is defined by convex constraints on the solution z z proach is useful for sharing certain parameters that do not space. The key idea of our algorithm is to coordinate be- dependonthespecificproblem.Forexample,inthecaseof tween individual pairs of items by projecting onto the re- apartmentsonAirbnb,asharedfeaturecouldbethedistance sulting convex set. We generalize our approach for contex- betweenapartments. tuallearning,whereadditionalfeatures,suchasinformation Hemimetricstructure.Tomodeldissimilaritiesbetween aboutusers,maybeavailable.Sinceprojectingontoconvex items for learning optimal incentives, we use the hemimet- sets may be computationally expensive, we further extend ric set. Specifically, we use r-bounded hemimetrics, which, ouranalysistoallowprojectionstobesporadic(i.e.onlyaf- nexttonon-negativityconstraintsandtriangularinequalities, ter certain gradient steps) and approximate (i.e. with some alsoincludenon-negativityandupperboundconstraints.For errorcomparedtotheoptimalprojection). d = 1, the convex set representing r-bounded hemimetrics isgivenbyS∗ = LearningwithStructuralInformation {www∗ ∈S |www∗ ∈[0,r],www∗ ≤www∗ +www∗ ∀i,j,k ∈[n]} i,j i,j i,k k,j We begin this section by introducing a general framework forspecifyingstructuralinformationviaconvexconstraints. OurAlgorithm Wedenoteeachpairofitemsasadistinctproblemz ∈[K], In the following, we introduce our algorithm, Coordinated whereKisthetotalnumberofpairsofitems.Eachproblem OnlineLearning(CoOL). z may be associated with additional features, for example Exploiting Structure via Weighted Projections. The withinformationaboutthecurrentuser.Asiscommoninon- CoOL algorithm exploits structural information in a prin- linelearning,weconsideraddimensionalweightvectorwwwz cipledwaybyperformingweightedprojectionstoS∗,with for each problem z ∈ [K] for learning optimal incentives. (cid:112) weights for a problem z proportional to τt. Intuitively, Thepredictionp isequaltotheinnerproductbetweenwww z z z theweightsallowustolearnaboutproblemsthathavebeen andtheddimensionalfeaturevector.Intheprevioussection, observedinfrequentlywhileavoidingto“unlearn”problems wedescribedthespecialcasewithd = 1andaunitfeature that have been observed more frequently. A formal justifi- vector,suchthatwww isequivalenttothepredictionp . z z cationforusingweightedprojectionsisprovidedintheex- tendedversionofthispaper(Hirnschalletal.2018). SpecifyingStructureviaConvexConstraints WedefineQtasasquarediagonalmatrixofsizedKwith (cid:112) Similartoconstraintsonpz,weallowforconvexconstraints each τzt representeddtimes.Intheone-dimensionalcase onwww , such thatwww ∈ S ⊆ Rd. We assume S is a con- (d=1),wecanwriteQtas z z z z vex, non-empty, and compact set, where (cid:107)Sz(cid:107) is the Eu- (cid:112)τt 0  clidean norm of the solution space3. Further, we assume 1 (cid:107)Sz(cid:107) ≤ (cid:107)Smax(cid:107) for some constant (cid:107)Smax(cid:107). We denote the QQQt = ... . (3) jointsolutionspaceoftheK problemsasS = S ×···× (cid:112) 1 0 τt S ×···×S ⊆ Rd·K anddefinewwwt ∈ S astheconcate- K z K Using www to jointly represent the current weight vectors of (cid:101) 3Euclideannormisusedthroughout,unlessotherwisespecified. all the learners at time t (cf. Line 7 in Algorithm 2), we Algorithm2:CoOL–CoordinatedOnlineLearning Function3:AProj–ApproximateProjection 1 Input: 1 Input:www(cid:101),δt,QQQt • Projectionsteps:(ξt)t∈[T]whereξt ∈{0,1} 2 Defineft(www)=(www−www(cid:101))(cid:48)QQQt(www−www(cid:101))forwww ∈S 3 Choose • Projectionaccuracy:(δt) whereδt ≥0 t∈[T] wwwt+1 ∈{www ∈S∗ :ft(www)− min ft(www(cid:48))≤δt} • Learningrateconstant:η >0 www(cid:48)∈S∗ 4 Return:wwwt+1 2 Initialize:www1z ∈Sz,τz0 =0 3 fort=1,2,...,T do 4 Sufferlosslt(wwwtz) 5 Calculate(sub-)gradientgggtz enforcesexactprojectionsaftereveryiteration.Thisispar- 6 Setτzt =τzt−1+1 ticularlyproblematicforlarge,complexstructuressincepro- 7 Updatewww(cid:101)t+1 =wwwt;www(cid:101)tz+1 =wwwtz− √ητtgggtz jiemcatitoionnss,onthtahtemseaystrnuocttugrueasraonftteene rtoelycoonnvenrugmeetoricthaepperxoaxc-t z 8 ifξt =1then solutioninfinitetime. 9 DefineQQQtasperEquation(3) 10 Computewwwt+1 =AProj(www(cid:101)t+1,δt,QQQt) PerformanceGuaranteesandAnalysis else In this section, we analyze worst-case regret bounds of the 11 wwwtz+1 =argmin(cid:13)(cid:13)www−www(cid:101)tz+1(cid:13)(cid:13)2 CoOL algorithm against a competing weight vector www∗ ∈ www∈Sz S∗.Theproofsareprovidedintheextendedversionofthis end paper(Hirnschalletal.2018). end GeneralBounds compute the new joint weight vector wwwt+1 (cf. Line 10 in Webeginwithageneralresult,withoutassumptionsonthe Algorithm2)byprojectingontoS∗,using projectionaccuracyandrate. wwwt+1 =argmin(www−www)(cid:48)QQQt(www−www). (4) Theorem 1. The regret of the CoOL algorithm is bounded (cid:101) (cid:101) www∈S∗ byRegretCoOL(T)≤ WerefertothisastheweightedprojectionontoS∗.Since 1 √ √ S∗isconvexandtheprojectionisaspecialcaseoftheBreg- (cid:107)S (cid:107)2 TK+2η(cid:107)ggg (cid:107)2 TK (R1) manprojection,theprojectionontoS∗ isunique(cf.(Cesa- 2η max max BianchiandLugosi2006;RakhlinandTewari2009)). T (cid:88) SporadicandApproximateProjections.Forlargescale + 1 (cid:107)S (cid:107)(cid:107)ggg (cid:107) (R2) {(¬ξt−1)∧(ξt)} max max applications(i.e.largeKorlarged),projectingateverystep t=1 caogueldnebriecccoomnpveuxtatsieotnSal∗lywvoeurylderxepqeunisrieves:olavipnrgojeacqtiuoandoranttioc + 1 (cid:88)T 1 (cid:16)δt+√2δt(tK)1/4(cid:107)S (cid:107)(cid:17) (R3) programofdimensiondK.Toallowforcomputationallyef- η {ξt} max t=1 ficient updates, we introduce two novel algorithmic ideas: 1 sporadicandapproximateprojections,definedbytheabove- + 2η (cid:107)Smax(cid:107)2−2η(cid:107)gggmax(cid:107)2K. (R4) mentioned sequences (ξt) and (δt) . Here, δt de- t∈[T] t∈[T] notesthedesiredaccuracyattimetandisgivenasinputto TheregretinTheorem1hasfourcomponents.R1comes Function 3, AProj, for computing approximate projections. from the standard regret analysis in the OCP framework, Thisway,theaccuracycanbeefficientlycontrolledusingthe R2comesfromsporadicprojections,R3comesfromtheal- dualitygapoftheprojections.Asweshallseeinourexper- lowederrorintheprojections,andR4isaconstant. imentalresults,thesetwoalgorithmicideasofsporadicand Note that when ξt = 0 for all t (i.e. no projections are approximateprojectionsallowustospeedupthealgorithm performed)andηisproportionalto1/(cid:113)τt ,wegetthesame byanorderofmagnitudewhileretainingtheimprovements √ i,j regretboundsproportionalto T asfortheIOLalgorithm. obtainedthroughtheprojections. Algorithm2,wheninvokedwithξt =1,δt =0∀t∈[T], Thisalsorevealstheworst-casenatureoftheregretbounds ofTheorem1,i.e.theprovenboundsforCoOLareagnostic correspondstoavariantofouralgorithmwithexactprojec- tionsateverytimestep.Wheninvokedwithξt =0∀t∈[T], tothespecificstructureS∗andtheorderoftaskinstances. ouralgorithmcorrespondstotheIOLbaseline. Relationtoexistingapproaches.Arelatedalgorithmis Sporadic/ApproximateProjectionBounds the AdaGrad algorithm (Duchi, Hazan, and Singer 2011), Toprovidespecificboundsforthepracticallyusefulsetting which uses the sum of the magnitudes of past gradients to ofsporadicandapproximateprojections,weintroduceαand determinethelearningrateateachtimet,wherelargerpast β and the user chosen parameters c and c to control the α β gradientscorrespondtosmallerlearningrates.Akeydiffer- frequencyandaccuracyoftheprojections. ence to the CoOL algorithm is that the AdaGrad algorithm 1400 1400 200 1200 1200 uw- IOL CoOL 150 1000 1000 IOL CoOL et 800 et 800 et =IOL Regr 600 Regr 600 CoOL Regr 100 CoOL 400 400 50 200 200 0 0 0 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 Time Time Time (a) Randomproblemorder (b) Batchesofproblems (c) Singleproblem 1400 1400 1.0 1200 1200 IOL IOL 0.8 1000 1000 me ret 800 ret 800 runti 0.6 Reg 600 Reg 600 tive 0.4 a 400 400 Rel 0.2 200 200 0 0 0.0 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.9 0.8 0.7 0.6 1.0 0.9 0.8 0.7 0.6 α β β (d) Varyingα (e) Varyingβ (f) Projectionruntimevaryingβ Figure 1: Simulation results for learning hemimetrics. (a,b,c) compare the performance of CoOL against IOL for different ordersofprobleminstances.(d,e,f)showthespeed/performancetradeoffusingsporadicandapproximateprojections. Corollary1. Setη = 1(cid:107)Smax(cid:107).∀t∈[T],define Hemimetricprojection.Tobeabletoperformweighted 2(cid:107)gggmax(cid:107) projections onto the hemimetric polytope, we use the met- c ξt ∼Bernoulli(α)withα= √α , ric nearness algorithm (Sra, Tropp, and Dhillon 2004) as a T √ starting point. For our purposes, three modifications of the K algorithmarerequired:First,welifttherequirementofsym- δt =c (1−β)2 √ (cid:107)S (cid:107)2 β max metry to generalize from metrics to hemimetrics. Second, t √ the metric nearness algorithm does not guarantee a solu- whereconstantscα ∈ [0, T],cβ ≥ 0,andβ ∈ [0,1].The tion in the metric set in finite time. However, to calculate expected regret (w.r.t. (ξt)t∈[T]) of the CoOL algorithm is thedualitygap,thesolutionisrequiredtobefeasible.Thus, boundedbyE[RegretCoOL(T)]≤ we apply the Floyd-Warshall algorithm (Floyd 1962) after √ (cid:18) c (cid:16) c (cid:17) every iteration to receive a solution in the hemimetric set. 2 TK(cid:107)Smax(cid:107)(cid:107)gggmax(cid:107)· 1+ √α 1− √α Third, we add weights to the triangle inequalities to allow 2 K T for weighted projections and further add upper and lower (cid:19) (cid:112) boundconstraints. +c (c + 2c )(1−β) . α β β As shown in Corollary 1, projections are required to be Data structure. To empirically test the performance of more accurate for higher values of t. Intuitively, this is re- theCoOLalgorithmonthehemimetricset,wesynthetically quired so that already learned weights are not unlearned generate data with d = 1 and model the underlying struc- through inaccurate projections. Using the definitions under ture S∗ as a set of r-bounded hemimetrics with n = 10, Corollary√1,wecanproveworst-caseregretboundspropor- resultinginK = 90problems.Weuseasimpleunderlying tionalto T forthissetting. ground-truth hemimetric www∗, where the n items belong to two equal-sized clusters, with p∗ = 1 if i and j are from PerformanceAnalysisforHememtricStructure thesameclusterandp∗ = 9otih,jerwise.Theresultsofour i,j WenowtesttheperformanceoftheCoOLalgorithmonsyn- experiment in Figure 1 illustrate the potential runtime im- theticdatawithanunderlyinghemimetricstructure. provementusingsporadic/approximateprojections. Apartment 1 Apartment 2 Price per night: 80$ Location: Manhattan Price per night: 80$ Location: Brooklyn 0 reviews 1.2 miles to Times Square · 23 reviews 4.4 miles to Times Square Figure2:SnapshotoftheusersurveystudyonMTurk.4 Random order of problems. Problem instances zt are creases, cf. Figure 1(f). For values of β smaller than 0.95, chosen uniformly at random at every time step. The CoOL theruntimeoftheprojectionislessthan10%ofthatofthe algorithmachievesasignificantlylowerregretthantheIOL exactprojection.Thus,withβ valuesintherangeof0.85to algorithm,benefitingfromtheweightedprojectionsontoS∗. 0.95,theCoOLalgorithmachievesthebestofbothworlds: AtT = 500,theregretofCoOLislessthanhalfofthatof the regret is significantly smaller than that of IOL, with an theIOL,cf.Figure1(a). orderofmagnitudespeedupintheruntimecomparedtoex- Batchesofproblems.Inthebatchsetting,aproblemin- actprojections. stanceischosenuniformlyatrandom,thenitisrepeatedfive timesbeforechoosinganewprobleminstance.Comparedto AirbnbCaseStudy theabove-mentionedrandomorder,theIOLalgorithmsuf- TotesttheviabilityandbenefitoftheCoOLalgorithmina fersalowerregretbecauseofahigherprobabilitythatprob- realistic setting, we conducted a user study with data from lems are repeatedly shown. Furthermore, the benefit of the themarketplaceAirbnb. projections onto S∗ for the CoOL algorithm is reduced, cf. Figure 1(b), showing that the benefit of the projections de- ExperimentalSetup pends on the specific order of the problem instances for a givenstructure. Weusethefollowingsetupinouruserstudy: Single-problem setting. A single problem z is repeated Airbnb dataset. Using data of Airbnb apartments from in every round. As illustrated, in this case the IOL algo- insideairbnb.com, we created a dataset of 20 apart- rithm and the CoOL algorithm have the same regret, cf. mentsasfollows:wechoseapartmentsfrom4typesinNew Figure 1(c). In order to get a better understanding of us- YorkCitybylocation(ManhattanorBrooklyn)andnumber ingweightsQt fortheweightedprojection,wealsoshowa of reviews (high, ≥ 20 or low, ≤ 2). From each type we variant uw-CoOL using Qt as identity matrix. Unweighted chose 5 apartments, resulting in a total sample of n = 20 projection or using the wrong weights can hinder the con- apartments. vergence of the learners, as shown in Figure 1(c) for this Survey study on MTurk platform. In order to obtain extremecaseofasingle-problemsetting. real-world distributions of the users’ private costs, we col- Varying the rate of projection (α). The regret of the lected data from Amazon’s Mechanical Turk marketplace. CoOL algorithm monotonically increases as α decreases, Afterseveralintroductoryquestionsabouttheirpreferences andisequivalenttotheregretoftheIOLalgorithmatα=0, and familiarity with travel accommodations, participants cf.Figure1(d).Intherangeofαvaluesbetween1and0.1, were shown two randomly chosen apartments from the the regret of the CoOL algorithm is relatively constant and Airbnb dataset. To choose between the apartment, partici- increasesstronglyonlyasαapproaches0.Withαaslowas pants were given the price, location, picture, number of re- 0.1, the regret of the CoOL algorithm in this setting is still views and rating of each apartment, as shown in Figure 2. almosthalfofthatoftheIOLalgorithm. Participantswerefirstaskedtoselecttheirpreferredchoice Varying the accuracy of projection (β). The regret of betweenthetwoapartments.Next,theywereaskedtospec- theCoOLalgorithmmonotonicallyincreasesasβdecreases, ify their private cost for choosing the other, less preferred and exceeds that of the IOL algorithm for values smaller apartment instead. The collected data from the responses than 0.65 becauseof higherrors intheprojections, cf. Fig- consistsoftuples((i,j),c),whereiisthepreferredchoice, ure 1(e). In the range of β values between 1 and 0.85, the j isthesuggestedalternative,andcistheprivatecostofthe regretoftheCoOLalgorithmisrelativelyconstantandless user. thanhalfofthatoftheIOLalgorithm. Sample. In total, we received 943 responses, as summa- Runtimevs.approximateprojections.Asexpected,the rizedinTable1.Thesamplefortheperformanceanalysisof runtimeoftheprojectionmonotonicallydecreasesasβ de- the CoOL algorithm consists of 323 responses, in which i was a frequently reviewed apartment, j an infrequently re- 4RealpicturesfromAirbnbreplacedwithillustrativeexamples. viewed apartment, and participants were willing to explore 1400 5 1200 CoOL 4 n 1000 ai CoOL g Utility gain 680000 IOL age utility 23 IOL r 400 e v A 1 200 0 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 High review count Low review count Time Time (a) Datasetof20apartments (b) Cummulativeutilitygain (c) Averageutilitygain Figure3:ResultsofexperimentswithAirbnbdataset. theinfrequentlyreviewedapartmentforadiscount(i.e.they Descriptivestatistics.Outofallresponses,758(80.4%) didnotselectNA). respondentswerewillingtoacceptanofferfortheirlesspre- Utility gain. The utility gain u for getting a review for ferredapartment,givenacertaindiscountpernight.Outof infrequently reviewed apartments is set to u = 40 in our theserespondents,theaveragerequireddiscountforaccept- experiments,basedonreferraldiscountsgivenbyAirbnbin ing the alternative apartment was 27.9 USD per night. The thepast. average required discount for switching from a frequently Lossfunction.Asintroducedinthemethodologysection, reviewed apartment to an infrequently reviewed apartment werequireaconvexversionofthetruelossfunctionforour was6%higher. online learning framework, ideally acting as a surrogate of Outoftherespondentswhocouldchoosebetweenafre- thetrueloss.Additionally,thegradientofthelossfunction quentlyandaninfrequentlyreviewedapartment,83.9%re- needs to be calculated from the binary feedback of accep- spondents chose the frequently reviewed apartment, while tance/rejectionoftheoffers.However,intheanalyzedmodel only 16.1% respondents chose the infrequently reviewed with binary feedback, a loss function that satisfies both re- apartment. quirements cannot be constructed. Instead, we consider a Intheresponsestoanopenquestionaboutthefactorsre- simplified piece-wise linear convex loss function given by spondentsconsideredtodecideonthediscount,wecaptured lt(pt) = 1 · (pt − ct) + 1 · u · (ct − pt), thefrequencyatwhichdifferentfactorswerementionedby {pt≥ct} {pt<ct} ∆ where u denotesthemagnitudeofthegradientwhenauser defining several keywords for each factor. The number of ∆ rejectstheoffer.Fortheexperiment,weuseadeltavalueof timeseachfactorwasmentionedisshowninTable2. 20.Duetothistransformation,weusetheutilitygainrather Algorithm performance. We use the cumulative utility thanthelossasausefulmeasureoftheperformanceofthe gaintomeasuretheperformanceoftheIOLandtheCoOL CoOLalgorithm. algorithm.Theutilitygainsofbothalgorithmsafter323re- Structure. Due to the small number of apartments, we sponses are shown in Figure 3(c). The utility gain in Fig- consider a non-contextual setting with d = 1 and use an r- ure3(b)isalmost50%higherfortheCoOLalgorithmthan bounded hemimetric structure to model the relationship of for the IOL algorithm. Figure 3(c) reveals that this gain is thetasks,whererissetto40toavoidrecommendingincen- mainly achieved due to a significant speed up in learning tives pt > u. Using a setting with d > 1 would allow for overthefirst50problems. real-worldapplicationswithadditionalcontext. Discussion.Theresultsoftheuserstudyconfirmseveral findingsof(Fradkin2014),whostudiedthebookingbehav- MainResults ioronAirbnb.Similartothisstudy,wefindthatapartments Wenowpresentanddiscusstheresultsoftheuserstudy. withahighnumberofreviewsaresignificantlymorelikely i j Responses Accepted Avg.Discount Category Examplekeywords Mentions High Low 416 77.6% 29.5$ Location neighborhood,distance 477 Low Low 228 83.3% 28.1$ Reviews rating,star 309 High High 219 82.2% 25.4$ Price expensive,cheap 182 Low High 80 81.3% 25.9$ Picture image,photo 169 Table1:Responsesfordifferentapartmenttypes. Table2:Mentionedcategoriesfordecidingonadiscount. to be selected. We also find that the average required dis- learning in a distributed setting. They considered a setup, count per night is higher when the alternative choice is an where tasks arrive asynchronously, and the relatedness infrequently reviewed apartment. This also points toward a among the tasks is maintained via a correlation matrix. differenceinwillingnesstopaybetweenfrequentlyandin- However, there is no theoretical analysis on the regret frequently reviewed apartments. Similar results have been bounds for the proposed algorithms. (Wang, Kolar, and found in earlier studies on other marketplaces (Resnick et Srerbo 2016) recently studied the multi-task learning for al.2006;Ye,Law,andGu2009;Luca2011). distributed LASSO with shared support. Their work is dif- The user study also confirms that incentives influence ferentfromours—weconsidergeneralconvexconstraints buying behavior and can help increase exploration on on- to model task relationships and consider the adversarial line marketplaces (Avery, Resnick, and Zeckhauser 1999; onlineregretminimizationframework. Robinson,Nightingale,andMongrain2012);whenrespon- dents chose a frequently reviewed apartment and were ConclusionsandFutureWork askedtoinsteadchooseaninfrequentlyreviewedapartment, Wehighlightedtheneedinthesharingeconomytoactively 77.6% of respondents were willing to accept a sufficiently shapedemandbyincentivizinguserstodifferfromtheirpre- largeoffer.Morethan10%ofthoserespondentswerewill- ferredchoicesandexploredifferentoptionsinstead.Tolearn ingtoacceptadiscountof10USDpernightorless. the incentives users require to choose different items, we TheperformanceoftheIOLandCoOLalgorithminFig- developed a novel algorithm, CoOL, which uses structural ures3(b)and3(c)suggeststhatincentivescanbelearnedvia information in user preferences to speed up learning. The onlinelearning,andthatstructuralinformationcanbeused keyideaofouralgorithmistoexploitstructuralinformation tosignificantlyspeedupthelearning.Further,thespeedup in a computationally efficient way by performing sporadic in learning directly increases the marketplace’s utility gain andapproximateprojections.Weformallyderivedno-regret from suggesting alternative items. To reduce the problem bounds for the CoOL algorithm and provided evidence for sizeonareal-worldapplicationsuchasAirbnb,itemscould the increase in performance over the IOL baseline through be grouped by features such as location or number of re- several experiments. In a user study with apartments from views. Further, problem-specific features, such as the dis- therentalmarketplaceAirbnb,wedemonstratedthepracti- tancebetweenapartmentscouldbeaddedtoincreasetheac- calapplicabilityofourapproachinareal-worldsetting.To curacyoftheprediction. conclude, we discuss several additional considerations for offeringincentivesinasharingeconomyplatform. RelatedWork Safety/individualconsumerloss.Generally,exploration Multi-armed bandit / Bayesian games. A related path of in the sharing economy may be risky, and individuals can researcharemulti-armedbanditandBayesiangames,where faceseverelosseswhileexploring.Forexample,newhosts a principal attempts to coordinate agents to maximize its mightnotbetrustworthy,andnewdriversinridesharingsys- utility. Research in this area mainly focuses on changing temsmightnotbereliable.Inourapproach,theitemstobe the behavior of agents in the way information is disclosed, exploredarecontrolledbytheplatform,andappropriatepre- rather than through provision of payments. (Kremer, Man- conditionswouldneedtobeimplementedtominimizerisks. sour, and Perry 2014) provide optimal information disclo- Reliability/Consistency.Inorderforplatformstoimple- surepoliciesfordeterministicutilitiesandonlytwopossible ment an algorithmic provision of monetary incentives, it is actions. (Mansour, Slivkins, and Syrgkanis 2015) general- important that incentives are reliable and consistent over izetheresultsforstochasticutilitiesandaconstantnumber time.Ideally,similarusersshouldreceivesimilarincentives, of actions. Further, (Mansour et al. 2016) consider the in- andoffersshouldbeconsistentwiththeuser’spreferences. teraction of multiple agents, and (Chakraborty et al. 2017) Using the CoOL algorithm, consistency can be controlled analyzeamulti-armedbanditsinthepresenceofcommuni- throughappropriateconvexconstraints. cationcosts.Ourproblemisdifferenttopreviousresearchin Strategy-proofness. Providing monetary incentives thatutilitiesarenotrequiredtobestochastic,andadditional based on user preferences creates possibilities for oppor- structuralinformationisavailabletotheprincipal. tunistic behavior. For example, users could attempt to Recommendersystems.Adifferentapproachtoencour- repeatedly decline offers to receive higher offers in the aging exploration in online marketplaces are recommender future or browse certain items hoping to receive offers for systems, which are known to influence buyers’ purchasing similar items. To control for such behavior, markets need decisionsandcanbeusedtoencourageexploration(Resnick to be large enough so that behavior of individuals does not andVarian1997;SenecalandNantel2004).Forexample,(cid:15)- affect overall learning. Further, platforms can control the greedyrecommendersystemsrecommendaproductclosest number and frequency with which individual users receive tothebuyer’spreferenceswithprobability(1−(cid:15))andaran- offerstominimizeopportunisticpossibilities. dom product with probability (cid:15) (Ten Hagen, Van Someren, and Hollink 2003). Such recommender systems can be ex- Acknowledgments tendedusingideasstudiedinthispaper. ThisworkwassupportedinpartbytheSwissNationalSci- Online/distributed multi-task learning. Multi-task ence Foundation, and Nano-Tera.ch program as part of the learning has been increasingly studied in online and dis- OpensenseIIproject,ERCStG307036,andaMicrosoftRe- tributed settings recently. Inspired by wearable computing, searchFacultyFellowship.AdishSinglaacknowledgessup- a recent work by (Jin et al. 2015) studied online multi-task portbyaFacebookGraduateFellowship. References Senecal, S., and Nantel, J. 2004. The influence of on- lineproductrecommendationsonconsumersonlinechoices. Avery,C.;Resnick,P.;andZeckhauser,R.1999.Themarket Journalofretailing80(2):159–169. forevaluations. AmericanEconomicReview564–584. Singla, A.; Santoni, M.; Barto´k, G.; Mukerji, P.; Meenen, Beckenbach,E.F.,andBellman,R. 2012. Inequalities,vol- M.;andKrause,A. 2015. Incentivizingusersforbalancing ume30. SpringerScience&BusinessMedia. bikesharingsystems. InAAAI. Cesa-Bianchi,N.,andLugosi,G. 2006. Prediction,learn- Singla,A.;Tschiatschek,S.;andKrause,A. 2016. Actively ing,andgames. Cambridgeuniversitypress. learninghemimetricswithapplicationstoelicitinguserpref- Chakraborty,M.;Chua,K.Y.P.;Das,S.;andJuba,B. 2017. erences. InICML. Coordinatedversusdecentralizedexplorationinmulti-agent Sra, S.; Tropp, J.; and Dhillon, I. S. 2004. Triangle fixing multi-armed bandits. In Proceedings of the Twenty-Sixth algorithmsforthemetricnearnessproblem. InAdvancesin International Joint Conference on Artificial Intelligence, NeuralInformationProcessingSystems,361–368. IJCAI-17,164–170. Ten Hagen, S.; Van Someren, M.; and Hollink, V. 2003. Duchi,J.;Hazan,E.;andSinger,Y. 2011. Adaptivesubgra- Exploration/exploitationinadaptiverecommendersystems. dient methods for online learning and stochastic optimiza- ProceedingsofEunite2003. tion. JournalofMachineLearningResearch12:2121–2159. Wang,J.;Kolar,M.;andSrerbo,N.2016.Distributedmulti- Floyd,R.W. 1962. Algorithm97:shortestpath. Communi- tasklearning. InAISTATS. cationsoftheACM5(6):345. Ye, Q.; Law, R.; and Gu, B. 2009. The impact of online Fradkin,A. 2014. Searchfrictionsandthedesignofonline user reviews on hotel room sales. International Journal of marketplaces. NBERWorkingPaper. HospitalityManagement28(1):180–182. Frazier, P.; Kempe, D.; Kleinberg, J.; and Kleinberg, R. Zinkevich,M. 2003. Onlineconvexprogrammingandgen- 2014. Incentivizing exploration. In Proceedings of the fif- eralizedinfinitesimalgradientascent. InICML. teenthACMconferenceonEconomicsandcomputation,5– 22. ACM. Hirnschall,C.;Singla,A.;Tschiatschek,S.;andKrause,A. 2018. Learning user preferences to incentivize exploration inthesharingeconomy(extendedversion). Jin,X.;Luo,P.;Zhuang,F.;He,J.;andHe,Q. 2015. Col- laboratingbetweenlocalandgloballearningfordistributed onlinemultipletasks. InCIKM. Kremer, I.; Mansour, Y.; and Perry, M. 2014. Implement- ingthewisdomofthecrowd. JournalofPoliticalEconomy 122(5):988–1012. Luca,M. 2011. Reviews,reputation,andrevenue:Thecase ofyelp.com. HarvardBusinessSchoolNOMUnitWorking Paper. Mansour,Y.;Slivkins,A.;Syrgkanis,V.;andWu,Z.S.2016. Bayesianexploration:Incentivizingexplorationinbayesian games.InProceedingsofthe2016ACMConferenceonEco- nomicsandComputation,EC’16,661–661. NewYork,NY, USA:ACM. Mansour,Y.;Slivkins,A.;andSyrgkanis,V.2015.Bayesian incentive-compatiblebanditexploration. InProceedingsof the Sixteenth ACM Conference on Economics and Compu- tation,EC’15,565–582. NewYork,NY,USA:ACM. Rakhlin,A.,andTewari,A. 2009. Lecturenotesononline learning. Draft,April. Resnick,P.,andVarian,H.R.1997.Recommendersystems. CommunicationsoftheACM40(3):56–58. Resnick, P.; Zeckhauser, R.; Swanson, J.; and Lockwood, K. 2006. The value of reputation on ebay: A controlled experiment. Experimentaleconomics9(2):79–101. Robinson, J. G.; Nightingale, T. R.; and Mongrain, S. A. 2012. Methodsandsystemsforobtainingreviewsforitems lackingreviews. USPatent8,108,255. OutineoftheSupplement WestartthesupplementbyintroducingpropertiesoftheBregmandivergenceandadditionalnotationrequiredfortheproofs of the regret bounds. We further introduce two basic propositions and several lemmas. We then provide formal justification forusingweightedprojectionintheCoOLalgorithm,cf.Equation(4).Lastly,weprovidetheproofoftheregretboundofthe CoOLalgorithminTheorem1andCorollary1. Preleminaries BregmanDivergence ForanystrictlyconvexfunctionRt : Rd → R,theBregmandivergenceD betweenaaa,bbb ∈ Rd isdefinedasthedifference Rt betweenthevalueofRtataaa,andthefirst-orderTaylorexpansionofRtaroundbbbevaluatedataaa,i.e. D (aaa,bbb)=Rt(aaa)−Rt(bbb)−∇Rt(bbb)·(aaa−bbb), Rt WeusethefollowingpropertiesoftheBregmandivergence,cf.(RakhlinandTewari2009): • TheBregmandivergencesisnon-negative. • TheBregmanprojection (cid:98)bbb=argminDRt(aaa,bbb) aaa∈S ontoaconvexsetS existsandisunique. • For(cid:98)bbbdefinedasintheBregmanprojectionaboveanduuu∈S,bythegeneralizedPythagoreantheorem,cf.(Cesa-Bianchiand Lugosi2006),theBregmandivergencesatisfies DRt(uuu,bbb)≥DRt(uuu,(cid:98)bbb)+DRt((cid:98)bbb,bbb). • Thethree-pointequality D (aaa,bbb)+D (bbb,ccc)=D (aaa,ccc)+(aaa−bbb)(∇R(ccc)−∇R(bbb)) Rt Rt Rt followsdirectlyfromthedefinitionoftheBregmandivergence. Notation Throughoutthesupplementweuseηt = √η andQQQt asperEquation(3).Similartothedefinitionofwwwt,wealsodefinexxxt, z τt z andgggtastheconcatenationofthetaskspecificfeatureandgradientvectors,i.e. xxxt =(cid:2)(xxxt)(cid:48)···(xxxt)(cid:48)···(xxxt )(cid:48)(cid:3)(cid:48) gggt =(cid:2)(gggt)(cid:48)···(gggt)(cid:48)···(gggt )(cid:48)(cid:3)(cid:48). 1 z K 1 z K whereforallt,xxxtandgggtare0inallpositionsthatdonotcorrespondtotaskzt.Wealsousewwwt+1torefertotheconcatenation (cid:101) oftheupdatedtaskspecificweights,beforeanycoordination,suchthat wwwt+1 =(cid:2)(www )(cid:48)···(www )(cid:48)···(www )(cid:48)(cid:3)(cid:48). (cid:101) 1 z K wherewww =wwwt forz (cid:54)=ztandwww =wwwt otherwise. z z z (cid:101)z Propositions InthefollowingweintroducetwobasicpropositionsthatweneedfortheproofofTheorem1. K Proposition1. Ifτ ∈R+forallz ∈{1...K},and (cid:80) τ =T,then z z z=1 (cid:88)K √ √ τ ≤ TK. z z=1 Proof. ExtendingandapplyingtheCauchy-Schwarzinequality,weget (cid:118) (cid:118) K (cid:117) K (cid:117) K (cid:88)√ (cid:117)(cid:88) (cid:117)(cid:88) τ ≤(cid:116) 1(cid:116) τ z z z=1 z=1 z=1 √ √ = K T √ = TK

Description:
Airbnb. Our findings suggest that our approach is well-suited to learn appropriate incentives and increase exploration on the investigated platform.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.