ebook img

Does Weather Matter? Causal Analysis of TV Logs PDF

0.19 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Does Weather Matter? Causal Analysis of TV Logs

Does Weather Matter? Causal Analysis of TV Logs Shi Zong† Branislav Kveton‡ Shlomo Berkovsky§ Azin Ashkan(cid:93) Nikos Vlassis‡ Zheng Wen‡ †TheOhioStateUniversity,Columbus,OH,USA ‡AdobeResearch,SanJose,CA,USA §CSIRO,Epping,NSW,Australia (cid:93)GoogleInc.,MountainView,CA,USA [email protected], {kveton, vlassis, zwen}@adobe.com, [email protected], [email protected] 7 ABSTRACT unitiiscontrolandT = 1iftheunitistreated. Thenunitihas i 1 twopotentialoutcomes,Y (1)iftheunitistreatedandY (0)other- Weatheraffectsourmoodandbehaviors,andmanyaspectsofour i i 0 wise. Theunit-levelcausaleffectofthetreatmentisthedifference life. Whenitissunny, mostpeoplebecomehappier; butwhenit 2 inpotentialoutcomes,τ =Y (1)−Y (0),andtheaveragetreat- rains, some people get depressed. Despite this evidence and the i i i r abundanceofdata,weatherhasmostlybeenoverlookedinthema- ment effect on treated (ATT) is Ei:Ti=1[τi] = Ei:Ti=1[Yi(1)]− a E [Y (0)].NotethatE [τ ]cannotbedirectlycomputed, M chine learning and data science research. This work presents a i:Ti=1 i i:Ti=1 i becauseY (0)isunobservedintreatedunits{i:T =1}. causalanalysisofhowweatheraffectsTVwatchingpatterns. We i i Sincetheassignmenttotreatmentandcontrolgroupsisusually show that some weather attributes, such as pressure and precipi- 4 notrandom,E [Y (0)]isapoorestimateofE [Y (0)].A tation,causemajorchangesinTVwatchingpatterns. Tothebest i:Ti=0 i i:Ti=1 i 2 keychallengeincausalanalysisistoeliminatetheresultingimbal- of our knowledge, this is the first large-scale causal study of the impactofweatheronTVwatchingpatterns. ancebetweenthedistributionsoftreatedandcontrolunits.Apopu- ] larapproachtobalancingthetwodistributionsisnearest-neighbor Y matching(NNM)[4].Inthiswork,wematcheachtreatedunittoits 1. INTRODUCTION C nearestcontrolunitbasedontheircovariates,andthentheresponse . Weather affects our mood, and thus human behaviors. One of ofthematchedunitservesasacounterfactualforthetreatedunit. s c thepronouncedexamplesistheseasonalaffectivedisorder–pro- Inparticular,theATTisestimatedas [ longed lack of sunlight that can depress people [2]. Weather in- 2 dchiraescitnlygbafefheacvtsiovras,riaonudsmasopreec.tsInofthoiusrwloivreks,:wweosrektaonudttsotuedxya,mpiunre- ATT≈ n1T (cid:80)i:Ti=1(Yi(1)−Yπ(i)(0)), (1) v theeffectsofweatheronTVwatchingpatterns. Inparticular,we wherenT = (cid:80)ni=1Ti isthenumberoftreatedunits,Yi(1)isthe 6 explorewhetherpeoplewatchdifferentgenresofprogramsindif- observedresponseoftreateduniti,andYπ(i)(0)istheobservedre- 1 ferentweatherconditions. sponseofthematchedcontrolunitπ(i).Thecovariateofuniti,xi, 7 Thisknowledgecanhaveseveralimportantimplications. First, shouldbechosensuchthatitspotentialoutcomesarestatistically 8 marketersmaybewillingtoadjustthecontentandratioofthead- independentofTi. Inthiscase,theestimatein(1)resemblesthat 0 vertisementstothetargetaudienceitwillbeexposedto[1]. Sec- ofarandomizedexperiment. . 1 ond,TVandvideorecommendersystemsmayleveragethisknowl- 0 edgeandadapttheirrecommendationsaccordingly[5]. 3. EXPERIMENTS 7 Tothebestofourknowledge,ourworkisthefirsttoapplycausal 1 analysistoanation-scaledatasetofTVconsumptionlogscontain- In this work, we use a dataset gathered by an Australian na- : ingabout10Mwatchingeventsofmorethan40kusers. Wepro- tional IPTV provider. We obtained the complete Australia-wide v i poseseveralpracticalwaysforestimatingthecausalityofweather logsforaperiodof26weeks, fromFebruarytoSeptember2012 X on TV watching behavior, and observe high correspondence be- [5]. Thedatasetcontains10Mviewingeventsofabout40kusers, r tweenourfindings. whowatchedmorethan11kuniqueprogramsin14TVgenres.We a also obtained the geographic locations of the users and matched 2. CAUSALANALYSIS themwiththeirweather. Wefurtherextractedeightattributesthat characterizevariousaspectsofweather(Table1). Theproblemofestimatingcausaleffectsfromobservationaldata iscentraltonumerousdisciplines[3]. Itcanbeformalizedasfol- 3.1 ExperimentalSetup lows. Let{1,...,n}beasetofnunitsi,suchasindividuals. Let WeapplythecausalityframeworkfromSection2toourdataset. T ∈ {0,1} indicate the treatment of unit i. That is, T = 0 if i i Asweexplainoursetup,weillustrateitonanexamplequery“does hightemperaturecausewatchingmoreDramas?”. Theunitsiare TVwatchingeventsandweestimatethecausaleffectofweather (cid:13)c2017InternationalWorldWideWebConferenceCommittee attheseevents. Thetreatment T indicatesthetreatmentweather (IW3C2),publishedunderCreativeCommonsCCBY4.0License. i ateventi. Inourexample, T = 1{hightemperatureateventi}. WWW’17Companion,April3–7,2017,Perth,Australia. i ACM978-1-4503-4914-7/17/04. Wedefineonetreatmentvariableforeachweatherattribute,andget http://dx.doi.org/10.1145/3041021.3054221 eightweather-attributetreatments.Foreachtreatment,wetreatthe eventsinthetailofthedistributionofthatattribute. Thatis,ifthe tailofthedistributionisontheleft(right),weassign20%ofthe . eventswiththelowest(highest)valuestothetreatmentgroup. We 15 Temperature (H) 10 Temperature (H) 10 10 Feels like temp (H) Feels like temp (H) malized ATT 05 CWloinuPddr s ecpsoesvueerdre (( (HHL))) 04 CWloinuPddr s ecpsoesvueerdre (( (HHL))) 04 Nor Humidity (L) -4 Humidity (L) -4 -5 Visibility (L) Visibility (L) -10 Preciptation (H) -10 Preciptation (H) -10 DramaSchool KidsComedyDocu Life PanelNews DramaSchool Kids ComedyDocu Life Panel News DramaSchool Kids ComedyDocu Life Panel News TV genres TV genres TV genres (a) NormalizedATTsofpressure. (b) StatisticallysignificantchangesinnormalizedATTsforallweather-genrecombinations. Figure1: (a)NormalizedATTsofpressureon8mostpopulargenreswithTV-genre(blue)andlatent(red)userprofiles. (b)Sig- nificantchangesinnormalizedATTsofallweathertreatmentson8mostpopulargenres. Significantincreasesarered,significant decreasesareblue,andinsignificanteffectsaregray.WereportresultsforbothTVgenre(left)andlatent(right)userprofiles. Weatherattribute Treated Weatherattribute Treated thatmanysignificantchangesinFig.1(b)arestableacrossourtwo Temperature High(H) Pressure Low(L) profilerepresentations.Thecorrelationcoefficientbetweentheval- Feels-liketemperature High(H) Humidity Low(L) ues in the two plots is as high as 0.949, which indicates that the Windspeed High(H) Visibility Low(L) Cloudcover High(H) Precipitation High(H) causaldependenciesarealignedregardlessoftheprofilerepresen- tation.Thisreaffirmsthevalidityofourresults. Table1:Ourweatherattributeswiththeirtreatmentgroups. Fig.1(b)revealssomeinsightfultrends. Forexample,whenthe feels-liketemperatureishigh,weobserveanincreaseinNewsand listourtreatmentgroupsinTable1.Thevalueof20%ischosenso adecreaseinKidsprograms. Oneplausibleexplanationforthisis thatthenumberoftreatedeventsissufficientlylarge. that warm days are suitable for outdoor activities and kids watch Thepotentialoutcomesateventiundercontrolandtreatment, less TV. Therefore, the volume of the genres watched by adults Y (0)andY (1), aretheindicatorsofwatchedTVcontentunder increases.Whentheprecipitationishigh,weobserveadecreasein i i controlandtreatment,respectively.Inourexample, DramasandanincreaseinPanels. Oneplausibleexplanationfor thisisthatrainydaysmakepeoplelesshappy,andtheywatchless Y (0)=1{Dramawatchedateventiifthetemperatureislow}, i Dramasthatareunlikelytocheerthemup. Y (1)=1{Dramawatchedateventiifthetemperatureishigh}. Wealsovalidatethequalityofourmatching. TheaverageEu- i clideandistancebetweenthecovariatesofrandomlymatchedevents Then the ATT in (1) is the change in the frequency of watching is0.72. Thedistanceofourmatchedpairsis0.13. This82%im- Dramasduetotemperature. IftheATTissignificantlyabove(be- provement indicates that our matching is very good, and that the low)zero, wesaythathightemperatureincreases(decreases)the estimatedeffectsmaybecausal. frequencyofwatchingDramas. Anear-zeroATTmeansthatthere is no effect on Dramas. We define potential outcomes and ATTs 4. CONCLUSIONS forallcombinationsofTVgenresandtreatments. Becausesome genresareinfrequent,wereportATTsdividedbythefrequencyof In this work, we conduct a causal analysis of weather on TV theirgenres. Thisnormalizationisonlyforvisualizationpurposes watchingpatterns. Weusetwouserprofilerepresentationsthatre- andhasnoimpactonthestatisticalsignificanceofourfindings. vealsimilardependencies.Tothebestofourknowledge,thisisthe Thecovariatex istheprofileoftheuserateventiandthetime firstcausalanalysisofalarge-scaledatathatlooksattheinterplay i oftheevent. Bothstronglyaffect(Y (0),Y (1)),andthereforeare betweenweatherandwatchingTV. i i goodcandidatesforguaranteeingthat(Y (0),Y (1))andT arein- Itis worthnoting thatour workisdomain-agnostic. However, i i i dependent. We experiment with two kinds of user profiles. The someattributes,suchashumidityandprecipitation,areclearlycor- first is a vector of the frequencies of watched TV genres. This related. We posit that some domain knowledge could help us to profile captures high-level genre preferences. The second profile refinethesetofinfluentialattributesandsubstantiallyimproveour isestimatedfrommatrixM ∈ {0,1}nu×np thatindicateswhich results. Webelievethattheuncoveredweatherdependenciescan programsarewatchedbywhichusers,wheren isthenumberof beincorporatedintoacontext-awarerecommendertoenhanceits u usersandn isthenumberofprograms. LetU ∈ Rnu×d bethe recommendations,andweleavethisforfuturework. p userlaterfactorsinrank-dtruncatedSVDofM. Thentheprofile ofuseruistheu-throwofU.Wechoosed=16.Theeigenvalues References ofM aresmallafterwards,andhencethefirst16singularvectors [1] A.FarahatandM.C.Bailey. Howeffectiveistargetedadver- ofM capturemostofitsstructure. tising? InProceedingsofWWW,pages111–120,2012. 3.2 Results [2] T.PartonenandJ.Lönnqvist. Seasonalaffectivedisorder. The Fig.1(a)showsthenormalizedATTs(1)ofpressureon8popular Lancet,352(9137):1369–1374,1998. genres.Weobservethatwhenthepressureislow,thefrequencyof [3] J.Pearl. Causality. CambridgeUniversityPress,2009. watching Dramas decreases by 5%. The same effect is observed forbothuserprofiles,whichstrengthensthevalidityofourresults. [4] D.B.Rubin.Matchingtoremovebiasinobservationalstudies. Fig.1(b)showthesignificanteffectsforallweather-genrecom- Biometrics,pages159–183,1973. binations. WemeasurethesignificanceoftheATTbyitsestimate dividedbyitsstandarderror,whichisitsstandarddeviation.There- [5] M. Xu, S. Berkovsky, S. Ardon, S. Triukose, A. Mahanti, fore,thevalueof4(−4)meansthattheestimateis4standarddevi- and I. Koprinska. Catch-up TV recommendations: show old ationabove(below)zero,andsignificantatp≈10−4.Weobserve favouritesandfindnewones. InProceedingsofRecSys,2013.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.