ebook img

Understanding the detection of fake view fraud in Video Content Portals PDF

14 Pages·2015·0.49 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Understanding the detection of fake view fraud in Video Content Portals

Understanding the detection of fake view fraud in Video Content Portals Miriam Marciel Ruben Cuevas Albert Banchs Roberto Gonzalez NECLabsEurope UniversidadCarlosIIIdeMadrid IMDEANetworksInstitute NECLabsEurope [email protected] [email protected] UniversidadCarlosIIIdeMadrid [email protected] [email protected] Stefano Traverso Mohamed Ahmed Arturo Azcorra PolitecnicodiTorino NECLabsEurope IMDEANetworksInstitute [email protected] [email protected] [email protected] ABSTRACT $49Bin2014,intheU.S.alone. Thisfigurecorrespondsto 15.6%increaseinrevenuewithrespectto2013. Following While substantial effort has been devoted to understand thetrendsintechnology,mobileandvideoadvertisinghave fraudulent activity in traditional online advertising (search becomeanimportantandgrowingfractionoftheoverallon- and banner), more recent forms such as video ads have re- line advertising market. Of particular interest to this work ceivedlittleattention. Theunderstandingandidentification isvideoadvertising. Surveysindicatethatin2013, 93%of offraudulentactivity(i.e.,fakeviewsonvideos)invideoads online marketers used video to advertise their products [1], foradvertisers,iscomplicatedastheyrelyexclusivelyonthe and of these, 65% used YouTube specifically to deliver ad- detectionmechanismsdeployedbyvideohostingportals. In verts. Thisactivityisestimatedtohavegenerated$3.3Bin this context, the development of independent tools able to 2014,intheU.S.,representingapproximately7%ofthetotal monitor and audit the fidelity of these systems are missing revenuegeneratedbyonlineadvertising[2]. Thisfigurehas todayandneededbybothindustryandregulators. risenfrom$2.2Bin2012,andisexpectedtogrowto$8Bby In this paper we present a first set of tools to serve this 2016[2,3]. purpose. Using our tools, we evaluate the performance of Givenitssize, itisnosurprisethatonlineadvertisingat- the fake view detection systems of five major online video tractsfraud. Currentestimationsindicatethat15-30%ofad- portals. OurresultsrevealthatYouTube’sdetectionsystem impressions are fraudulent [4, 5, 6] leading to losses in the significantlyoutperformsalltheothers. Despitethis,asys- orderofbillionsofdollarsforadvertisers[7]. Withrespect tematic evaluation indicates that it may still be susceptible tovideoads,recentmediaandindustryreportsindicatethat to simple attacks. Furthermore, we find that YouTube pe- fraud is especially endemic in online video, c.f. [8, 9, 10]. nalizesitsusers’publicandmonetizedviewcountersdiffer- TheU.S.basedmarketingcommunitybody,theAssociation ently, with the former being more aggressive. In practice, of National Advertisers (ANA), reported in Dec. 2014 that thismeansthatviewsidentifiedasfakeanddiscountedfrom (onaverage)23%ofvideoadviewsacrossdifferentstudies the public view-counter are still monetized. We speculate arefraudulent[11]. that even though YouTube’s policy puts in lots of effort to Researchers and practitioners have previously analyzed compensateusersafteranattackisdiscovered,thispractice andproposedsolutionstoaddressclickfraudinsearchand placestheburdenoftheriskontheadvertisers, whopayto display ads. Notable examples include [12, 13, 14, 15]. gettheiradsdisplayed. While the goal of click fraud is in general to inflate user activity counters at a particular target, such as a webpage, Keywords online video ads offer new motivations, attack paths and YouTube, advertising fraud, fake views, detection mecha- revenue streams. First, the prestige associated with own- nism ing/producing popular videos is now commonly commodi- tized, for example it was reported that YouTube removed morethan2Bsuspectedfraudulentviewsfromaccountsas- sociated with the music industry [16, 17]. Second, online 1. INTRODUCTION videos advertisers are forced to delegate the detection of fraud to the online video portals, that host their content, Online advertising sustains a large fraction of Internet and have to rely on the high-level statistics these offer. In businesses. It was recently reported by The International contrast, search and banner advertisers can collect partial AdvertisementBureau(IAB)tohavegeneratedarevenueof 1 information from their click/impression sources to identify Source YouTube Vimeo Dailymotion Myvideo.de TVUOL SYSOMOS[23] 81.9% 8.8% 4% - - fraudulentactivity. Forinstance,theycanregisterdatafrom DATANYZE[24] 65.1% 11.1% 0.6% - - the delivered users (e.g., session times), ad-location, etc. NIELSEN[25] 84.2% - 1.16% - - Statista[26] 73.6% 0.9% 1.6% - - Third, while ads in banner search are sold at either Cost- Alexa 3 145 84 3236(DE:153) 101(BR:5) per-Impression (CPI), or Cost-per-Click (CPC), video ads Views/day(x1M) 1200 1 60 7 6 aretypicallysoldatCost-per-View(CPV),1 whichisonav- Table1: Marketshareandrankofthestudiedportalsasre- eragemoreexpensivethantheequivalentCPIs(soldinsets portedbydifferentpublicsources. Viewsper-dayestimates of1000,andreferredtoasCostPerMille(CPM))[18]. This arecollectedfromWikipedia[27]. isanobviousmotivationforfraudsterstoprefervideo-ads. . The common attack in online video ad fraud consists of inflatingtheviewcountofone(ormore)video(s),usingbot- (1)Ofthe5portalswetest,YouTubeintheonlyportaltode- nets or viewers recruited from cheap-labour countries [19]. ployasignificantlydiscriminativefakeviewdetectionmech- In fact, it is easy to find paid services that offer to gener- anism. All other portals do not sufficiently discount view ate tens of thousands of views to videos hosted on popular counters under the simplest fake views generation configu- platforms(YouTube,DailymotionorVimeo)atalowprice.2 rations. Ifthegoaloftheattackeristosimplyincreasethepopular- (2)Adeeperanalysisrevealsthattheviewdetectionmecha- ity and visibility, this is enough. In the case of monetized nismofYouTube’spublicviewcounterissusceptibletosim- views, the attacker translates the view inflation to revenue plefakeviewsgenerationstrategiessuchas: usingmultiple byattemptingtohaveadsservedtoitsfakeviewers. values in the HTTP connection attributes (e.g., User-agent In response to the scale of the video-ad fraud, the online orReferrer),distributingviewsacrossmultipleIPaddresses, advertising market has consistently published the need for orroutingviewsthroughNATs. solutioninthemedia[20],andtheIABhasrecentlyformed (3)Wefindaconsistentandsignificantdiscrepancybetween a working group to address the issues [15]. It has so far the counters reported for the same content by public and published a white-paper report on anti-fraud principles and monetized view counters in YouTube; whereby monetized proposedareferencetaxonomy[19]. Atthesametime,on- viewcountersreportatleast75%morefakeviewsthanpub- line video platforms have acted to forestall the damage by licviewcounters. strengthening their fake-view detection methodologies [21, 22],andpublicizingtheiractivity. Organizationofthepaper As things stand today, we lack standard methods to au- Therestofthepaperisorganizedasfollows. Sec.2presents ditingormonitoringtheperformanceofthefrauddetection thebasicbackgroundonthebusinessmodelsandstatisticre- algorithm used by video platforms. This is reflected in the porting tools for the five online video platforms considered IAB working group white-paper which states that “[Supply inthisstudy.InSec.3wepresentthemeasurementtoolsand sources] are challenged by a lack of consistent and inde- theperformancemetricsusedinthisstudy. Sec.4compares pendently measurable principles on how they each should theperformanceofthefakeviewdetectionalgorithmsofthe identifyandexpungefraudulenttraffic”[19]. online video platforms under analysis. Sec. 5 and Sec. 6 In this paper, we first propose a measurement methodol- presentmoredetailedanalysesofthefakeviewdetectional- ogy to aid in filling this gap. Employing a modular active gorithmsusedbyYouTubetodiscountviewsfromthepublic probe, we evaluate the performance of the fraud detection view-counterandthemonetizedview-counter, respectively. mechanism (for public and/or monetized views) of 5 on- Finally, Sec. 7 discusses the related work and Sec. 8 con- line video portals, namely YouTube, Dailymotion, Vimeo, cludesthepaper. Myvideo.de,andTVUOL. Being YouTube the only portal deploying a sufficiently 2. BACKGROUND discriminative fake view detection mechanism, we deepen Inthiswork,wefocusonuser-generatedvideoplatforms, ouranalysistostudyitskeyparameters. Welookattheim- the most widely used and, therefore, the most susceptible pactofmanipulatingonlyparametersthataredirectlyacces- tovideoadvertisingfrauds. Table1summarizestheonline- sible to viewers, including: the behavior of an IP address, video market share of the portals considered in this study. such as varying the number of visited videos per-day, the SinceYouTubeisreportedbyallsourcestobethedominant viewsper-video,andthedurationper-view. Wealsolookat playerofonlinevideomarket, itwillserveasthereference theimpactofchangingthebrowser-profileofviewers,such portalinourstudy. whetherornotcookiesareenabled,andtheimpactofmixing Mostuser-generatedvideoplatformsmonetizethecontent vieweractivityinNATedtraffic. uploadedbytheirusersthroughadvertising. YouTube,Dai- Ourmainfindingscanbesummarizedasfollows: lymotion, Myvideo.de, TV UOL all deliver adverts to the videos streamed to the viewers. YouTube is the only plat- 1seehttps://support.google.com/adwords/answer/2472735?hl=en formthatallowsitsusertoexplicitlyindicatewhetherornot 2e.g.,http://www.viewbros.com toenrollhervideosintothemonetizationprogramme.Users 2 whodecidetoparticipate,sharetherevenuefrommonetized thevideogenerated. Inordertoenableuserstobettertarget views. Dailymotionimplementsadifferentrevenuesharing their contents, these metrics are available by country, date scheme. It allows third-party web-masters to enroll in the andtypeofad. “Dailymotion Publisher Network”, so that they can mone- Finally, YouTube Analytics service collects, and reports tizeviewstovideoshostedbyDailymotion,andareembed severalfinergrainedstatisticsforviewerbehaviour,includ- in their sites.3 While YouTube and Dailymotion share the ing, the number of views grouped by day, country, viewer advertisingrevenuewiththeirusers,tothebestofourknowl- ageandgender, playbacklocation(ifvideoisembeddedin edge,Myvideo.de,andTVUOLdonot.Vimeoisstilldiffer- thirdpartywebsites),etc. YouTubeAnalyticsalsoprovides ent, as it runs a graded subscription based model, whereby userswithsummaryreportsontheirchannelsubscribers,in- only users paying for the ’Plus’ or ‘Pro’ accounts are able cludingtheirlikesanddislikes,comments,etc. Thesestatis- to generate revenue from their uploads; either by using a ticsareupdatedonceaday[30]. Basedonourexperiments “TipJar”servicethatenablesotherviewerstotiptotheuser Analyticscountsonlyvalidatedviews. (availablefor’Plus’and‘Pro’accounts)ora“Pay-To-View” -Dailymotion providespublicviewcountsoneachvideo’s service in which viewers pay to watch (only available for webpage,anduserscanaccesssimilarstatisticstothoseof- ‘Pro’accounts). fered by YouTube Analytics: number of views filtered by Undertheserevenuemodels,malicioususershaveincen- country, and playback location, over a selected time win- tives to inflate their view counters. When there is revenue dow. Web-mastersregisteredwiththeDailymotionmoneti- sharingbasedonviewcounts,asinthecaseofYouTubeand zation service can access monetization statistics: the num- Dailymotion,viewinflationistypicallylinkedtothegener- ber of impressions, estimated revenue and CPM. However, ationoffraudulentrevenue. However,userviewinflationis thesestatisticsareaggregatedacrossallvideosassociatedto not limited to defrauding the ad system; as mentioned, nu- a web-master’s account. Hence, statistics per-video are not merousexampleshavebeendocumented,showingthatusers available. can,anddo,tradeonthepopularityoftheiruploads,cf.[16]. -Myvideo.deandTVUOL providepublicviewcounters To enable their users to understand how viewers interact only. This data can be accessed through either the website withtheircontent,videoplatformsrecordandreportseveral associatedtothevideoortheuseraccount. statistics. Morespecifically: - Vimeo offers public statistics for each video including - YouTube provides two main sources of data on user the number of views, likes and comments as well as their activity (counted views): public statistics (public view weeklyevolution.VimeoBasicaccountshaveonlyaccessto counter, number of comments, likes, dislikes, number of the public statistics, whereas Vimeo Plus and Pro accounts subscribers),whichareavailableonthevideowebpage,and haveaccesstoadvancedstatistics[31],includinggeograph- private statistics, which include the number of counted and ical information about the views, information about users monetized views and are only available to the video up- commentingorlikingthevideo,etc. loader. The public view-counter is updated in real-time, In summary, different platforms offer statistics with dif- while the video has less than 301 views. Once this thresh- ferentlevelofdetailontheircountedandmonetizedviews. oldisreached,theview-counterispausedandabackground Overall, YouTube is the one offering most extensive statis- verificationprocessstarts. Afterthisinitialverificationpro- ticswhereasMyvideo.deandTVUOLprovideverylimited cess successfully ends, public view-counters are updated statistics. onceevery30-120minutes[28],andonlyverifiedviewsare counted. 3. MEASUREMENT TOOLS, PERFOR- YouTubeprovidesseparatestatisticsforcountersonmon- MANCE ASSESSMENT METHODOL- etized content. Users have to create an AdSense account, OGYANDDATAPROCESSING and enrol their YouTube channel, to monetize their con- In this section we present the tools that we have imple- tent. Users can then view monetization statistics in both mentedtogenerateviewsandfetchstatisticsautomatically, their YouTube analytics and the AdSense accounts. In this as well as our methodology to evaluate the effectiveness of paper we use the monetization statistics from the YouTube video portals in detecting fraudulent views. We note that Analytics service, which is claimed to provide an error of thesetoolsandmethodsareavaluablecontributioninthem- lessthan±2%withrespecttotheactualnumberofmone- selves,andcanbeeasilyadaptedtoevaluatenewvideopor- tizedviews[29]. Inparticular,themonetizationstatisticsof- talsbeyondtheonesconsideredinthispaper. feredbyYouTubeAnalyticsare:(i)theestimatednumberof monetized views, i.e., the number of viewers who watched 3.1 MeasurementTools anassociatedvideo-ad,(ii)theestimatedrevenuebasedon To evaluate the performance of the fake-view detection theCostperMille(CPM),and(iii)thetotalgrossrevenue mechanismsdeployedbythedifferentonlinevideoportals, we implement probes that generate automatic views under 3Notethatuserscannotmonetizetheirvideosdirectly. However, web-masterscanembedanyvideohostedbyDailymotionintheir well defined constraints, and log the results of their activ- websiteandmonetizetheviewsattracted ity. In addition, we build tailored web crawlers to collect 3 Module Description DefaultValue intheuser(i.e.,uploader)accountandretrievethenumberof Settheuser-agentforasession(e.g., User-agent Linux/Firefox countedviewsforthevideoaswellasthenumberofmon- FirefoxorChrome). Set the referrer for a session. Op- etized views (if available).4 In particular, for Myvideo.de, tionsareFacebook,Twitter,YouTube TVUOLandVimeowecollectinformationonthenumber Referrer DirectLink Search (specific for the case of ofcountedviewswhereasforYouTubeandDailymotionwe YouTube),andDirectLink. retrievealsothe reportedstatisticsonthe numberofmone- Whenenabled,alltheviewshavethe Cookies Disabled samecookies. tizedviews. Durationofavideoview(inseconds). Optionsarei)fixedtimeorii)sam- 3.2 PerformanceAssessmentMethodology Fixed (40 Viewduration plesfromanexponentiallydistributed randomvariablewithmeanthedura- secs.) Toestablishthereliabilityoftheviewcounters,andcom- tionofthevideo. pare the different portals, we measure their accuracy in de- Constant Varytheviewinter-arrivaltime(sec- tecting fake views and report their false positive rates. For Wait time be- factor of the tweenviews onds).OptionsareaPoissonprocess, number of somespecificportalswealsoreporttheirfalsenegativerates. oraconstant.Zeroindicatesaburst. dailyviews Falsepositives: Fromtheperspectiveofaviewcounterde- Table2: Descriptionofoursoftwaremodulesandtheirde- tection mechanism, a false positive is defined to be a ‘fake faultsetting view’ that is misclassified and counted in the view counter (publicormonetized). Tomeasuretherateoffalsepositives thenumberofviewscountedandmonetizedbythedifferent foragivenportal,aprobegeneratesviewstoagivenvideo videoportals. onagivenportalandretrievesthenumberofcountedviews Automatic Views Generation: We implement a Sele- from the statistics offered by the portal. The false positive nium [32] based modular probe to simulate the actions of rate(R )forthegivenplatformisthendefinedas: viewers on the different portals. This tool is able to load FP a given video-page, and we can easily configure it to per- #countedviews R = FP formcertainviewer-likeactionssuchasinteractingwiththe #‘probe’generatedviews objects in the page or varying the duration of video views. False negatives: Again from the perspective of a view Viewer-likebehaviorsareimplementedasdynamicallycon- counterdetectionmechanism, afalsenegativeisdefinedas figuredmodules,whichenableustoflexiblyexplorethepa- a‘realuser’viewlabelledasfakefromthesystem,andnot rameterspacethatmightbeusedtocategorizetheactionsof counted in the view counters. To measure the rate of false viewers. Table 2 summarizes the list of available modules negativesforagivenportal,weembedvideosintowebpages andtheirdefaultsettings. that we can control and monitor. We then request collab- In order to accurately monitor and isolate the impact of oration from real users to visit these pages, and watch the ourexperimentsontheportalstested, eachprobegenerates videos. The experiment tracks the number of visitors ac- views only to videos that we upload for our experiments. cessingeachwebpage,andthenumberofvideostheywatch. So as to reduce background noise resulting from real users Thefalsenegativerate(R )forthegivenplatformisthen FN stumblinguponthevideos,thenamesanddescriptionsofall definedas: experiment videos are set to random hashes, and all exter- #countedviews nallinkstothemareremoved. Wemeasurethescaleofthe R =1− FN #‘real-user’generatedviews backgroundnoiseresultingfromourapproachbyuploading 209videostoYouTube. Weobservethattheyonlyattract21 3.3 DataProcessing viewsfromexternalusersoverathree-monthmeasurement Theresultsobtainedfrommeasuringsystemsinthewild period. typically show a high level of variability. Our case is not To scale our experiments, where specified, we utilize transparentSquid[33]proxiessetupin∼100publicIPad- an exception. The sources of variability in our study may comefromourmeasuringarchitecture,(e.g.,afailingproxy) dresses in Spain and Germany as well as in 300 PlanetLab orfromtemporaltransientbehaviorsofthedetectionmech- nodes[34]. Theseproxiesrelayviewsgeneratedbyprobes anisms of video portals. While identifying failures in our running in our local datacenter. From our experiments, we probesandremovingthemfromthedatais(relatively)easy, determinethatYouTubetreatsdirectandtransparentlyprox- identifying temporal transients in the behaviour of the de- iedrequestsequally. tectionmechanismandfilteringthemischallenging. Indeed Intherestofthepaper,wewillindicatethespecificcon- ourexperimentsshowevidenceofthepresenceofsystematic figurationoftheprobeusedtoconducteachreportedexper- andrandomtemporaltransientsinthedetectionmechanism iment. of some portals. For instance, when our probe generates Fetching Statistics from Video Portals: In order to re- automatic views to a YouTube (our reference video portal) trieve the statistics reported by each portal, we implement tailoredwebcrawlers. Theseallowto: (i)collecttheinfor- 4Thecrawleralsologinintheweb-masteraccountinthecaseof mationfromthevideos’publicviewcountersand(ii)login Dailymotion. 4 Trace Period Length IPaddresses Views Videos 1.0 YT-1 01/03/13-30/04/13 2months 28071 3.94M 1.37M YT-2 01/05/13-30/11/13 7months 16781 15.9M 3.95M 0.8 0.6 Table3: ThemeasurementtracescontainingYouTubevideo P RF sessions we obtain from a residential network of an Euro- 0.4 peanISP,andtheirmaincharacteristics. 0.2 0.0 video,weobservethatthereisatrend(verycleariftherate YouTube Dailymotion Myvideo.de UOL Vimeo ofconfiguredrateofviewsperdayishigh)forYouTubeto 100views 400views 500views count more views in the first few days of the experiment, thentolateradjusttoalowerstablefalsepositiveratiointhe Figure 1: Comparison of the false positive ratio among the restoftheexperiment. Moreover,wealsoobservesomeex- studiedportals. perimentswithalowfalsepositiveratioinalltheexecution oftheexperimentthatshowapeakinthefalsepositiveratio inanuniqueday. indicatethatnoneoftheIPaddressesinYT-1andYT-2per- Inordertoidentifythestandardbehavioroffakeviewde- form more than 100 daily views to a single video. Since tectionmechanismsofvideoportalsandgetridofthemost YouTubeisthemostpopularamongthestudiedportals,we commontransienteventsasthosedescribedabove,wecom- assume that the configured number of views per day repre- pute for each experiment the daily false positive ratio and sentsanaggressivesetupforalltheportals. calculatethemedianacrossthedaysoftheexperiment.Note The results of this experiment is reported in Figure 1. thatthemedianisrobusttooutliersthatinourcasearerepre- Main bars represent the average RFP value from the three sentedbytemporaltransientevents. Forsimplicitywerefer experiments and error bars report the max and min RFP, tothismetricasR acrossthepaper. Finally,wetakecare while the different colors correspond to the different daily FP ofrepeatingallexperimentsnumeroustimestoprovidesta- view rates. Our results indicate that YouTube operates the tisticalconfidenceintheresultsreported. mostdiscriminativedetectionmechanism,andthatitissig- nificantly more effective than the other portals. It penal- izes all the views from the probes. In contrast, Dailymo- 4. COMPARISON OF FAKE VIEW DE- tioncountsasvalidalmostalltheviewswhenthedailyrate TECTION MECHANISMS OF MAJOR is 100, and 93% (85%) of them when the rate is 400 (500) PORTALS viewsperday. Myvideo.de,TVUOLandVimeodeployde- tectionmechanismsthatcapture<5%ofprobesfakeviews, In this section, we investigate how views are counted in evenforthemostaggressiveconfiguration. thedifferentportals. Wefirstcomparethefakeviewpenal- In summary, we observe that YouTube implements the ization methods employed by the different portals in their most discriminative fake view detection mechanisms, and public view counters. We then study how the two portals isabletoeasilydetectobviouslyaggressivebehaviors. Sur- that share their revenue (YouTube and Dailymotion) count prisingly,otherportalshaveseriousproblemstodiscriminate viewsformonetizedcontent. fakeviewevenundertheconsideredaggressivesetups. 4.1 Countingviewsinpublicviewcounters RateofFalseNegatives: ToevaluatetheR forthedif- FN Rate of False positives: We start by looking at the rate of ferent portals, we embed videos from the different portals falsepositivesforpublicviewcounters. Wesetupasimple intowebpageswecontrol,andtrackthenumberofrealusers experimentwherebyasingleprobe,withdefaultparameters, accessing the pages and watching the embedded videos as from a fixed IP address, varies only the per day number of wellasthedurationoftheirviews. Further,wecomparethe views it generates to a given video on a given portal. In impactofsourcingusersviasocialmediaandonlinecrowd- particularforeachofthefiveportals,wegenerate100,400 sourcingplatforms.Inthecaseofsocialmedia,weannounce and500viewsperday,toatargetedvideo. Eachexperiment the URL of the webpage on Facebook and Twitter and re- runsforeightdaysandwerepeatitthreetimes. quest collaboration from our contacts and friends to visit In order to understand whether the number of views that our pages and watch the embedded videos. In the second we generate corresponds to normal user behavior, we col- case, we use a crowd-sourcing website to recruit viewers. lect information from a large number of YouTube sessions Finally, giventhecostofrecruitinglargenumbersofusers, togiveusabaselineforthenormaluserbehaviorinthenet- and the results of the previous experiment indicating that work. To do this, we replicate the methodology described onlyYouTubeandDailymotionaresignificantlydiscrimina- in[35]andcollecttwoindependentdatasets(fromtheresi- tiveinupdatingpublicviewcounters,weevaluatetheR FN dentialnetworkofanISP)thatcontainmillionsofYouTube onlyforthesetwoportals. sessions. WerefertothesedatasetsasYT-1andYT-2andwe TheresultsofthisexperimentaresummarizedinTable4. summarizetheirmaincharacteristicsinTable3. Ourtraces WefindthattheR isreasonablysmallforthetwoportals FN 5 #performed #counted Platform Experiment realviews views RFN 1 Publicview-counter Monetizedview-counter SocialMedia 330 322 2,4% YouTube Crowd-sourcing 599 537 10,3% 0.75 SocialMedia 325 290 10.9% Dailymotion Crowd-sourcing 587 515 12.2% RFP 0.5 Table4: FalseNegativeRatioforthetwoconductedexperi- 0.25 mentsforYouTubeandDailymotion. 0 YouTube Dailymotion under both user sourcing approaches (< 12%). Hence, we concludethatthefakeviewdetectionalgorithmsofYouTube Figure 2: Comparison of false positive ratio for the num- and Dailymotion are fairly effective at identifying views berofviewsinthepublicandmonetizedviewcountersfor generatedbyrealusers. However,wenotethatDailymotion YouTubeandDailymotion. showsalargerR forthefirstexperiment. FN Finally, it is worth noting that the data provided by videowebpage. ForDailymotion, viewsaredirectedtothe YouTubeandDailymotioninbothexperimentsshowsatyp- externalwebpageswebuildtoembedourvideos. Basedon ical spatially localized distribution of viewer visits. View- preliminaryexperiments,weobservethatDailymotiondoes ers are mostly localized in a specific geographical re- not show an advert to every automatically generated view. gion; for the social media experiment, most of the views Then,toproperlylogthenumberofgeneratedviewssubject come from Spain, whereas, for the experiment that uses to be monetized, we have developed a tool, based on opti- crowd-sourcing, most of the views come from India and calcharacterrecognition(OCR),todetectwhethertheview Bangladesh. Interestingly,weobservethatthesocialmedia receivedanadvert.5 experiment for both portals presents a smaller R . This FN Weruntheseexperimentsusingthedefaultconfiguration difference may be due to the fact that many of the views ofourprobe,whichemulatesadeterminsiticbehaviour,and comefromSpain,acountrythatisreportedtopresentavery issettoperform20viewsperdaytoatargetedvideofroma lowvolumeoffraudulentactivitiesinonlineadvertising[36, singleIPaddress. Werepeattheexperiment4timeswitha 37]. durationof20daysforbothportals. 4.2 Counting views in monetized view coun- FromourYT-1(YT-2)tracesweknowthatlessthan0.04% ters (0.01%) of monitored IPs perform more than 20 views per day to a single video. We therefore consider that our setup Having established a baseline for how views are dis- is aggressive and expect our fake views to be easily identi- counted in public view counters for (i) the actions of real fied. Moreover, as monetized fake views translate to direct users,and(ii)obviouslyaggressivefakeviewpatterns.Now, costs to advertisers, we expect both portals to be strict in we look at how this discounting is reflected on monetized theidentificationoffraudulentviewsformonetizedcontent. content. Morespecifically,wefocusoncontentweenrolin We check this assumption by comparing the R between FP YouTube and Dailymotion monetization programmes, and YouTubeandDailymotion. Notethatforeachtrail,wecon- we look at how actively it is monetized. Note that a given figure the probe to view the video and any video advert it viewisconsideredmonetizedonlyifavideoadvertisshown encounterscompletely. inthesessionandtheviewerspendsagiventimewatching Figure2comparestheR inthenumberoffakeviews FP the advert. Thus, to evaluate the effectiveness in counting countedbythepublicandmonetizedviewcounters,ineach monetizedviews,wecounttheviewswegeneratethatactu- experiment,forbothYouTubeandDailymotion. Again,the allyencounteranadvert, and, then, usingtheanalyticstool main bar depicts the average value across the experiments, providedbyeachportal,wecomparethisnumberagainstthe andtheerrorbarsgivethemin-maxvalueoftheR across FP viewsthattheportalhasmonetized. theexperiments. Tocarryoutmonetizationexperimentsweregisterseveral Dailymotion shows the expected behavior, i.e., it dis- accountsandtheirassociatedvideosinthemonetizationpro- countsalargernumberoffakeviewsfromthemonetization gramofeachportal. WeremarkthatDailymotiononlymon- viewcounter(avg. R =72%)ascomparedtothepublic FP etizes videos embedded in external webpages. Therefore, view counter (avg. R = 97%). Despite this improve- FP wecreateexternalwebpagestoembedourvideos. Theweb- ment, the detection mechanism for monetized fake views masteraccountsforthesepagesarethenassociatedwithup- stillpresentsapoorperformancesinceroughly3outofeach loaderaccounts,andtheassociatedvideosarethenenrolled 4 fake views are monetized even under the aggressive con- intothemonetizationprogramme. ForYouTube,theprocess figurationofourexperiment. Surprisingly,YouTuberesults is more straightforward. To monetize videos, the uploader must register her channel to AdSense, Google’s monetiza- 5Asadsdisplaysometextindicatingtheremainingtimeofthead, tionplatform,andindicatewhichvideostoenrol.Inthecase bydetectingthepresenceofsuchtextwecanidentifythatthead ofYouTube,oursoftwaregeneratesviewstotheURLofthe hasbeendisplayed. 6 are in contradiction with our initial expectations. Indeed, 5.1 Parametersusedinthedetection YouTube applies a weak control to remove fake monetized Weleveragethemodularityoftheprobetodefinedifferent views(avg. R =82%)despitehavinginplacearelatively FP configurations to isolate the impact of different parameters efficient mechanism to remove fake views from the public onthefakeviewdetectionmechanism. Inourexperiments, viewcounter(avg. R =7%). Thisunexpectedresulthas FP each probe instance uses a single public IP address chosen beenreportedpreviouslybyYouTubeusers.6 TheYouTube from our pool and performs 20 views per day to the same support team has explained that discrepancies may be due video for 8 consecutive days. Based on our results in Sec- to users watching the video advert but not the video, and, tion4.2,weexpectthatthisbehaviourisaggressiveenough when this happens, the view is monetized but not counted totriggerthefakeviewdetectionmechanismofYouTube. by the public counter. However, this is not our case, as we Next we describe the probe configurations we use in our instrument the software in these experiments to load both experiments. Notethatunlessspecified, wesetallparame- the video advert and the video completely. Another poten- terstotheirdefaultvaluesgiveninTable2. tialreasoncouldbethatYouTubeperformspartofitsview -Deterministic(D):Thegoalofthisconfigurationistode- validation procedure in a post-hoc manner, rather than in fineasimple,deterministic,and,thus,atypicalviewpattern. real-time [38]. However, in our experience, 6 months have The configuration eliminates any randomness by setting to passedsincethetimeatwhichweruntheexperiments,and constant values the view time (40 secs.) and the time be- wehavenotobservedanycompensationinthestatisticspro- tweenviews(72mins.). Allotherparameterstaketheirde- vided by YouTube. Furthermore, while such an approach faultvaluesfromTable2. Weexpectthisconfigurationtobe wouldprovidetheflexibilityofenablingYouTubetoretroac- easilyidentified. tivelycompensateforattacksthatarediscovered,wespecu- - Vary view burst (B): Having observed the impact of latethatitwouldplacetheburdenoftheriskontheadvertis- changingthenumberofviewsthatavideoreceives,wede- ers,whowouldhavetobidtogettheiradvertsdisplayedto signthisconfigurationtolookattheimpactofmakingviews customerswithoutrelyingonstableandvalidatedstatistics. in bursts. In particular, the probe runs the Deterministic From the above experiments, we would like to highlight configuration zeroing the time between consecutive views, that: First, we have not received any money while running therebygeneratingaburstof20consecutiveviewseveryday. these experiments, and all the statistics we report are those The time between bursts is configured to 24 hours. Since we retrieve from the YouTube Analytics channel page and viewsinburstsfromagivenIPaddressareatypical,weex- the Dailymotion Publisher page; Second, we have reported pect this configuration to show a false positive rate lower ourfindingstoYouTubeandDailymotionandplantopresent thanthedeterministicconfiguration. theirfeedbackandexplanations,oncewereceivethem. - Vary inter-view wait time (P): The goal of this config- Insummary,theanalysisconductedinthissectionshows uration is to measure the impact of varying the time be- that, among the studied online video portals, YouTube is tween views over a day. In particular, this probe runs the theonlyoneimplementingasufficientlydiscriminativefake Deterministic configuration, but varies the time between viewdetectionmechanism. However,ourresultsrevealthat two consecutive views to make it follow a Poisson process YouTubeonlyappliesthismechanismtodiscountfakeviews (λ = 20). In essence, this configuration aims to determine from the public view counter and not from the monetized whethersimplyaddingnoisetotheviewsperdayrateofthe view counter. In the rest of the paper, we present advanced Deterministicconfigurationhasimpact. methodologiestofurtherunderstandthefunctionalityofthe -ShortViews(SV):Thegoalofthisconfigurationistomea- fake view detection algorithm of YouTube for public (Sec- suretheimpactofmakingveryshortviewstovideos. This tion5)andmonetized(Section6)view.counters. isabehaviourthatisknowntobeutilizedbybotsandpaid- for mass view services. In particular, this probe runs the 5. YOUTUBE’S FAKE VIEW DETECTION Deterministic configuration, but sets the duration of video MECHANISMFORCOUNTEDVIEWS viewsto1sec. Sinceconsecutiveshortviewsisatypicalof users, we expect to see this configuration to be heavily pe- In this section we explore different aspects we believe nalized. YouTube may employ to detect fake views in order to un- -Cookies(CK):Thegoalofthisconfigurationistomeasure veil some fundamental components of the fake view detec- towhatextentthedetectionsystemsrelyoncookiestoiden- tionsystem. Weacknowledgethatthespectrumofparame- tify users. Therefore this configuration performs all views tersandtheirinteractionswithinYouTube’sdetectionmech- usingthesamecookie. anismmaybetoolargetomakeafullexploratoryexercise. -Complete(C):Thegoalofthisconfigurationisprovidea Instead,weanalyzetheimpactofasubsetofmeaningfulpa- baselinebyemulatingsomereal-userlikefeatures. Assuch, rameters,thatotherdetectionmechanismscanmonitor,and it enables all the modules in Table 2, but cookies. Specifi- areeasytoconfigureforauser. cally,theviewduration(λ=durationofthetargetedvideo) 6https://plus.google.com/100368302890592068600/posts/ and wait time between views (λ = 20) are selected from 1sEuu94EjuV Poissonprocesses. TheRefererandUser-agentfieldsarese- 7 0.6 1.0 0.5 RFP RFP Conservative(IP-1) Conservative(IP-2) 0.4 ws1 RFP000...123 RFP0.5 R2=0.9992 #ofcountedvie Astgagrtressive Aengdgressive 0.0 0.0 0 C P D SV CK B 0 810 20 30 40 50 60 0 5 10 15 20 25 30 Probeconfiguration #ofviewsperday(W) Time(days) Figure3: Falsepositiverateobtainedfor Figure 4: The false positive rate and its Figure 5: Number of views counted by each one of our experiment configura- fitting model when increasing the num- YouTube for both IP-1 (running only a tions. berofdailyviewsperformedbyasingle conservative probe) and IP-2 (running IPaddresstoasinglevideo. a conservative probe and an aggressive probe). lected randomly. Given the variation in the parameters, we of the strongest online users identifiers [38] and one of the expectthisconfigurationtobetheleastpenalized. key parameters many security online services rely on [13, Foreachconfigurationweruntheexperiment5timesand 39,40],leadsustoconcludethatthevideoviewingpattern evaluate the R (again from the perspective of the view from an IP address is a key element for the fake view de- FP detection mechanism). The results are given in Figure 3; tectionmechanismofYouTube. Weanalyzethishypothesis the main bar and the error bars represent the average, and furtherinthenextsubsection. max/min R . Each bar represents a different configura- FP 5.2 Influence of Video Viewing Pattern in the tion. As expected, the Complete configuration yields the detection highestfalsepositiverate(≈40%),whichisonaveragemore than4timeslargerthantheotherconfigurations,withaver- InthissectionweanalyzetheresponseofYouTube’sde- ageRFP <10%. Thisindicatesthataddingrandomnessto tectionmechanismtothefakeviewpatternsofanIPaddress. basic HTTP connection parameters such as the User-agent, Wefirstlooktheimpactoffakeviewpatternswhenaimedat or the Referer makes it significantly harder for YouTube to asinglevideo,thenexplorethecasesforasingleIPviewing detectfakeviews. multiple videos, and finally a single video receiving views Looking at the impact of varying the wait-time between frommultipleIPaddresses. views (P, D and B), we observe some differences. Specifi- Onevideo,OneIPaddress cally,thedetectionmechanismpenalisesburstybehaviorthe We start our analysis by examining how YouTube dis- most heavily, discounting 98% of the views from the Burst counts the views a single IP address generates to a sin- configuration. Moreover, comparing the Deterministic and gle video. In particular, we are interested in understand- the Short Views configurations, we observe that the detec- ing how the view discounting threshold(s) are triggered, tion mechanism ignores the view duration despite our ex- when varying the number of views per day. We con- pectationthatveryshortviewdurationwouldbeanobvious duct a simple experiment, in which our probe generates candidatefortheidentificationoffakeor(atleast)notwor- W = [1,4,7,8,9,10,20,30,40,50,60] views per day to thyviews. Finally,thedetectionmechanismdoesnotlever- a given video for 8 days. We use the previously defined agebrowsercookiesinthedetectionoffakeviewssincethe Deterministicconfigurationforthisexperiment. differencesinRFP inconfigurationswith(CK)andwithout TheresultsofthisexperimentarepresentedinFigure4.7 (D,SV,etc.) cookiesarenegligible. It reports the R for the different values of W. We ob- FP In summary, the results in this subsection reveal that at- servethatthedetectionsystemcountsallthefakeviewsup tacks performed using a static configuration of HTTP con- toarateof8perday. From9viewson,theR decaysex- FP nectionparametersareeasilyidentifiedbythefakeviewde- ponentiallyandzeroesformorethan30dailyviews.Wecan tection mechanism of YouTube, which effectively removes modeltheR withrespecttotheviewsperday(W)asan FP more than 90% of fake views in this case. Instead, adding exponential decay function with the following parameters, somevariabilityintheseparameterswouldincreasetheef- withanR2=0.999: fectiveness of the attack by ∼30%. While these results ex- plain the false positive rate difference between the consid- (cid:26) 1 ifW ≤8, eredconfigurationsandourbenchmark,theydonotexplain R (W)= FP e−0.454517n otherwise the60%discountedfakeviewscommontoallourconfigura- tions.Theonlyaspectwhichiscommontoallourconfigura- Intheaboveexperiment,allvideosarenew,uploadedfor tionsandwhichmayberesponsibleforsuchlargepenaliza- theexperiment. Inordertounderstandwhetherthishasany tionisthattheyperformtheirviewsfromauniquepublicIP 7We repeat this experiment 3 times from IP addresses located in address. This, alongwiththefactthatIPaddressesareone SpainandGermanyobtainingsimilarresults. 8 impactontheresultsobtained,welookattheresponseofthe detectionmechanismwhenthevideosarealreadyknownby the system and moderately popular. We repeat the experi- 1.0 ment using two videos,8 with roughly 12K (100 in the last 1.0 0.9 0.8 month) and 300K (5K in the last month) registered views 0.8 0.7 overfewyears. Todifferentiatetheactivityofourprobe,we 0.6 configureittouseveryrareuser-agents(specifically,Bada, 0.6 RFP 0.5 0.4 0.4 HitTop,MeeGoandNintendo3DS).Beforestartingtheex- 0.3 periment, we validate (using YouTube Analytics) that the 0.2 0.2 0.1 targeted videos have not received views from the selected 0 0.0 0.0 useRr-eapgeeantitnsginththeeapbroevveioeuxspe6rimmoenntth,sw. ithW = [8,9,10,20] Number5of1v0iew1s5W20 0 N5umb1e0r of1v5ideos20D viewsperday, withnewIPaddresses, wefindthatthefake view detection mechanism again starts discounting views Figure 6: R for several combinations of the number of FP from8viewsperday,foragivenIPaddress. Thissuggests viewsW andthenumberofwatchedvideosD. that the threshold chosen by the fake view detection algo- rithmofYouTubeisfixedregardlessofthepopularityofthe addressesgloballyfortheirbehavior,ratherthanforindivid- video. ualvideos. Multiplevideos,OneIPaddress Observing that the behaviour of an IP address is tracked HavingobservedhowtheviewsfromasingleIPaddressto across video views, we now look at characterizing the pe- a single video are penalized, we now look at the response nalization factor YouTube applies to an IP address when ofthefakeviewdetectionsystemwhenasingleIPaddress it varies the number of videos viewed. We conduct distributesitsviewsoverseveralvideos. Giventheprevious a large scale experiment in which we perform W = result, weexpectthataggressiveIPaddresseswillbeheav- [1,3,5,7,10,15,20] views per day, uniformly distributed ily penalized, independently on the number of videos they across D = [1,3,5,7,10,15,20] videos over a period of target. 7 days.9 In total, we ran 28 trails combining the different In the following we first validate this hypothesis with a numbers of views and of watched videos. For each experi- simple experiment. We then present the results of a large mentweuseadifferentPlanetLabnodeasproxyandusethe scalemeasurementstudyweruntocharacterizetheR as FP Deterministicconfigurationoftheprobe. function of the overall behavior of an IP address, with re- Figure 6 reports the R for each one of the 28 consid- FP specttothenumberofvisitedvideos,andviewsperformed. ered combinations. Looking at the evolution of R for a FP We start defining two types of simple probe behaviors, fixed number of videos, we observe the exponential decay with respect to the number of videos, and views per day, foundearlierineverycase. However, wenoticeavariation which we refer to as conservative probe and aggressive ofthethresholdinthenumberofoveralldailyviewsthatde- probe. The conservative probe performs 1 daily view to a fines the start of the exponential decay. It seems to be set givenvideofromasingleIPaddresswhereastheaggressive to (at least) 10 views for D ≥ 1, whereas we know from probe performs 30 views per day. We set up the following ourpreviousresultsthatitis8viewsforD = 1. Ifwenow experiment: in two IP addresses, IP-1 and IP-2, we launch consider the evolution of R for a fixed number of daily FP aninstanceoftheconservativeprobetoadifferentvideofor views, we observe that when we concentrate all views in a 34 days. Moreover, in IP-2, we also launch an instance of singlevideo,thepunishmentismuchmoreseverethanwhen the aggressive probe, starting at day 8 and stopping at day theyspreadacrosstwoormorevideos.Moreover,thediffer- 24,whilethereisnoaggressiveprobeinIP-1. encesinR forthecasesof2ormorevideosarerelatively FP Figure5givesthenumberofviewsthedetectionmecha- small(≤7%). nismcountsovertime,fortheconservativeprobesinthetwo Onevideo,MultipleIPaddresses IPaddresses.Sincetheseprobesperformjust1viewperday Having established, thatan IP address is tracked across the tothevideo,weexpecttoseeeither1,ifitiscounted,or0,if views it makes, we now look at the response of the fake itspenalized. Wefindthatthedetectionmechanismcounts view detection mechanism, when distributing the views to all the views from the conservative probe in IP-1, while it a given video across several IP addresses. To this end, we penalizes the conservative probe in IP-2 from day 9 to day use70differentPlanetLabnodes,anddividethemin3inde- 26. We observethatview penalizationstarts24 hoursafter pendent sets of different size N = [10,20,40]. We assign we launch the aggressive probe in IP-2, and ends two days eachsetofnodesadifferentvideoonYouTube,andconfig- after we stop the probe. Repeating the experiment twice ureeachPlanetLabnodetogenerateviewstoitscorrespond- more, we obtain the same results. From this, we conclude thatYouTube’sfakeviewdetectionmechanismpenalizesIP 9NotethatweonlyrunexperimentsforW ≥ D. Forinstance,in 8Weobtainvideouploaderpermissionstoconductthisexperiment. thecaseofW =5werunexperimentsforD=1,3,5. 9 2500 1.0 40nodes 2000 20nodes 0.8 ws 10nodes vie1500 0.6 berof1000 RFP0.4 m u N 500 0.2 0 0.0 0 5 10 15 20 25 30 Workingdays Daysoff Days Figure 8: Distribution of daily R for working days and Figure 7: Number of views in the public view counter for FP daysoffforInstitution2 three different experiments using 10, 20 and 40 PlanetLab nodes. #U(users Experiment W(views/day) behindtheNAT) U/W RFP Inst.1 20 ∼50 ∼2.5 0.9 ing video. We again utilize the Deterministic configuration Inst.2 75 ∼100 ∼1.33 0.82 oftheprobe,andreporttheresultswitheachPlanetLabnode Inst.3 100 ∼50 ∼0.5 0.36 settogenerate3viewsperday.Overall,ourexperimentgen- Table5: R andinformationaboutthe fourscenariosfor erates30,60and120viewsperdaytoavideo,whichshould FP theexperimentsweconductfromNATedIPaddresses. resultinR =0,ifcomingfromasingleIPaddress. FP Figure 7 gives the temporal evolution of the aggregate some information of the different NATed scenarios. Note number of views counted by the YouTube public view that,althoughourprobegeneratesviewsratheraggressively, counter for each of the three experiments. As shown, the theR issurprisinglylargeinallcases. Thissuggeststhat growthinnumberofviewsovertimeislinearforallconfig- FP theYouTube’sfakeviewdetectionmechanismhasproblems urations,andweobserveanoverallR >73%inthethree FP to properly identify malicious activity coming from NATed experiments. This indicates that distributing activity across networks. To confirm this finding, we separately analyze multipleIPaddressesresultsinasubstantialincreaseinthe working days and days off (i.e., weekends and holidays) in R enablingattackerstoinflateviewcounterseasily. FP Institution2,runningtheexperimentfor194days.Note,that This experiment suggests that YouTube is vulnerable to workingdaysregisterahighvolumeoftrafficintheNATed attacks that employ many IP addresses (like, e.g., the ones networkwhereasindaysoffthetrafficistypicallylow. Fig- performed using botnets), as such attacks can apparently ure 8 shows the distribution of daily false positive rate for achieve an arbitrarily large number of views. Indeed, it is workingdaysanddaysoffusingboxplots. Theresultscon- easytofindpaidservicesthatoffertoinflatetheviewcounter firm that YouTube discounts almost all views during days ofYouTube(andothervideoplatforms)videosuptotensof off, i.e., when our traffic is more exposed, but have serious thousandviewsinashortperiodoftimeandatalowprice problems to discount views (median R = 60%) during (e.g.,http://www.viewbros.com). FP workdays,i.e.,whenourviewsarehiddenbylargervolumes oftraffic. Hence, thissuggestsafraudstercandramatically 5.3 Impact of NATed IP addresses on the de- increasetheefficiencyofitsactivitybygainingaccesstoma- tectionmechanism chineslocatedbehindlarge(active)NATednetworks,e.g.,a As NAT devices aggregate traffic, they typically contain publiccampusnetwork. the video viewing activity coming from multiple, usually In summary, the fake view detection mechanism of private,IPaddresses. InlargeNATednetworkssuchascam- YouTube implements an exponential discount factor of the pus networks, corporate networks and, in some cases, ISP numberofviewsperformedfromasingleIPaddressthatin- networks,thisactivitymaybesignificantlylarge.Therefore, creases with the rate of views. However, our results show in these sets of experiments, we investigate how the fake thatsomesimplemodificationsinafraudster’sstrategycan view detection mechanism of YouTube penalises the views increaseverysubstantiallythefalsepositiverate: i)adding coming from NATed networks. To this end, we install our somerandomnessintheHTTPconnectionattributessuchas probeonthreemachinesaccessingtheInternetfromNATed the User-agent or the Referer, ii) distributing the malicious networks located at three different institutions and we con- activityacrossmultipleIPaddresses,oriii)performingfake figure them to perform 20 (Institution 1), 75 (Institution 2) viewsfromNATednetworks. and 100 (Institution 3) daily views for a period of 8 days. We remark that we use the Deterministic configuration, in whichwedisabletheusageofthecookiestoensurethatthe 6. YOUTUBE’S FAKE VIEW DETEC- fakeviewdetectionsystemhasnomeans(underourcontrol) TION MECHANISM FOR MONETIZED toidentifytheviewingsessionsourprobesgenerate. VIEWS Table 5 gives the R for each experiment along with FP 10

Description:
their YouTube analytics and the AdSense accounts. In this paper we use networks located at three different institutions and we con- figure them to
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.