Graffiti Networks: A Subversive, Internet-Scale File Sharing Model AndrewPavlo NingShi BrownUniversity BrownUniversity [email protected] [email protected] Abstract sharinguser. Thisisbecausethefundamentalprincipleof 1 1 The proliferationof peer-to-peer(P2P) file sharing proto- the BitTorrentprotocolis thatusersdownloadand upload 0 colsisdueto theirefficientandscalablemethodsfordata datadirectlywithotheruntrustedusers,ratherthandown- 2 dissemination to numeroususers. But many of these net- loadfromasingle,centralsource[11].AlthoughsomeP2P n workshaveno provisionsto provideuserswith longterm clientsemploycommunicationencryptionandprotocolob- a access to files after the initialinteresthas diminished,nor fuscation enhancements, such measures do not protect a J are they able to guaranteeprotection for users from mali- user frommaliciousclients that harvestfile sharing activ- 1 cious clients that wish to implicate them in incriminating ity information for future litigation. Furthermore, it has ] activities. Assuch,usersmayturntosupplementarymea- beenshownthatwhileitmaynotbepossibletoeasilyview I suresforstoringandtransferringdatainP2Psystems. We encrypted packet contents, a third-party observer can still N presenta new file sharingparadigm,called a GraffitiNet- deducethatfilesharingisoccurringbyidentifyingnetwork s. work,whichallowspeerstoharnessthepotentiallyunlim- pairsbasedonatracker’spublicpeerlist[7,15]. c ited storage of the Internet as a third-party intermediary. Another limitation of current BitTorrent-like models is [ Ourkeycontributionsinthispaperare(1)anoverviewofa thatthenetworksrelyonaltruisticuserstokeepfilesavail- 1 distributedsystem basedonthisnewthreatmodeland(2) able for others. This is problematic in an environment v a measurementof its viability througha one-yeardeploy- whereuserswanttolimittheirexposuretoanytrafficlog- 0 mentstudyusinga popularweb-publishingplatform. The ging clients, and thus it is in their interest to disconnect 5 3 results of this experimentmotivate a discussion about the immediately once they have the successfully downloaded 0 challengesof mitigatingthis type of file sharing in a hos- their desired files. Content in these networks is unavail- 1. tile network environmentand how web site operatorscan ableonceallofthepeersthathavethecompletefiledepart. 0 protecttheirresources. Newlyarrivingclientsmaybeabletodownloadandshare 1 somefractionofthedata(ifanyisavailable),buttheymust 1 Introduction 1 waitandhopethataclientreturnstothenetworkwiththe : v In just a few years since its inception, the BitTorrentpro- restof file. Enhancementsto privatetrackers, such asup- i tocol and similar systems have become the predominant load/downloadratios,provideincentivesforclientstocon- X P2P file sharing model [11]. But the recent activities of tinue to seed files [9], butthese economicmodelsare dif- r a thoseseekingtotakedownP2Pinfrastructureshaveforced ficulttoinitiateanddolittletomaintainlesspopularolder the file sharing community to adapt to a hostile environ- files. ment[15].OperatorsofglobalBitTorrenttrackersnowtake Inresponsetothelackofuseranonymityandlong-term two notable measures in order to indemnify themselves data persistence in existing P2P systems, some users may fromlegalaction: (1)the trackersare locatedin countries seek an alternative. But because traditional data hosting that are not party to international copyright treaties, and solutions are not a viable option for sharing certain con- (2) access to trackers is controlled by private, invite-only tent that may have legal consequences, these users must communitieswithstrictmembershiprequirements[9].The use more questionablemeansfor sharing data. Motivated formerallowsoperatorstoignorelegalthreatstoshutdown by this, we developedthe GraffitiNetwork distributedfile theirservicesthatalaw-abidingISPwouldnormallyhave sharingprotocolthatusesmultiplethird-partystoragesites tocomplywith.Butthisapproachcanbebothprohibitively asadatareplicationandtransfermediumbetweenclients. expensive and difficult to arrange. Additionally, limiting The Graffiti approach is to use publically available web accesstoonlyprivilegedusersonlytemporarilyprotectsa sites to store multiple copies of shared content. We use sitethathasbeenmadeprivate;ittakesonlyasinglesedi- thetermgraffitiforourworksincewearestoringdataina tioususertounderminethenetworkandprovidedamaging waythatnon-networkparticipantsmayregardasunsightly evidencetotherightparties. orunwantedvandalism.Ourapproachpresentsseveralnew Theaforementionedmeasuresmayprotecttrackeroper- securitychallengesoverotherexistingP2Psystemswhere ators but they provide little protection to the average file clientstransmitdata directly with each other: (1) a newly 1 arrivingpeercanstilldownloadfilesevenifallotherpeers asynchronous tit-for-tat model, we believe that a Graffiti havelongdisconnected,(2)a peerdoesnotneed to know Networkcouldprovideclientswith accesstofilesmonths about the existence of other peers, and (3) a tracker does oryearsafteritwasfirstintroducedtotheInternet. notneedmultiplepeerstoenforcetit-for-tatpolicies[11]. 2.2 Peer-to-Peer StorageSystems Thelayoutofthispaperisasfollows. First,weprovide anoverviewoftheGraffitiNetworkfilesharingmodel.We Much of the previous work on developing P2P storage thendiscussourexperimentalprototypeoftheGraffitiNet- systems that provide block storage across multiple nodes workmodelthatisintegratedwithaBitTorrentsystem.The is based on distributed hash tables [12, 22, 29]. These resultsfromourone-yearstudyontheefficacyofourproto- approaches have the same deficiencies as the BitTorrent typeinareal-worlddeploymentshowthattheuseofpublic model: peers download file blocks directly from other storage sites in a file sharing system is possible. We then peers, thereby losing anonymity, and the systems do not concludewitha discussionabouthowbothadministrators provide mechanisms to provide long term availability for andsoftwaredeveloperscanguardagainstsuchathreat. lesspopularfilesafterpeersdisconnectfromthenetwork. Other systems are focused on providing anonymous and 2 Related Work secure P2P data storage [32]. The POTSHARDS system Wemotivateourworkbyfirstdiscussingtherelatedback- provides secure long-term data storage when the content groundresearchandliterature. originatornolongerexistsusingsecretsplittinganddatare- constructiontechniquestohandlepartiallosses[30]; their 2.1 BitTorrent approachassumesmultiple,semi-reliablestoragebackends TheBitTorrentprotocoldefinestheoperationsofaP2Pnet- thatarewillingtohostaclient’sdata. TheFreenetanony- work that facilitates the efficient sharing of files in a dis- mousstoragesystemuseskey-basedroutingtolocatefiles tributedmanner[11]. Ourmodelinheritsmanyofthefea- storedonremotepeers[10].Asdiscussedin[12],Freenet’s tures of BitTorrent, but employs third-party storage sites anonymitylimitsbothitsreliabilityandperformance:files as an intermediaryfor data transfers, rather than allowing arenotassociatedwithanypredictableserver,andthusun- clientstodirectlydownloadfilesfromeachother. Thisin- popularcontentmaydisappearsincenooneisresponsible directionmakesitdifficulttodiscovertheidentitiesofusers formaintainingreplicas. thatareparticipatinginaGraffitiNetwork. 2.3 Steganographic Storage Systems TheoverallefficiencyandthroughputofBitTorrentsys- tems has been shown to scale gracefully to accommodate Although the Graffiti Network model is not a pure manyusersarrivingatthesametimetodownloadnewand steganographic-based storage system, it does share sim- popularfiles [27]. But while the modelworkswell in the ilar properties of this class of systems [18, 17]. The shortterm, itdoesnotensurethe longtermavailabilityof Mnemosyne storage service applies the steganography esotericcontentorfilesthatbecomelesspopularovertime. techniquesfromalocalstoragesystem[8]toadistributed Thisproblemisespeciallyprevalentforcontentthatisre- hashtable[17].TheStegVaultproposalusessecretsharing leasedin“episodes”:newcontentissharedprofuselywhen tobuildasecureP2Pstoragesystemontopofreliablemul- itisreleased,butthenumberofpeersdecreasesasthefile ticast [18]. One key benefit of these systems is that users becomes older and newer episodes are released. In a five have plausible deniability of the existence of hidden data month study of BitTorrent network activity, it was shown becauseitisconcealedinsidecoveringdata[6]. that the average time that a client stays in the network to 2.4 AlternativeStorageSites continue sharing a file after it has received the entire file setwasonlysevenhours[19]. Theseresults,however,are Since theGraffitiNetworkmodelreliesongainingaccess based on the sharing activity of copyright-free files, and toandthecircumventionofthird-partystoragesitestohost therefore the clients do not have a vested interest in dis- content,weconsiderthealternativeapproachofusingded- connecting immediately. In contrast, a study [25] explic- icated storage services that are explicitly designed for the itlyfocusedonillegalfile sharingactivityshowedthatthe storage and transfer of large files. The Amazon Simple departure rate of peers is much faster than previously as- StorageServiceprovidesawell-definedAPIforwritingar- sumed in [27]. The results in [16] show that the average bitrarydatafiles,butitcurrentlychargesforboththestor- availabilityofatorrentislessthanninedaysandthatmost age space an account uses as well as the network band- swarmscompletelydieoutinonly13days. Thus,without widthusedtotransferdata[1]. TheGmailFilesystemen- theincentivesforsharingfoundinprivatecommunities[9], ablesGoogleemailaccountstobeusedasanetworkstor- most BitTorrent content becomes unavailable after just a agemedium,butadoptingapproachwouldrequireusersto short amount of time. To overcomethe capricious nature shareaccountinformation[20].TheUsenetnewsserviceis of users, Graffiti Networks use storage sites that have the anotherpotentialstoragesystem, butserversoftenimpose potential to always be available, and thus the shared files amessageretentiontimeandmanyISPshavediscontinued are still accessible after the initial interest in the content providingthisservicetocustomersforfree. hassubsided.Withenoughreplication,enforcedbyastrict Freeweb-basedfile-hostingsitesalsodonotprovidethe 2 . /)*01&",2)*34$) 9 !"#$# %&’()*+,"(& -,()+.#/-",# /)*01&", 0"1, 5#"$6& 8 7 !"#$%$& 8&$)9:,#; !+97(+; 2340 5#167, ’&()*+$,’"&$- ’&()*+$,’"&$- ’&()*+$,’"&$- !"#"$%"&’ (’))*+’$,-*./(’))*+’$,-*./ ’&()*+$,’"&$% Figure1: Foragivenafileset,theclientcommunicateswiththetrackerinthefollowingmanner: (1)theclientsendsthetrackerthelist ofpiecesitalreadyhas;(2)thetrackerrespondsalistofinstructionsonwheretheclientshoulddownloadasub-pieceandthelocation of where toupload a replica; (3) after downloading thenew sub-piece, theclient then navigates the target storage site and uploads a new encrypted and encoded sub-piece payload; (4) thestorage sitereturns anHTML page and theclient verifiesthat theupload was successful.Thisprocessrepeatsuntiltheclienthasallthepiecesofthefilesetandhasproducedenoughreplicasforthetracker. robustnessthatweseekinourfilesharingmodel[4]. One andarenumberedsequentially. Eachpieceisdividedfur- limitation of these sites is that large files are broken into ther into fixed-lengthsub-pieces. A Graffiti Network that separatedownloadsandusersmustwaitforsometimepe- isdeployedtodistributethesepiecesiscomprisedofthree riodbeforetheyareallowedtoretrievethenextpiece. Fur- distinctcomponents: (1)atrackercoordinatesthereplica- thermore,theusermustmanuallyentereachsegmentURL tionandsharingproceduresofafileset,(2)aclientdown- into their browser and repeatedly pass human-validation loadsandreplicatesthefilesetdatamanagedbythetracker, tests[24]. Thesefreehostingsitesarealsounderscrutiny and(3)third-partystoragesitesstoreandprovideaccessto because many of their users post illegal content, and thus fileset data for peers. Any client that wishes to download the site operators streamline the removal process for files andreconstructtheoriginalfilesetisrequiredbythetracker andthedisclosureofoffendingusers’informationforcopy- toproducemultiplesub-piecereplicasonasmanystorage rightholdersinordertoquicklydiffuseanylegalactionthat sitesaspossible. maydisruptthehostingsite’srevenuestream. Despitethis, A high-leveloverviewof the Graffiti Network protocol itispossibletoincludefile-hostingsitesasjustoneofthe is shown in Figure 1. To connectto the Graffiti Network, many options available in a Graffiti Network deployment theclientfirstannouncesitselftothetrackerandprovidesit (seeSection3.3). withalistofallthepiecesthatthepeerhasalreadydown- Lastly, another proposed solution is to create a highly- loaded. Thetrackerrespondswithaseriesofsub-piecere- volatilestoragesite bysendingdatapacketsto unsuspect- questpairsforanewpiecethattheclientismissing. Each ing networkentitiesto leveragenetworklatency asa type requestpairconsistsof(1)adownloadlocationwherethe ofdurability[26]. Theideaistocontinuouslysenddatato peercanretrieveasub-pieceand(2)instructionstoproduce targets that relay the same data back to the source, there- anewreplicaonadifferentstoragesite forthedataitjust fore two copies of the data are always theoretically avail- downloaded. Graffititrackersfollowastricttit-for-tatpro- able. This approach is not practical for the Graffiti Net- tocol:foreachsub-piecethatapeerdownloads,thatpeeris workmodelbecauseitdoesnotallowthedatatobeshared requiredtogenerateareplicaforapreviouslydownloaded amongst multiple peers. Furthermore, it requires that the sub-pieceon a differentstorage site andsend the location originaldatasourceremainonlineinordertokeepcycling ofthisnewreplicabacktothetrackerbeforeitcanreceive thepacketsbackoutoverthewire. thenextpiece. 3 Graffiti Network Model 3.1 Central Tracker We now describe how a file-sharing system based on the Thetrackerprovidesadirectoryserviceforpeerstoretrieve GraffitiNetworkmodelwouldoperate.Wediscussvarious afileset.Foreachpieceofdatainafileset,thetrackermain- measures and techniques that ensure the system is stable, tainsatableofthesub-piecereplicalocationsonsitesthat usable, and scalable. Such qualities are necessary to fa- were generatedby clients. Each replica is annotatedwith cilitate wide-spread adoption by file-sharing participants, threepiecesofmeta-data: (1)auniqueencryptionkeyfor therebymakingthethreatarealpossibility. thatreplica,(2)achecksumforeachsub-piece,and(3)the To describe the Graffiti model, we adopt the terminol- firstandlastbytesequencesoftheencrypteddatablockon ogyof theBitTorrentprotocol[11]. We definea fileset as thestoragesite. Thetrackerusesadifferentencryptionkey a set of one or more files that peers wish to share. The per entry to ensure that each replica is stored as a unique fileset’s data is divided into multiple fixed-length pieces character sequence to preventthe use of tools to discover of n bytes (the last piece can contain less than n bytes) other replicas. The checksum and sequence markers also 3 allow peersto determinewhether a replica has the proper dom set of sub-pieces at regular intervals (e.g., hours or bytesequenceandtolocatedataboundariesatthestorage days,ratherthanminutes). Thus,itispossibleforarogue sitelocation. clienttoretrieveacompletefilesetwithouteverproducing For each connected peer, the tracker maintains an ac- anewreplicaforthenetwork,butitwouldtakeseveraldays tive pieceset (APS)ofdownload/uploadreplicapairsthat orweekstocyclethroughallofthetracker’sIPScombina- are unfulfilled requests for a client. Each pair consists of tions if there were a significantly large number of pieces. asub-pieceidentifierthatthetrackerprovidedforclientto Theclientisrequiredtoalsoproducetwonewreplicasfor downloadandastoragesite locationwherethetrackerin- each sub-piece in the IPS, even if the client has already structed the clientto make a new replica. Once the client downloadedthepiecespreviously. Thispolicyisakintoa providesthe tracker with informationabouta new replica new tenantpaying“last month’srent” beforemovinginto foradownload/uploadpair,theentryisremovedfromthat anapartment:itensuresthatclientcannotdisconnectfrom client’sAPSandtheclientisallowedtoreceivenewinfor- the network without creating new replicas for each piece mation. ThesizeoftheAPSisdeterminedbythetracker’s thatitdownloads. administrator and prevents a client for downloading too Once the client successfully downloads and generates manysub-pieceswithoutproducinganynewreplicas.Asin sufficient replicas for its IPS, it leaves the initialization theBitTorrentprotocol,theGraffititrackerstrivesforuni- phaseandisthenallowedtoreceivearbitrarypieces. The formavailability of alldata pieces[11]. Since the tracker protocolworks the same before: the tracker maintains an decreeswhatpiecesthe clientsmustreplicateforeachre- APSforeachclientandonlygivesnewdownloadlocations quest in the APS, it can decide to replicate the “rarest” oncethatparticularclienthasproducedanewreplicaona piecesfirst. storagesite. Malicious clients in Graffiti Networks are quite differ- 3.3 Storage Sites ent than malicious clients in BitTorrent networks [23]. A rogue Graffiti client may have other ulterior goals: (1) to A potential Graffiti storage site is any accessible network discoverall of the storage site locationsused by a tracker entitythatallowsfordatato bestoredandretrievedusing inordertocontactsite administratorsandhavethereplica a known network protocol. In practice, peers will likely data removed or (2) to falsely identify valid storage sites usepublicallyavailablewebsitesthatprovideservicesthat andreplicalocationsasinvalidinanattempttodisruptop- Graffiticlients repurposeto store arbitraryblocksof data. erations. Inthefirstcaseoftryingtodiscoverallofafile- This approach has the distinction that all data movement set’sreplicas,thetrackercanusethrottlingmeasurestopre- appearsasnormalHTTPtraffic,andthusisimmunetocur- venta clientfromlearningtoomuchin a shortamountof rentISPthrottlingandtrackingtechniques[15]. time. Butforthelatterproblem,thetrackershouldnotac- The idealstoragesite fora GraffitiNetworkis onethat tivelycheckwhetheraclientactuallyuploadedthedataat allows for anyone to post data without CAPTCHA pro- the locationitclaimsit did, due to securityandeconomic tections [24] and is either unmoderatedor has long aban- reasons. Insteaditcanemployproxiesorotherthird-party doned by its owner. A popular and high-traffic wiki site, entitiestodeterminewhetheraclientisbehavingproperly. for example, would not be a good storage site candidate For example, the tracker can retrieve a page through the as it likely that non-malicious visitors would quickly no- Coral Cache or Tor services to determine if the data was tice the changes made by Graffiti clients to store replica storedatthelocationclaimedbyaclient[14,13]. data. With the rise of many open-source web-publishing platforms, there are many potential targets that allow for 3.2 Client anonymousorsemi-anonymousdataposting. Notableex- AGraffiticlientallowsausertoautomaticallydownloada amplesincludepaste-bins,wikisites,messageboards,and filesetstoredononeormorestoragesites. Ausermustfirst blogs.AnHTML-basedstoragesitealsoallowsthedatato obtainametadatafileforaspecificfilesetuniquelyidenti- bedisseminatedtopeersthroughdisparatechannelsonceit fiedbyan“infohash”inordertobegindownloading[11]. isonline,suchasthroughCoralCache[14]orTor[13].The Aftertheclientfirstannouncesitselftothetrackeratthe dataembeddedinthesite’s pagescouldalsobepickedup address listed in the metadata file, the tracker places the bysearchenginecachingandarchivingservicesforlonger- peer in an “initialization” mode. This is always done re- termstorage. gardlessofwhethertheclientisconnectingforthefirsttime Other potential storage sites include any photo and file orif itisreturningwithsome piecesalreadydownloaded. hosting sites that allow for automated data uploading. In The tracker sends every new client the same initial piece the case of the former, the data could also be hidden in- set (IPS) that will use for the first phase of downloading side of image files using well-knowntechniques[21, 28]. andreplication.Thisinitialsetisthesameforallclientsar- As the Internet evolves, new targets will emerge that can rivingwithinacertaintimeperiodtopreventaclientfrom be incorporatedinto existingnetworks. Thesystem could initiating multiple new connections without ever creating alsoallowclientstousestoragesitesthatarepasswordpro- newreplicas. Thesizeoftheinitialsetisthesamesizeas tectedforwritingdata,butwhereanaccountisnotrequired the APS andits informationischangedto a differentran- toreadbackthedata. Thisobviatestheneedforaclientto 4 sendthetrackeraccountinformation,whichcouldthenbe SitesFound SitesUsed usedimproperlybyotherclientstotamperwithordestroy AnonymousEdits 8,483 3,161 thedata. RegistrationProtected 5,983 2,347 Using involuntary web sites as storage dumps seems PuzzleProtected 1,157 138 counterintuitive if the main goal of the network is data CAPTCHAProtected 1,586 - persistenceandavailability,sincereplicasarepromptlyre- NotPubliclyModifiable 5,946 - moved when site administrators and moderators discover Total: 23,156 5,646 them. The Graffiti model overcomes this challenge and Table1:ThecategoriesofprotectionusedbytheMediaWikisites takes advantageof “freenetworkstorage” througha mas- discoveredduringthecollectionprocessandthesitesusedinthe sive replication and obfuscation process. It is not trivial, experimentaldeployment. however, to automatically store arbitrary data on random websites norisittrivialto discoverwhichsitesare avail- isaltered. able with the properties stated above. The prevalence of WedecidedtotestoursystemonopenMediaWikisites popularwebpublishingsoftwaremeansthatoneonlyneeds that we do not have controlover as this allows us to best to target a small number of platforms in order to circum- measurewhetherourassumptionsabouthowlongthedata vent a large portion of the Internet. Furthermore, many will remain on the sites are correct. We developed a dis- sites, suchaswikisandmessageboards,oftendisplaythe tributed web crawler to discover MediaWiki installations network location of the user responsible for adding new through search engines using keywords that are uniquely contentor makingchangesto their pages, which makesit indicativeofanewlyinstalledsite. Thecrawlerpurposely difficult to deny responsibility for participating in illegal ignoredwell-knownMediaWikisites(e.g.,thosesitesthat activities. We argue that by fracturing a fileset’s replicas arepartoftheWikipediaFoundation)andthecommercial- across hundreds of storage sites, it is difficult to be fully izedversionsofMediaWiki(e.g.,Wikia).Foreachsitethat implicated when only a fraction of the evidence is avail- thecrawlerfound,weprobedittodeterminewhatkindof able. A distributed effort to probe websites and uncover protection scheme it utilizes and the last time that it was open storage paths could allow peers to draw on a nearly updated (see Table 1). Of the 23,156 unique MediaWiki limitlesspoolofavailablestorage. installationsthatwefound,8,483sitesallowedforanony- mouseditingand5,983alloweduserstoregisteraccounts 4 Experimental Deployment withoutCAPTCHA oremailprotectionsin orderto make To determine whether the Graffiti Network modelis a vi- edits [24]. The default MediaWiki installation provides able andthusis a potentialthreat, we implementeda pro- aprimitivearithmetic“puzzle”protectioncountermeasure totypeGraffititrackerandclientasanextensiontotheBit- that we found in use on 1,157 sites; this puzzle is easily Torrent protocol. We then stored a sample data set on a brokenwithjustafewlinesofcode,andthusdidnotpre- largenumberofopensitesandmeasuredtheavailabilityof ventoursystemfromstoringdataonthesesites. Lastly,in ourdataforalmostanentireyear. ordertominimizetheimpactofourexperiments,weonly We built our system on top of the open-source libtor- targeted those sites that had not been updated within the rent[5]BitTorrentlibraryinordertoallowclientstopartic- last threemonths, therebyreducingourlist to 5,646sites; ipateintorrentswarmsconcurrentlywithGraffitiNetwork loweringthethresholdtotwomonthswouldhaveyieldeda activities. Whenenoughpeersareavailable,theclientop- totalof11,987potentialstoragesites. eratesstrictlyinBitTorrentmode.Butifthenumberofdis- The Graffiti client stores data on MediaWiki sites as tributed copies in the swarm dropsbelow a threshold, the base64-encoded,Blowfish-encryptedblocksoftextthatare clientbeginstocontactthetrackerusingtheGraffitiproto- writteninanewarticletitledwitharandomwordfromthe col in conjunctionwith its BitTorrentoperations. As new dictionary.Amoreresilientapproachwouldbetomodifya pieces are retrieved from storage sites, they are passed to popularpageonagivensite,andthenimmediatelyreverse libtorrent’sstoragemanagerforseedingtootherpeers. thechangesandmarktherevisionasvandalism. Thishas twosignificantimplicationscomparedtowritingdatatoa 4.1 Storage SiteDiscovery newlycreatedarticle. Foremostisthatremovingthisdata In our experimental prototype, we target the open source completelyfromthepage’shistoryrequiresadministrators MediaWiki [3] platform as the potential storage site for todeletetheentirepagefromthedatabaseandrestorethe thenetwork. DuetothepopularityofsiteslikeWikipedia latestrevisionbyhand,therebylosingallthepreviouslegit- that use MediaWiki, we believe that it is the most widely imaterevisions.Second,suchanattackismorelikelytobe deployed wiki platform with a large number of less- overlookedby a site’s operatorssince they may only care experiencedusers that install the software withoutchang- whetherthechangeswerereversed. Wedeemedthistech- ingthepermissivedefaultsettings. Anotherkeycharacter- nique too malevolent for the purpose of our experiments, istic is thatthe MediaWikiplatformmaintainsa complete andthuschosetonotimplementit. revisionlogforeacharticle,whichallowsGraffitipeersto To retrieve a sub-piece stored on one of these storage retrievedataevenifthechangesarereversedorthecontent sites, the client downloads the web page and extracts the 5 0.8 0.8 Site Not Found y Puzzle Protected Replicas 0.7 Replica Removed egor 0.7 RegistratioAnn Pornoytemcoteuds RReepplliiccaass s at a c ssing replic 00..56 eplicas per 00..56 mi 0.4 g r 0.4 ntage of 0.3 of missin 0.3 erce 0.2 age 0.2 p nt 0.1 ce 0.1 er p 0 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 # of days since creation # of days since creation Figure2: Percentageoftotalreplicasremovedovertimecate- Figure3: Theavailabilityof replicas categorized by itscorre- gorizedbythetypeoffailure. spondingstoragesite’sprotectionschemes. textsurroundedbythebytesequencemarkersprovidedby e 0.8 p .com tdheecrtyrapctskethr.eTdahtea,calinedntvtehreifinersetvheartsietsmthaetcbhaesset6h4ecehneccokdsiunmg, ain ty 0.7 ..eodrgu m other-intl providedbythetracker. do 0.6 other-us er p 4.2 System Configuration as 0.5 c pli e 0.4 For our experimental deployment, we used a Linux ISO g r n splitinto512KBpiecesand64KBsub-piecesasoursample ssi 0.3 mi datafilethattheclientswanttoshare.Eventhoughwewere e of 0.2 abletostoreupto512KBpayloadsonasingleMediaWiki g a page,wechoosetouseasmallersub-piecesize. Again,an- ent 0.1 c othermoremaliciousapproachwouldbetostoreapayload per 0 0 50 100 150 200 250 300 withthesizethatcanbeuploadedandretrievedbutcauses # of days since creation eitherabrowserortheservertochokeiftheoperatortries Figure4: Thecumulativeavailabilityofreplicascategorizedby to access the page through the MediaWiki administrative theirdomaintype: .com(42.5%),.edu(3.2%),.org(24.1%),US- interface. For example, we found that it was possible to basedother(14.0%),andNon-US-basedother(16.1%). store 512KB pieces that would exhaustthe default 20MB memorylimitofPHPifsomeonetriedtoremovethedata. thenusedaseparatetooltocheckdailywhetherthedatawe Thus,theonlywaytoremovethecontentistoexecutethe storedisstillinplaceandhasnotbeenmodified.Wecheck proper SQL commands directly in the database, which is everyreplicaregardlessifithasnotbeenavailableforsome likelytoodifficultformostusers. timetoensurethattheerrorsarenottransient. WeinitiatedfilesharingactivityonApril10th,2009us- 4.3 Results ing a tracker and five clients deployed in our departmen- tallab. Eachclientconnectstothetrackerandproducesa Wenowreportontheavailabilityofthe5,646replicasthat full copy of a sub-piece on one of the 5,600+MediaWiki westoredinourexperimentsfromApril10thtoFebruary sites. Weassumethatallclientsaretruthfulaboutwhether 28th, 2010. For each missing replica, we categorize the a replicaisavailableanddonotfalsifyreplicaURLs. We replica as (1) removed if the site is available but the orig- instrumented the tracker to target each storage site only inal page is missing, (2) changed if both the site and the once(althoughvariationsinsub-domainsandURLrewrit- originalpageareavailable,butthedatadoesnotmatchour ingledtosomesitesbeingusedmorethanonce). stored checksum, or (3) not found if the site is no longer Along with the data payload, at the top of each wiki available(e.g.,thedomainnamehasexpiredorMediaWiki page we stored a small paragraph with an explanation of wasuninstalled). Ourinvestigationfoundthatthemissing theseeminglyrandomtext. Thisdescriptionalsoincluded replicaswereonlyeitherremovedornotfound;noreplica a unique tracking link back to our web page with fur- haditscontentsaltered. ther informationabout the project. Tracking users’ click- On the last day of our data collection, roughly 40% of throughsfromtheselinksallowsustomeasuretosomeex- thereplicaswerestillavailableandhostingtheoriginaldata tentwhetherhumanswereactuallydiscoveringourpayload thattheprototypeclientsuploaded. ThegraphinFigure2 pagesbeforetheyweredeleted. showsa timelineofthepercentageofreplicasthatarenot Oncetheclientspushedoutallofthedatatothesites,we available on each day that we checked. The first notable 6 data point is that an initial 20% of the replicas were re- despitetheinevitableexposuretovandalismandspam.We movedwithin the same week that they were created. The counterthatsuchsitesthatdonotwanttorequireusersto rateinwhichsitesareremovedthentapersoffastimepro- registeranaccountshouldstilluseCAPTCHAprotections, gresses. Weattributethisdrop-offinactivitytotwopossi- suchasbeforeauserisallowedtoeditapage. Inpractice, ble reasons. Foremost is that by default any changes to a wefoundthatthereCAPTCHA[31]projectisthemostef- MediaWikisitewillappearonthefirstpageofrevisionlogs fective protection as it does not require administrators to forsevendaysaftertherevisioniscreated,andthusourac- install special server-side graphics libraries and strikes a tionsaremorelikelytobediscoveredsoonafterthedatais properbalancebetweenavailabilityandcomplexity. More posted.Thesecondpossiblereasonisbecauseastoryabout complexCAPTCHAschemeswouldnotdeterfutureGraf- ourprojectappearedonthefrontpageofapopulartechnol- fiticlientsthatareabletosolveCAPTCHAs(eithermanu- ogy news website on the third day of our experiment[2]. ally or programmatically)andmay only inhibitlegitimate Webelievethatthe“notoriety”oftheprojectduringthispe- visuallyimpairedusers. Ifsites wish to stillremainopen, riodmayhavecausedadministratorstoexaminetheirweb- theCAPTCHAcouldbeselectivelyenabledonlywhenan sitestoseeiftheyweretargetedbyoursystem. Oncethis unverifieduser triesto postdata largerthan some low de- initial attention diminished, the slopes of the lines in Fig- fault threshold or creates too many new pages in a short ure2decreaseandittakesanother35daysbeforeanother timespan. 10% of the replicas are removed. After about 100 days, We also believe that other simple protection measures the growth rate of replicas being removed (i.e., the lower could be included in popular web applications to prevent portion of the curve in Figure 2) tapers off and the num- abandoned or forgotten sites from being used for unin- berofsitesthatbecomeunavailablebeginstorise. Thisis tendedpurposes.Forexample,MediaWiki’sdefaultbehav- expectedsincemanyofthesiteswerenotactivelyusedby iorcouldbetolockdowntheeditingfeaturesofasiteafter theirproprietor,andthusaretakendownarbitrarily. a certainnumberofdaysif itwas installedbutthennever The graphin Figure 3 shows how the replicas were re- actually used. This approach is similar to the one used movedovertimeinrelationtotheirstoragesite’sprotection bysomebloggingplatformstodisablecommentsonolder scheme.Thesalientaspectoftheresultisthatinitiallysites posts. Administratorscouldeasilyre-enablethisfunction- that employed some type of protection were faster to re- ality by simply logging into the site again. Another tech- movereplicas.Thisisexpected,sincemanyofthesitesthat niqueistouseapagecounterthatisinvokedontheclient- employed some protection were still being used by users side(e.g.,throughJavaScript)andthencomparetheresults despitehavingnotbeenupdatedrecently,whereasmanyof withserver-sidelogstodeterminewhetherthereareanun- thecompletelyopensitesstilldisplayedthedefaultMedi- usually large number of users accessing pages through a aWiki homepage message and thus were never even used non-browserclient. Web applicationframeworks, such as oncetheywereinstalled. Suchsitesarelikelylongforgot- RubyonRailsandDjango,couldalsoprovidesimilarfea- ten by their owners who may never discover the replicas turestoprotectcustom-madesites. oncetheypassthedefaultsevendayrevisionlogwindow. 5.2 Variations& Adaptations Butafterapproximately120days,thepercentageofmiss- ing replicas stored on sites allowing for anonymousedits Other than forP2P activities, the Graffitimodelis also of surpassessitesusingthebasicregistrationprotection. potential use for large-scale distributed systems used by Lastly, the graph in Figure 4 charts the availability of criminal organizations, often referred to as botnets. The replicaswithrespecttothedomainnameofthestoragesite. goal of most botnet operators is to gain access to a large We attribute greater durability of data stored on .edu and supplyofcomputationalresourcesforpurposesofnetwork .org sites compared to other domains; such organizations communication (e.g., sending emails or DOS attacks). If arelikelytouseopen-sourcesoftwareforcollaborationand thesegoalsshifttowardsmoredata-centricactivities, then internalsitesareoftennotbehindcorporatefirewalls. systems based on some of the principals of the Graffiti Network model may become prevalent in order to store 5 Discussion largeamountsofdataforthebotnet. Alternatively,instead The results presented in the previous section clearly ofstoringreplicateddata,thecommandeeredstoragesites demonstratethe efficacyof the GraffitiNetworkmodelas couldalsobeusedasacontrolchannelforotherentitiesin ameansforfacilitatinglonger-termfilesharing. Wethere- thebotnet. forearguethatthethreatofsuchasystemdoesindeedexist 6 Acknowledgments andsitesneedtotakemeasurestoprotectthemselvesfrom beingusedinsuchamannerthatwehavedescribe. Theauthorswouldlike to thankArvidNorbergatBitTor- rent,Inc.forhisassistancewiththelibtorrentlibrary[5]. 5.1 Countermeasures 7 Conclusion Muchofthefeedbackthatwereceivedontheprojectwas from administrators that expressed their desire to provide We have presented an overview of Graffiti Networks, a an open wiki site that allowed anonymous contributions, new file sharing model that allows peers to subversively 7 use third-party storage sites as an intermediary for trans- [15] GARETTO,M.,FIGUEIREDO,D.R.,GAETA,R.,ANDSERENO, ferringfilesbetweenusers. Ourclient-trackerparadigmis M. A Modeling Framework to Understand the Tussle between ISPsandP2PFile-sharing Users. PerformanceEvaluation 64, 9- similar to the BitTorrent protocol, but is designed to pro- 12(2007),819–837. vide long term file availability to users while preserving [16] GUO,L.,CHEN,S.,XIAO,Z.,TAN,E.,DING,X.,ANDZHANG, their anonymity. We do not intend the Graffiti model to X.Measurements,analysis,andmodelingofbittorrent-likesystems. supplant BitTorrent networks, as it will never achieve the InProc.oftheInternetMeasurementConference(2005),pp.4–18. same maximumnetwork throughputnor will it everbe as [17] HAND, S., AND ROSCOE, T. Mnemosyne: P2P Steganographic efficient. Webelieve,however,thatourapproachcanhave Storage. InRevisedPapersfromtheIntl.WorkshoponP2PSystems a symbioticrelationshipwith existingdeployments: peers (2002),pp.130–140. would use a Graffiti Network-like system to improve the [18] HONG, G. C. StegVault: Pervasive Information Hiding in an longtermavailabilityofsharedfiles, whileleveragingthe AnonymousP2PEnvironment. Master’sthesis,NationalUniversity ofSingapore,2003. fasterinitialtransferratesofdirectP2Pcommunicationfor datadissemination. Wehaveimplementedaprototypeand [19] IZAL,M.,KELLER,U.G.,BIERSACK,E.,FELBER,P.,HAMRA, A., AND ERICE, G. L. Dissecting BitTorrent: FiveMonths ina shownthatdatacanbestoredonpublicallyaccessiblesites Torrent’sLifetime. InProc.ofthePassiveandActiveMeasurement for extendedperiodsof time, beyondwhat is often possi- Workshop(2004). bleinotherexistingpeer-to-peersystems. Afteralmostan [20] JONES, R. Gmail Filesystem. entireyear,roughly40%ofthedatathatwestoredonsites http://richard.jones.name/. thatarenotunderourcontrolwasstillavailable. Thesere- [21] KATZENBEISSER,S.,ANDPETITCOLAS,F.A.,Eds. Information sults indicate that malicious users may adopt the Graffiti Hiding Techniques for Steganography and Digital Watermarking. Network model, and thus site operatorsshould take mea- ArtechHouse,Inc.,2000. surestopreventtheirsitesfrombeingusedinthismanner. [22] KUBIATOWICZ, J., BINDEL, D., CHEN, Y., CZERWINSKI, S., EATON, P., GEELS, D., GUMMADI, R., RHEA, S., WEATH- References ERSPOON, H., WEIMER, W., WELLS, C., AND ZHAO, B. OceanStore: AnArchitecture for Global-scale Persistent Storage. [1] AmazonS3.http://aws.amazon.com/s3/. SIGPLANNotices35,11(2000),190–201. [2] Grad Student Project Uses Wikis To Stash Data, Miffs Admins. [23] LOCHER, T., MOOR, P., SCHMID, S., AND WATTENHOFER,R. http://tech.slashdot.org/article.pl?sid=09/04/13/01202F2re6e.RidinginBitTorrentisCheap. InWorkshoponHotTopicsin Networking(2006),pp.85–90. [3] MediaWiki. http://www.mediawiki.org/. [24] NAOR,M.VerificationofaHumanintheLooporIdentificationvia [4] RapidShare.com. http://www.rapidshare.com/. theTuringTest. 1996. [5] RasterbarSoftware. http://www.rasterbar.com/. [25] POUWELSE,J.A.,GARBACKI,P.,EPEMA,D.H.J.,ANDSIPS, [6] Steghide. http://steghide.sourceforge.net/. H.J. TheBittorrentP2PFile-sharingSystem: Measurementsand Analysis. InIntl.WorkshoponP2PSystems(2005). [7] MeetingtheChallengeofToday’sEvasiveP2PTraffic.WhitePaper, SandvineInc.,Waterloo,Canada,2004. [26] PURCZYNSKI, W., AND ZALEWSKI, M. Jug- gling with packets: Floating data storage. [8] ANDERSON, R. J., NEEDHAM, R. M., AND SHAMIR, A. The http://lcamtuf.coredump.cx/juggling_with_packets.txt, SteganographicFileSystem. InProc.oftheIntl.WorkshoponIn- 2003. formationHiding(1998),pp.73–82. [27] QIU,D.,ANDSRIKANT,R. ModelingandPerformanceAnalysis [9] ANDRADE, N., MOWBRAY, M., LIMA, A., WAGNER, G., AND ofBitTorrent-likeP2PNetworks. SIGCOMMC.Commun.Rev.34, RIPEANU,M. InfluencesonCooperationinBitTorrentCommuni- 4(2004),367–378. ties.InProc.oftheWorkshoponEconomicsofP2PSystems(2005), [28] RAMKUMAR, M., AND AKANSU, A. N. Capacity estimates for pp.111–115. data hiding in compressed images. IEEETransactions on Image [10] CLARKE, I., SANDBERG, O., WILEY, B., AND HONG, T. W. Processing10,8(August2001),1252–1263. Freenet: A Distributed Anonymous Information Storage and Re- [29] ROWSTRON, A., AND DRUSCHEL, P. Storage management and trievalSystem. InIntl.WorkshoponDesigningPrivacyEnhancing caching in PAST, a large-scale, persistent P2P storage utility. Technologies(2001),pp.46–66. SIGOPSOperatingSystemsReview35,5(2001),188–201. [11] COHEN,B. IncentivesBuildRobustnessinBitTorrent. InProc.of [30] STORER,M.W.,GREENAN,K.M.,MILLER,E.L.,ANDVORU- theWorkshoponEconomicsofP2PSystems(2003). GANTI,K.POTSHARDS:securelong-termstoragewithoutencryp- [12] DABEK,F.,KAASHOEK,M.F.,KARGER,D.,MORRIS,R.,AND tion. InProc.oftheUSENIXAnnualTechnicalConference(2007), STOICA,I.Wide-areacooperativestoragewithCFS.InProc.ofthe pp.143–156. ACMSymposiumonOperatingSystemsPrinciples(2001),pp.202– [31] VON AHN, L., MAURER, B., MCMILLEN, C., ABRAHAM, D., 215. AND BLUM, M. reCAPTCHA: Human-Based Character Recog- nitionviaWebSecurityMeasures. Science(August2008),1465– [13] DINGLEDINE, R., MATHEWSON, N., AND SYVERSON, P. Tor: 1468. thesecond-generation onionrouter. InSSYM’04: Proceedings of the13thconferenceonUSENIXSecuritySymposium(Berkeley,CA, [32] WALDMAN,M.,RUBIN,A.D.,ANDCRANOR,L.F. Publius: A USA,2004),USENIXAssociation,pp.21–21. Robust,Tamper-evident,Censorship-resistantWebPublishingSys- tem. InProc.oftheConference onUSENIXSecuritySymposium [14] FREEDMAN,M.J.,FREUDENTHAL,E.,ANDMAZIE`RES,D. De- (2000),pp.59–72. mocratizingcontentpublicationwithcoral. InNSDI’04: Proceed- ingsofthe1stconferenceonSymposiumonNetworkedSystemsDe- signandImplementation(Berkeley,CA,USA,2004),USENIXAs- sociation,pp.18–18. 8