ebook img

DTIC ADA619170: Defending Tor from Network Adversaries: A Case Study of Network Path Prediction PDF

1.5 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA619170: Defending Tor from Network Adversaries: A Case Study of Network Path Prediction

Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302 Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number 1. REPORT DATE 3. DATES COVERED 2015 2. REPORT TYPE 00-00-2015 to 00-00-2015 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Defending Tor from Network Adversaries: A Case Study of Network 5b. GRANT NUMBER Path Prediction 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Naval Research Laboratory ,Washington,DC,20375 REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT The Tor anonymity network has been shown vulnerable to traffic analysis attacks by autonomous systems and Internet exchanges, which can observe different overlay hops belonging to the same circuit. We evaluate whether network path prediction techniques provide an accurate picture of the threat from such adversaries, and whether they can be used to avoid this threat. We perform a measurement study by collecting 17.2 million traceroutes from Tor relays to destinations around the Internet.We compare the collected traceroute paths to predicted paths using state-of-the-art path inference techniques. We find that only 20.0% of predicted paths match paths seen in the traceroutes. We also consider the impact that prediction errors have on Tor security. Using a simulator to choose paths over a week, our traceroutes indicate a user could expect 10.9% of paths to contain an AS compromise and 0.9% to have an IX compromise with default Tor selection.We find modifying the path selection to choose paths predicted to be safe still presents a 5.3???11% chance of compromise in a week while making 5.1% of paths fail with 96% failing unnecessarily due to false positives in path inferences. Our results demonstrate more measurement and better path prediction is necessary to mitigate the risk of AS and IX adversaries to Tor. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a REPORT b ABSTRACT c THIS PAGE Same as 17 unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 DefendingTorfromNetworkAdversaries 2 that may have been chosen by a Tor user.1 We find that AS client.Asaresultofthisprocess,theclientidentityisonlydi- and IX path prediction significantly overestimates the threat rectly observable in traffic between the client and the guard of vulnerability to such adversaries; at the same time, most relay, and the destination identity is only directly observable usersdorunasignificantriskofcompromisebyanAS-level intrafficbetweentheexitrelayandthedestination. adversaryasdeterminedfromthetraceroutedata,whereasIX- In order to be real-time and efficient, Tor does not mix, leveladversariesaffectonlyasmallfractionofpaths. pad,ordelaytraffic.Therefore,itisvulnerabletoattacksbased Wethenmodifyoursimulatortospecificallyavoidselect- on traffic analysis. For example, an adversary that can ob- ing paths that are vulnerable to AS or IX adversaries based serveacircuitbetweentheclientandguardandalsobetween on predictions, as has been previously suggested. We show the exit and destination can correlate the traffic patterns and thatthissignificantlylimitsthechoiceofpathsandfrequently deanonymizetheconnection[8].Thusentitiesthatcanobserve results in no paths being available for use while following parts the underlying network infrastructure, such as Internet theTorpracticeofmaintainingalong-termfixedsetofentry Service Providers or Internet Exchanges, are a serious threat guardsintothenetwork.Theselimitswouldrequirereconsid- to Tor. Previous work has shown that individual autonomous eringthealreadycomplexsetoftradeoffsinthedesignofthe systemsandInternetexchangesareinfactfrequentlyinapo- mechanismsforselectingandupdatingthesetofentryguards sition to break Tor’s security [15, 17, 22, 23, 30]. However, usedinTor[16];wenotethatthesituationismadeworseby almostallofthisanalysisusesheuristicroute-inferencetech- therecentmovetowardsusingasingleentryguardinsteadof niqueswhoseaccuracymaynotbesatisfactory.Murdochand 3[13]. Zielin´ski[30]dostudyTorsecurityagainstIXesusingtracer- Ontheotherhand,wefindthatmanyofthesefailuresare outesfromTorrelays,butthetraceroutesareperformedfrom aconsequenceofover-prediction,asweareoftenabletofind the UK only, and the analysis does not consider whether IX suitablenon-vulnerablepathsinourtraceroutedatasetdespite adversariescanbeavoidedduringpathselection. coveringonlyafractionoftheTorrelays.Ourworksuggests that a defense based on proactive path measurement, rather thanASpathmodels,islikelytobemorepracticalandoffer 2.2 Internet routing bettersecurityguarantees. Internet routing at the highest level is performed among autonomous systems using the Border Gateway Protocol 2 Background (BGP) [32]. An AS is a network with an opaque internal routingpolicy(e.g.,usingOSPF[29],IS-IS[9],RIP[27],or iBGP[32])thatroutestraffictoandfromothernetworks.BGP 2.1 Tor is a path-vector routing protocol since neighboring networks advertise the whole AS path that they will use to send traf- Tor is a popular system for anonymous communication on- fictoagivendestination.ApathisadvertisedforanIPprefix line [14]. Tor consists of a network of volunteer relays that and represents the path used for all IP addresses sharing that formanoverlaynetworkandforwardtrafficsentbyusersrun- prefix.Path-vectorroutingenableseachAStomakecomplex ningTorclients.InFebruary2015,itcontainedapproximately routingdecisionsbasedonfactorssuchasindividualcontracts 7000relaysandtransferredaround70Gbpsofdataforauser withotherASes. populationestimatedatover2000000.2 Understandingthebehaviorofsuchcomplexroutingpoli- Torusesonionroutingtoachieveanonymity.Aclientsets cies on the Internet is a challenging problem. Routers just up a connection to a destination by choosing a sequence of propagate the routes that they provide for a given neighbor threerelays,conventionallycalledguard,middle,andexit,and to use, and so different Internet vantage points reveal dif- establishingacircuitthroughthesequence.Theclientencrypts ferent subsets of global routing behavior. Sources of rout- amessageonceforeachcircuitrelay(aprocesscalledonion ing data include the Route Views Project,3 which provides encryption), sends it through the circuit, and each relay re- BGProutinginformationfrommanylargeASes,andCAIDA moves one layer of encryption before forwarding. The final Archipelago,4 which provides and analyzes traceroute data relaysendsunencryptedmessages tothedestination.There- from three teams of 17–18 monitors distributed worldwide. verseprocesshappensformessagesfromthedestinationtothe GaodescribeshowtousesuchdatatoinferAS-levelInternet 1 TorPS:http://torps.github.io/ 3 http://www.routeviews.org/ 2 https://metrics.torproject.org/ 4 http://www.caida.org/projects/ark/ DefendingTorfromNetworkAdversaries 3 routes [18]. Gao’s method uses heuristics to classify the ob- ceivedit,andthatmappingfromIPaddresstoASnumberis servedconnectionsbetweenASesbytheireconomicrelation- non-trivialduetoinaccurateWHOISinformation.Augustinet ship(viz.customer-to-provider,provider-to-customer,peer-to- al.[7]discusssimilarissuesininferringthepresenceofIXes peer, or sibling). Shortest-path valley-free routing is used to fromtraceroutes. infertheroutebetweentwohosts.Valley-freeroutingverifies Nevertheless,traceroutesdoprovideagenerallyaccurate paths have all costly customer-to-provider links first with an pictureofhowpacketsareactuallyroutedattheASlevel.Re- optionalsiblingorpeer-to-peerlinkfollowedbyonlypreferred search on AS-level routing rarely uses any real ground truth provider-to-customerlinksonthepath.QiuandGaoimprove data because routing involves the proprietary information of the accuracy of this technique by incorporating the observed many parties. Instead, traceroutes and advertised BGP paths advertisedBGPpaths[31].Inaddition,theydescribehowto arethemostfrequentlyusedsourcesofdata(e.g.,[28,31,36]), inferasetofpossiblepathsratherthanjustone.Theirresults although each has inaccuracies. A thorough comparison of showthatthesetechniquescaninfertheexactcorrectASpath thesedata[28]showedthatamongcompletedtraceroutes(the for 60% of evaluation ASes; furthermore, the exact path is typethatweconsider),approximately90%oftheirASpaths foundwithinthetop5predictedpossiblepathsfor83%ASes matchedtheadvertisedBGPpathsexactly. andwithinthetop14pathsfor86%ASes. Traceroutes serve as an important comparison point to Many links between ASes occur at Internet exchanges. AS-levelpathpredictions.Theseinferred ASpathsaremuch These are facilities that provide space and infrastructure for lessconsistentwithadvertisedBGPpaths,andinthispaper,it ASes to locate routers and establish connections. Ager et istheinferredpathsthatwearetryingtoevaluate.Only60% al.[4]describehowthelargestIXesmayprovidelinksamong ofASpathsinferredusingtheQiu-Gaoalgorithmhaveanex- hundreds of ASes and carry petabytes of traffic per day. Au- actmatchwithadvertisedBGPpathsonaverage[31].Indeed, gustin et al. [7] describe how IXes on Internet routes can be QiuandGaojustifytheirinferencemethodoverusingtracer- detected using traceroutes and an index of known IXes and outes only because “traceroute requires the access to source their IP prefixes. They identify 44000 peering relationships machinesandisresourceconsuming”. betweenASesatIXes.EachpeeringbetweentwoASesindi- Moreover, the mismatches that do exist between tracer- catesthatsometraceroutepasseddirectlyfromoneAStoan- outesandadvertisedBGPpathsoftenfavortraceroutesforTor otherthroughanIX.Discoveringsuchlinkscanimprovethe security analysis. Mao et al. [28] show that advertised paths accuracyofASpathinferencetechniques.However,aswewill missexchangeASes,siblingASes,andtailASes.TheseASes observe,itdoesn’tdiscernamongdifferentrouter-levelpaths should be included when considering Tor security, and such taken between the same two ASes, which may pass through mismatches are found in 1–3% of the completed paths they differentIXes. compare between traceroute AS paths and advertised BGP paths. For IXP inference, traceroutes are the main data used 2.3 Traceroute measurement in the literature (e.g., [7, 30]). Indeed, the IXP inferences methodologythatweapplyusestraceroutedataasaprimary Thetraceroutetoolisextraordinarilyusefulinmeasuringrout- datasource. ing behavior on the Internet. The basic algorithm iteratively issuesUDPpacketswithuniqueportsandanincreasingtime- to-live (TTL) value. Then for any ICMP Time Exceeded re- 3 Mapping Network Adversaries sponseitusesthecontainedUDPportnumbertoidentifythe TTL value used and infers that the source IP address is lo- 3.1 Measuring Internet Paths cated at that path position. There are many variations of the basicalgorithm[26]whichprovidedifferentlevelsofsuccess 3.1.1 GeneratingTraceroutes depending on the traffic engineering (e.g., filtering and load balancing)thatoccursenroute. Our measurement study consists of running traceroutes from In addition to such problems with traceroute itself, it is Tor relays to various destinations in the Internet. We use the notalwaysstraightforwardtomakeinferencesaboutInternet scamper5 networktool,whichprobesmultipledestinationsin pathsfromatraceroute.Forexample,Maoetal.[28]describe parallel,andusestechniquestoaccuratelydiscovertheInter- thedifficultiesofinferringanAS-levelpathfromtraceroutes, which include that different iterations of a single traceroute mighttakedifferentpaths,thatreportedIPaddressesmaybe from a network interface other than the one that actually re- 5 http://www.caida.org/tools/measurement/scamper/ DefendingTorfromNetworkAdversaries 4 netpathtraversedbypacketsinthepresenceofmulti-pathload outereachedtheASofthedestinationandtherearenomiss- balancing[6,24]. ing hops in the path on the boundary between ASes. For ex- Forourmeasurements,weextractedthesetofadvertised ample, an AS path “AS1 AS1 * AS1 AS2 AS3” is consid- destinationIPprefixesfromtheSeptember2013RoutingIn- eredcomplete,becausethemissinghopiscontainedentirely formation Bases (RIBs) of the Route Views routers. Each withinAS1,whereas“AS1AS1*AS2AS3”isconsideredin- relay running the measurements picks a random IP address complete.Overall,28%ofthetraceroutesyieldacompleteAS withineachoftheapproximately500Kprefixesandperforms path. We discard the other traceroutes from our analysis. We atraceroutetothatdestination.Wealsocollectedtraceroutes also identify an IX as on the path if the path contains an IP totheTorrelaysthemselvesaswellasascanofall/24IPv4 address from the list of known IP addresses of IX points as subnets,butthisdatawasnotusedfortheanalysisinthispa- outlinedinthefollowingsection. per.Wefocusontheadvertisedprefixestomaketheanalysis moretractable.Weexpectaddresseswithinaprefixtousethe same or similar routes, and our analysis of CAIDA’s tracer- 3.2 Inferring Path ASes and IXes outes to all /24 IPv4 subnets [3] found that 81% of the time traceroutes destined to the same routable prefix traversed the We are interested in comparing the AS and IX adversaries same set of ASes. Our measurement scripts are available for identifiedfromtraceroutedatacomparedtoASandIXadver- publicreview.6 sariesinferredfromASmapswhicharemucheasiertoattain andmaintain.WepredictASpathsfromsourcetodestination using Gao’s algorithm [18] to classify relationships and Qiu 3.1.2 ProcessingTraceroutes and Gao’s algorithm [31] to infer the top k paths (for k = 1 to5).WhileadvanceshavebeenmadeinclassifyingASlink WenextprocessthetraceroutestodeterminewhichASesand relationships[25],wefindthat,whenavailable,QiuandGao’s IXesanInternetpathhastraversed.First,wefilterouttracer- method of matching RIB paths is more accurate than using outesthatdonotsuccessfullyreachthedestination.Notethat graph based methods based solely on AS relationships [23]. because we use randomized destinations, in many cases the It is known that AS relationships are difficult to classify es- destinationmaynotexistormaybedown;indeed,onlyasmall pecially at the highly interconnected core of the AS graph. fraction (8%) of probes reaches their target. However, 49% Violations in the valley-free principle from advertised routes reach the AS of the destination, as determined by the Max- often indicate erroneous AS relationship classification espe- MindGeoIPdatabase[2]. ciallythroughtop-tierASes.ThereforeQiuandGao’smethod We further find that 94% of the traceroutes are missing ofprependingadvertisedroutestocompletepathsyieldsaccu- some hops from the path. In some cases, we believe missing rateresultsevenwithincorrectlyclassifiedASrelationshipsat pathsarecausedbyroutersclosetotheprobesourceratelim- thecoreoftheInternet.Sincetheprependedhopsarealmost itingtheirICMPresponses.Toaddressthis,weperformroute entirelyeasilyclassifiedcustomer-to-providerhopsatthebot- stitching, where gaps in a traceroute are filled by path seg- tom of the AS graph, improving the AS relationship classifi- mentsobservedinothertraceroutes.Forexample,ifweseea cationofthetop-levelASesdoeslittletoimproveoverallAS path “A B C D E” and another path “A B * D F,” where “*” pathpredictionaccuracy. denotesamissinghop,wecanrepairthesecondpathbyinfer- TopredictthepresenceofIXes,werecreatetheworkof ringthatthethirdhopmusthavealsobeenCinthiscase.To Augustin et al.[7]. We scraped Packet Clearing House7 and minimize inaccuracies introduced by this repair mechanism, the Peering Database8 in February of 2014 creating a list of weonlyconsiderpathsegmentsthatoriginatefromthesame 732 Internet exchange points and their known prefixes. We host, and which are contained within the same batch of 64K parsedover200milliontraceroutesfromFebruaryandMarch traceroutes, which typically occur within an hour or two of 2014collectedfromboththeCAIDAroutedIPv4database[3] eachother.Wevalidatedthisapproachoncompletepathsand andtheiPlaneproject,9 andidentifiedIXesinthetraceroutes foundthatstitchingwouldhavegivenusthecorrectASpath using the list of IXes and IP prefixes. This analysis revealed result96%ofthetime. roughly 130000 Internet exchange point peerings between We then compute the ASes corresponding to each IP in pairs of ASes. Our number is roughly twice the number of thepathusingtheGeoIPdatabase.SimilartoMaoetal.[28], weconsiderthecorrespondingASpathcompleteifthetracer- 7 http://www.pch.net 8 https://www.peeringdb.com 6 https://bitbucket.org/anupam_das/traceroute-from-tor-relays 9 http://iplane.cs.washington.edu/ DefendingTorfromNetworkAdversaries 7 missing IX of about 0.2 per hop. This result is unsurprising Simulating path selection in Tor allows us to estimate becauseiftherearenoIXpointsinthetraceroutes,thenthere which Internet hosts a user’s traffic is likely to flow over in can be no missing IXes in the inference. Unfortunately, the atypicalusecase.Thenwecanuseourtraceroutedatatode- falsepositivesforIXpointsareproblematicwithlinearlyin- terminethespecificInternetroutesthattrafficwouldtakeand creasingaveragesrangingfrom10–25foreachofourhostsil- evaluate the resulting security. Specifically, we provide new lustratingtheneedforbettermethodsinidentifyingIXpoints. estimatesforhowoftenaTorstreamflowsthroughthesame For AS adversaries, a k value of 1 or 2 seems most ap- AS or IX between the client and the guard and between the propriatetoidentifymostASadversarieswithoutcausingtoo destinationandtheexit.Whenthishappens,theASorIXisin many false positives. Higher values of k give lower rates of apositiontodeanonymizetheclient.Thisissuewaspreviously returnwhilecausingalinearincreaseinfalsepositives.Iden- studiedonlyusinginferredASpathsandIXsets. tifying IX adversaries is much more problematic. Since the In addition, using this method we provide an improved traceroutesidentifyveryfewIXadversariestobeginwith,ak evaluation of the repeatedly-proposed [15, 17] modification valueof1appearstoworkwell.Theinaccuracyofthemethod to Tor to use AS/IX path inference to choose relays that canbeseeninthefalsepositiveswhichalsoincreaselinearly are path independent, that is, that result in paths for which withkbutgreatlyover-predictthenumberofadversarieseven the same AS or IX cannot observe both the client and the with a k value of 1. The inaccuracy of AS and highly inac- destination. We modify TorPS to produce the first simulator curateIXpredictioncouldpotentiallycauseseriousproblems for path-independent Tor (to our knowledge) that reproduces when designing a system of AS/IX independence in Tor. We how path selection occurs over time, including features that analyzetheeffectsofthisinaccuracyinthefollowingsections. havethepotentialtosignificantlyaltertheeffectivenessofthe path-independencerequirement,suchasguardlistsandcircuit reuse.Weapplyourtraceroutemeasurementstotheresultsof 5 AS and IX Adversaries in Tor these simulations to evaluate the effectiveness of path infer- enceasabasisforpathindependenceinTor. Errorsinpathpredictioncallintoquestionpreviousworkthat has used path prediction to both evaluate the security of Tor 5.1 Vanilla Tor andproposechangestoTor’spathselectionbasedonpathpre- dictions.Understandingtheeffectoftheerrorsuncoveredby All of our Tor simulations run over the week of January 19– ourtraceroutemeasurementsrequirestakingintoaccountthe 25, 2014. When producing and analyzing these simulations, specificpropertiesofTor. we generally use the same data sources and inference algo- We accomplish such an analysis by simulating the Tor rithms as in Section 3.2 to produce AS path inferences, AS- protocolandnetworkatahighlevel.WeuseandadapttheTor levelIXinferences,andtracerouteIXinferences.Weusedaily PathSimulator(TorPS)10 toperformMonteCarlosimulation AS-pathinferencesconductedfromJanuary19–25,2014com- of Tor path selection by a single client. By using the hourly paredtothetracesfromeachdayofthesimulationweek.We network “consensuses” and server “descriptors” archived by also use the daily Route Views prefix-to-AS datasets to de- CollecTor,11 wecanrecreatethestateoftheTornetworkover termine routed prefixes and to map IPs to ASes. When an- theperiodwerunoursimulations,includingfeaturessuchas alyzing our simulations using traceroutes, we use all of the thenumber,bandwidths,andaddressesofTorrelaysavailable traceroutemeasurementsgatheredduringtheweekofJanuary inanygivenhour.Wesimulate“typical”useractivityusingthe 19–25, 2014. In our analysis we match a traceroute to a pair recordedvolunteertraceofJohnsonetal.[22],whichincludes ofcommunicatinghostsinTorifthesourceprefixanddesti- userbehaviorssuchaswebsearchandwebmailonaplausible nationprefixmatch. daily schedule. Over the course of a week, this schedule re- We first conduct a simulation using the default Tor path sultsin2632streams(i.e.,TCPconnectionsoverTor),eachto selectionalgorithm.Weconsiderclientscomingfrom50ofthe oneof205distinctIPaddressesoccupying168uniqueASes, top200mostcommonclientASes(asmeasuredbyJuen[23]). oneitherport80or443.Finally,werunsimulationsusingthe EachASadvertiseshundredsofpossibleprefixesintheRoute most common client ASes as measured by Juen in Fall 2011 Viewsdata.WeselectatrandomtwentyprefixesperclientAS [23]. foratotalof1000clientprefixesforthesimulations.Thesim- ulatorruns10000repetitionsofsimulatedtrafficusinginput datafromtheweekofJanuary19th–25th2014yieldingover 10 https://github.com/torps 24 million traffic streams per client prefix with 18.2 million 11 https://collector.torproject.org/ uniquestreams.WeidentifythepresenceofASandIXadver-

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.