ebook img

Damage detection via shortest path network sampling PDF

6.9 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Damage detection via shortest path network sampling

Damage detection via shortest path network sampling Fabio Ciulla,1 Nicola Perra,1 Andrea Baronchelli,2 and Alessandro Vespignani1,3 1Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston MA 02115 USA 2Department of Mathematics, City University London, Northampton Square, London EC1V 0HB, UK 3Institute for Scientific Interchange (ISI), Torino, Italy (Dated: January 30, 2014) Largenetworkedsystemsareconstantlyexposedtolocaldamagesandfailuresthatcanaltertheir functionality. The knowledge of the structure of these systems is however often derived through sampling strategies whose effectiveness at damage detection has not been thoroughly investigated so far. Here we study the performance of shortest path sampling for damage detection in large 4 scale networks. We define appropriate metrics to characterize the sampling process before and 1 after the damage, providing statistical estimates for the status of nodes (damaged, not-damaged). 0 The proposed methodology is flexible and allows tuning the trade-off between the accuracy of the 2 damage detection and the number of probes used to sample the network. We test and measure the efficiency of our approach considering both synthetic and real networks data. Remarkably, in n all the systems studied the number of correctly identified damaged nodes exceeds the number false a J positives allowing to uncover precisely the damage. 9 PACSnumbers: 89.75.-k,89.75.Hc,89.20.Hh 2 ] h Real world networks are often the result of a self- work by removing nodes according to different strategies p organized evolution without central control [1–3]. The andsampleagainthedamagednetworksupposingnotto - physical Internet and the World Wide Web are two pro- know the location and magnitude of the damage. Dur- c o totypical examples where the interplay of local and non- ing this sampling, we constantly monitor whether the s local evolution mechanisms defines the global structure probabilitynottohaveseenanodeexceedstheexpected . s of the network. In absence of defined blueprints, the value calculated on the basis of the sampling of the un- c only way to characterize the global structure of self- damaged graph. We define a statistical criterion to asses i s organizing systems is by devising sampling experiments, which nodes of the network can be considered damaged, y liketraceroute probingforthephysicalInternet[4–9]and andwetesttheperformanceofthemethodbylookingat h p web crawlers for the WWW [10–12], that try to infer the number of true and false positives it identifies. [ the properties and connectivity patterns of these struc- We perform numerical experiments on synthetic net- tures [13–19]. However, sampling processes are limited works, either heterogeneous, generated with the un- 2 v by time and physical constraints, and in general guaran- correlated configurational model (UCM) [25], or ho- 2 tee access only to a part of the network. In this context, mogeneous, obtained through the Erd¨os-R`eny model 8 the detection of network damage is a difficult task. The (ER) [26]. Remarkably, the magnitude and location of 9 lack of information on the exact structure of the sys- the damage can be detected with fairly good confidence. 6 temmakesitextremelydifficulttoidentifywhatdamage The accuracy improves for nodes that play a central role . 1 has been actually suffered and discriminate damaged el- innetwork’sconnectivity. Namely,thedetectionofdam- 0 ements from those that have been simply not yet probed aged hubs is more reliable than the one of peripheral 4 by the sampling process. nodes. As a practical application, we consider the dam- 1 : Herewestudynumericallytheeffectivenessofshortest age detection on the physical Internet at the level of Au- v path sampling strategies for damage detection in large- tonomous System (AS). We use the AS topology pro- i X scalenetworks. Shortestpathsamplingstrategiesareac- vided by the Dimes project [14]. In this case, to sim- r tually used in Internet probing [1, 20–24] including fail- ulate realistic damages, we remove nodes according to a ure detection, via traceroute like tools and end-to end theirgeographicalposition. Indoingsowesimulatecrit- measurements. We consider different attack strategies, ical events such as large-scale power outages, deliberate namely random or connectivity based and introduce a servers switch offs [27] or other major localized catas- global measure, M, that allows to quickly identify dam- trophic events [28]. Interestingly, also in this case our ages that induce large variations in the routing patterns methodology allows to statistically identify the extent of networks. We then propose a statistical method able and location of the damage with reasonable accuracy. toclassifynodesasdamagedorfunctioninginthecaseof The paper is organized as follows: in the section I partial network sampling. In our analysis, we first sam- we present the sampling method used. In section II we ple a given network structure via shortest path probes introduce a measure that provides a general estimation [1, 20–24] obtaining a partial representation of its nodes ofthedamageextensioninsamplednetworks. Insection and connectivity patterns. Then we damage the net- III we provide a method to infer if a single node is 2 damaged or not using a p-value test. In section IV we validate this method by applying it to the physical Internet network. Finally in section V we present our conclusions and final remarks. SHORTEST PATH SAMPLING OF UNDAMAGED NETWORKS FIG.1: Probabilitydistributionfunction(pdf)ofnodevisit The sampling of networks via shortest paths consists probabilityforundamagedUCM(bluesolidline)andER(or- in sending probes from a set of nodes that have been de- angedashedline)networks. Thetwopeaksthatdeviatefrom finedassourcestowardanothersetofnodeschosentobe the overall heavy-tailed behavior occurs at pi = 1/NS and p =1/N ,andrepresentthevisitprobabilityofsourcesand targets. Each probe travels through the network follow- i T target respectively. The curves are the average over 100 in- ingashortestpathandrecordseachnodeandlinkvisited dependent simulations. returning a path. This is, to a first approximation, what is executed by Internet mapping projects that use the traceroutetool[8]asprobingmethod. Thismethodology infers paths by transmitting a sequence of limited-time- ones. to-liveTCP/IPpacketsfromasourcenodetoaspecified target on the Internet. The nodes visited along the way • All shortest path (ASP). All possible, equivalent, sendtheirIPaddressesasaresponseandcreatethepath. shortestpathbetweenshortestpatharediscovered. Theunionofallthepathsreturnedbytheprobescreates the sampled picture of the network. However, mapping therealphysicalInternetiscomplicatedandexistingap- InthefollowingwewillusetheRSPprobingstrategy[20, proacheshavemajorlimitations[4]. Forexample,visited 29–31]. Both sources and targets are chosen randomly nodes can fail to provide their IP address, and wrong or among all the nodes. Inspired by real Internet probing outdated forwarding routes registries can result in not we investigate scenarios in which the order of magnitude optimal forwarding route indications. Our study is in- of sources is N = O(10), while the order of magnitude S spired by the traceroute tool although we assume that ofdensityoftargets,ρ ,isO(10−1). Alongwiththeraw T our probes follow shortest paths in the network neglect- number of discovered nodes, we also keep track of the ing the real world limitations aforementioned. visit probability p for each visited node i, defined as the i Here we focus on undirected and unweighted networks ratiobetweenthenumberofshortestpathprobespassed constituted of N nodes and M edges G(N,M). We fix through the node i and the total number of probes sent a set of sources S = (s ,s ,...,s ) and a set of target 1 2 NS NS ×NT: T = (t ,t ,...,t ) among the N nodes, being N and 1 2 NT S N the total number of sources and targets respectively. T For each pair of nodes, taken in the two sets, a shortest NS(cid:80)×NT δ i,j path probe is sent. After all of the NS ×NT probes are p = j=1 , (1) executed,alltheresultingpathsaremergedinasampled i N N S T network that we denote as G∗(N∗,M∗). Here and in the rest of the paper the star symbol indicates sampled where δ is equal to 1 if the node i is seen by the probe quantities, so N∗ is the number of discovered nodes and i,j j. In the limit in which both N and N approach N, M∗ the number of discovered links via to the shortest S T the probability p become the betweenness [20]. Instead, path sampling. in more realistic cases in which the number of sources Ingeneral,foreachsource-destinationpairwecanhave and targets is small the nodes visit probability is just two or more equivalent shortest paths. More precisely, an approximation of this quantity [20]. Considering this therecouldbedifferentstrategiestonumericallysimulate limit, we show the distribution of p in Figure 1 in both the shortest path probing: UCM and ER networks with 105 nodes. The number of • Uniqueshortestpath(USP).Theshortestpathbe- sources is 15 and the target density is 0.2. The curves showapowerlawbehaviorexceptforthepresenceoftwo tween a node i and a target T is always the same peaks, representing the visit probability of sources and independently of the source S. Each shortest path targets. Thepeakforlargevaluesofp istheconsequence is selected initially and they will never change. i of the visit of sources, and appears in correspondence of • Randomshortestpath(RSP).Eachshortestpathis p = p = 1/N , ∀i ∈ S. The other peak is due to i source S selectedeverytimerandomlyamongtheequivalent targetsandoccursforp =p =1/N , ∀i∈T[37]. i targets T 3 DAMAGE DETECTION In order to introduce damage in the network, we con- sider that N nodes are not functional, i.e. a fraction D ρ = N /N, of nodes and all their links are removed D D from the network G. We define the damaged network G (N ,M ) where the subscript D denotes damage. D D D Damaged nodes are selected either randomly or accord- ing to a degree based strategy in which nodes are re- moved according to their position in the degree ranking (hubs first or leafs first). Although probes target nodes can be damaged, we assume that no sources are dam- aged. In these settings we aim at inferring the damage using shortest path sampling by looking at the number of nodes discovered before and after the damage occurs. Shortest path probes are sent between each pair of source-targetnodes. Thesamplednetworkafterthedam- age G∗ differs in general from the sampled view G∗ of D FIG. 2: The behavior of M is shown for three different the original undamaged network because of the changed damagestrategies: random(redsquares),smalldegreenodes topologyduetothemissingnodes. Toquantifythedam- first (green diamonds) hubs first (blue circles). (A) UCM age we introduce the quantity scale-free graph. Inset: the behavior of M when high degree nodesareremovedfirstiscomparedtothequantity(cid:104)l(cid:105)ρ as av M =1− ND∗ (2) a function of ρD. (B) ER random graph. For UCM network N∗ the minimum indicates the enhanced discovery given by the lack of hubs. Each plot is the median among 10 independent whereN∗ andN∗ arethenumberofdiscoverednodesin assignment of sources and targets. D theundamagedanddamagednetworkrespectively. Ifno damage occurs, the number of nodes discovered in G is D similar to the one discovered in G so that N∗ (cid:39)N∗ and strategy considering two cases in which nodes are re- D M (cid:39) 0 [38]. If less nodes are seen in GD than in G then moved in increasing (hubs first) or decreasing order of M > 0, with M = 1 representing the extreme case in degree. which no nodes are discovered in the damaged network. Figure 2 shows M as a function of ρ for the three D Interestingly, the quantity M can assume also negative different attack mechanisms, for the two different types values. Indeed, it is possible to see more nodes in GD of network. The top panel presents data for UCM net- respect to G. Although this case may sound counterin- work. Therandomnodesremovalstrategygivesthesame tuitive at first, a closer look to the effect in the topology qualitativebehaviorastheoneinwhichthesmalldegree induced by removing nodes, clearly explain its meaning. nodes are attacked first. This is not surprising, since Indeed, by removing some central nodes in the network the probability that a randomly selected node has small (in the next section we discuss this point in details) the degreeisextremelyhighduetothepower-lawdegreedis- lengthoftheshortestpathsmightincreaseonaverageas tribution of the network. The two strategies select, on well as the number of discovered nodes. average,thesamecategoryofnodes. Abigdifferencecan be noted when nodes are attacked in decreasing order of degree (high degree nodes, hubs, are attacked first). For Numerical simulations small values of ρ the value of M assumes negative val- D ues,meaningthatmorenodesarediscoveredinthedam- We measure the quantity M in damaged homogenous aged network than in the undamaged one. Indeed, hubs and uncorrelated heterogenous networks [1–3, 32], gen- act as shortcuts for network connectivity. Their failure erated through the ER and the UCM algorithm, respec- causesthereroutingofprobestowardlowerdegreenodes tively. The network size is fixed at N = 105 nodes, and and the consequent growth of the average length of the the number of sources and target density are NS = 15, shortest paths (cid:104)l(cid:105). As ρD increases, this trend is con- and ρ = 0.2, respectively. The average degree is k¯ = 8 trasted by the progressive fragmentation of the network T for both the topologies. The exponent γ for UCM is in many disconnected components. 2.5. As mentioned above, sources and targets are ran- In order to estimate how much the network has been domly selected. We consider different damage strategies fragmentedbythedamagingprocesswedefinethequan- in which the removed nodes are selected at random or tity ρ as the average of the ratio between the nodes av considering their degree. We further divide the latter of the components in which each source is located, and 4 thetotalnumberofnodesN oftheundamagednetwork. After a certain amount of nodes are removed, the graph undergoes disconnection and more than one component appears. At this point a shortest path probe can reach only the nodes belonging to the component where the source is located. Components with no sources will be no longer accessible. ρ is a decreasing function of ρ av D assumingthevalue1whenthereisnodamage,whereasit becomes ρ = N /N in the limit of ρ (cid:39) 1, when only av S D thesourcessurviveandeachofthemconstitutesonecom- ponent. Neither (cid:104)l(cid:105) or ρ alone explains the presence of av theminimumofthequantityM intheplots. Insteadthe product of the two (cid:104)l(cid:105)ρ does: it represents the average av number of nodes discovered by each shortest path probe rescaled by the number of nodes effectively available to be discovered. The relation of this quantity with the FIG.3: PrecisionαasafunctionofC for(A)hubsremoval minimum for M is shown in the inset of Figure 2A. The with fraction of removed nodes ρD = 0.001 and (B) random argumentaboveisconfirmedbythebehaviorofM inER nodesremovalwithfractionofremovednodesρD =0.01. The blackdashedlineindicateα=0.9. Dotsrepresenttheaverage graphs. Here,removingthenodeswithhigherdegreehas over100independentsimulationsanderrorbarsillustratethe amuchsmallerimpactonthetopology,andconsequently 95% confidence interval. there is no increase in the amount of nodes discovered in the damaged graph G with respect to the original one D G. The plot of M for hubs removal in the ER network NT. The p-value test consist in imposing the equality does not show a minimum and substantially the damage between this quantity and an arbitrary confidence level detection works similarly for all damage strategies. C: C =(1−p )τi (3) i SINGLE NODE DAMAGE DETECTION Note that after imposing the equality, τ has the index i i as for p . This is because τ is different for each node. i i While the measure M quantifies the damage at the By taking the logarithm on both side of the equation we global level, it does not provide any information about obtain: specific nodes of the network. In this section we address lnC the damage of individual nodes by assuming that the τ = (4) i ln(1−p ) information gathered during the exploration of the un- i damaged network constitutes the null hypothesis of our If the node i has not been seen at least once before τ i measure, namely that no one of the nodes is damaged. probeshavebeensentthenwecanstatethatiisdamaged WestartbymonitoringthenetworkG assumingthatitis withstatisticalconfidenceC. Hereweareassumingthat not damaged. Every time we send a shortest path probe the visit probability of nodes does not change after the weobtainabetterapproximationofthesamplednetwork damage. Thisholdswhenthedamageisarelativelysmall G∗ with increasing number of discovered nodes N∗. At perturbationanddoesnotchangetheconnectivityofthe the same time we collect information about how many network or its dynamical properties [3, 34, 35]. After times a probe passes through a node i resulting on visit the p values have been determined for all the nodes, i probabilityp definedinEq.1. Thenetworkisthendam- the value of C tunes the number of probes to be sent i aged according to one of the strategies discussed above, before declaring a node damaged. If C is selected to be andsampledviashortestpathprobes. Bydefinition,any large, nodes will be considered as damaged much earlier node that is discovered during the sampling is not dam- but with a small statistical confidence, leading to a large aged. However, the situation is less clear for nodes that number of false positive (FP) detections. Conversely, if havenotbeendiscovered. Indeed,thereasonwhyanode C is set to be small, more probes are needed to state is not seen can be either that it is actually damaged or if a node is damaged or not. The accuracy improves that the sampling has missed it because the damage has andthefinalresponsewilleventuallyreturnonlyactually altered the shortest path routing. damaged nodes, the true positive damaged nodes (TP). In order to infer the state of undiscovered nodes we Ontheotherhandthenumberofprobesneededtoreach use a p-value test [33] applied to the visit probability p . this level of statistical confidence will be much higher i More precisely, we calculate the probability (1−p )τ of resulting in a longer sampling process. i not seeing the node i after a number τ of shortest path The value of C is an input of the method and it can probes. τ can assume any integer value from 1 to N × be chosen by opportunely tuning the trade-off between S 5 FIG.4: Probabilitydistributionfunction(pdf)ofnodesvisit FIG. 5: Recall r as a function of C for (A) hubs removal probabilityintheundamagedUCM(A)andER(B)networks with fraction of removed nodes ρ = 0.001 and (B) random D restricted to nodes that will be later damaged with two dif- nodes removal with fraction of removed nodes ρ = 0.01. D ferentstrategies.Curvesaretheaverageover100independent Dots represent the average over 100 independent simulations simulations. and error bars illustrate the 95% confidence interval. a poor but fast sampling that produces high number of FPs and an accurate but slow sampling that generates so determinant and the visit probability of high degree more TPs. We evaluate the performances of the damage nodesisalmostindistinguishablefromtheoneofrandom detection strategy measuring its precision and recall. In nodes. For the same value of C in Eq. 4 the difference particular, the precision, α, is defined as in p between UCM and ER translates in smaller values of τ for the UCM network. This leads to higher number i TP α= . (5) of FPs and hence smaller precision. In Figure 3B we TP +FP show the same curve for the random damaging strategy The recall, r, is instead setting ρD = 10−2. In this case the behavior of α in the two topologies is very similar. Such behavior is due TP to the p distributions that in both networks span the r = , (6) TP +FN entire range of possible visit probabilities irrespective of the topology reproducing the same behavior shown in where FN indicates the number of false negatives, i.e. Figure 1. nodes damaged but not detected. In a given network precision and recall are functions of the parameter C. In Figure 3 we plot α for different values of C in both In Figure 5 we study the recall r as a function of C. UCM and ER networks. In the top panel we remove In this case we can see that r reaches 1 for all the values top degree nodes and set ρ = 10−3. As expected α of C investigated when damaging hubs in an UCM net- D increases as C decreases. Interestingly, in the case of work. All damaged nodes are detected during the sam- UCM topologies the increase is slower. Indeed, we can pling. This is consequence of the very large visit proba- notice that α reaches an arbitrary level of 90% (dashed bilities for high degree nodes in the UCM network. In- line) for C =10−10 while in the case of ER networks the deed,largep scombinedwithC viaEq.4,producesmall i same level is reached for C = 10−5. The extremely low values of τ , allowing all removed nodes to be promptly i valuesofC,aboveallinthecaseoftheUCMnetwork, is declareddamaged. Thedownsideofthiseffectisthelack justified by the presence of the logarithm function at the of precision for large values of C (see Figure 3). In the numerator of Eq. 4. Considering the big absolute value ER network, instead, the presence of low visit probabil- of the denominator for big p , very low values of C are ity nodes among the ones in the top degree ranks makes i required to have values of τ that possibly ranges from 0 their discovery more lengthy. Indeed, in this case larger i to N ×N . The quite different value of C in the two values of C are required to produce τ s small enough to S T i networks for top degree nodes removal can be explained allow the algorithm to declare the nodes damaged. It is considering the distribution of p of the damaged nodes crucial to stress that τ = N ×N . Any node that i max S T in case of hubs removal as shown in Figure 4. In the for a given C is characterized by τ > τ will not be i max UCM network top degree nodes have much higher visit evaluated by the algorithm. In the bottom panel we see probability respect to the rest of the nodes. In the ER the recall for random nodes removal in both topologies. network, instead, the role of high degree nodes is not Here the behavior of r is similar for both UCM and ER 6 FIG.6: Recallr(blucircles)andnormalizednumberoffalse FIG. 7: Recall r (blu circles) and normalized number of positivefp(orangesquares)forhighdegreenodesremovalas false positive fp (orange squares) for random nodes removal a function of number of probes in UCM (A) and ER (B) as a function of number of probes in UCM (A) and ER (B) networks. The fraction of removed nodes is ρ =0.001. The networks. The fraction of removed nodes is ρ = 0.01. The D D values of confidence level C are 10−10 and 10−5 for UCM valueofconfidencelevelC is10−3 forboththeUCMandER and ER networks respectively. Points are the median among network. Points are the median among 100 realization with 100realizationwithindependentchoiceofsourcesandtarget. independentchoiceofsourcesandtarget. Errorbarsillustrate Error bars illustrate the 95% confidence interval. the 95% confidence interval. networks as consequence of the similar visit probability that a specific number of probes equal to distributions. lnC lnC τ = = (7) targets ln(1−p ) ln(1−1/N ) targets T Numerical simulations in synthetic networks is necessary to be able calling targets as damaged or not. Since targets are 20% of the nodes, once τ is targets Weapplythestatisticalcriteriondevelopedinthepre- reached a conspicuous amount of nodes is characterized vious section to the two types of synthetic networks, bythep-valuetest. Itisworthnotingthatonlythenum- UCM and ER, with 105 nodes and two damaging strate- ber of TPs increases in correspondence with the jump, gies, high degree and random nodes removal. We send andfpdonotexhibitanydiscontinuity. Thismeansthat shortestpathprobesfrom15sourcestoanumberoftar- wehaveabetterviewofthedamagewithoutaffectingthe gets equal to a fraction ρT =0.2 of total nodes. In order accuracy. tocomparetheresultsofthispartofthestudyfordiffer- Let us now consider ER networks subject to removal of enttopologiesanddamagestrategieswearbitrarilyfixed nodes in decreasing order of degree. In Figure 6B we the C value to the one correspondent to a precision of plot r and fp as a function of τ. As for the case of UCM α=0.9 in each system. network, the recall increases, even if slower, and reaches Let us first consider UCM networks subject to the re- the maximum values at 0.9. Similar behavior for both moval of the top hundred nodes ranked according to the the topologies is observed in the case of random removal degree(ρD =10−3). InFigure6weplottherecall,r,and of nodes (see Figure 7). the normalized number of FPs, fp = FP/(TP +FN) as a function of τ. Interestingly, the recall reaches 1 quickly. The absence of the hubs is promptly detected NUMERICAL DAMAGE DETECTION IN by the method. In Figure 7 we show the behavior of the GEOLOCALIZED NETWORKS same quantities in the case of random removal of nodes considering ρ = 10−2. In this case the recall increase In this section we consider a sample of the real In- D slowlywhilefpremainsconstantafteraninitialincrease. ternet topology network at the level of ASs where each An interesting feature of the r curve is the presence of a nodeisanautonomoussystemofknowngeographicallo- jump. This jump is the consequence of the peak in the cation [13, 21, 36], and links represent the physical con- distribution of p that is mapped into τ via Eq. 4. It oc- nectionsamongthem. Topologiesareavailablefordown- i i curs at the value of τ corresponding to p = 1/N loads in the DIMES project webpage [14]. We focus on targets T and is caused by the enhanced visit probability of tar- the largest connected component of ASs that is made by get nodes. Since targets are assigned randomly, in UCM 32852 nodes. networkstheyaremostprobablysmalldegreenodesthat We test damage detection in two relevant classes of are visited just if they are set to be targets. This imply realistic attacks: all nodes in the same country are at- 7 FIG. 8: Europe map showing the detected Italian damaged FIG.9: MapofpartoftheUnitedStateseastcoastshowing nodes in the real AS network. Each circle represent one AS theoutcomeafterdamagingnodesaroundthecityofBoston and its size is proportional to the degree. Green circles are within a radius of 50 km in the real AS network. Each circle the working nodes, blue ones are the TPs while orange ones represent one AS and its size is proportional to the degree. aretheFPs. Nodeslocatedatseaareeffectoffiniteaccuracy Green circles are the working nodes, blue ones are the TPs of geographical coordinates provided. while orange ones are the FPs. tacked, and all nodes inside a radius ξ with epicenter E ure 9. In both cases of geographical damaging the recall are attacked. Both of these strategies are geography- is almost constant and close to the value 0.2 as shown based but they provide different scenarios. The first in Figure 10. Indeed, the algorithm is detecting almost represents a deliberate shut down, for instance as al- only the fraction of damaged nodes that are also target. legedly happened in several countries during the Arab Becauseofthehomogeneousdistributionoftargetnodes, spring [27]. The second one is referred to localized event this fraction corresponds to ρT =0.2. The choice for the such as blackouts, earthquakes, or others catastrophic statistical confidence affects more the measures related events[28]. Alsointhiscasewefixthenumberofsources to local damaging (Figure 10B). Here, for big values of N =15 and the density of targets ρ =0.2. According C the precision drops while the recall slightly increases. S T to one of the two geographic based strategies we remove This means that a less strict choice of C allows the dis- N nodesfromtheoriginalASnetwork. Themaindiffer- covery of more nodes. However the FPs grow more than D encebetweenthiscaseandthosediscussedintheprevious theTPs. Soalittlegaininrecalliscontrastedbybigloss sections is that the networks here have geographical at- in precision. As for the artificial networks, in the case of tributes. The measure of damage detection should then entire country damaging we choose C to achieve a pre- be able to return not only the entity of the damage as a cision of 0.9 (C = 10−2). In the case of local damaging whole, but also tell us where the damage is localized. there is no value of C that allow to reach such a preci- sion. For this reason and considering the diverse nature We use the same method already discussed for syn- ofthetwostrategieswefixthearbitraryvaluetoα=0.75 thetic networks. As an example of entire country switch (C =10−5). Despite the recall never exceeds 0.3 in both off we decided to damage all the AS nodes in Italy. This damage detections this is a good result considering the translatesinremovingN =1246nodes,equaltoafrac- D small number of damaged nodes, the completely random tion ρ = 0.038 of total nodes. Figure 8 shows the out- D displacement of source and target nodes and the lack of come of our analysis. We want to stress that the algo- any ad-hoc search strategy. rithm does not have any a priori information about the locationofthedamage. Despitethat,themethodclearly returns Italy as affected country. Few other nodes are wrongly classified as damaged. The reason for the pres- CONCLUSIONS ence of FPs can be just statistical fluctuations or, more interesting, that some FP nodes turn out to be strongly In this paper we addressed the problem of damage de- linked to the Italian TPs, so that the deletion of the lat- tection in large-scale networks. We assessed the effec- ter prevents them to be visited. For the second type tiveness of shortest path probing for damage detection of geographical damage we decide to switch off all the in the case of incomplete network sampling. We consid- ASs within a radius of 50 km around the city of Boston, ered different network topologies, damage strategies and MA, in the USA. This corresponds to N = 176 and defined basic metrics for the measurement of damage. D ρ = 0.0054. Also in this case the method is able to We provided a statistical criterion for the classification D detect correct location of the damage as shown in Fig- (damage/undamaged) of single nodes based on the p- 8 Agency or the U.S. Government. [1] Newman, M.E.J., Networks, an Introduction (Oxford University Press, 2010). [2] Caldarelli,G.,Scale-FreeNetworksComplexWebsinNa- ture and Technology (Oxford Finance Series, 2007). [3] Barrat, A. and Barth`elemy, M. and Vespignani, A., Dynamical Processes on Complex Networks (Cambridge University Press, 2008). [4] Z. M. Mao, J. Rexford, J. Wang, and R. H. Katz, in Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (ACM, New York, NY, USA, 2003), FIG. 10: Precision α (blu circles) and recall r (orange SIGCOMM ’03, pp. 365–378, ISBN 1-58113-735-4, URL squares)asafunctionofC forItaliannodesremoval(A)and http://doi.acm.org/10.1145/863955.863996. damagingofnodesaroundthecityofBostonwithinaradius [5] M. Luckie, Y. Hyun, and B. Huffaker, in Internet of 50 km (B) in the real AS network. Measurement Conference (IMC) (Internet Measurement Conference(IMC),Vouliagmeni,Greece,2008),pp.311– 324. [6] B. Huffaker, D. Plummer, D. Moore, and k. claffy, in value test. Although this criterion allows false positives, Symposium on Applications and the Internet (SAINT) i.enodeswronglyconsideredasdamaged,itispossibleto (SAINT, Nara, Japan, 2002), pp. 90–96. fine tune the statistical confidence level in order to opti- [7] M. Luckie, in Proceedings of the 10th ACM SIG- mize the trade-off between precision and probing load in COMM Conference on Internet Measurement (ACM, thesystem. Thenumericalinvestigationaccordingtothis NewYork,NY,USA,2010),IMC’10,pp.239–245,ISBN criterionallowsthestudyofdamageinpartiallysampled 978-1-4503-0483-2,URLhttp://doi.acm.org/10.1145/ networkswithtunableprecision. Inthecaseofreal-world 1879141.1879171. [8] Van jacobson, traceroute, ftp://ftp.ee.lbl.gov/ network such as the Internet AS graph, we damaged the traceroute.tar.gz, URL ftp://ftp.ee.lbl.gov/ networkaccordingtogeographicalfeaturesthatsimulate traceroute.tar.gz. criticaleventsonspecificareasordeliberateshutdownof [9] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson, an entire country, as for political reasons. Also in this IEEE/ACMTrans.Netw.12,2(2004),ISSN1063-6692, cases, our methodology is able to identify the entity of URL http://dx.doi.org/10.1109/TNET.2003.822655. the damage and, more importantly, its location. [10] Bing, L., Web Data Mining (Springer, 2011). Themethodwehaveproposedcanrepresentafirststep [11] F. Menczer, G. Pant, and P. Srinivasan, ACM Trans. Internet Technol. 4, 378 (2004), ISSN 1533-5399, URL towardsastrategyforthecontinuousmonitoringoflarge- http://doi.acm.org/10.1145/1031114.1031117. scale,self-organizingnetworks. Possiblevariationsofthe [12] S. M. Mirtaheri, M. E. Dincturk, S. Hooshmand, G. V. shortest path sampling can be envisioned and combined Bochmann,G.-V.Jourdan,andI.-V.Onut,inCASCON with more elaborate diffusive walkers strategies that op- (2013). timizes network discovery. Furthermore, we have stud- [13] The cooperative association for internet data analysis, iedonlytherandomdisplacementofsourcesandtargets. caida, http://www.caida.org/home/, URL http:// Detection of damages could be improved by opportune www.caida.org/home/. [14] The dimes project http://www.netdimes.org, URL choice of sources and targets or by different schedule of http://www.netdimes.org. probes delivery. This points remain to be addressed in [15] Topology project, electric engineering and computer sci- future works. ence department, university of michigan, URL http: //topology.eecs.umich.edu/. [16] The internet mapping project at bell labs, URL (http: ACKNOWLEDGMENTS //www.cheswick.com/ches/map/index.html. [17] Rocketfuel: An isp topology mapping engine, http://www.cs.washington.edu/research/ The authors thank Bruno Gon¸calves for valuable networking/rocketfuel/, URL http://www.cs. discussions and the Dimes Project for providing Au- washington.edu/research/networking/rocketfuel/. tonomous System topology free to download. This work [18] Routeviews project,URL(http://www.routeviews.org. was partially funded by the DTRA-1-0910039 award to [19] T. Bu, N. Duffield, F. L. Presti, and D. Towsley, SIG- A.V. The views and conclusions contained in this doc- METRICS Perform. Eval. Rev. 30, 21 (2002), ISSN 0163-5999,URLhttp://doi.acm.org/10.1145/511399. ument are those of the authors and should not be in- 511338. terpreted as representing the official policies, either ex- [20] L.Dall’Asta,I.Alvarez-Hamelin,A.Barrat,A.Va`zquez, pressed or implied, of the Defense Threat Reduction 9 and A. Vespignani, Theoretical Computer Science 355, 201 (2004). 6 (2006). [31] D. Achilioptas, A. Clauset, D. Kempe, and C. Moore, J. [21] R. Pastor-Satorras and A. Vespignani, Evolution and ACM 56, 4 (2009). Structure of Internet: A Statistical Physics Approach [32] R. Albert and A.-L. Baraba´si, Rev. Mod. Phys. 74, 47 (Cambridge University Press., 2004). (2002). [22] K. Claffy, T. Monk, and D. McRobb, Nature (1999). [33] G. Cowan, Statistical Data Analysis (Oxford Science [23] Y. Shavitt and E. Shir, SIGCOMM Comput. Commun. Publications, 1998). Rev. 35, 71 (2005), ISSN 0146-4833, URL http://doi. [34] R. Cohen and S. Havlin, Complex Networks: Structure, acm.org/10.1145/1096536.1096546. Robustness and Function (Cambridge University Press, [24] A. Abdelkefi, Y. Efthekhari, and Y. Jiang, CoRR Cambridge, 2010). abs/1209.5074 (2012). [35] J. Platig, E. Ott, and M. Girvan, Phys. Rev. E [25] M.Catanzaro,M.Bogun˜a´,andR.Pastor-Satorras,Phys. 88,062812(2013),URLhttp://link.aps.org/doi/10. Rev. E 71, 027103 (2005), URL http://link.aps.org/ 1103/PhysRevE.88.062812. doi/10.1103/PhysRevE.71.027103. [36] A. Va´zquez, R. Pastor-Satorras, and A. Vespignani, [26] Erdo¨s, P. and R´enyi, A, Publ. Math. 6, 290 (1959). Phys. Rev. E 65, 066130 (2002). [27] A. Dainotti, C. Squarcella, E. Aben, K. C. Claffy, [37] Bothsourcesandtargetsarerandomlyselected,sowecan M. Chiesa, M. Russo, and A. Pescap´e, in Proceedings assume that in average they will be visited N times if T of the 2011 ACM SIGCOMM conference on Internet theyaresourcesorN timesiftheyaretargets.Consider- S 2m0e1a1s)u,rIeMmCen’t11c,onpfpe.re1n–c1e8,(IASBCNM,97N8e-1w-4Y50o3rk-1,0N13Y-0,,UUSRAL, ingthisandapplyingEq.1wegetpi =NjS(cid:80)=N1Tδi,j/NSNT = http://doi.acm.org/10.1145/2068816.2068818. NT = 1 and p = NS = 1 , respectively [28] J.Chen,V.S.A.Kumar,M.V.Marathe,R.Sundaram, [38] TNhSeNTnumbNeSrN∗maiydiffNeSrNfTromNNT∗ alsowhennodamage M. Thakur, and S. Thulasidasan, Technical report, Vir- D is present because of the not deterministic behavior of ginia Tech (2011). the shortest path algorithm. As we wrote in section we [29] A. Lakhina, J. W. Byers, M. Crovella, and P. Xie, Tech- usetheRPSprobingstrategy,thatreturnrandomlyone nical Report BUCS-TR-2002-021, Department of Com- of the possibly equivalent shortest path hence providing puter Sciences, Boston University (2002). different view of the same network [30] T.PetermannandP.DeLosRios, ExplorationofScale- Free Networks - Do we measure the real exponents? 38,

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.