ebook img

Exploration of scale-free networks PDF

0.33 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Exploration of scale-free networks

EPJ manuscript No. (will be inserted by the editor) 4 0 0 Exploration of Scale-Free Networks 2 n Do we measure the real exponents? a J Thomas Petermann and Paolo De Los Rios 6 Laboratoire de BiophysiqueStatistique, ITP-FSB, Ecole PolytechniqueF´ed´erale deLausanne, 1015 Lausanne, Switzerland. ] n n Received: date/ Revised version: date - s i Abstract. The increased availability of data on real networks has favoured an explosion of activity in the d elaboration of models able to reproduce both qualitatively and quantitatively the measured properties. . t What has been less explored is thereliability of the data, and whether themeasurement techniquebiases a them. Here we show that tree-like explorations (similar in principle to traceroute) can indeed change the m measured exponentsof a scale-free network. - d PACS. 89.75.-k Complexsystems–87.23.Ge Dynamicsofsocial systems–05.70.Ln Nonequilibriumand n irreversible thermodynamics o c [ 1 Introduction tection method. In the lack of such analysis on real data 1 andmethods, we proposeto work on syntetic models and v data to explore their robustness in some simple test case. 5 In recent years networks have become one of the most 6 promising frameworks to describe systems as diverse as In the next section we address the tree-like exploration 0 technique and discuss how it can bias the measurements, the Internet and the WWW, email and social communi- 1 andinthethirdsectionweshowthatarandomgraphcan ties, distribution systems, food-webs, protein interaction, 0 be distorted by the exploration so to look like a SF one: genetic and metabolic networks [1]. The collected data 4 in this case the exponent γ is completely spurious. 0 have allowedthe discoveryof many important properties: / inparticulartwoofthemhavebecomeprominent,namely t a the small-world [2]andscale-freefeatures[3].Small-world 2 Tree-like Exploration of Scale-Free m implies that the average distance between nodes of the network increases at most logarithmically with the num- Networks - d ber of nodes, and formalizes the concept of ”six degrees n of separation” typical in social contexts. Scale-free refers Scale-free (SF) networks can be explored in many differ- o to the lack of an intrinsic scale in some of the properties entways.Oneofthemostpopularmethods,thathasbeen c of the network. In particular, the quantitiy that has been extensively used for example for the Internet, is a sort of : v most thoroughly studied is the degree (or connectivity) tree-like exploration implemented by the recursive use of i distribution:thedegreek ofanodeisthenumberofother the traceroute command. In short, traceroute finds a path X nodes it has links to (here we do not distinguish between (usuallyashortone,butnotnecessarilytheshortest)from r a directedandundirectedlinks),andthedegreedistribution thenodewherethecommandisexecutedtoanothergiven P(k)issimply thehistogramofthe numberofnodeswith node.Byrepeatingtheprocedureaskingtraceroutetofind a given degree k. Scale-free networks exhibit a power-law paths to all other possible nodes (addressed by their IP behavior of the distribution P(k) ∼ k−γ, with γ values number), one ends up with a representation of the Inter- often between 2 and 3 [1]. The small-world and scale-free net that shows just a small amount of loops. This is due properties turned out being quite ubiquitous and some to the fact that traceroute mostly uses the same paths: general, qualitative, models to explain their appearence if a node D can be reached from A through both B and havebeen put forward.At the same time variousversions C, traceroute most of the times detects only one of them. of these models have been also proposed in order to cap- Actually, chances are that traceroute can find more than turealsothedetailedvaluesofsomequantities,suchasthe asingle pathiftraffic overanalreadydiscoveredoneis so exponentγ.Yet, as this new field is slowlycoming ofage, highthatitbecomesmoreconvenienttoswitchtoadiffer- and as a consequence it is also becomeing more quantita- ent path. Data collected with this technique have shown tive,ananalysisofthedata,andoftheirreliability,isdue. thatdegreesintheInternetaredistributedaccordingtoa Themainproblemthatshouldbeaddressediswhetherthe power-lawwithexponentγ ≃2.2±0.1 [4].Inordertoan- data we are using have been skewed somehow by the de- alyze the effects of a tree-building exploration algorithm 2 T. Petermann and P. DeLos Rios: Exploration of Scale-Free Networks onSFnetworks,wehavesynthesizedourownnetworksac- coveredforverylargenetworks,orwhether thischangeof cordingtotwodifferentmodels:theBarabasi-Albert(BA) exponent is real. A simple analytical argument in favour model [3], and the hidden variable model [5]. of this second interpretation can be formulated using the The BA model describes the growth of a network as lack of correlations in the BA model. Indeed, since there newnodesareaddedataconstantrate,andtheyconnect is no correlation between the degree of a site and the de- to older nodes in the network according to the preferen- grees of its neighbors, due to (1) there is no correlation tial attachment rule. Preferential attachment means that between the age of a node and the age of its neighbors. an old node has a probability proportional to its degree This allows us to look at exploration during the growth of aquiring a connection from a new one. It is useful to of the network. In particular we can say that, in a grow- recalla simple derivation of the degree distribution start- ing network formalism, any time a new node is added to ing from these two simple rules, growth and preferential the network, we label it as reachable if it connects to at attachment. The rate of change of the degree k of node i leasta reachablesite througha followedconnection(with i is probability p). We assume that the first site is reachable. dk (t) k (t) Then,the density ofreachablenodes attime t is givenby i i =m (1) dt 2mt dN(t) t dN(t′) m weshtaerbelismhesiswthitehnouldmebreornoefs,coannndetchtieondsenthomatinaatnoerwinnotdhee dt =1−(cid:18)1−pZ0 dt′ q(t′)dt′(cid:19) (2) right hand sideofEq.1representsthe sumoverallthede- ′ whereq(t) is the probability to choosea node introduced greesof the network.Eq.1has the simple solutionki(t)= in the network between t′ and t′ + dt′: the preferential m(t/τi)1/2,whereτiisthetimeatwhichnodeienteredthe attachment rule translates to q(t′) = 1/[2(t·t′)1/2] (this network.Sincetherelationbetweenkiandτiismonotonous trickissimilartoassigningtoeachnodeahiddenvariable wecanclassifynodesaccordingeither totheirdegreek or corresponding to the time t′ at which it entered the net- to their age τ. As a consequence we can apply the usual work, with a connection probability that depends on the formula to transform probability distributions: P(k)dk = hidden variables of both the new and old nodes; for more ρ(τ)dτ. Since nodes enter the network at a constant rate, details see below [7,8]). Since N(t) can grow at most lin- we have ρ(τ)=const and therefore P(k)∼k−3. As men- early, we make the assumption that dN(t)/dt ∼ tα with tionent above for the Internet, the exponent γ is in gen- α expected to be negative. After some algebra,and keep- eralnot equalto the BApredictionγ =3;yetit has been ing only the leading terms, we find α = (mp−1)/2: as shown that, as long the attachment rate in (1) is asymp- long as mp < 1 the density of reachable nodes decreases totically linear in k, the distribution P(k) ∼ k−γ with in time. Then, the measured degree distribution can be 2<γ <∞dependingonthe pre-asymptoticbehavior[6]. againobtained from the relationP (k)dk =ρ(τ)dτ, with m Another important feature of the BA model, that we are ρ(τ) ∼ τα, from which we obtain P (k) ∼ k−γm, with m goingto exploitfor ouranalyticalapproach,is the lack of γ = 2+mp For m = 1, p = 0.5 we have γ = 2.5, in m m correlationsbetweenthe degreeofa node andthe degrees agreementwith simulations. We expect therefore that, as of its neighbors. This is best represented through the av- longasmp<1,the measuredexponentcouldbe different erage neighbor degree knn(k), that is the average degree from the real one. of the neighbors of a node of degree k: this quantitiy is To check whether the distortion of the exponent is a essentially constant for the BA model. feature only of BA networks, we have also studied net- ThefirststepinouranalysisistobuildaBAnetwork, worksgeneratedaccordingto the hidden variables model. whosedegreedistributionisshowninFig.1.Then,starting Hidden variable networks are characterizedby a quantity fromanode(wechooseahighlyconnectednode)webegin x (the ”fitness”) assigned to every node and taken from our exploration procedure: 1) each edge connecting that some probability distribution p(x); every pair of nodes i node to its neighborsis followedwith probabilityp; edges and j is connected then with a probability q(x ,x ). As a i j that are lost at this stage are lost forever 2) from each consequence the average degree of a node of fitness x is of the reached nodes, repeat step 1) until no new nodes arereachable.Inthis procedure,edgesto nodesthathave ′ ′ ′ k(x)=N q(x,x)p(x)dx (3) already been reached are not followed. The result of this Z U algorithmisanetworkthathasfewernodesthantheorig- inal one, and, on the average, fewer links per node, and where N is the number of nodes in the network and U is thatis,topologically,atree.Theintuitiveresultwouldbe thesupportofp(x).Eq.3givesarelationbetweenk andx that every node sees just a fraction p of its edges, so that thatcanbeusedinP(k)dk =p(x)dxtoobtainP(k).Suit- ′ alldegreesshouldbereducedofafactorp,withoutconse- ablechoicesofp(x)andofq(x,x)giveSFnetworks.Here quences on the power-law behavior of P(k). Actually the weusethesameexamplesprovidedin[5],namelyZipfand effects of the probability p are much more dramatic. As exponentialfitnessdistributions.Zipfdistributedfitnesses it can be seen fromour simulations (Fig.1), the measured areinspiredbythe ideathatmanyquantitiessuchasper- exponent γ actually changes. For a network grown with sonalwealth,companysize,citypopulationandothersare m m=1 and explored with p=0.5 the measured exponent power-lawdistributed[9].Inthiscaseaconnectionproba- is close to γ = 2.5. We can therefore wonder whether bilityq(x,x′)∼x·x′ ensuresthattheresultingnetworkis m this is a crossover effect, and the correct exponent is re- SF. Using p(x)∼x−3 we obtain P(k)∼k−3 and in Fig.2 T. Petermann and P. DeLos Rios: Exploration of Scale-Free Networks 3 we show that also this network’s exponent has changed the Poisson distribution of the underlying ER network: following the exploration, with a value γ ≃2.7. whenever a node has a variable x > x , all of its connec- m c Next we look at SF networks obtaned using an expo- tions are detected, and its degree is not distorted. Since nential fitness distribution p(x)=exp(−x) and a connec- in SF networks a special role is played by hubs, that is, tion probability q(x,x′) = θ(x+x′ −x ), that is, a link nodes with a very large degree, we checked whether the c between two nodes of fitness x and x′ is present only if hubs of the explored network fall into the Poisson cutoff the sum x + x′ > x . Eq.3 yelds k(x) = Ne−xcexp(x) or in the power-law distributed part. The result, shown c and as a consequence P(k) ∼ k−2 (see Fig.3) [5]. The in the inset of Fig.4 clearly shows that the distribution of tree-like exploration of this network shows that, also in the maximum degree nicely obeys a Frechet distribution this case, the measured exponent can change, γ ≃ 1.5. P (k)=(γ−1)k−γexp(−k−γ+1) [11], with γ =2.1(1). m max We do not have at the moment an analytical derivation Such a Frechet distribution is indeed the expected dis- ofthemeasuredexponents.Indeed,degree-degreecorrela- tribution of the maxima of set of variables taken from a tionsbetweennearestneighborsandthelackofanexplicit power-lawdistributionk−γ.Thisimplies that,apartfrom time evolution hinder the formulation of some equations a 1% of the networks (we show the distribution of k max similar to (2). over 10000 networks), the maximum degree is almost al- Interestingly, in all cases we have analysed, the mea- ways drawn from the power-law part of the degree distri- sured exponent γ < γ, an indication that the explo- bution,anindicationthattypicalnetworkscanbe consid- m ration process penalizes nodes with small degree with re- ered genuinely scale-free. spectto nodes withlargedegree.This isreasonable,since a node with few connections has fewer paths reaching it (and some bottlenecks, since all these paths have ulti- 4 Conclusions mately to flow through its few connections) than a high degree node. The final result is therefore that high de- Theincreaseintheamountofrealnetworkdataisprompt- greenodesarefairlywellrepresentedinthefinaldistribu- ing the community to study networks in more detail and tion, whereas the number of nodes with few connections to elaborate models able to predict qualitatively, but also is underestimated. This intuitive picture rationalizes our quantitatively,themeasuredproperties.Yet,beforetaking numericaland analyticalfinding that the measuredexpo- these data by face value, a thorough investigation of the nent is smaller than the real one. measurement techniques is necessary to ascertain if and whatkindofdata distortionthey couldintroduce.This is customaryinphysics,wheresystematicerrorshavealways 3 Edge-Picking Exploration of Erdo¨s-R´enyi tobetakenintoaccountandpossiblytobecorrected,and the same kind of attention should be paid also to data and Complete Graphs from different disciplines. In this work we have shown, through simple examples, that tree-like explorations, in- Thelastexampleisalsoagoodexampleofhowotherlink spired by the traceroute command, can indeed skew the detection techniques can change the apparent topological datasothatthemeasuredexponentofthedegreedistribu- properties of networks in an even more dramatic way, in tionofascale-freenetworkcanchangewithrespecttothe particular if the probability to detect a link depends on real one. In the simplest case (BA networks), simulations someintrinsic propertiesofthe nodes itconnects.Indeed, and simple analytical arguments agree with each other. we can interpret the generation of SF networks from hid- We have also shown that a recently proposed model of den variables as the result of the exploration of complete SFnetworksbasedonhiddenvariablescanbeinterpreted or Erdo¨s-R´enyi[10] (ER) networks. Starting from a com- as an exploration technique leading to the appearence of plete or ER graph, we assign to each node i a variable power-law degree distribution where the underlying net- x taken from a probability distribution p(x) = exp(−x). i work is topologically much simpler. Then,we prunethe graphby discarding(thatis,they are not detected) all those edges that join nodes i and j such ThisworkhasbeensupportedbytheFETOpenProject that x +x < x , with x a properly chosen threshold. i j c c IST-2001-33555COSIN, and by the OFES-Bern (CH). This is tantamount to say that, during the explorationof thegraph,theedgebetweennodesiandj isdetectedwith probabilityq(x ,x )=θ(x +x −x ),whichbringsusback i j i j c References totheformalismusedabove,thatshowsthattheresulting network is scale-free with degree distribution P(k)∼k−2 1. R. Albert and A.L. Barab´asi, Rev. Mod. Phys. 74, 47 (data shown in Fig.3, black symbols, refer to the proce- (2002). dure over a complete graph). Fig.4 (main panel) shows 2. D.J. Watts and S.H. Strogatz, Nature (London) 393, 440 the results for a starting graph that is an Erdo¨s-R´enyi (1998). network of 12800 vertices with a connection probability 3. A.L. Barab´asi and R.Albert, Science 286, 509 (1999). p = 0.025, corresponding to a graph with average degree 4. G. Caldarelli, R. Marchetti and L. Pietronero, Europhys. <k>=320;the thresholdxc =13.The resultingdegrees Lett. 52, 386 (2000) are power-law distributed with exponent γ ≃ 2 (black 5. G.Caldarelli,A.Capocci,P.DeLosRiosandM.A.Mun˜oz, thick line). The peak for large values is what is left of Phys. Rev.Lett. 89 ,258702 (2002). 4 T. Petermann and P. DeLos Rios: Exploration of Scale-Free Networks Fig. 1. Degree distribution for a Barab´asi-Albert network Fig.2.Degreedistributionforahiddenvariablesnetworkwith grown with m = 1, with 105 nodes. Circles: original network; p(x)∼x−3andq(x,x′)∝x·x′;105nodes.Circles:originalnet- stars: explored network with p=0.5. The best fit to the orig- work; squares: explored network with p=0.5. The best fit to inal network is with γ ≃3, and to the explored network with theoriginalnetworkiswithγ ≃3,andtotheexplorednetwork γ ≃ 2.5. Inset: rescaled degree distribution k3P(k), such that with γ ≃2.7. Inset: rescaled degree distribution k3P(k), such thedatafortheoriginalnetworkareconstant,andtheresidual that the data for the original network are constant, and the power-law behavior of theexplored network is more evident. residual power-law behavior of the explored network is more evident. 6. P.L.Krapivsky,S.RednerandF.Leyvraz,Phys.Rev.Lett. 85, 4629 (2000). 7. M. Bogun˜a and R. Pastor-Satorras, Phys. Rev. E 68, 036112 (2003). 8. V.D.P. Servedio, G. Caldarelli and P. Butt`a, cond-mat/0309659. 9. G.K.Zipf,HumanBehavior andthePrincipleofLeast Ef- fort, (Addison-Wesley, Cambridge, 1949); M. Marsili and Y.-C.Zhang, Phys. Rev.Lett. 80, 2741 (1998). 10. P.Erd¨osandP.R´enyi,Publ.Math.Inst.Hung.Acad.Sci. 5,17 (1960). 11. A.A.Moreira,J.S.AndradeJr.andL.A.N.Amaral,Phys. Rev.Lett. 89, 268703 (2002). Fig.3.Degreedistributionforahiddenvariablesnetworkwith p(x) ∼ e−x and q(x,x′) = θ(x+x′−xc); 105 nodes. Circles: original network;squares: explored network with p=0.5. The best fit to the original network is with γ ≃ 2, and to the ex- plored network with γ ≃1.45. Inset: rescaled degree distribu- tion k2P(k), such that the data for the original network are constant,and theresidual power-law behaviorof theexplored network is more evident. T. Petermann and P. DeLos Rios: Exploration of Scale-Free Networks 5 Fig. 4. Mainpanel:DegreedistributionforERnetworks(p= 0.025)ofN =12800nodes,whereeverynodehasbeenassigned a variable x taken from an exponential distribution. A link betweeniandj isdetectedonlyifxi+xj >xc=13;dataare averaged over 10000 realizations. The solid line is k−2. Inset: maximumdegreedistributionforthe10000networks;thesolid line is a Frechet distribution of exponent2.1(1).

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.