On the robustness of power-law random graphs in the finite mean, infinite variance region 8 0 Ilkka Norros and Hannu Reittu 0 2 VTT Technical Research Centre of Finland n a J Abstract 7 ] We consider a conditionally Poissonian random graph model where the mean degrees, R ‘capacities’, follow a power-tailed distribution with finite mean and infinite variance. Such P a graph of size N has a giant component which is super-small in the sense that the typical . h distance between vertices is of the order of loglogN. The shortest paths travel through a t a coreconsistingofnodeswithhighmeandegrees. Inthispaperwederiveupperboundsofthe m distance between two random vertices when an upper part of the core is removed, including [ thecasethatthewholecoreisremoved. 1 v 9 7 1 Introduction 0 1 . 1 Power-law random graphs (or ‘scale-free random graphs’) have become popular objects of both 0 appliedandtheoreticalinterest,becausetheyaresimpletodefineandgenerateandyetsharesome 8 0 important characteristics with many large and complex real-world networks. The mathematical : models help to understand how and on what conditions those characteristics emerge, and this v i understandingcanbevaluablealsointhedesignofnew,artificialnetworks. X r The general characteristic of power-law random graphs is that their degree distribution possesses a a regularly varying tail, the most interesting region of tail exponents being that of finite mean and infinite variance. Several models with these features have been studied. In this paper, we work with the conditionally Poissonian random graph [7], a modification of the expected degree sequence model of Chung and Lu [3]. We draw an i.i.d. sequence of mean degrees, ‘capacities’, thatfollowaParetodistributionwithfinitemeanandinfinitevariance. Our graph of size N has a giant component which is ultra-small in the sense that the typical distance between vertices is proportional to loglogN (more precisely, this is an upper bound, like most of our distance results), whereas a similar random graph whose degrees have the same mean but finite variance cannot offer better than logN scalability of distances. Remarkably, the speciality of the infinite variance case is the emergence of a ‘core network’ consisting of nodes with high degrees, through which the short paths typically travel. It is now natural to ask, what happens to these distances, if vertices with highest degrees are deleted. Our main findings can be summarizedasfollows: 1 (i) Any deletion of core vertices leaves the asymptotical size of the largest, giant component intact. (ii) If vertices with capacity higher than Nγ are deleted, γ not depending on N, then paths that would have gone through the deleted vertices can be repaired using ‘back-up paths’ that increase the distances only by a constant depending on γ but not on N. This increase is negligibleontheloglogN scalingofdistances. (iii) If the whole core is removed (for the exact meaning of this, see Section 2), the distances stillscaleslightlybetterthanlogN. Because of structural similarity (compare [8] with [7]), we conjecture that our results are trans- ferable to the configuration model of Newman et al. [6, 8]. Perhaps they can be extended even to preferential attachment models in the finite mean and infinite variance region, since van der Hof- stad and Hooghiemstra showed recently that these models share with the previously mentioned modelstheloglog-scalabilityofdistances[5]. The next section specifies the graph model and reviews the relevant results of [7]. Some of them are slightly sharpened in Section 3. Section 4 applies results on the diameter of classical random graphs to measure the core network in the horizontal direction. The main result on robustness againstcorelossesisstatedandproveninSection5. 2 Model and earlier results Throughout the paper, we work with the following conditionally Poissonian power-law random graph model [7]. There are N vertices that possess, respectively, i.i.d. ‘capacities’ Λ ,...,Λ 1 N withdistribution P(Λ > x) = x−τ+1, x ∈ [1,∞); τ ∈ (2,3). GivenΛ = (Λ ,...,Λ ),eachpairofvertices{i,j},isconnectedwithE edges,where 1 N ij (cid:18) (cid:19) N Λ Λ (cid:88) i j E ∼ Poisson , L = Λ , ij N j L N j=1 andtheE ’sareindependent. Loops,i.e.thecasei = j,areincludedforprinciple,althoughthey ij havenosignificanceinthepresentstudy. GivenΛ,thedegreeofvertexithenhasthedistribution Poisson(Λ ). TheresultingrandomgraphisdenotedasG . i N The following coupling between neighborhood shells and branching processes will play an im- portant role, as it did in [7]. Fix N, and assume that the sequence Λ has been generated and probabilities are now conditioned on it. We shall consider a marked branching process, i.e., a branching process where each individual is associated with some element of the mark space {1,...,N}. More specifically, let us define a process (Z,J) = (Z ,(J )), where Z is the size n n,i n of generation n, and J ∈ {1,...,N} is the mark of member i of generation n. We set Z ≡ 1 n,i 0 and take J from the uniform distribution U{1,...,N}. The process then proceeds so that an 0,1 2 invidual bearing mark i gives for each j = 1,...,N birth independently to a Poisson(Λ Λ /L ) i j N distributednumberofj-markedmembersofthenextgeneration. On the other hand, we consider the neighborhood sequence around a random vertex of G . Take N avertexi fromuniformdistribution,anddefinerecursively 0 N (i ) = {i } 0 0 0 (cid:40) (cid:32) (cid:33)c (cid:41) k (cid:91) N (i ) = j ∈ N (i ) : j ↔ N (i ) . k+1 0 l 0 k 0 l=0 Thefollowingcouplingwasprovenin[7]. Proposition2.1 Let (Z,J) be the marked branching process defined above. Define a reduced processbyproceedinggenerationbygeneration,i.e.,intheorder J ;J ,J ,...,J ;J ,..., 0,1 1,1 1,2 1,Z1 2,1 and pruning (that is, deleting) from (Z,J) each individual, whose mark has already appeared before, together with all its descendants (these are considered as not seen in the procedure). ˆ ˆ ˆ Denote the resulting finite process by (Z,J), and let J be the set of the marks in generation k of k ˆ the reduced process. Then, the sequence of the sets J has the same distribution as the sequence k N . k Moreover, we observed that the size of each generation and its marks can be generated indepen- dently. Denotebyq thedistribution N . Λ j q (j) = , j = 1,...,N, (1) N L N and by π∗ the mixed Poisson distribution Poisson(Γ), where Γ is a random variable with distribu- tion P(Γ ∈ dx) = xP(Λ ∈ dx)/E{Λ}. Obviously,thedistributionofPoisson(Λ ),J(N) ∼ q ,convergesweaklytoπ∗ inprobability. J(N) N Let J(N),J(N),... be i.i.d. random variables with distribution q , and let J(N) be a random vari- 1 2 N 0 ablewithuniformdistributionon{1,...,N},independentofthepreviousones;weoftensuppress the superscript ·(N). Start now by (Z˜ = 1,J ), and proceed generationwise by drawing the size 0 0 ofthen+1’thgenerationfromthedistribution (cid:88)Mn ˜ Zn+1 ∼ Poisson ΛJi, i=Mn−1+1 where M = 0, M = (cid:80)nZ˜ , and giving then each individual a fresh mark from the sequence −1 n 0 k J ,J ,.... Mn+1 Mn+2 3 Proposition2.2 Themarkedgenerationsequence(Z ,(J ))isstochasticallyequivalentto n n,i ˜ (Z ,(J ,...,J )). n Mn−1+1 Mn Wenextturntothedefinitionofthecorenetworkmentionedintheintroduction. Fixanincreasing function(cid:96) : N → Rsuchthat (cid:96)(N) (cid:96)(N) (cid:96)(1) = 1, → 0, → ∞ asN → ∞, (2) logloglogN loglogloglogN anddefine (cid:96)(N) (cid:15)(N) := . logN Notethat (cid:15)(N) → 0, N(cid:15)(N) = e(cid:96)(N) → ∞ asN → ∞. (3) Definerecursivelythefunctions 1 (cid:15)(N) β (N) = + , 0 τ −1 τ −2 β (N) = (τ −2)β (N)+(cid:15)(N), j = 1,2,..., (4) j j−1 anddenote (cid:24) (cid:25) . loglogN k∗ = . −log(τ −2) For N sufficiently large, we have β (N) > (cid:15)(N)/(3 − τ), and the sequence (β (N)) 0 k k=0,1,... decreases toward the limit value (cid:15)(N)/(3−τ). With k = k∗, we are already in the (cid:15)(N) order of magnitude: (cid:32) (cid:33) ∞ (cid:88) 4−τ β (N) ≤ (cid:15)(N) 1+ (τ −2)i = (cid:15)(N). (5) k∗(N) 3−τ i=0 The key of our analysis of G is the notion of its core C, consisting of all vertices with capacity N higherthanNβk∗. Notethattheexactboundaryofthecoredependson(cid:96)(N)andisthussomewhat arbitrary. Bythedefinitionof(cid:15)(N),wecanchooseanaturalnumberκ = κ(N)suchthat κ(N) κ(N) → ∞, → 0, asN → ∞, (6) Nθ(cid:15)(N) k∗(N) where θ = (τ −2)(4−τ)/(3−τ). We can now collect the main results of [7] to the following theorem: Theorem2.3 (i) The graph G has a.a.s. (asymptotically almost surely) a giant component, N whoserelativesizeapproachesthevalue ∞ 1−(cid:88)P(D = j)P(cid:0)Z(π∗) = 0(cid:1)j, (7) ∞ j=1 4 where D is distributed as the conditionally Poissonian variable Poisson(Λ), and (Z(π∗)) is n aGalton-Watsonbranchingprocesswithoffspringdistributionπ∗. (ii) A random vertex of the giant component is connected with the core C in less than κ(N) hops. The distribution of the capacity of the first core vertex found in a random-order breadth-first neighborhood search is asymptotically identical to the distribution of J(N), (cid:110) (cid:111) nC wheren = inf n : J(N) ∈ C . C n (iii) Arandomvertexofthecoreisa.a.s.connectedwiththevertexi∗ withhighestcapacityinat mostk∗ hops. (iv) Asaconsequenceofthepreviousitems,thedistancebetweentworandomlychosenvertices ofthegiantcomponentisatmost2k∗(N)(1+o(1))a.a.s.. (cid:8) (cid:9) DenoteT = i ∈ {1,...,N} : Λ ∈ (Nγ,Nδ] . Theideaoftheproofofclaim(iii)ofTheorem γ,δ i 2.3isthatifthecoreisdividedinto‘tiers’ V = {i∗}, 0 V = T \{i∗}, 1 β1,∞ V = T , k = 2,...,k∗, k β ,β k k−1 then,foranyk > 0,arandomvertexoftierk hasa.a.s.anedgetoV ∪···∪V . k−1 0 3 Auxiliary results In this section we sharpen the estimate used in [7] for the aggregated capacity of a set of form {i : Λ > Nγ}. In particular, we obtain that the aggregated capacity of a tier T is asymp- i γ,γ+(cid:15)(N) totically overwhelmingly larger than all strictly higher vertex capacities together. This holds also whenγ dependsonN. Weusefrequentlythenotationx+I := {x+y : y ∈ I},wherex ∈ RandI ⊂ Risaninterval. Lemma3.1 Letα = α(N) ∈ (0,α¯),whereα¯ < 1/(τ −1). Then,forN sufficientlylarge, (cid:32)(cid:80)N Λ 1 1 (cid:33) (cid:18)E{Λ}(cid:19)−τ+1 P i=1 i {Λi>Nα} ∈ ( ,4) ≥ 1− N−(1−(τ−1)α¯)(τ−2). N1−(τ−2)αE{Λ} 4 2 Proof Denote p(n) = P(cid:18)n1 (cid:80)ni=1Λi (cid:54)∈ (1,2)(cid:19), N = (cid:88)n 1 . E{Λ} 2 α {Λi>Nα} 1 5 SinceP[Λ > y|Λ > x] = P(xΛ > y),wehavethedistributionequality (cid:88)N (cid:88)Nα Λ 1 =D Nα Λ˜ , (8) i {Λi>Nα} k i=1 k=1 ˜ wheretheΛ ’sarefreshindependentcopiesofΛ. Then k (cid:32) 1 (cid:80)Nα Λ˜ 1 (cid:33) P Nα k=1 k (cid:54)∈ [ ,2] = E{p(N )} E{Λ} 2 α (cid:18) (cid:19) 1 (cid:110) (cid:111) ≤ P N < E{N } +E p(N )1 . (9) α 2 α α {Nα≥21E{Nα}} The distribution of N is Bin(N,N−(τ−1)α). The well-known bounds for a binomial random α variableX, x2 P(X < E{X}−x) ≤ exp(− ), 2E{X} x2 x3 P(X > E{X}+x) ≤ exp(− + ), 2E{X} E{X}3 yield (cid:18) (cid:19) (cid:18) (cid:19) N 1 N P α < ≤ e−E{Nα}/8, P α > 2 ≤ e−E{Nα}/8+1. (10) E{N } 2 E{N } α α Bythewell-knownresultonsubexponentialvariableswithfinitemean(see,e.g.,[4]), (cid:32) (cid:33) n 1 (cid:88) P Λ > 2E{Λ} ∼ P(max(Λ ,...,Λ ) > nE{Λ}) ∼ E{Λ}−τ+1n−τ+2. i 1 n n i=1 For the large deviation to the opposite direction, the classical Cramér theorem applies and the probabilitygoestozeroatexponentialspeed. Thus,forN sufficientlylarge, 3 (cid:18)E{N }(cid:19)−τ+2 E{p(N )} ≤ e−E{Nα}/8 + E{Λ}−τ+1 α . (11) α 2 2 Combining (8)-(11), replacing α by the worse case α¯ and upperbounding the exponential terms byreplacingthefactor3/2in(11)by2wegettheassertion. (cid:3) Lemma3.2 Let α (N),α (N) ∈ (0,α¯), where α¯ < 1/(τ − 1), and α (N) ≤ α (N) − (cid:15)(N). 0 1 0 1 Then,forN sufficientlylarge, (cid:32)(cid:80)N Λ 1 1 (cid:33) (cid:18)E{Λ}(cid:19)−τ+1 P i=1 i {Λi∈(Nα0,Nα1]} ∈ ( ,5) ≥ 1−2 N−(1−(τ−1)α¯)(τ−2). N1−(τ−2)α0E{Λ} 5 2 Proof ThisfollowsbyapplyingLemma3.1tobothα andα andnotingthatN−(τ−2)(α1−α0) → 0 1 0. (cid:3) 6 Lemma3.3 Foranyfixedb > 1, (cid:80)N Λ 1 i=1 i {Λi>Nb(cid:15)(N)} → 0 inprobability. (cid:80)N Λ 1 i=1 i {Λi>N(cid:15)(N)} Proof ByLemma3.1, (cid:80)N Λ 1 1 i=1 i {Λi>Nb(cid:15)(N)} ·N(b−1)(τ−2)(cid:15)(N) ∈ [ ,16] a.a.s., (cid:80)N Λ 1 16 i=1 i {Λi>N(cid:15)(N)} andtheclaimfollowssinceN(cid:15)(N) → ∞. (cid:3) Lemma3.3,combinedwithourearlierresultsin[7],entailstheimportantobservationthatthefirst contacttothecoreisa.a.s.closetoitsbottom: Proposition3.4 Fix b > 0. Let I be a random node of the giant component of G . Asymptoti- N N (cid:8) (cid:9) callyalmostsurely,IN isconnectedtoTβk∗,bβk∗ withapaththatavoidstheset i : Λi > Nbβk∗ , andwhoselengthcoincideswiththedistancebetweenI andthecore. N Proof Sincetheproportionalsizeofthecoreshrinkstozero,itsufficestoconsiderthecasethat I is picked from outside the core. By the results in [7], conditional that I is connected with N N the core, one of the minimal length paths to the core has its core endpoint distributed as the first core element in an i.i.d. sequence J ∈ {1,...,N} with common distribution (1). Now, β is n k∗ proportional to (cid:15)(N), and Lemma 3.3 can be applied with β in the role of (cid:15)(N). We conclude k∗ that the first core element in the sequence J belongs a.a.s. to the set T , and the statement n βk∗,bβk∗ ofthepropositionfollows. (cid:3) 4 Horizontal paths in the core Inthissection,weshowthatquitethintiersoftheformT areinternally(asinducedsubgraphs) γ(cid:48),γ(cid:48)(cid:48) almost connected in the sense that the proportional size of the largest component approaches one and, moreover, the diameter of that largest component is obtained with high accuracy from remarkable results on classical random graphs. For deterministic γ, the tiers T are a.a.s. γ−(cid:15)(N),γ connected, and their diameters can be picked almost unequivocally from the following classical theorem[1]. Theorem4.1 Consider the Erdös-Rényi random graph G and let p = p(n) and d = d(n) > 2 n,p satisfy logn −3loglogn → ∞ d pdnd−1 −2logn → ∞ pd−1nd−2 −2logn → −∞ ThenG hasdiameterda.a.s. n,p 7 Toincludealsolowerpartsofthecore,weapplythefollowing,ratherrecentresultbyChungand Lu[2]. Theorem4.2 Ifnp → ∞and(logn)/(lognp) → ∞,thenalmostsurely logn diam(G ) = (1+o(1)) , n,p lognp wherediam(G)denotesthediameterofthelargest(giant)componentofG. Denote (cid:24) (cid:25) 1−(τ −1)γ w(γ) := . (12) (3−τ)γ Proposition4.3 Let γ(N) be a non-increasing function such that γ(N) ∈ [(cid:15)(N), 1) for all N. If 2 limγ(N) > 0, assume also that γ(∞) does not correspond to a jump of the ceiling function in (12). Then,thereexistsanotherfunctionγ (N)suchthat ∗ γ (N) < γ(N), γ (N)/γ(N) → 1, |T | → ∞, ∗ ∗ γ∗(N),γ(N) and diam(T ) = (1+o(1))w(γ(N)) a.a.s., (13) γ∗,γ wherediam(·)isgenerallyinterpretedasinTheorem4.2. Whenγ(N)isboundedawayfromzero, T isa.a.s.connectedanditsdiameterisa.a.s.equaltow(γ(N)). γ∗,γ Proof WeshallsuppresstheN’sinγ(N)etc.whenitimprovestheclarityofpresentation. Consideredasaninducedsubgraph,T isdenser(resp.sparser)thantheclassicalrandomgraph γ∗,γ G (resp.G )with n,p∗ n,p∗ N2γ∗ N2γ n = n(N) = |T |, p = p (N) = , p∗ = p∗(N) = . γ∗,γ ∗ ∗ L L N N Denote γ (N) = γ − δ(cid:15)(N), where δ ∈ (0,1). We fix δ for a while and choose γ = γ . By δ ∗ δ Lemma3.2,|T | → ∞,and γ∗,γ |T | 1 γ∗,γ ∈ [ ,5], a.a.s. N1−(τ−1)γ∗ 5 Ontheotherhand, L /N 1 N ∈ [ ,2] a.a.s. E{Λ} 2 Thus,a.a.s., logn ∈ (1−(τ −1)γ )logN +[−log4,log4], ∗ log(np ) ∈ (3−τ)γ logN −logE{Λ}+[−log8,log8], ∗ ∗ log(np∗) ∈ (2γ −(τ −1)γ )logN −logE{Λ}+[−log8,log8]. ∗ 8 Note that p∗(N) → 0 and n(N)p (N) → ∞, so Theorem 4.2 applies to G and G . We ∗ n,p∗ n,p∗ distinguishbetweentwocases. First, if γ(N) is bounded away from zero, we are in fact in the regime of Theorem 4.1, and a computation of the diameter d leads to the expression of w(γ). Thus, Theorem 4.1 yields the inequalities (cid:24) (cid:25) 1−(τ −1)γ ∗ diam(T ) ≤ a.a.s., γ∗,γ (3−τ)γ ∗ (cid:24) (cid:25) 1−(τ −1)γ ∗ diam(T ) ≥ a.a.s. γ∗,γ 2γ +(1−τ)γ ∗ Now, the right hand sides are with sufficiently large N both equal to the right hand side of w(γ), andtheclaimfollows. Thevalueofδ didnotmatter. The second case is that γ(N) (cid:38) 0. Since n(N)p (N) → ∞, the subgraph still has a giant com- ∗ ponent whose proportional size approaches one, although it is not necessarily connected. Choose anysequenceδ (cid:38) 0,anddenote k (cid:26) (cid:27) (3−τ)γ 2γ +(τ −1)γ δ η = max , k . k (3−τ)γ (3−τ)γ δ k Notethatη → 1. NowdefineasequenceN as k k N = inf(cid:110)M : P(cid:18)|T |N2γδk ≥ k(cid:19) ≥ 1−1/k ∀N ≥ M, k γ ,γ δk L N (cid:18) (cid:19) 1 (cid:111) P diam(T ) ∈ (η−2,η2)· ≥ 1−1/k ∀N ≥ M . γ∗,γ k k (3−τ)γ Thanks to Theorem 4.2, the N ’s are finite numbers, and obviously they grow unboundedly. Fi- k nally,denote k(N) := max{k : N ≤ N} k andchooseγ (N) = γ . Thenobviously ∗ δ k(N) 1 diam(T ) = (1+o(1)) a.a.s. γ∗,γ (3−τ)γ(N) (cid:3) In a similar way, we can prove a diameter result for the tiers V = V(N), defined at the end of k k Section2: Proposition4.4 The tiers V ,...,V , considered as induced subgraphs of G , are almost con- 0 k∗ N nectedinthesensethattherelativesizesoftheirlargestcomponentsapproach1asN → ∞,and theirdiameterssatisfy,a.a.s., diam(V ) = 0, 0 diam(V ) = 2, 1 diam(V ) ∈ (1+o(1))[w(β ),w(β )], k = 2,...,k∗. k k−1 k 9 ByProposition4.3,wecansaythatthewidthofthecoreatheightNγ,measuredbyhopdistance, isw(γ). Moreover,thisholdsalsoforvaryingγ = γ(N)downtothebottomofthecore. Whenγ ∈ (0, 1),wehave 2 w((τ −2)γ) > w(γ)+2. This has the following heuristic consequence. For connecting two vertices with capacities ≈ Nγ, a ‘horizontal’ path staying at about the same height in the core is longer than a path that jumps at bothendswithonehoptotheheightN(τ−2)γ andfindsthehorizontalconnectionatthatlevel. On the other hand, this is how high in the core the neighborhood of a vertex typically reaches. Thus, ahorizontalmoveshouldbemadeatashighlevelaspossible. 5 The main result We have now essentially collected all elements needed to understand the situation where a top part of the core is deleted. First, since the aggregated capacity of any top part is concentrated at its bottom, the remains of the core are found from outside almost as easily as the original core was found. Second, if a lower part of the core is alive, we can find ‘vertical’ paths from the bottom of the core to a top tier of its remaining part, as we did with the undamaged core in [7] (a complete proof of this step requires a slight modification of the proof of Proposition 4.3 of [7] andispresentedbelowindetail). Third,Proposition4.3guaranteestheexistenceofa‘horizontal’ path between almost any pair of vertices within this top tier, and provides also an explicit upper boundforitslength. Itremainstocouplethesepiecestogether. Theorem5.1 Let γ := γ(N) ∈ ((cid:15)(N), 1) be such that γ(N) is non-increasing and γ(N)/(cid:15)(N) 2 non-decreasing w.r.t. N. Denote by H = H(N) the graph obtained from G by deleting all γ N vertices with capacity higher than Nγ, together with the edges attached to them. The following hold: (i) ThegraphH hasa.a.s.agiantcomponentwhoserelativesizeapproachestherelativesize γ ofthegiantcomponentofG . N (ii) The distance of two randomly chosen vertices of the giant component of H is a.a.s. less γ than (cid:18) (cid:18) (cid:19) (cid:19) 2 1 (1+o(1)) loglogN −log +w(γ) , (14) −log(τ −2) γ wherew(γ)isgivenin(12). Proof 1o. Denote γ = γ, 0 γ = γ −(cid:15)(N), 1 0 γ = (τ −2)γ +(cid:15)(N), k = 1,2,..., k+1 k 10