ebook img

Raising Graphs From Randomness to Reveal Information Networks PDF

2.2 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Raising Graphs From Randomness to Reveal Information Networks

Raising Graphs From Randomness to Reveal Information Networks Róbert Pálovics1,2 András A. Benczúr1 1InstituteforComputerScienceandControl,HungarianAcademyofSciences 2BudapestUniversityofTechnologyandEconomics [email protected], [email protected] ABSTRACT We study the growth of information networks by consid- ering processes where each node and edge is added to the We analyze the fine-grained connections between the aver- 7 network only once, and no node or edge is deleted from the age degree and the power-law degree distribution exponent 1 network. Our key finding is related to how the average de- in growing information networks. Our starting observation 0 gree and the power-law degree distribution of the network is a power-law degree distribution with a decreasing expo- 2 evolves over time. More specifically, nentandincreasingaveragedegreeasafunctionofthenet- n work size. Our experiments are based on three Twitter at- • the exponent of the power-law degree distribution in the a mentionnetworksandthreemorefromtheKoblenzNetwork network decreases down to two over time, and J Collection. Weobservethatpopularnetworkmodelscannot • theaveragedegreegrowsasa+cnb,wherenisthenumber 2 explain decreasing power-law degree distribution exponent of nodes in the network. and increasing average degree at the same time. For example, as seen in Fig. 1, in graphs generated by the ] Weproposeamodelthatisthecombinationofexponential Baraba´si-Albertmodel[3],thedegreedistributionexponent I S growth, and a power-law developing network, in which new stays very close to constant. The starting point of our ob- . “homophily”edgesarecontinuouslyaddedtonodespropor- servations is that in a real network, the degree distribution s tionaltotheircurrenthomophilydegree. Parametersofthe log-log plot lines get flattened as the networks grow. c [ average degree growth and the power-law degree distribu- Weemphasizetheimportanceoftheconstantaintheav- tion exponent functions depend on the ratio of the network erage degree formula. The constant was considered negligi- 1 growth exponent parameters. Specifically, we connect the bleintheexperimentsofLeskovecetal.[15]. Inourresults, v growth of the average degree to the decreasing exponent of however, the constant helps capture the mixture of edges 6 the power-law degree distribution. Prior to our work, only that appear at random vs. as a result of common interest, 0 oneofthetwocaseswerehandled. Existingmodelsandeven and fit to the actual measurements as seen in Fig. 2. 4 theircombinationscanonlyreproducesomeofourkeynew To our best knowledge, there is no graph model yet that 0 observations in growing information networks. captures both effects simultaneously. Leskovec et al. [15] 0 observe densification, a power-law growth for the average . 1 degree. Their models apply to graphs where the exponent 0 1. INTRODUCTION of the degree distribution is less than 2 and remains con- 7 stant over time. They predict densification for networks 1 While information spread is a main effect in the world- with power law exponent larger than 2 that is the case for v: wesipdeecisaolclyiahlanredtwtoorakn,atlhyezea,catsuanlomteedchamanoincsgoofththerisspbyroLceibssenis- all of our real networks, however they give no models. i Ourexponentialgrowthandpreferentialattachmentbased X NowellandKleinberg[16]. Onlinesocialnetworksarewidely model results in networks with initially slowly growing av- used for information network analysis [2, 5, 6, 14]. While r erage degree in addition to decreasing power law exponent. a manyonlinesocialservicesexist,itishardtofinddatasets Themaindifferenceofourmodelcomparedtoearliermodels thatrevealdetailedtemporalrecordsofthenetworkgrowth. can be summarized in three points. Theconceptof growth isablendoftwomechanismsthat • The power law exponent, as in all our real networks, is can be hardly separated. First, information networks grow greater than 2 that could not be modeled in [15]. naturallybynewlyestablishedconnectionsbetweenindivid- • Ourmodelexplainstheinitialbehaviorofthedegreesasa uals. Second, onecanviewthegrowthasanetworkdiscov- natural mixture of influence and preferential attachment ery process. Interactions of users observed via information edges, also correctly predicting the ratio of these edges. flow partly reveal the hidden social network of them. • Ourmodelgeneratestheeffectsofbothincreasingaverage degree and decreasing power law exponent. As a general overview of the possible models based on ACMacknowledgesthatthiscontributionwasauthoredorco-authoredbyanem- ourobservations,networksstarttogrowatrandom,likean ployee,contractororaffiliateofanationalgovernment. Assuch,theGovernment Erdo˝s-R´enyi graph. Then certain rule such as preferential retainsanonexclusive,royalty-freerighttopublishorreproducethisarticle,ortoal- lowotherstodoso,forGovernmentpurposesonly. attachment [3] intensifies during the growth process, and WSDM2017,February06-10,2017,Cambridge,UnitedKingdom causesscale-freedegreedistributionwithadecreasingexpo- nent. The stronger the rule is, the closer the exponent of (cid:13)c 2017ACM.ISBN978-1-4503-4675-7/17/02...$15.00 DOI:http://dx.doi.org/10.1145/3018661.3018664 the degree distribution gets down to two in a more coher- 1.1 Reproducibility Alldatasetsusedinourmeasurementsareavailableinan anonymized format at our website1. Measurements covered inthenextsectionsarefurtherdetailedandillustratedwith morefiguresinJupyterNotebooks2 thatareavailableinour GitHub repository3. 2. RELATEDRESULTS InformationspreadinsocialnetworkslikeTwitterorFace- book have been analyzed in [2, 5, 6, 14]. Several properties Figure 1: Degree distribution snapshots of grow- ofthesenetworksarewidelyconsideredtobeapproximately ing networks at different sizes. Left: The Barab´asi- distributed as power law, with a few exceptions found in Albertmodelyieldsfixedexponent. Right: TheOc- [19]. Newman reviews the theoretical background of power cupy data set (see Section 4) with flattening slope lawfunctionsanddistributionsfoundinempiricaldatasets as the network grows. in [8, 17]. We rely on these calculations in Section 4, where weestimatetheexponentofseveralpower-lawdegreedistri- butions. Alargenumberofmodelsdescribinggrowingnetworksop- erate by assuming a constant average degree. For instance, themodelofBaraba´sietal.[3]iscapableofgeneratingnet- works with power-law distributions, but the average degree of the network remains fixed during the growth of the net- work. Anothermodelthatassumesconstantaveragedegree is the popular copying model [12]. However,severalmeasurementsfromthepastreportthat theaveragedegreeofagrowingnetworkincreasesovertime [4, 11, 21, 22, 24]. Some state that the average degree is a power-law function of the number of nodes. This effect has Figure 2: Growth of the average degree in the Oc- beennamedacceleratedgrowthbyDorogovtsevetal.[9],and cupy data set (see Section 4). densification law by Leskovec et al. [15]. Densification Laws. Leskovec et al. [15] mention that the larger the exponent of the densification law, the denser the network over the growing process. They demonstrate densification law on six different networks, and introduce thecommunityguidedattachmentmodelandtheforestfire model. They claim that both can lead to densification law, and give a theoretical explanation for community guided attachment. Related to our work, they derive a formula connecting the exponent of densification law and the exponent of the power-law degree distribution of the network. Their model worksifboththedegreeandthedensificationexponentsare constantovertime. Theyactuallypredictthatifthedegree distribution exponent is fixed and larger than 2, then there isnodensificationandtheaveragedegreeremainsconstant. They observe that for degree distribution exponents larger Figure 3: The Occupy data set when the number than 2 and changing over time, densification is possible in of nodes is 1,000, with several disconnected com- the network, however they give no models in this case. ponents, random node pairs, and some high degree In contrast to [15], we find that the increasing average nodes. Isolated nodes are not drawn. degreeisnotasimplepower-lawfunctionofthenetworksize. We connect this growth to the decreasing power-law degree entnetwork. Asthedegreedistributionlog-logplotflattens, distribution exponent. We propose models that reproduce the chance for very high degree nodes in a strongly skewed both increasing average degree and decreasing power law distributionincreasesthatusuallyactasthemainorganizer exponent in evolving networks. of the network structure. The intermediate stage in the life Work of Dorogovtsev et al. In [9], the notion of“ac- of a growing network is illustrated in Figure 3, with the celeratedgrowth”forpower-lawdegreedistributionwithin- numberofnodesaround1,000. Onecanobserveseveraldis- creasing average degree is introduced. Similar to Leskovec connected random edges and some larger components with etal.,theirformulasconnecttheexponentofthedegreedis- high degree nodes that have already started evolving their tribution and the accelerated growth exponent. The main internalstructure. Asourmainresult,weaimtomodelthe transitionfromarandomandmostlydisconnectedgraphto 1https://dms.sztaki.hu/en/letoltes/networkgrowth a highly organized and very skew degree distribution net- 2http://jupyter.org/ work. 3https://github.com/rpalovics/networkgrowth differenceisthattheypredictdensificationfornetworkswith of Dorogovtsev et al. and Va´zquez, vertex copying models fixed degree distribution exponents above 2 as well. How- generate networks with fixed degree distributions. ever,incontrasttoourkeyobservation,theexponentofthe Summary. We can conclude that numerous network power-law degree distribution remains fixed in all models models result in power-law degree distributions where the proposedin[9]. Furthermore,wefindadifferentgrowthfor exponentofthedistributionisthefunctionofthemodelpa- theaveragedegreeandconnectthistothedecreasingpower rameters. However, it is common in all of the listed models law exponent. thatforagivenparametersetting,thedegreedistributionre- Another work of Dorogovtsev et al. is a growing network mainsthesameduringthegrowththenetwork. Besidesthe model in [10]. The concept of the model is that while ran- aboveclassofpopularmodels,severalpapersreportpower- dom users continuously connect to the network, edges ap- law growth for the average degree. The proposed models pear at constant c rate between already existing nodes via forthiseffectmisstocapturethefactthatwhilethedegree preferential attachment. The model results in a network increases,theexponentofthepower-lawdegreedistribution with power-law degree distribution where the exponent can decreases in real networks when the exponent is above 2. be derived from the only model parameter c (see [10]). A Furthermore,inourmeasurementswereportthattheaver- similar model has been analyzed by Baraba´si et al. [4], and agedegreegrowsasa+c·nb,andthatisdifferentfromthe byChungin[7]. Notethatwhilethemodelleadstodifferent densification law. degree distribution exponents for different values of c, for a givengrowthprocesscisalwaysfixedinthecitedworksand 3. POWER-LAWDEGREEDISTRIBUTIONS the degree distribution remains the same. In this section, we summarize the main properties and Models of V´azquez. In contrast to densification law the measurement of the exponent and the average degree and accelerated growth, in [22] the average degree is mea- in graphs with power-law degree distributions. We follow sured to increase as a logarithmic function in the network Newman [17] for properties of power-law distributions and size. Va´zquez investigates network models that generate [8] for fitting the exponent. Our main goal is to review and power-law degree distributions with various exponents and emphasize seemingly simple facts that often become prob- introduces the triangle closing model. This dynamic model lematic or misleading in experimentation. We also extend has only one parameter u, and at each time step (i) with the discussion to prepare the background for our new mod- probability 1−u we add a new node to the network and els. connect it to one uniform randomly chosen previous node, and(ii)withprobabilityuwecloseapossible,previouslynot 3.1 NotationsforDegreeDistributions existing triangle in the graph. An in-depth analysis of the A graph has power-law degree distribution, if the proba- modelcanbefoundin[23],wheretheauthorclaimsthatthe bilitythatagivennon-zerodegreenodehasdegreek inthe processresultsinanetworkwithpower-lawdegreedistribu- network, p(d(i)=k) is tion and the exponent of the distribution depends on the model parameter u. While the model is capable of generat- p(d(i)=k)=Ck−α, α>2. ing networks with various degree distribution exponents, in a fixed parameter setting, the exponent of the distribution Iftheexponentαisgreaterthan1,asinallofourexample remains constant over time. networks, we can compute the value of C by Random Edge Sampling by Pedarsani et al. They (cid:90) ∞ (cid:90) ∞ questiondensificationlawin[18]andstatethatdensification p(d(i)=k)dk=1, C =1/ k−αdk=α−1. can be a result of the fact that measurements on networks 1 1 areusuallycarriedoutonedgessampledfromanunderlying The number of degree zero nodes in real networks is less fixed network. In other words, they conclude that densifi- explored in the literature. For simplicity, we introduce the cation may arise as a feature of the common edge sampling notation NZ for the fraction of nodes with non-zero degree. procedure to measure dynamic networks. They show that networkgrowthcanbeadirectconsequenceofthesampling 3.2 RelationofExponentandAverageDegree process, therefore the sampling process itself is a plausible Byasimplecomputationoftheaveragedegreeinnetworks explanation of network densification law. In Section 4.6 we with power-law distribution [8], introduce a simple experiment indicating that the growth of the average degree can not be explained by random edge d(n)=NZ·(cid:82)∞(α−1)·k−α·kdk=NZα−1 =NZ(cid:104)1+ 1 (cid:105). (1) 1 α−2 α−2 sampling. We simply shuffle our time series that contains timestamped information about the growth of the network. Let∆:=α−2. NotethatbothNZand∆candependon We notice that the uniform shuffling procedure yields sig- the size of the network n: nificantly different average degrees at different sizes of the d(n)=NZ(n)[1+1/∆(n)], (2) network. Vertex Copying Models. Another popular model or equivalently thatiscapableofgeneratingpower-lawdegreedistributions (cid:2) (cid:3) ∆(n)=1/ d(n)/NZ(n)−1 . (3) is the vertex copying model. Here at each time step we add a new node that chooses an ambassador node uniform (2) indicates that a size dependent average degree in a net- randomly. Next, we copy each of the connections of the work with power-law degree distribution is caused by ambassador node to the new node with probability q, or • NZ(n), if the fraction of non-zero degree nodes signifi- attachanewedgetothenewnodeuniformatrandomwith cantly changes with n, probability 1−q. The model leads to a power-law degree • ∆(n), if the exponent of the degree distribution changes distributionwithexponentα= 1−q. Similarlytothemodels q during the growth of the network. 3.3 Maximum Likelihood Estimate of Expo- network numberofnodes numberofedges nentandGoodnessofFit TwitterOccupy [oc] 364,649 1,004,961 Twitter15O [15] 55,976 120,632 Estimating the exponent of a power-law distribution can TwitterMH17 [mh] 1,366,691 3,489,068 be problematic in case of shifts in the distribution [20], or Facebookwallpost [fb] 46,947 183,335 when exponential cutoffs [17] modify the shape of the dis- DBLPcoauthorship [db] 1,314,050 5,362,414 tribution. To overcome these problems, specific estimates Wikipedia [wi] 1,870,709 36,532,531 have been proposed [8, 20] and reviewed in [19]. We follow Table 1: Summary of the data sets used. and extend the methodology of [8] by applying the maxi- mum likelihood approximation for the exponent based on the subset of nodes with a degree at least a certain x,  −1 αˆ=1+|{i:xi ≥x}| (cid:88) lnxxi . (4) i:xi≥x Foreachx,wemaydefinetheKolmogorov-Smirnovstatistic KS(x)betweentheobserveddistributionx andagenerated i power-law distribution of the corresponding αˆ. A common method to fit the exponent is to select x opt that yields the lowest KS(x). However, in our experiments inSection4.4,themaximumvalueturnsouttobeunstable: forseveralxvalueswithverydifferentαˆ,KSisverycloseto thatofx . Forthisreason,wewillcomparethreedifferent opt methods to fit the power law exponent in our experiments. 1. We compute the estimate αˆ with all i, i.e. x=1. all 2. We select x that yields minimum KS and use the cor- opt responding αˆ . opt 3. We use the entire set of estimates αˆ :={αˆ for x with |KS(x)−KS(x )|<0.05)}. (5) 0.05 opt 4. MEASUREMENTSONREALNETWORKS In this section, we measure the degree distribution in six realnetworks. Allthesenetworksaregrownintimebypro- cesses where each node and edge is added to the network only once, and no node or edge is ever deleted. Our first task is to measure the connection between the power law exponent, the average degree, and the fraction of nonzero degree nodes as in equations (2) and (3) as the networks Figure4: Numberofedges(top)andaveragedegree grow. We will also investigate the importance of time in (bottom) as the function of the number of nodes. how different types of new nodes and edges may appear in these graphs. user. Otherwise,weaddedtheauthoruserofthetweetasa new node without any connection to the network. 4.1 DataSets KONECT is the Koblenz Network Collection of large We use three Twitter mention networks and three com- network data sets [13]. We selected networks from the col- pletelydifferentnetworksfromtheKONECTdatacollection lection with timestamped edges evenly distributed in time, in our experiments. In Table 1 we summarize the sizes of anddegreedistributionclosetopowerlaw. Forexample,we thedatasets,byshowingforeachnetworkthefinalnumber discarded networks that were crawled for several weeks but of nodes n and the number of edges e. a significant part of their edges appeared within one day. Twitter, a rich information source, lets us define several 4.2 AverageDegree graphs: the follower network, the retweet graph formed by retweetcascades,andthementionnetwork. Inourwork,we Inourmeasurementswefirstmeasurethenumberofedges will analyze mention networks defined by authors mention- eandtheaveragedegreed:=2e/nasthenetworkgrows,as ing other users by their Twitter user names in their tweets. thefunctionofthenumberofnodesn. Fig.4(top)showsthe We collected data sets from Twitter that contain tweets growthofnumberofedgeseforthesixdifferentnetworksin aboutacertaintopic,i.e. containagivenhashtag. Weused log-logplot. Forallnetworks,thenumberofedgesincreases the OccupyWallstreet and the 15O tweet collection of [1]. super-linearly and apparently fit well to a power function, Furthermore, we used our own crawl, where we collected as in [9, 15]. tweetsabouttheMalaysiaAirlinesFlight17. Westartedto Analyzingeasthefunctionofnmayhoweverbemislead- download tweets immediately after the first news reports. ing, since e already grows at least linearly with n and the Wedefinedourgraphsbysortingthetweetsbytime. Ifa functionwillautomaticallybeveryclosetolinearaslongas tweetcontainedamentioninourdata,weaddedadirected the average degree is small. Indeed, as seen in Fig. 4 (bot- edge to the network between the author andthe mentioned tom), initially the average degree is around 1, and it slowly a b ln(c) c RMSE oc 0.8 0.29 -2.02 0.132 0.0528 15 0.72 0.21 -1.26 0.284 0.073 mh 0.3 0.24 -1.1 0.33 0.078 fb 1.1 0.93 -7.2 0.00075 0.023 db 1.1 0.41 -3.9 0.02 0.091 wi 1.4 0.6 -5.4 0.00452 0.099 Table 2: Parameters of the fitted a+c·nb curves for the average degree in case of the six data sets. ∆n ∆e Z 1 0 R 2 1 I 1 1 H 0 1 Table 3: Growth events. For each event, the orange part is added to the network. data. For our six networks at different sizes, we estimate Figure5: Fittedcurvesforthegrowthoftheaverage ∆(n)=α(n)−2 and d(n). degree for the six different networks. WefirstinvestigateNZ(n),thefractionofnodeswithnon- starts to increase in all of our networks, which results in a zerodegreeinourdata,sinceequation(2)alsoinvolvesthis lineargrowthofefor3–4ordersofmagnitude,withoutany function. The networks from the Koblenz collection do not structural reason. containzerodegreenodes,NZ(n)=1. TheTwittermention We find that the increase of the average degree as the networks include users that have adopted the certain hash- function of n can be best modeled by tag, but have not been mentioned. For this reason, NZ(n) is typically less than one for the Twitter data sets, as seen d(n)=a+cnb =a+eb·ln(n)+ln(c). (6) in Figure 7. Interestingly, NZ(n) is roughly constant in all three cases. In Fig. 5 we plot the average degrees as the function of Figure 8 shows d(n) and ∆(n) for all the six data sets at log n. We show the fit to equation (6) for each network 10 exponentially growing size of the networks. On the left, we separately. Our results indicate good fit. We summarized show the observed d(n), while on the right, the fitted ∆(n) the parameters of our fitted curves in Table 2, where we by the methods of Section 3.3, i.e. the values αˆ , αˆ and indicated the root mean squared error (RMSE) for each fit. all opt the set αˆ as in equation (5). Note that for all networks 0.05 the exponent is always above 2. 4.3 Evolving Exponent of Power-law Degree Next, for both sets of plots of in Figure 8, we show the Distributions transformationbetweend(n)and∆(n). Ontheleft,weplot Next we analyze the degree distribution of the observed the result of equation 2 when applied to all the fitted α networks that we compute at exponentially scaled sizes. In values from the corresponding plot on the right. And on Fig.6,weplotthedegreedistributionwhennreaches2i,i= the right, we show the result of equation (3) applied to the 5,6,7,...forthesixdifferentnetworks. Thedistributionsare observed average degree. The exponent is always above 2, exponentially binned and shown on log-log scale. andasitgetscloserto2,∆takesverysmallvaluesshownon For all data sets, we may visually observe that the expo- logarithmic scale, and d(n) increases in the network. Since nent of the distribution decreases as the network grows that the calculation for the average degree tends to infinity as we will confirm by the methods of Section 3.3 in the next thepowerlawexponenttendsto2in(2),thevalueishighly subsection. We may conclude that the degree distribution sensitive to the variation of the fitted exponent when it is changes in time such that the probability of larger degree close to 2. Keeping the sensitivity of the estimates near nodes increases while the power law exponent decreases. exponent2inmind,weobservethattheconnectionbetween theaveragedegreeandtheexponentestimatesareconsistent 4.4 DegreeDistributionandAverageDegree with the equations (2) and (3). In this section we connect two previous observations, the AsthemainconclusionofthemeasurementsinSections4.2– growthoftheaveragedegreeandthedecreaseofthedegree 4.4, in all of the six data sets we observe that distribution exponent. Recall from Section 3.2 that d(n) is • theexponentofthepower-lawdegreedistributiondecreases the average degree and α(n) is the degree exponent of the – i.e. nodes with higher degree appear more likely – but networkasfunctionsofthenumberofnodesn. Thesestatis- always stay above 2; ticsareconnectedviathenotion∆:=α−2inequations(2) • the average degree grows as a+c·nb; and (3). In particular, (2) predicts a connection match- • the connection between the growing average degree and ingourobservationthattheaveragedegreegrowswhilethe theshrinkingexponent,uptotheuncertaintyintheesti- exponent shrinks. Next we investigate how well the pre- mation of the exponent, is as predicted by the power-law dicted connection between degree and exponent fits to our distribution. Figure 6: Degree distributions for different sizes of the six networks. log n is indicated in the figures. 2 From left to right and top to bottom: oc, 15, mh, fb, db, wi. Figure 7: Fraction of nonzero nodes NZ(n) for the three Twitter data sets. 4.5 Edge Formation: From Random to Joint Figure 8: Left: Comparison of the average degree Interest (red curve) and its estimates computed from the Thegrowthoftheaveragedegreecouldbeanaturalresult exponentestimatesbyusing(2). Right: Comparison ofneweventsthatgenerateedgesalreadyhavingoneorboth of the estimates of ∆. The red curve is based on (3). endpoints in the network. Initially, while the network is very small, edges that connect two nodes will likely appear can appear without any connection to the already existing between nodes that have not yet been part of the network, part to the network (Z). Second, a new random edge may hence the number of edges will increase by one but at the appear with nodes that are both new in the network (R). sametime,thenumberofnodesmayincreasebyoneoreven Third, one node can appear and immediately join with a by two. Later however, when many of the eventual nodes new edge to the network (I). This event may be a result have already joined the network, the events will bring an of influence between the nodes. Finally, two existing nodes extra edge but no new nodes, thus increasing the average can connect to each other as a result of common interest, degree. possibly also involving preferential attachment that we call Inordertocharacterizeeventswithoneortwoendpoints homophily edges (H). already in the network, we distinguish four different events Tomeasuretheincreaseinaveragedegreeasthepossible intheprocessasdetailedinTable3. First,anewrootnode result of increased homophily events H, we measure the ra- actual growth of the network always yields denser graphs than random sampling from the final snapshot of the net- work. 5. MODELING WeshowedinSection4.2thattheaveragedegreeincreases in evolving networks. Next, in Section 4.3 we saw that the exponent of the degree distribution of a growing network decreases. In Section 4.4 we connected the two observa- tions, and concluded that while the evolving exponent of thepower-lawdegreedistributiondecreases,theaveragede- gree grows. Finally we demonstrated in Section 4.5 that as more and more edges appear between existing nodes, natu- rallytheaveragedegreeincreases. Weintendtogivenetwork models that capture these effects simultaneously, i.e. give a dynamic network model with the following requirements: • increasing average degree according to a+cnb, • decreasing power law exponent that tends to 2, • vanishing fraction of random edges, • increasing number of homophily edges. Asfarasweknow,thereexistsnonetworkmodelyetthat Figure 9: Ratio of different types of edges as the isabletodescribeallourobservations. Modelsthatexplain function of n. R, I, and H are defined in Table 3. increasingaveragedegreeareoftencalleddensificationlaws [15]. InSection5.1wewillreviewhowexistingdensification law models predict our other three observations. Then in Section 5.2 we detail the model of Dorogovtsev et al. [10] that is closest to our new set of models. In Section 5.3, we introduce our basic model of growing networks with de- creasingpower-lawdegreedistribution. Afterextendingthe model in 5.4, we verify our models with simulations in Sec- tion 5.5. 5.1 RelationtoDensificationLaws Modelsgivenby[9,15]foracceleratedgrowthanddensifi- Figure 10: Result of shuffling the events in the time cationlawareforgraphswithfixeddegreedistributionexpo- series of the Occupy (left) and MH17 (right) data nents. Moreover,bothofthempredictfortheaveragedegree sets. The original graph is always denser than the power-law growth, d(n)=cnb. As stated above, in our ex- randomized one. periments we measured that the average degree first slowly growsfromaconstantvaluethatisaround1. Furthermore, tio of different edge types (R,I,H) as the function of the noneoftheproposedmodelsforacceleratedgrowthandfor network size. The plots for the six data sets are in Fig- densificationarecapableofgeneratingpower-lawdegreedis- ure 9. The most important observation is that the ratio of tributionwithadecreasingexponent. Finally,inSection4.6 randomedgesRdeclines,andinfluenceeventsI areroughly weshowedthatthesimpleuniformedgesamplingproposed constant,andtheratioofH increasesasthenetworkgrows. by Pedarsani et al. [18] leads to a different network growth process. 4.6 NoFittoUniformEdgeSampling 5.2 TheConceptofPreferentialEdges Pedarsani et al. [18] state that the growth of the average degreecanbearesultofuniformedgesamplingfromahid- ClosesttoourresultisagrowingnetworkmodelofDoro- den network. We made a simple experiment to investigate govtsev et al. [10]. The model is capable of generating net- theirassumptions. Werandomizedtheedgesequenceofthe works with power-law degree distribution. In this genera- network, and measured the growth of the average degree tivemodelonepartoftheedgesareaddedbetweenalready in the randomized time series. In other words, instead of existing preferentially chosen nodes in the network. More observing the edges in a temporal order according to their specifically, at each time we perform two steps. creationtimestamps,weprocessedtheminarandomorder. • A new node is added to the network with c new edges. InFigure10wecomparethegrowthofd(n)fortherandom- The endpoints of the edges are selected at random with izedandtheoriginaltimeseriesintheOccupy(left)andthe preferentialattachment. Ifthenumberofedgesiseinthe MH17(right)datasets. Theresults,accessibleasdescribed network, then the probability that node i with degree d i in Section 1.1, are similar for the remaining data sets. gets connected to the new node is d /2e. i Accordingtoourmeasurements,thegrowthoftheaverage • cnewedgesareaddedbetweenalreadyexistingnodes. If degreecannotbecompletelyexplainedbyrandomsampling nodes i and j have degrees d and d , the probability of i j from a hidden social web. While the average degree gener- an edge between them is proportional to d d . i j ally grows in both cases, it is always larger in the original Themodelresultsinanetworkwithpowerlawexponent, timeseriesthanintherandomizedone. Inotherwords,the whosevaluecanbederivedfromasinglemodelparameterc, see[10]. Sinceforafixedc,weobtainafixedexponent,the model cannot explain the shrinking phenomenon. Similar rn models have been analyzed by Baraba´si et al. [4], and by Chung in [7], both of which yield a constant exponent. 2se 5.3 ModelI:RandomandHomophilyEdges h Next we introduce our new model that relies on prefer- ential attachment, and can generate growing networks with decreasing power law exponents. In our first growing net- n workmodelweaddateachtimestep(i)randomnewedges that connect two new nodes in the network, (ii) and edges betweenalreadyexistingnodesinthenetwork. Morespecif- ically, as indicated in Figure 11, at time t: • For some constant r, r·n(t) new random edges appear that indicate the random growth of the network. Figure 11: Visualization of Model I. At each time t, • Eachnodeiselectsothernodestoconnectwithhomophily we add rn(t) random, and 2se (t) preferential edges. edges randomly. Theexpectednumberofnewhomophily h connectionscreatedbynodeiiss·dh(i),wheredh(i)isthe that tends to 1 as t → ∞. (12) and (13) are consistent homophilydegreeofnodei,i.e. thenumberofhomophily with our measurements in Section 4.5 where we observed edges connected to node i. For a given new connection thatthenumberofrandomlyappearingRedgesdecreaseas of node i, the target node is selected by preferential at- the network grows while the number of homophily H edges tachment. In other words, the probability of selection for increases over time. node j as a new neighbor of i is d(j). In our model, at the beginning of the growth random First we analyze the growth of the average degree in the edges dominate and the network is similar to an Erdo˝s- model. The number of nodes n(t) in the network at time R´enyigraph. Then,theaddedpreferentialedgesslowlystart t can be derived from the number of created random new to dominate the process as computed in (13). This effect edges by time t. Each new random edge brings in two new pushes the network towards a scale-free graph. The model nodes, therefore of Dorogovtsev et al. [10] adds preferential edges between dn already existing nodes at constant rate c at each time step. dt =2rn, n(0)=N0, n(t)=N0e2rt, (7) Theirmodelresultsinpower-lawdegreedistributionwitha fixedexponent. Inourmodelasthenumberofpreferentially where N is the initial number of nodes in the network. 0 added edges increase during the growth, the exponent will The number of generated random edges e (t) is half of the r start to decrease. We verify this assumption that is based number of nodes, on the derivations of [10] by simulations in Section 5.5. e (t)=1/2·n(t)=N /2·e2rt. (8) r 0 5.4 Model II: Extension with Influenced and The number of preferentially added edges eh(t) can be RootNodes derived from a separated equation. At each time step the In this section we extend our model of random and ho- expected number of preferentially crated edges by node i is mophily edges by introducing two additional events. s·d (i), therefore h • We add p·n(t) random influenced nodes that newly con- deh =(cid:88)sd (i)=2se ,e (0)=H ,e (t)=H e2st, (9) nect to the network with one edge. For a new influenced dt h h h 0 h 0 node, the influencer is selected uniform random from the i already existing nodes in the network. where H0 is the initial number of the homophily edges. • We add q ·n(t) random root nodes that appear in the As the number of edges e(t) = eh(t)+er(t), the average network with 0 degree. d(t) is then With similar notations, the number of nodes is d(t)= 2(er(t)+eh(t)) = N0e2rt+2H0e2st (10) dn =(p+q+2r)n, n(0)=N , n(t)=N e(p+q+2r)t. (14) n(t) N0e2rt dt 0 0 =1+ 2H0e2(s−r)t =1+ 2H0 n(t)rs−1 (11) The number of influenced nodes ni(t), the number of zero N0 N0s/r degreerootnodesnz(t),andthenumberofnodesaddedby random edges n (t) are then Note that the model generates a growing average degree r thatincreasesfromconstant1. Theexponentofthepower- n (t)= p N e(p+q+2r)t, (15) law second term is s/r−1 that is larger than 0 if s>r. i p+q+2r 0 The fraction of added random edges is n (t)= q N e(p+q+2r)t, (16) z p+q+2r 0 e (t) 1 1 r = = . (12) 2r er(t)+eh(t) 1+eh(t)/er(t) 1+2H0/N0erst nr(t)= p+q+2rN0e(p+q+2r)t. (17) If s > r, that tends to 0 as t → ∞. Similarly, the fraction Hencethenumberofinfluenceedges,rootedgesandrandom of homophily edges is specified by the sigmoid function, edges are e (t) 1 er(t)h+eh(t) = 1+N0/(2H0)erst (13) ei(t)=ni(t), ez(t)=0, er(t)=1/2nr(t). (18) Figure 12: Results of simulations for Model I. Dif- Figure 13: Comparison of estimated and calculated ferent values of s correspond to different colors as b values for Model I. Calculated values are based on noted above. Left: d(n) Right: ∆(n). the setting of s and r, b=s/r-1. The estimated val- uesarecomputedfromthesimulatedaveragedegree Thedifferentialequationforthenumberofhomophilyedges curves. remains the same, deh =(cid:88)sd (i)=2se ,e (0)=H ,e (t)=H e2st (19) dt h h h 0 h 0 i The average degree based on the previous formulas is 2(e (t)+e (t)+e (t)) d(t)= r i h (20) n(t) 2 r+p N e(p+q+2r)t+2H e2st = p+q+2r 0 0 (21) N e(p+q+2r)t 0 2(r+p) 2H = + 0e(2s−p−q−2r)t (22) p+q+2r N 0 = 2(r+p) + 2H0 n(t)p+2qs+2r−1. (23) p+q+2r N2s/(p+q+2r) 0 Notethattheprocessgeneratesagrowingnetworkwhere the average degree grows by d(n)=a+c·nb, where • a= 2r+2p . In other words, the fraction of nodes added p+q+2r by influence p, the fraction of nodes added as root q, and the fraction of nodes added by random edges r sets to- gether the constant term. • b= 2s −1 is the exponent of the power-law growth. p+q+2r • c= 2H0 is the multiplier of the power-law term. N2s/(p+q+2r) 0 Themodeliscapableofgeneratingnetworkswithisolated nodes as in our Twitter data sets. In case of networks from the Koblenz Network Collection, the average degree is al- ways greater than 1, and there are no root nodes in the growing network. By setting the proper fraction of random edgesrandinfluencednodesp,andsettingq=0,themodel Figure 14: Results of fitting the model to real data is also capable of modeling these networks. sets. Left: Occupy, right: Facebook. Top: Growth 5.5 VerifyingtheModelsbySimulations of the average degree. Middle: Decrease of the ex- ponent of the degree distribution. Bottom: Visual- First we investigate the properties of networks generated ization of the degree distributions at different net- by Model I of Section 5.3. In Fig. 12 we fixed H0 = 2,r = work sizes. 0.05,andransimulationsatdifferenthomophilyratess. On the left, we show the increase of the average degree, while Next we turn to analyzing how Model II fits to real net- ontheright,theestimateof∆whilegrowingthesimulated works. We present the experiments for Occupy and Face- network. Note that different colors correspond to different book; for the other networks, see Section 1.1. For both values of s, and we ran 40 different simulations in case of data, we compute parameters p,q,r,s,N and H from the 0 0 each parameter setting. The results indicate that s sets the values of a,b and c in Table 2. Then we analyze d(n) and velocityofthegrowthofd(n). Furthermore,∆(n)decreases ∆(n) in networks simulated by the corresponding parame- in our simulations at each value of s. ters. Specifically we set p = 0.002, q = 0.022, r = 0.038, InFig.13,weshowthatsimulationsfitwelltothemodel s = 0.0645, N = 14, H = 2 for Occupy, and p = 0.0089, 0 0 equation (11). We estimated b from the observed function q = 0.0, r = 0.04, s = 0.0857, N = 85, H = 2 for Face- 0 0 d(n) and compared that to s/r−1 that we calculated from book. the model parameters. Our results indicate the growth of In Fig. 14, we show our measurements for the two data the average degree in our simulations follows (11). sets in separate columns. By measuring in the real and the simulated data, we show the growth of d(n) on the top [7] F.R.ChungandL.Lu.Complex graphs and networks, and of ∆(n) in the middle. Finally, at the bottom, we in- volume 107. American Mathematical Society, cludesnapshotsofthedegreedistributionsofthesimulated Providence, 2006. graphs. [8] A. Clauset, C. R. Shalizi, and M. E. Newman. Wemayconcludethatbysettingthepropervaluesofthe Power-law distributions in empirical data. SIAM model parameters, our models are capable for reproducing review, 51(4):661–703, 2009. the two main effects, increasing d(n) and decreasing ∆(n), [9] S. N. Dorogovtsev and J. Mendes. Accelerated growth as well as generating growing networks that fit well to our of networks. arXiv preprint cond-mat/0204102, 2002. real-world data. [10] S. N. Dorogovtsev and J. F. F. Mendes. Scaling behaviour of developing and decaying networks. EPL , 6. CONCLUSION 52(1):33, 2000. In growing networks, we measured that the exponent of [11] J. S. Katz. Scale-independent bibliometric indicators. the power-law degree distribution decreases in time. We Measurement: Interdisciplinary Research and connected this observation with the growth of the average Perspectives, 3(1):24–28, 2005. degree and gave models for this phenomenon. In general, [12] R. Kumar, P. Raghavan, S. Rajagopalan, we model the initial growth of information networks by a D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic random process as in an Erdo˝s-R´enyi graph. Then we ob- models for the web graph. In Foundations of servethatanorganizingrulebeginsdominatingthenetwork Computer Science, 2000. Proceedings. 41st Annual building process, driving the network towards a scale-free Symposium on, pages 57–65. IEEE, 2000. degree distribution. A model and its extended version that [13] J. Kunegis. Konect: the koblenz network collection. In arebasedonexponentialgrowthandpreferentialedgeshave Proceedings of WWW 2013, pages 1343–1350. been introduced in Section 5. This new model proposed in International World Wide Web Conferences Steering this paper yields a power-law degree distribution with de- Committee, 2013. creasing exponent in a growing graph with increasing aver- [14] H. Kwak, C. Lee, H. Park, and S. Moon. What is age degree. twitter, a social network or a news media? In Inourfutureworkweplantoextendourmodeltoincorpo- Proceedings of WWW 2010, pages 591–600. ACM, rateothernetworkbuildingmechanisms. Mostinterestingly, 2010. weintendtoincorporatetriangleclosingandvertexcopying [15] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph rules and analyze their effects on the degree distribution of evolution: Densification and shrinking diameters. the network. We also plan to analytically derive the power ACM TKDD, 1(1):2, 2007. law exponent as the function of the network size and the [16] D. Liben-Nowell and J. Kleinberg. Tracing parameters of our models. information flow on a global scale using internet chain-letter data. PNAS, 105(12):4633–4638, 2008. 7. ACKNOWLEDGMENTS [17] M. E. Newman. Power laws, pareto distributions and Thepublicationwassupportedby“BigData—Momentum” Zipf’slaw.Contemporary physics,46(5):323–351,2005. grantoftheHungarianAcademyofSciencesandthePIAC 13- [18] P. Pedarsani, D. R. Figueiredo, and M. Grossglauser. 1-2013-0197 project. Densification arising from sampling fixed graphs. In ACM SIGMETRICS Performance Evaluation Review, 8. REFERENCES volume 36, pages 205–216. ACM, 2008. [1] P. Arago´n, K. E. Kappler, A. Kaltenbrunner, [19] C.Petersen, J.G.Simonsen, andC.Lioma.Powerlaw D. Laniado, and Y. Volkovich. Communication distributions in information retrieval. ACM TOIS, dynamics in twitter during political campaigns: The 34(2):8, 2016. case of the 2011 Spanish national election. Policy & [20] E´. Ra´cz, J. Kert´esz, and Z. Eisler. A shift-optimized Internet, 5(2):183–206, 2013. Hill-type estimator. arXiv preprint arXiv:0905.3096, [2] E. Bakshy, J. M. H., W. A. Mason, and D. J. Watts. 2009. Everyone’s an influencer: quantifying influence on [21] S. Redner. Citation statistics from more than a Twitter. In Proceedings of WSDM 2011, pages 65–74. century of physical review. arXiv preprint ACM, 2011. physics/0407137, 2004. [3] A.-L. Baraba´si and R. Albert. Emergence of scaling in [22] A. Vazquez. Statistics of citation networks. arXiv random networks. Science, 286(5439):509–512, 1999. preprint cond-mat/0105031, 2001. [4] A.-L. Barabaˆsi, H. Jeong, Z. N´eda, E. Ravasz, [23] A. Va´zquez. Growing network with local rules: A. Schubert, and T. Vicsek. Evolution of the social Preferential attachment, clustering hierarchy, and network of scientific collaborations. Physica A: degree correlations. Physical Review E, 67(5):056104, Statistical mechanics and its applications, 2003. 311(3):590–614, 2002. [24] A. Va´zquez, R. Pastor-Satorras, and A. Vespignani. [5] M. Cha, H. Haddadi, F. Benevenuto, and Large-scale topological and dynamical properties of K. Gummadi. Measuring user influence in Twitter: the Internet. Physical Review E, 65(6):066130, 2002. The million follower fallacy. In ICWSM, 2010. [6] M. Cha, A. Mislove, B. Adams, and K. P. Gummadi. CharacterizingsocialcascadesinFlickr.InProceedings of the first workshop on Online social networks, pages 13–18. ACM, 2008.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.