ebook img

Scalable Bicriteria Algorithms for the Threshold Activation Problem in Online Social Networks PDF

0.55 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Scalable Bicriteria Algorithms for the Threshold Activation Problem in Online Social Networks

1 Scalable Bicriteria Algorithms for the Threshold Activation Problem in Online Social Networks Alan Kuhnle, Tianyi Pan, Md Abdul Alim, and My T. Thai Department of Computer & Information Science & Engineering University of Florida Gainesville, Florida 32611 Email: {kuhnle, tianyi, alim, thai}@cise.ufl.edu 7 Abstract—We consider the Threshold Activation Problem be met with the least expense possible. Thus, we consider 1 (TAP): given social network G and positive threshold T, find a the following threshold activation problem (TAP): given a 0 minimum-sizeseedsetAthatcantriggerexpectedactivationofat thresholdT,minimizethesizeofthesetofseedusersinorder 2 least T. We introduce the first scalable, parallelizable algorithm to activate at least T users of the network in expectation. n with performance guarantee for TAP suitable for datasets with Goyaletal.[11]providedbicriteriaperformanceguarantees a millions of nodes and edges; we exploit the bicriteria nature J of solutions to TAP to allow the user to control the running for a greedy approach to TAP based upon Monte Carlo 0 time versus accuracy of our algorithm through a parameter sampling at each iteration to select the best seed node, an 3 α ∈ (0,1): given η > 0, with probability 1−η our algorithm algorithm reminiscent of the greedy algorithm for IM in returns a solution A with expected activation greater than Kempe et al. [1]; this approach is inefficient and impractical T − 2αT, and the size of the solution A is within factor ] I 1+4αT+log(T)oftheoptimalsize.Thealgorithmrunsintime for large networks. Although TAP is related to IM, scalable S O(cid:0)α−2log(n/η)(n+m)|A|(cid:1), where n, m, refer to the number solutions that already exist for IM are unsuitable for TAP: the s. ofnodes,edgesinthenetwork.Theperformanceguaranteeholds TIM [10] and IMM [9] algorithms require knowledge of the c for the general triggering model of internal influence and also size k of the seed set ahead of time; the SKIM algorithm [3] [ incorporates external influence, provided a certain condition is foraveragereachabilityhasbeenshowntobeeffectiveforIM met on the cost-effectivity of seed selection. 1 inspecializedsettings,butitisunclearhowtoapplySKIMto v moregeneralsituationsortoTAPwhileretainingperformance 9 I. INTRODUCTION guarantees. 9 Moreover, empirical studies have shown that in the viral 7 With the growth of online social networks, viral marketing 8 where influence spreads through a social network has become marketing context, it is insufficient to consider internal prop- 0 agation alone; external activation, i.e. activations that cannot acentralresearchtopic.Usersofasocialnetworkcanactivate 1. their friends by influencing them to adopt certain behaviors be explained by the network structure, play a large role in 0 influence propagation events [12]–[15], and recent works on or products. In this context, the influence maximization (IM) 7 scalable algorithms for IM [3], [9], [10] have neglected the problem has been studied extensively [1]–[8]: given a budget, 1 consequences of external influence. For internal diffusion, : the IM problem is to find a seed set, or set of initially v two basic models have been widely adopted, the independent activatedusers,withinthebudgetthatmaximizestheexpected i cascade and linear threshold models; Kempe et al. [1] showed X influence. Much recent work [3], [4], [9], [10] has developed these two models are special cases of the triggering model, r scalable algorithms for IM that are capable of running on a a powerful, general model that has desirable properties in a social networks with millions of nodes while retaining the viral marketing context. provable guarantees on the quality of solution; namely, that Motivatedbytheaboveobservations,themaincontributions the algorithm for IM will produce a solution with expected of this work are: influence within 1−1/e of the optimal activation. However, a company with a specific target in mind may • We establish a new connection between the triggering model and a concept of generalized reachability that al- adopt a more flexible approach to its budget: instead of lowsanaturalcombinationofexternalinfluencewiththe having a fixed budget k for the seed set, it is natural to triggeringmodel.Weshowanyinstanceofthetriggering minimize the size of the seed set while activating a desired model combined with any model of external influence is thresholdT ofuserswithinthenetwork.Forexample,suppose monotone and submodular. acompanydesiresacertainlevelofexposureonsocialmedia; such exposure could boost the sales of any of its products. • We show how to use the generalized reachability frame- work to efficiently estimate the expected influence of Alternatively, suppose a profit goal T for a product must the triggering model combined with external influence, (cid:13)c2017IEEE.Personaluseofthismaterialispermitted.Permissionfrom leveraging scalable estimators of average reachability by IEEE must be obtained for all other uses, in any current or future media, Cohen et al. [3], [16], [17]. This efficient estimation includingreprinting/republishingthismaterialforadvertisingorpromotional results in a parallelizable algorithm (STAB) for TAP purposes,creatingnewcollectiveworks,orresaleorredistributiontoservers orlists,orreuseofanycopyrightedcomponentofthisworkinotherworks. with performance guarantee in terms of user-adjustable 2 trade-offs between efficiency and accuracy. The desired (u,v).IntheLTmodeleachnetworkuseruhasanassociated accuracy is input as parameter α ∈ (0,1) which de- thresholdθ(u)chosenuniformlyfrom[0,1]whichdetermines termines running time as O(cid:0)α−2log(n/η)(n+m)|A|(cid:1), how much influence (the sum of the weights of incoming where n,m are number of nodes, edges in the network, edges) is required to activate u. u becomes active if the total andAistheseedsetreturnedbySTAB.Withprobability influence from its activated neighbors exceeds the threshold 1−η, the expected activation is guaranteed to be with θ(u). 2αT of threshold T, and the size of the seed set A is These well-studied models are both examples of the Trig- guaranteed to be within factor 1 + 4αT + log(T) of gering Model, also introduced in [1]: Each node v ∈ G the optimal size. If the cost-effectivity of seed selection independentlychoosesarandom“triggeringset”T according v falls below 1, this performance bound may not hold; we to some probability distribution over subsets of its neighbors. provide a looser bound for this case. A seed set S is activated at time t = 0; a node v becomes • Throughacomprehensivesetofexperiments,wedemon- active at time t if any node in Tv is active at time t−1. strate that on large networks, STAB not only returns a Two important properties that it is desirable for a model σ better solution to TAP, but it runs faster than existing ofinfluencepropagationtosatisfyarefirstlythesubmodularity algorithms for TAP and algorithms for IM adapted to property of the expected activation function: for any sets solve TAP, often by factors of more than 103 even S,T ⊆V,σ(S∪T)+σ(S∩T)≤σ(S)+σ(T),andsecondly for the state-of-the-art IMM algorithm [9]. In addition, the monotonicity property: if S ⊆ T ⊆ V, σ(S) ≤ σ(T). we investigate the effect of varying levels of external These properties together allow a greedy approach to have influence on the solution of STAB. a performance ratio for the influence maximization problem The rest of this paper is organized as follows: in Section (IM): given k, find a seed set A of size k such that the II, we introduce models of influence, including the triggering expected activation of A is maximized. Kempe et al. [1] model and our concept of generalized reachability. We prove showed that the triggering model is both submodular and these two concepts are equivalent. In Section III, we formally monotonic. Both properties are also important in proving define TAP and first prove bicriteria guarantees in a general performance guarantees for TAP, the problem studied in this setting. Next, we employ the generalized reachability concept work and defined in Section III. to show how combination of triggering model and external Itis#P-hardtocomputetheexactinfluenceofasingleseed influence can be estimated efficiently, and we present and node under even the restricted version of IC where each edge analyze STAB, our scalable bicriteria algorithm. In Section is assigned the probability 0.5 [18]. Therefore, it is necessary IV, we analyze STAB experimentally and compare with prior to estimate the value of σ(S) by sampling the probability work. We discuss related work in Section V. distribution determined by σ. Sampling efficiently such that the estimated value σˆ(S) satisfies |σˆ(S) − σ(S)| < ε for II. MODELSOFINFLUENCE all seed sets is a difficult problem; we discuss this problem A social network can be modeled as a directed graph when σ is a combination of the triggering model and external G = (V,E), where V is the set of users and directed edges influence in Section III-B2. Because of the errors associated (u,v) ∈ E denote social connections, such as friendships, withestimatingthevalueσ(S),weintroduceslightlygeneral- betweentheusersu,v.Inthiswork,westudythepropagation ized versions of the above two properties. First, let us define of influence through a social network; for example, say a user ∆ σ(A)=σ(A∪{x})−σ(A), for any model σ, and subset x on the Twitter network posts a message to her account; this A ⊂ V. The following property is equivalent to σ satisfying message may be reposted by the friends of this user, and submodularity and monotonicity together. as such it propagates across the social network. In order to Property 1 (Submodularity and monotonicity). For all A ⊆ study such events from a theoretical standpoint, we require B, and x∈V, ∆ σ(A)≥∆ σ(B). the concept of a model of influence propagation. x x Intuitively, the idea of a model of influence propagation Let us define a σ to be ε-approximately monotonic and in a network is a way by which nodes can be activated submodular if the following property is satisfied instead: given a set of seed nodes. In this work, we use σ to denote a model of influence propagation. Such a model is usually Property 2 (ε-submodularity and monotonicity). Let ε > 0. probabilistic, and the notation σ(S) will denote the expected For all A⊆B, and x∈V, ε+∆xσ(A)≥∆xσ(B). number of activations under the model σ given seed set S ⊆ V. Kempe et al. [1] studied a variety of models in A. Triggering model from the perspective of generalized their seminal work on influence propagation on a graph, reachability including the Independent Cascade (IC) and Linear Threshold (LT) models. For completeness, we briefly describe these two Nextwedefineaclassofinfluencepropagationmodelsthat models. An instance of influence propagation on a graph G naturally generalize the notion of reachability in a directed follows the IC model if a weight can be assigned to each graph: that is, these models generalize the simple model that edge such that the propagation probabilities can be computed a node is activated by a seed set S if it is reachable from the as follows: once a node u first becomes active, it is given seed set by edges in the graph. Somewhat surprisingly, the a single chance to activate each currently inactive neighbor triggering model is equivalent to this notion of generalized v with probability proportional to the weight of the edge reachability. 3 SupposeinsteadofasingledirectedgraphG,wehaveaset 1) Submodularity of σ ⊕σ : In this subsection, we trig ext ofdirectedgraphs{(G =(V,E ),p )}l onthesamevertex establish the submodularity and monotonicity of Φ≡σ ⊕ i i i i=1 trig set V, and associated probabilities such that (cid:80)l p = 1. σ . First, we require the following proposition. i=1 i ext Thendefineaninfluencepropagationmodelσinthefollowing Proposition 2. Let σ be a submodular, monotone-increasing way: when a seed set S is activated, graph G is chosen with i model of influence propagation. Define σ (T) to be the ex- probability p . Then, influence is allowed to propagate from A i pectedinfluenceofseedingsetT whenAisalreadyactivated; seed set S in G according to the directed edges of G . Let i i thatisσ (T)=σ(A∪T).Then,foranyA,σ issubmodular σ (S) be defined as the number of vertices reachable from S A A i and monotone increasing. in G . Then the expected activation of a seed set S is given i by σ(S) = (cid:80)li=1piσi(S). We will term a model σ of this Proof: Let T1,T2 ⊆ V. We prove submodularity only, form a model of generalized reachability, since it generalizes monotonicity is similar. thenotionofsimplereachabilityonadirectedgraph.Wehave σ (T ∪T )+σ (T ∩T ) the following important proposition: A 1 2 A 1 2 =σ(T ∪T ∪A)+σ(T ∩T ∪A) 1 2 1 2 Proposition 1. Generalized reachability is equivalent to the =σ((T ∪A)∪(T ∪A))+σ((T ∪A)∩(T ∪A)) Triggering Model. 1 2 1 2 ≤σ(A∪T )+σ(A∪T ) 1 2 Proof: Suppose we have an instance σ of the triggering =σ (T )+σ (T ). model. For each node v, triggering set T is chosen indepen- A 1 A 2 v dently with probability p(T ). If all nodes choose a triggering v set, then define graph H for this choice by adding directed T Theorem1. Letσ beanyinstanceoftheTriggeringModel, edges (u,v) for each u ∈ T . Assign graph H probability trig v T and σ be any instance of external influence. Then, Φ ≡ (cid:81) p(T ) and we have instance of generalized reachability ext v v σ ⊕σ is submodular and monotonic. with the same expected activation. trig ext Conversely, suppose σ is an instance of generalized reach- Proof: For any A ⊆ V, let pA be the probability that ability. For v ∈V, let Tv be any subset of nodes excluding v A is activated via σext, the external influence. Then, Φ(S)= (cid:80) itself. Assign p(Tv) to be the sum of the probabilities of the A⊂V pAσtrig(A∪S).ByProp.2,σtrig,A(S)=σtrig(A∪S) graphs in which the in-neighbors of v are exactly the set T . forfixedAismonotoneandsubmodular.Sinceanon-negative v Then we have an instance of the triggering model. combination of submodular and monotonic set functions is also submodular and monotonic, the result follows. B. External influence III. THRESHOLDACTIVATIONPROBLEM In this section, we outline our model of external influence The framework has been established to define the problem σ in a social network. We wish to capture the idea that ext we consider in this work. We suppose a company wants to users in the network may be activated by a source external minimize the number of seed users while expecting a certain to the network; that is, these activations do not occur through level of activation in the social network. Formally, we have friendshipsorconnectionswithinthesocialnetwork.Themost general model of external influence is simply an arbitrary Problem 1 (Threshold activation problem (TAP)). Let G = probability distribution on the set of subsets of nodes. That (V,E) be a social network, n = |V|. Given influence propa- is, for any S ⊂V, there is a probability p that S is activated gation model σ and threshold T such that 0≤T ≤n, find a S fromanexternalsource.Inthiswork,weadoptthismodeland subset A⊆V such that |A| is minimized with σ(A)≥T. denote such a model of external influence by σ . In order ext First, we consider performance guarantees for a greedy to consider both external and internal influence in our social approach to TAP with only the assumption that the influence networks, we next define the concept of combining models of propagation σ is ε-submodular. We do not discuss how to influence together. sample σ for these results. Subsequently, we specialize to the Definition 1 (Combination of σ ,σ ). Let σ and σ be two case when σ is the the estimated value of the combination of 1 2 1 2 models of influence propagation. We define the combination the triggering model and external influence, which is approxi- σ of these two models in the following way: At any timestep matelysubmodularuptotheerrorofestimation,andweshow t,ifsetA ⊂V isactivatednewnodesmaybeactivatedfrom how to efficiently estimate to a desired accuracy in Sections t A accordingtoeitherσ orσ ;thatis,ifinthenexttimestep III-B2 and III-B3b. We detail the algorithm STAB in Section t 1 2 σ activatesA1 andσ activatesA2 ,thenA ≡A1 ∪ III-B1, a scalable algorithm with performance guarantees for 1 t+1 2 t+1 t+1 t+1 A2 . We denote the combination model as σ ⊕σ and write TAP utilizing this estimation and analysis. t+1 1 2 σ1 ⊕σ2(A) for the expected number of activations resulting A. Results when σ is ε-submodular from seeding A under this model. We analyze the greedy algorithm to solve TAP, which adds In this work, we are most interested in combining external anodethatmaximizesthemarginalgainateachiterationtoits activationwiththetriggeringmodel.Thatis,ifσ isamodel chosenseedsetAuntilσ(A)≥T,atwhichpointitterminates. ext ofexternalinfluenceasdefinedabove,andσ isaninstance One might imagine in analogy to the set cover problem that trig of the triggering model, then we consider σ ⊕σ . there would be a logn-approximation ratio for the greedy trig ext 4 algorithmforTAP–however,thisresultonlyholdsforintegral 1) Approximationratios: Next,weconsiderwaysinwhich submodularfunctions[11].Wenextgiveabicriteriaresultfor the bicriteria guarantees of Theorem 2 can be improved. In the greedy algorithm that incorporates the error ε inherent in viral marketing, we may suppose a company seeks to choose ε-approximate submodularity into its bounds. In this context, athresholdT suchthatthemarginalgaintoreachT isalways a bicriteria guarantee means that the algorithm is allowed to at least 1; seeding nodes with a marginal gain of less than 1 violate the constraints of the problem by a specified amount, would be cost-ineffective. In other words, it would cost more and also to approximate the solution to the problem. In the toseedanodethanthebenefitobtainedfromseedingit.There viral marketing context, this means that we may not activate is little point in activating T users if the marginal gain drops the intended threshold T of users, but we will guarantee to toolow;intuitively,thecompanyhasalreadyactivatedasmany activateanumberclosetoT.Furthermore,wewillnotachieve as it cost-effectively can. a solution of minimum size, but there is a guarantee on how We term this assumption the cost-effectivity assumption large the solution returned could be. (CEA): In an instance of TAP, if B ⊆V such that σ(B)<T, there always exists a node u such that ∆ σ(B) ≥ 1. Under u Theorem 2. Consider the TAP problem for σ when σ is ε- CEA,thegreedyalgorithminTheorem2wouldbeanapprox- approximately submodular. Then the greedy algorithm that imationalgorithm;thatis,itwouldensureσ(A)≥T,withthe terminates when the marginal gain is less than 1 returns a solution A of size within factor 1 + ε + log(cid:0) T (cid:1) of same bound on solution size as stated in the theorem. To see OPT this fact, notice that once inequality (4) above is satisfied, the the optimal solution size, OPT, and the solution A satisfies algorithm must add at most o(1+ε) more elements before σ(A)≥T −(1+ε)·OPT. σ(A)≥T, by CEA. Proof:LetA ={a ,...,a }bethegreedysolutionafter More generally, each node v ∈ V has an associated i 1 i i iterations, and let Ag be the final solution returned by the reticence rv ∈ [0,1]; rv is the probability that v will remain greedy algorithm. Let o = OPT be the size of an optimal inactive even if all of the neighbors of v are activated. Then solution C ={c ,...,c } satisfying σ(C)≥T. Then we have the following theorem, whose proof is analogous to 1 o Theorem 2. T −σ(A )≤σ(A ∪C)−σ(A ) i i i Theorem 3. Let σ be ε-approximately submodular and let =(cid:88)o ∆ σ(A ∪{c ,...,c }) r∗ =minv∈V rv. Suppose r∗ >0. Then the greedy algorithm cj i 1 j−1 forTAPisanapproximationalgorithmwhichreturnssolution j=1 A within factor 1+ε +log(cid:0) T (cid:1) of optimal size. o r∗ OPT (cid:88) ≤ ∆ σ(A )+oε (by Property 2) cj i B. Scalable bicriteria algorithm for σ ⊕σ j=1 int ext ≤o·[σ(A )−σ(A )+ε]. (1) Inthissubsection,wedetailthescalablebicriteriaAlgorithm i+1 i 1 for TAP when the propagation is given by an instance of Therefore, T −σ(Ai+1)−ε≤(cid:0)1− 1o(cid:1)(T −σ(Ai)). Then the triggering model in the presence of external influence; that is, when σ = σ ⊕ σ . We describe our scalable trig ext (cid:18) 1(cid:19)i (cid:88)i−1(cid:18) 1(cid:19)j algorithm STAB first and then discuss the necessary sampling T −σ(A )≤T 1− +ε 1− i o o and estimation techniques in the subsequent sections. j=0 1) Description of algorithm: As input, the algorithm takes (cid:18) 1(cid:19)i a graph G = (V,E) representing a social network, an exter- ≤T 1− +εo. (2) o nal influence model σ , internal influence model σ , an ext trig instance of the triggering model, and the desired threshold of Fromhere,thereexistsanisuchthatthefollowingdifferences activated users T. In addition, the user specifies the fractional satisfy error α, on which the running time of the algorithm and the accuracy of the solution depend. Using α, in line 1 the T −σ(A )≥o(1+ε), and (3) i algorithm first determines (cid:96), the number of graph samples it T −σ(Ai+1)<o(1+ε), (4) requires according to Section III-B2. Next in the for loop on line 2, the algorithm constructs Thus, by inequalities (2) and (3), o ≤ T exp(cid:0)−i(cid:1), and a collection of oracles which will be used to estimate the i ≤ olog(cid:0)T(cid:1). By inequality (4) and the assumoption on average reachability in the sampled graphs H , which is used o i the termination of the algorithm, the greedy algorithm adds to approximate the expected influence. Each graph H is i at most o(1 + ε) more elements, so g ≤ i + o(1 + ε) ≤ needed only while updating the oracle collection in iteration o(cid:0)1+ε+log(cid:0)T(cid:1)(cid:1). Finally, if the algorithm terminates be- i; once this step is completed, H may be safely discarded. o i fore σ(Ag)≥T, then the marginal gain is less than 1. Hence, Since the samples are independent, this process is completely by (1), σ(A)≥T −(1+ε)o. parallelizable. Notice that the above argument requires only that σ is a Once the set of oracles has been constructed, a greedy ε-submodularsetfunction;inparticular,itdidnotusethefact algorithm is performed in attempt to satisfy the threshold T that σ represents expected influence propagation on a social of expected activation with a minimum-sized seed set. The network. estimation in line 11 may be done in one of two ways: using 5 estimator C1 or C2; both are described in detail in Section Algorithm 1: Scalable TAP Algorithm with Bicriteria III-B3b. The estimator chosen has a strong effect on both the guarantees (STAB) runningtimeandperformanceofthealgorithm:giventhesame Data: G,σ ,σ ,T,α,δ trig ext oracles, C1 can be computed in time O(k), and in practice Result: Seed set A is much faster to compute than C2. However, the quality of 1 Choose (cid:96)≥log(2/δ)/α2, k ≥(2+c)logn(αT)−2 as C1 degrades with the size of seed set. On the other hand, discussed in Section III-B2; C2 takes time O(k|B|log|B|) time to compute, where B is 2 for i=1 to (cid:96) do the seed set for which the average reachability is estimated; 3 Sample graph Gi from σtrig; our experimental results show that the quality of C2 is vastly 4 Sample external seed set Aeixt for σext; superior to C1 for larger seed set sizes |B|; however its 5 Construct Hi =Gi−σi(Aeixt), and store the value of running time increases. |σ (Aext)|, as described in Section III-B3a; i i Thisalgorithmachievesthefollowingguaranteesonperfor- 6 Update the oracle collection {Xu :u∈V} for τˆ mance: according to H as in Section III-B3a; i Theorem 4. Suppose we have an instance of TAP with σ = 7 end σtrig⊕σext andthatT hasbeenchosensuchthatCEAholds. 8 A≡∅, O ≡(cid:80)(cid:96)i=1|σi(Aeixt)|/(cid:96); Then, if η > 0, by choosing δ = η/n3, the solution 9 while σˆ(A)<T −αT do returned by Alg. 1 satisfies the following two conditions, with 10 for u∈V do probability at least 1−η: 11 Estimate ∆u ={τˆ(A∪{u})−τˆ(A)} using one of the estimators C1 or C2 as described in 1) σ(A)≥T −2αT Sections III-B3b, III-B3d; 2) If A∗ is an optimal solution satisfying σ(A∗) ≥ T, |A| ≤1+4αT +log(T). 12 end |A∗| 13 A=A∪{u∗}, where u∗ maximizes ∆u; If Assumption CEA is violated, the algorithm can detect this 14 Compute σˆ(A)=τˆ(A)+O by Lemma 2; violation by terminating when the marginal gain drops below 15 end 1. In this case, bound 1 above becomes σ(A) ≥ T −(1+ ε)|A∗|. Using estimator C1 for average reachability in line 11 yieldsrunningtimeO(cid:0)log(n/η)(n+m)|A|/α2(cid:1).Ifestimator is input to Alg. 1). C2 is used, a factor of |A|log|A| is multiplied by this bound. Theorem 5 (Hoeffding’s inequality). Let X ,...,X be in- 1 (cid:96) Proof: Let A∗ be a seed set of minimum size satisfying dependent random variables in [0,T]. Let X be the empirical σ(A∗)≥T.Then,asdiscussedinSectionIII-B2,ifδ =η/n3, mean of these variables, X = 1(cid:80)(cid:96) X . Then for any t then with probability at least 1 − η, A∗ satisfies σˆ(A∗) ≥ Pr(cid:0)(cid:12)(cid:12)X−E[X](cid:12)(cid:12)≥t(cid:1)≤2exp(cid:16)−(cid:96)2(cid:96)t2i=(cid:17)1. i T − αT by the choice of (cid:96) in Alg. 1, and the analysis in T2 Section III-B2. Hence |A∗| ≥ |B∗|, where B∗ is a set of Since σ is an instance of the triggering model, by trig minimum size satisfying σˆ(B∗)≥T −αT. Theorem 1, there exists a set {(G ,p ) : j ∈ J} such that j j Notice σˆ is 4αT-approximately submodular on the sets for any seed set A, σ (A) = (cid:80) p σ (A), where σ trig j∈J j j j considered by the greedy algorithm with probability at least is simply the size of the set of nodes reachable from A in 1−η. By Section III-A1, in this case, the solution A returned G . Thus, if A is fixed, then by taking independent samples j byAlg.1satisfies|A|≤(1+4αT+logT)|B∗|≤(1+4αT+ of graphs from the probability distribution on {G }, we get j logT)|A∗|. Furthermore, σ(A) ≥ σˆ(A) − α ≥ T − 2αT, independent samples of σ (A). i if CEA holds, otherwise the alternate bound follows from In the general model of external influence presented above, Theorem 2. every seed set B ⊂V has a probability p of being activated B Next, we consider the running time of Alg. 1. Let m be by the external influence. By sampling from this distribution the number of edges that have a nonzero probability to exist on subsets of nodes, and independently sampling as above in one of the reachability instances (m is at most the number from σ , σ(A) could be computed exactly in the following trig of edges in input graph G). Lines 2 and 3 clearly take time way: σ(A) = (cid:80) (cid:80) p p σ (A∪B). In most cases, B⊂V j∈J j B j (cid:96)(n+m).ByCohenetal.[3],line4takestimeO(k(cid:96)m).The thissumcannotbecomputedinpolynomialtime,andcertainly while loop on line 9 executes exactly |A| times, and the for it has Ω(2n) summands. Accordingly, we estimate its value loop on line 10 requires time O(kn) if estimator C1 is used, by independently sampling (cid:96) externally activated sets {Aj } ext andtimeO(k|A|log|A|n)ifestimatorC2bySectionIII-B3b. fromσ ;wealsoindependentlysample(cid:96)reachabilitygraphs ext Bythechoicesofk and(cid:96)online1,wehavethetotalrunning G according to σ and estimate by averaging the size of j trig tim2e)sEbsotuimndaetidonasosftσated.⊕σ (A): Let σ be a model of the reachability from a seed set A in this context: σ(A) ≈ internal influence protpriaggationex,twhich in thtirsigsection will be σˆ(A)≡ 1(cid:96) (cid:80)(cid:96)j=1σj(cid:0)A∪Aejxt(cid:1).Toestimateσ(A)inthisway an instance of the triggering model. Let σ be the model of withinerrorαT withprobabilityatleast1−δfromHoeffding’s ext external influence activation, and let σ = σ ⊕σ be the inequality we require (cid:96)≥ log(2/δ) such samples. trig ext 2α2 combinationofthetwoasdefinedabove.Weusethefollowing Now, in our analysis of the greedy algorithm in Theorem 2 versionofHoeffding’sinequality(T willbethethresholdthat only at most n3 sets were considered. All that is required for 6 the analysis to be correct as it that those sets were estimated (5). For convenience, we refer to this method of estimation in within the error αT; if δ < η/n3, where η < 1, then by the the rest of the paper as method C1. union bound, with probability at least 1−η, the analysis for Each pair (v,i) ∈ V ×{1,...,(cid:96)}, consisting of a node v Theorem2holds.Inpractice,wewereabletogetgoodresults in graph H , is assigned an independent, random rank value i with much higher values of δ, see Section IV. r(i) uniformly chosen on the interval [0,1]. The reachability v 3) Estimationofσˆ: Intheprevioussection,wedescribean sketch for a set A is defined as follows: let k be an integer, (cid:16) (cid:17) approach to estimate σ(A), based upon independent samples and consider X = bottom-k {r(i)|(v,i)∈τ (A)} , where A v i {G } of graphs from the triggering model, and independent i bottom-k(S) means to take the k smallest values of the set samples of externally activated nodes {Aeixt}. Next, we need S, and R(i) is the set of nodes reachable from A in graph to compute the value of the estimator σˆ(A). One method A H . The threshold rank of a set A⊂V is then defined to be i would be to compute it directly using breadth-first search γ = maxX , if |X | = k, and γ = 1 if |X | < k. The from the sets A∪Aext in each graph G . This method would A A A A A i i estimator for τ(A) is then τˆ(A)=(k−1)/((cid:96)γA) unfortunately add a factor of Ω(n+m) to the running time, If k =(2+c)((cid:15))−2logn, the probability that this estimator which would result in a running time of Alg. 1 of Ω(n2), too has error greater than (cid:15) is at most 1/nc [3]. The bounds on k large for our purposes. Thus, we would like to take advantage needed for the proof of Theorem 4 are determined by taking of estimators formulated by Cohen et al. [3] for the average (cid:15)=αT. reachability of a seed set over a set of graphs. However, c) Computation of X : First, we compute X for all because the external seed sets Aext for each graph G vary A u i i u ∈ V using Algorithm 2 of [3] in time O(k(cid:96)m), where m with i, we must first convert the problem into an average is the maximum number of edges in any H . This collection reachability format. i {X :u∈V} is referred to as oracles. a) Conversion to an average reachability problem: u Next, we discuss how to compute, for an arbitrary node Suppose we have sampled as above (cid:96) pairs of sample graphs u, X given that X has already been computed. This and external seeds: {(G ,Aext),...,(G ,Aext)} To compute A∪{u} A σˆ(A) = 1(cid:80)(cid:96) σ (A1∪ A1ext) efficien(cid:96)tly,(cid:96)we first convert computation will take time O(k) and is necessary for the (cid:96) i=1 i i bicriteria algorithm: given that X and X are both sorted, this sum to a generalized reachability problem: we construct u A we compute X by merging these two sets together until graphH fromG byremovingallnodes(andincidentedges) A∪{u} i i the size of the new set reaches k values. reachable from Aext: H = G − σ (Aext). The average i i i i i d) Estimation of τ(A), method C2: Alternatively to reachability of a set A in the graphs {H }, which we term i estimator C1, we can estimate τ(A) from the oracles τ(A), is {X : u ∈ V} in the following way. Let γ be the (cid:96) u u 1(cid:88) threshold rank as defined above for X . Then τˆ(A) = τ(A)= τ (A), (5) u (cid:96) i=1 i 1(cid:96) (cid:80)z∈(cid:83)v∈AXv\{γv} maxu∈A|z∈1Xu\{γu}γu. For convenience, we refer to the estimator in the rest of the paper as estimator whereτi(A)isthesizeofthesetreachablefromAinHi.The C2; it was originally introduced in [17]. estimators formulated by Cohen et al. are suitable to estimate the value of τ, and the following two lemmas show how we can compute σˆ(A) from τ(A). IV. EXPERIMENTALEVALUATION Lemma 1. The size of the reachable set from A in Hi can In this section, we demonstrate the scalability of STAB as be computed from reachability σi in Gi as follows: τi(A) = compared with the current state-of-the-art IMM algorithm [9] σi(A∪Aeixt)−σi(Aeixt). and with the greedy algorithm for TAP in Goyal et al. [11]. ThemethodologyisdescribedinSectionIV-A,comparisonto Proof: Suppose, in G , x is reachable from A, but not i from Aext. This is true iff there exists a path from A to x in existingIMalgorithmsisinSectionIV-B,andinvestigationof i G avoidingσ (Aext),whichisequivalenttothepathexisting the effect of external influence on the performance of STAB i i i is in Section IV-C. All experiments were run on an Intel(R) in in H . i Core(TM)[email protected]. The next lemma shows explicitly how to get σˆ(A) from τ(A): Lemma 2. τ(A)=σˆ(A)− 1(cid:80)(cid:96) σ (Aext). A. Datasets and framework (cid:96) i=1 i i Proof:ThisstatementfollowsdirectlyfromLemma1and We evaluated the following algorithms in our experiments. the definitions of τˆ, σˆ. 1) CELF: the greedy algorithm by naive sampling of In Alg. 1, for each i, σi(Aeixt) is computed in the con- Kempe et al. [1] can be modified to find a solution to TAP, as struction of Hi; its size can be stored as instructed on line 5, shown in Goyal et al. [11]. The modified algorithm performs and used in the computation of O for line 14 in the greedy MonteCarlo samplingateachstep toselectthe nodewiththe algorithm’s stopping criterion. highestmarginalgainintotheseedsetuntilthethresholdT is b) Estimation of τ(A), method C1: In this subsection, satisfied. The Cost-Effective Lazy Forward (CELF) approach we utilize methods developed by Cohen et al. [3] to estimate by Leskovec et al. [19] improves the running time of this efficiently the average reachability problem τ(A) defined in algorithm by reducing the number of evaluations required. 7 2) IMM: The IMM algorithm [9] is the current state-of- model that each node u is activated externally independently the-art algorithm to solve the IM problem, where the number with uniformly chosen probability ep ∈[0,ep ]. The set- u max of seeds k is input. Since TAP asks to minimize the number ting of ep is discussed in the context of each experiment. max of seeds, this algorithm cannot be applied directly. For the Unless otherwise stated, we set δ = 0.01, which we found purposeofcomparisontoourmethods,weadaptthealgorithm sufficienttoreturnasolutionwithintheguaranteeprovidedin byperformingabinarysearchonkintheinterval[1,T],where Theorem 4 in most cases with estimator C1, and in all cases T is the threshold given in TAP. At each stage of the search, with estimator C2. IMM utilizes the value of k in question until the minimum k as estimated by IMM is found. Since binary search can B. Performance comparison and demonstration of scalability identify the minimum in at most logT iterations, we chose this approach over starting at k =1 and incrementing by one In this section, we compare the performance of STAB-C1 until the minimum is found, which in the worst case would and STAB-C2 to CELF and IMM as described above; we require T iterations. experimented on the datasets listed in Table I, where the total 3) STAB: theSTABalgorithm(Alg.1)usingestimatorsC1 CPU time required to compute the oracles is shown. The andC2,referredtoasSTAB-C1,STAB-C2respectively.Since oracle computation is parallelizable, and in our experiments thesearegreedyalgorithmswithanapproximatelysubmodular we used 7 threads of computation. The oracles for each value function,wealsousetheCELFapproachtoreducethenumber of α were computed once and stored, and thereafter when of evaluations performed by STAB-C2; for STAB-C1, we running STAB the oracles were simply read from a file. The found this optimization unnecessary. running times we report for the various versions of STAB Network topologies: We generated networks according to do not include the oracle computation time unless otherwise the Erdos-Renyi (ER) and Barabasi-Albert (BA) models. For specified.AlsoinTableIarethevaluesofip usedforeach max ER random graphs, we used varying number of nodes n; the dataset in this set of experiments. Since IMM and CELF do independent probability that an edge exists is set as p=2/n. notconsiderexternalinfluence,theexperimentsinthissection The BA model was used to generate scale-free synthesized had no external influence; that is, ep =0 for all datasets. max graphs;theexponentinthepowerlawdegreedistributionwas To evaluate the seed set returned by the algorithms we used set at −3 for all BA graphs. the average activation from 10000 independent Monte Carlo The following topologies of real networks collected by samples. the Stanford Network Analysis Project [20] were utilized: 1) We show typical results in Fig. 1 on the following four Facebook,asectionoftheFacebooknetwork,withn=4039, datasets: ER 1000 (n = 1000), Nethept, Youtube, and Wik- m = 88234; 2) Nethept, high energy physics collaboration italk. The first row of Fig. 1 shows the expected activation, network, n = 15,229, m = 62,752; 3) Slashdot, social normalized by the threshold value T, of the seed set returned network with n = 77,360, m = 828,161; 4) Youtube, by each algorithm plotted against T. Thus, a value of 1 from the Youtube online social network, n = 1,134,890, indicates the algorithm successfully achieved the threshold of m = 2,987,624; and 5) Wikitalk, the Wikipedia talk (com- activation. The second row of the figure shows the running munication) network, n=2,394,385, m=5,021,419. timeinminutesofeachalgorithmagainstT,andthethirdrow shows the size of the seed set returned by each algorithm. α=0.1 α=0.2 α=0.3 ipmax For the ER 1000 network results are shown in the first ER1000 1.0 0.4 0.2 1 columnofFig.1;thisdatasetwastheonlyoneonwhichCELF BA15000 6.4 1.6 0.9 0.1 Nethept 954 230 102 1 finished under the time limit of 60 minutes. We see that IMM Slashdot 25 6.4 3.3 0.01 isconsistentlyreturningalargerseedsetthanSTABandCELF Youtube 385 91 41 0.01 and overshooting the threshold value in activation by as much Wikitalk 1704 274 122 0.01 asafactorof2.5.Thus,ithaspoorperformanceofminimizing TABLEI the size k of the seed set for TAP. This behavior appears to ORACLECOMPUTATIONTIME(SEC) be a result of IMM underestimating the influence of its seed setinternally.Asexpected,CELFperformsverywellinterms Models of internal and external influence: In all experi- of the size of the seed set and meeting the threshold T, but ments, we used the independent cascade model for internal is running on a timescale larger than the other algorithms by influence propagation: each edge e in the graph is assigned factor of at least 100 and as large as 104. Notice that STAB- a uniform probability ip ∈ [0,ip ]. For synthesized net- C2 with α=0.1 has virtually identical activation and size of e max works, we usually set ip = 1; for real networks, we seed set to CELF, while running at a much faster speed; as max observed that if ip = 1, then in most cases a single expected, the quality of solution of STAB deteriorates as α is max node can activate a large fraction of the network (often more raised, but the running time decreases drastically. In addition, than33%).However,alargenumberofempiricalstudieshave noticethatnoneoftheversionsofSTABseedtoomanynodes confirmedthatmostactivationeventsoccurwithinafewhops and overshoot the threshold as IMM does; instead, STAB ofaseednode[12]–[14];theseworksindicatethatip =1 errs on the side of seeding too few nodes and only partially max is not a realistic parameter value for the IC model. Therefore, achieving the threshold T of activation. Finally, it is evident we also ran experiments using lower values for ip . that STAB-C1 runs faster than STAB-C2 and has a similar max For external influence, we adopted in all experiments the amount of activation when the seed set required is relatively 8 (a) ER,activation (b) Nethept,activation (c) Youtube,activation (d) Wikitalk,activation (e) ER,runningtime (f) Nethept,runningtime (g) Youtube,runningtime (h) Wikitalk,runningtime (i) ER,numberofseeds (j) Nethept,numberofseeds (k) Youtube,numberofseeds (l) Legend Fig. 1. First row: expected activation normalized by threshold T versus T. Second row: running time in minutes versus T. Third row: size of seed set versusT. small. However, the larger the seed set required, the farther demonstratedasthemostpreciseversion,STAB-C2withα= is STAB-C1 from achieving the threshold T while STAB-C2 0.1, runs faster than IMM by as much as a factor of 50. On does not suffer from this drawback. ourlargestdataset,theWikitalknetwork,IMMagainexceeded The results from Nethept are shown in the second column 32 GB memory after T =6000; notice that the running time of Fig. 1; with n = 15,229, CELF was unable to run within of STAB in all cases is under 2 minutes and STAB-C2 with the time limit, and IMM was able to complete its binary α=0.3maintainedactivationgreaterthan0.6T whilerunning search only for T ≤ 7000; for the higher threshold values, in less than 5 seconds, as shown in Fig. 1(h). With inclusion IMM exceeded the 32 GB memory usage limit imposed in of the parallelizable and reusable oracle computation time of our setup. On this network, STAB-C2 again exhibits the best 122 seconds from Table I, the total time taken by STAB-C2 performance. Despite an initial decrease in activation relative is less than 3 minutes. The total running time for α=0.1 of to T shown in Fig. 1(b), the trend reverses around T =8000 STAB-C2 including the oracles at T =10000 is less than 30 andseedsetsize100andthealgorithmgetsclosertoachieving minutes. the threshold. This behavior is explained by lower coefficient Choice of α: The above discussion demonstrates that α = of variation of estimator C2 as analyzed in [17]; the CV 0.1 provides the close activation to the threshold T while (cid:112) can be lower than estimator C1 by up to a factor of |A|, maintaining high scalability. If faster running time is desired, where A is the seed set. In stark contrast, STAB-C1 was αmaybeincreased,whichresultsinalossofaccuracyshown unable to proceed past T = 8000 even for α = 0.1 because clearlyinFig.1(d),forα∈{0.1,0.2,0.3}.Ontheotherhand, the estimator C1 appears to lose accuracy as the number of if activation closer to T is required, smaller values of α may seeds increases. By this mechanism, STAB-C1 had achieved be used at higher computational cost. its maximum estimation of influence of any set and thereby C. External influence could not increase it before reaching an estimated activation of T, for T >8000. In this section, we analyze the performance of STAB when Next, the Youtube network is shown in the third column external influence is present in the network; that is, when of Fig. 1. As in the ER network, IMM is underestimating the ep > 0. For this section, we considered a BA network max influence of its seed set and thereby picking too many seeds with 100,000 nodes, with a threshold of T = 1000, and the and overshooting the threshold, as shown in Fig. 1(c); IMM FacebooknetworkwithT =1500.Resultsonothertopologies picks nearly twice as many seed nodes as STAB-C1, α=0.1, were qualitatively similar. For all experiments in this section, as shown in Fig. 1(k). In Fig. 1(g), the scalability of STAB is we set α=0.1. 9 guarantees; Goyal et al. [11] studied the TAP problem with monotonic and submodular models of influence propagation; their bicriteria guarantees differ from ours, and provide no method of efficient sampling required for scalability. Chen et al.[23]consideredexternalinfluenceinaviralmarketingcon- text. However, their model of external influence is much less generalthanours.Furthermore,theyrestrictexternalinfluence to only pass through seed consumers and have no discussion (a) BA100,000 (b) Facebook of sampling, scalability, or the TAP problem. Nguyen et al. Fig.2. Theeffectofexternalinfluenceonactivationandsizeofseedsetof [24], [25] studied methods to restrain propagation in social STAB-C2. networks. VI. CONCLUSION In 2(a), we plot the expected activation (Act) of the seed set returned by STAB-C2 normalized by the threshold T, as We establish equivalency between the triggering model ep varies from 0 to 0.006. As in the previous section, the and generalized reachability, which allows incorporation of max expected activation of the returned seed set A is estimated by external influence into our efficient sampling techniques. We independentMonteCarlosampling.Wealsoplottheexpected gaveprecisetrade-offbetweenaccuracyandrunningtimewith fraction of T activated by the external influence, along with a bound on the number of samples required to solve TAP. the size of the seed set returned by STAB-C2, normalized by Our algorithm is highly scalable and outperforms adaptations its maximum value. As the effect of external influence in the to TAP of the current state-of-the-art algorithm for the IM network increases, the algorithm requires fewer seed nodes problem. to ensure the expected activation is within the specified error ACKNOWLEDGMENT tolerance to threshold T. This work is supported in part by the NSF grant #CCF- In 2(b), we show an analogous plot for the Facebook 1422116. network; interestingly, the size of the seed set chosen by STAB-C2 nearly doubles as ep increase from 0 to 0.01, max REFERENCES before beginning to decrease to 0. This increase differs from [1] DavidKempe,JonKleinberg,andE´vaTardos. Maximizingthespread what we expected; it is counterintuitive that the algorithm ofinfluencethroughasocialnetwork. InProceedingsofthe9thACM would require more seeds to reach the threshold as external SIGKDD International Conference on Knowledge Discovery and Data Mining,pages137–146,2003. influence increases. One possible explanation for this effect [2] ThangNDinh,HuiyuanZhang,DzungTNguyen,andMyTThai.Cost- is that the external influence both decreases and distributes effectiveviralmarketingfortime-criticalcampaignsinlarge-scalesocial networks. IEEE/ACMTransactionsonNetworking(TON),22(6):2001– the marginal gain more evenly among seed nodes, so that 2011,2014. the greedy algorithm has a more difficult time identifying the [3] Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F. Werneck. Sketch-basedInfluenceMaximizationandComputation:Scalingupwith best seed nodes, especially in the presence of the error of Guarantees. InProceedingsofthe23rdACMInternationalConference estimation. onConferenceonInformationandKnowledgeManagement,pages629– 638,2014. [4] ChristianBorgs,MichaelBrautbar,JenniferChayes,andBrendanLucier. V. RELATEDWORK MaximizingSocialInfluenceinNearlyOptimalTime.InProceedingsof theTwenty-FifthAnnualACM-SIAMSymposiumonDiscreteAlgorithms, Kempe et al. introduced the triggering model in a seminal pages946–957,2014. [5] HuiyuanZhang,ThangNDinh,andMyTThai.Maximizingthespread work on the IM problem [1], where they exhibited a Monte ofpositiveinfluenceinonlinesocialnetworks.InDistributedComputing Carlogreedysamplingalgorithmthatachieves1−1/eperfor- Systems(ICDCS),2013IEEE33rdInternationalConferenceon,pages 317–326.IEEE,2013. manceratioforIM;thisalgorithm,althoughitrunsinpolyno- [6] HuiyuanZhang,DungTNguyen,HuilingZhang,andMyTThai.Least mial time, is very inefficient and cannot scale well. Since the costinfluencemaximizationacrossmultiplesocialnetworks.IEEE/ACM TransactionsonNetworking,24(2):929–939,2016. maximum coverage problem prohibits performance guarantee [7] Thang N Dinh, Dung T Nguyen, and My T Thai. Cheap, easy, and better than 1−1/e+o(1) under standard assumptions, the massivelyeffectiveviralmarketinginsocialnetworks:truthorfiction? In Proceedings of the 23rd ACM conference on Hypertext and social ratio for IM likely cannot be improved, but much work has media,pages165–174.ACM,2012. improved the scalability of the algorithm while retaining the [8] Huiyuan Zhang, Huiling Zhang, Alan Kuhnle, and My T Thai. Profit MaximizationforMultipleProductsinOnlineSocialNetworks.InIEEE guarantee.Leskovecetal.[19]introducedtheCELFmethodto InternationalConferenceonComputerCommunications,2016. [9] YouzeTang. InfluenceMaximizationinNear-LinearTime:AMartin- exploit submodularity and improve the running time. Reverse galeApproach.InProceedingsofthe2015ACMSIGMODInternational Influence Sampling (RIS) was introduced in [4] to further ConferenceonManagementofData,pages1539–1554,2015. [10] YouzeTang,XiaokuiXiao,andYanchenShi. Influencemaximization: improvethegreedyperformance;algorithmsusingRISinclude Near-optimaltimecomplexitymeetspracticalefficiency.InProceedings [4], [9], [10], [21], [22]; the current state-of-the-art are the ofthe2014ACMSIGMODInternationalConferenceonMangementof Data,pages75–86,2014. SSA [21] and the IMM algorithm [9] to which we compare [11] Amit Goyal, Francesco Bonchi, Laks V S Lakshmanan, and Suresh in this paper. Cohen et al. [3] introduced a methodology for a Venkatasubramanian. On minimizing budget and time in influence propagationoversocialnetworks. SocialNetworkAnalysisandMining, highly scalable IM algorithm SKIM; in this work, we extend 3(2):179–192,2013. thismethodologytosolvetheTAPproblemwiththetriggering [12] SharadGoel,AshtonAnderson,JakeHofman,andDuncanJWatts.The StructuralViralityofOnlineDiffusion.ManagementScience,62(1):180– model and external influence. 196,2016. As compared with IM, much less effort has been devoted [13] Sandra Gonza´lez-Bailo´n, Javier Borge-Holthoefer, and Yamir Moreno. BroadcastersandHiddenInfluentialsinOnlineProtestDiffusion.Amer- to scalable solutions to TAP while maintaining performance icanBehavioralScientist,57:943–965,2013. 10 [14] Seth A. Myers and Jure Leskovec. The bursty dynamics of the Twitter information network. In Proceedings of the 23rd International ConferenceonWorldWideWeb,pages913–924,2014. [15] Seth A. Myers, Chenguang Zhu, and Jure Leskovec. Information diffusion and external influence in networks. In Proceedings of the 18thACMSIGKDDInternationalConferenceonKnowledgeDiscovery andDataMining,pages33–41,2012. [16] EdithCohen. Size-EstimationFrameworkwithApplicationstoTransi- tiveClosureandReachability.JournalofComputerandSystemSciences, 55(3):441–453,1997. [17] Edith Cohen, H Kaplan, and Tel Aviv. Leveraging discarded samples for tighter estimation of multiple-set aggregates. ACM SIGMETRICS PerformanceEvaluationReview,37(1):251–262,2009. [18] WeiChen,ChiWang,andYajunWang.Scalableinfluencemaximization forprevalentviralmarketinginlarge-scalesocialnetworks.InProceed- ingsofthe16thACMSIGKDDInternationalConferenceonKnowledge DiscoveryandDataMining,pages1029–1038,2010. [19] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. Cost-effective Outbreak De- tection in Networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages420–429,2007. [20] JureLeskovecandAndrejKrevl.SNAPDatasets:Stanfordlargenetwork datasetcollection. http://snap.stanford.edu/data,June2014. [21] H.T. Nguyen, M. T. Thai, and T. N. Dinh. Stop-and-Stare: Optimal SamplingAlgorithmsforViralMarketinginBillion-ScaleNetworks. In ACMSIGMOD/POSDConference,2016. [22] H.T.Nguyen,T.N.Dinh,andM.T.Thai. Cost-awareTargetedViral MarketinginBillion-scaleNetworks. InIEEEInternationalConference onComputerCommunications,2016. [23] Wei Chen, Fu Li, Tian Lin, and Aviad Rubinstein. Combining Tradi- tionalMarketingandViralMarketingwithAmphibiousInfluenceMax- imization. In Proceedings of the 16th ACM Conference on Economics andComputation,pages779–796,2015. [24] N. P. Nguyen, Y. Xuan, and M. T. Thai. A novel method for worm containmentondynamicsocialnetworks. InMilitaryCommunications Conference,pages2180–2185,2010. [25] NamP.Nguyen,T.N.Dinh,D.T.Nguyen,andM.T.Thai.Overlapping communitystructuresandtheirdetectiononsocialnetworks.InPrivacy, Security,Risk,andTrust(PASSAT)and2011ThirdInternationalCon- ferenceonSocialComputing,2011.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.