Table Of ContentInformation-Theoretic Lower Bounds for
Recovery of Diffusion Network Structures
Keehwan Park Jean Honorio
Department of Computer Science Department of Computer Science
Purdue University Purdue University
park451@purdue.edu jhonorio@purdue.edu
Abstract
6 G =(V,E)whereV ={1,...,p}isthesetofnodesandE
1 is the set of edges. Next, we provide a short description
0 Westudytheinformation-theoreticlowerboundofthe for the discrete-time IC model [11]. Initially we draw an
2
samplecomplexityofthecorrectrecoveryofdiffusionnet- initialsetofactivenodesfromasourcedistribution. The
y work structures. We introduce a discrete-time diffusion process unfolds in discrete steps. When node j first be-
a
modelbasedontheIndependentCascademodelforwhich comes active at time t, it independently makes a single
M
we obtain a lower bound of order Ω(klogp), for directed attempt to activate each of its outgoing, inactive neigh-
graphsofpnodes,andatmostk parentspernode. Next, bors i, with probability θ . If j succeeds, then i will
3 j,i
2 weintroduceacontinuous-timediffusionmodel,forwhich become active at time t + 1. If j fails, then it makes
asimilarlowerboundoforderΩ(klogp)isobtained. Our no further attempts to activate i. And this process runs
] results show that the algorithm of [16] is statistically op- until no more activations are possible.
G
timal for the discrete-time regime. Our work also opens Related works. Research on the sample com-
L
thequestionofwhetheritispossibletodeviseanoptimal plexity of the network inference problem is very recent
.
s algorithm for the continuous-time regime. [1,5,14–16]. Netrapalli et al. [15] studied the network
c
inference problem based on the discrete-time IC model
[
and showed that for graphs of p nodes and at most k
2 1 Introduction parents per node, O(k2logp) samples are sufficient, and
v
Ω(klogp) samples are necessary. However, as Danesh-
2
In recent years, the increasing popularity of online so- mand et al. [5] have pointed out, their model only con-
3
9 cial network services, such as Facebook, Twitter, and In- siders the discrete-time diffusion model and the correla-
7 stagram,allowsresearcherstoaccesslargeinfluenceprop- tion decay condition is rather restrictive since it limits
0 agation traces. Since then, the influence diffusion on so- the number of new activations at every step. Abrahao et
.
1 cial networks has been widely studied in the data min- al.[1]proposedtheFirst-Edgealgorithmtosolvethenet-
0 ing and machine learning communities. Several studies work inference problem and also suggested lower bounds
6
showed how influence propagates in such social networks but their results are specific to their algorithm, i.e., the
1
as well as how to exploit this effect efficiently. Domingos lower bounds are not information-theoretic.
:
v et al. [6] first explored the use of social networks in vi- In [5], Daneshmand et al. worked on the continuous-
i
X ral marketing. Kempe et al. [11] proposed the influence timenetworkinferenceproblemwith(cid:96)-1regularizedmax-
maximization problem on the Independent Cascade (IC) imum likelihood estimation and showed that O(k3logp)
r
a and Linear Threshold (LT) models, assuming all influ- samples are sufficient, using the primal-dual witness
ence probabilities are known. [10,18] studied the learn- method. Narasimhan et al. [14] explored various influ-
ing of influence probabilities for a known (fixed) network ence models including IC, LT, and Voter models under
structure. theProbablyApproximatelyCorrectlearningframework.
The network inference problem consists in discover- Pouget-Abadie et al. [16] studied various discrete-time
ingtheunderlyingfunctionalnetworkfromcascadedata. models under the restricted eigenvalue conditions. They
The problem is particularly important since regardless also proposed the first algorithm which recovers the net-
of having some structural side information, e.g., friend- work structure with high probability in O(klogp) sam-
ships in online social networks, the functional network ples.
structure, whichreflectstheactual influencepropagation It is important to note that, as we will see later in
paths, may look greatly different. Adar et al. [2] first the paper, we show information-theoretic lower bounds
explored the problem of inferring the underlying diffu- of order Ω(klogp), confirming that the algorithm in [16]
sion network structure. The subsequent researches [9,13] is statistically optimal. However, since their algorithm
have been done in recent years and the continuous-time only considered discrete-time models, developing a new
extensions [7,8,17] have also been explored in depth. algorithm for continuous-time models with the sufficient
Basic diffusion model. Consider a directed graph, condition on the sample complexity of order O(klogp)
1
can be an interesting future work.
2 Ensemble of Discrete-time Dif-
fusion Networks
Lower bounds of the sample complexity for general
graphs under the IC and LT models [11] seem to be par-
ticularlydifficulttoanalyze. Inthispaper,weintroducea
simple network under IC model, which fortunately allow
us to show sample complexity lower bounds that match
the upper bounds found in [16] for discrete-time models.
2.1 A simple two-layer network
Here we considered the two-layer IC model shown in
Figure 1: Diffusion Model with Two Layers.
Figure 1. Although not realistic, the considered model
allows to show that even in this simple two-layer case,
we require Ω(klogp) samples in order to avoid network The conditional probability can be expressed as follows.
recIonvFeriygufraeilu1r,ee.ach circle indicates a node and each edge P(tp+1 =2|tπts2)=1−(1−θ)(cid:80)i∈π1[ti=1](1−θ0)
(j,i) with its influence probability θ indicates that a cas- P(tp+1 =∞|tπts2)=(1−θ)(cid:80)i∈π1[ti=1](1−θ0)
cade can be propagated from node j to i or equivalently
where 1[·] is an indicator function. Lastly, for simplicity,
nodej activatesiwithprobabilityθ. Themodelassumes
we define,
thatthereexistsasupersourcenodes , whichisalready
1
activated at time zero and at time 1, it independently 1
θ :=1−θk (2)
0
tries to activate p parent nodes with probability θ and
0
s with probability 1. There exist a child node p +1, whichdecreasesasthechildnodep+1hasmoreparents.
2
which has exactly k +1 parents including s . Then at Thelatteragreeswiththeintuitionthataswehavemore
2
time 2, s and all direct parents of p + 1, which have parents,thechanceofasingleparentactivatingthechild
2
been activated at time 1, independently try to activate node gets smaller.
the child node p+1 with probability θ and θ, respec- We will study the information-theoretic lower bounds
0
tively. We use t = ∞ to indicate that a node i has not on the sample complexity of the network inference prob-
i
been activated during the cascading process. Note that lem. We will use Fano’s inequality in order to analyze
these influence probabilities can be generalized without the necessary number of samples for any conceivable al-
too much effort. gorithm in order to avoid failure.
Given the model with unknown edges between parent
nodes and the child node p+1, and a set of n samples 2.2 Lower Bounds with Fano’s inequality
t(1),t(2),...,t(n) ∈ {1,∞}p × {2,∞}, the goal of the
First,wewillboundthemutualinformationbyusinga
learneristorecoverthekedgesorequivalentlytoidentify
pairwise Kullback-Leibler (KL) divergence-based bound
thek (cid:28)pdirectparentsofthechildnodep+1. Eachsam-
[3], and show the following lemma.
ple is a (p+1)-dimensional vector, t = (t ,...,t ,t ),
1 p p+1
and includes all the activation times of the parent and Lemma 1. Under the settings of the discrete-time diffu-
child nodes. A parent node i ∈ {1,...,p} is either acti- sion model, for any pair of hypotheses, π,π(cid:48) ∈F,
vated at time 1 (i.e., t = 1) or not (i.e.,t = ∞). The
i i 1
childnodep+1iseitheractivatedattime2(i.e.,t =2) KL(P ||P )≤log
p+1 t|π t|π(cid:48) θ
or not (i.e., t =∞). 0
p+1
Now, we define the hypothesis class F as the set of all Proof. First,wenoticethatthemaximumKLdivergence
combinations of k nodes from p possible parent nodes, betweentwodistributions,P andP canbeachieved
t|π t|π(cid:48)
that is |F| := (cid:0)p(cid:1). Thus, a hypothesis π is the set of when the two sets, π and π(cid:48), do not share any node,
k
k parent nodes such that ∀i ∈ π, there exist an edge or equivalently, when there is not any overlapping edge
from i to p + 1 with influence probability θ. We also between parent and child nodes. That is, π∩π(cid:48) =∅.
let πc := {1,...,p}\π to be the complement set of π. Then we compute the KL divergence with the two dis-
Givenahypothesisπ andasamplet,wecanwriteadata joint parent sets, as follows
likelihood using independence assumptions. (cid:88) P(t|π)
KL(P ||P )= P(t|π)log
P(t;π)=P(tπ)P(tπc)P(tp+1|tπts2) (1) t|π t|π(cid:48) t∈{1,∞}p×{2,∞} P(t|π(cid:48))
2
Using Jensen’s inequality and Eq (1), we have Proof. We first bound the mutual information by the
pairwise KL-based bound [3].
(cid:18) (cid:88) P(t|π)(cid:19)
KL(Pt|π||Pt|π(cid:48))≤log P(t|π)P(t|π(cid:48)) I(π¯,S)< 1 (cid:88) (cid:88) KL(P ||P )
t∈{1,∞}p×{2,∞} |F|2 S|π S|π(cid:48)
(cid:18) P(t|π)(cid:19) π∈Fπ(cid:48)∈F
≤log t∈{1,∞m}ap×x{2,∞}P(t|π(cid:48)) = |Fn|2 (cid:88) (cid:88) KL(Pt|π||Pt|π(cid:48))
(cid:32) (cid:33) π∈Fπ(cid:48)∈F
P(t )P(t )P(t |t t )
=log max π πc p+1 π s2
t∈{1,∞}p×{2,∞}P(tπ(cid:48))P(tπ(cid:48)c)P(tp+1|tπ(cid:48)ts2) NtioonwafsrofmolloLwems.ma 1, we can bound the mutual informa-
(cid:32) (cid:33)
P(t |t t )
=log max p+1 π s2 (3) 1
t∈{1,∞}p×{2,∞}P(tp+1|tπ(cid:48)ts2) I(π¯,S)<nlogθ0 (4)
Now as we have argued earlier, the maximum value can Finally, by Fano’s inequality [4], Eq (4), and the well-
be attained when π∩π(cid:48) =∅. Without loss of generality, known bound, log(cid:0)p(cid:1)≥k(logp−logk), we have
k
we assume that π connects the first k nodes to p+1 and
π(cid:48) connects the subsequent k nodes to p+1. Thus we nlog 1 +log2
P[fˆ(cid:54)=f¯]≥1− θ0
have log(cid:0)p(cid:1)
k
P(tp+1 =2|tπts2) ≤ 1−(1−θ)(cid:80)ki=11[ti=1](1−θ0) ≥1− nlogθ10 +log2
P(tp+1 =2|tπ(cid:48)ts2) 1−(1−θ)(cid:80)2i=kk+11[ti=1](1−θ0) k(logp−logk)
1
=
Similarly, we have 2
By solving the last equality we conclude that, if n ≤
P(tp+1 =∞|tπts2) ≤ (1−θ)(cid:80)ki=11[ti=1](1−θ0) klogp−klogk−2log2, then any conceivable algorithm will
P(tp+1 =∞|tπ(cid:48)ts2) (1−θ)(cid:80)2i=kk+11[ti=1](1−θ0) fail wi2thlogaθ10large probability, P[πˆ (cid:54)=π¯]≥1/2.
We can use the above expressions in order to obtain an
upper bound for Eq (3). Thus, by Eq (2) we have 3 Ensemble of Continuous-time
KL(P ||P ) Diffusion Networks
t|π t|π(cid:48)
(cid:32) (cid:26)1−(1−θ)k(1−θ ) 1−θ (cid:27)(cid:33)
≤log max 0 , 0 In this section, we will study the continuous-time ex-
θ0 (1−θ)k(1−θ0) tensiontothetwo-layerdiffusionmodel. Forthispurpose,
(cid:18) (cid:19) weintroduceatransmissionfunctionbetweenparentand
1
≤log childnodes. Fortheinterestedreaders,Gomez-Rodriguez
θ
0 et al. [8] discuss transmission functions in full detail.
3.1 A simple two-layer network
Byusingtheaboveresults,weshowthatthenecessary
Here we used the same two-layer network structure
number of samples for the network inference problem is
shown in Figure 1. However, for a general continuous
Ω(klogp).
model, the activation time for a child node is dependent
Theorem 2. Suppose that nature picks a “true” hypoth- ontheactivationtimesofitsparents. Forouranalysis,we
esis π¯ uniformly at random from some distribution of hy- relax this assumption by considering a fixed time range
potheses with support F. Then a dataset S of n inde- for each layer. In other words, we first consider a fixed
pendent samples t(1),t(2),...,t(n) ∈ {1,∞}p × {2,∞} time span, T. Then the p parent nodes are only acti-
is produced, conditioned on the choice of π¯. The learner vated between [0,T], and the child node p+1 is only ac-
then infers πˆ from the dataset S. Under the settings of tivated between [T,2T]. Our analysis for the continuous-
the two-layered discrete-time diffusion model, there exists time model largely borrows from our understanding of
a network inference problem of k direct parent nodes such the discrete-time model.
that if n ≤ klogp−klogk−2log2, then learning fails with The continuous-time model works as follows. The su-
2log 1
probability at least 1/2,θ0i.e., per source node s1, tries to activate each of the p parent
nodes with probability θ , and s with probability 1. If
0 2
a parent node gets activated, it picks an activation time
1
P[πˆ (cid:54)=π¯]≥ from [0,T] based on the transmission function, f(t;π).
2
Then, s and all the direct parents, which have been ac-
2
for any algorithm that a learner could use for picking πˆ. tivated in t ∈ [0,T], independently try to activate the
3
child node p+1 with probability θ and θ, respectively. NowwiththesameargumentwemadeinLemma1, con-
0
If the child node p+1 gets activated, it picks an activa- sider that π connects the first k nodes to p+1 and π(cid:48)
tiontimefrom[T,2T]basedonthetransmissionfunction, connects the subsequent k nodes to p+1. Thus, we have
f(t;π).
P(t ∈[T,2T]|t t )
For the continuous-time model, the conditional proba- p+1 π s2 ≤
P(t ∈[T,2T]|t t )
bilities can be expressed as follows. p+1 π(cid:48) s2
(cid:18) (cid:19)
P(tp+1 ∈[T,2T]|tπts2)= 1−(1−θ)(cid:80)ki=11[ti∈[0,T]](1−θ0) f(tp+1−T;π)
(cid:18) (cid:19)
1−(1−θ)(cid:80)i∈π1[ti∈[0,T]](1−θ0) ·f(tp+1−T;π) (cid:18)1−(1−θ)(cid:80)2i=kk+11[ti∈[0,T]](1−θ0)(cid:19)f(tp+1−T;π(cid:48))
P(tp+1 =∞|tπts2)=(1−θ)(cid:80)i∈π1[ti∈[0,T]](1−θ0) Similarly, we have
Lastly, we define the domain of a sample t to be
T := ([0,T] ∪ {∞})p×([T,2T] ∪ {∞}). P(tp+1 =∞|tπts2) ≤ (1−θ)(cid:80)ki=11[ti∈[0,T]](1−θ0)
P(tp+1 =∞|tπ(cid:48)ts2) (1−θ)(cid:80)2i=kk+11[ti∈[0,T]](1−θ0)
3.2 Boundedness of Transmission Func-
We can use the above expressions in order to obtain an
tions upper bound for Eq (5). Thus, by Eq (2) we have
We will start with the general boundedness of the KL(P ||P )
t|π t|π(cid:48)
transmission functions. The constants in the bounded- (cid:32) (cid:26)1−(1−θ)k(1−θ )κ 1−θ (cid:27)(cid:33)
ness condition will be later directly related to the lower ≤log max 0 2, 0
bound of the sample complexity. In the later part of the θ0 κ1 (1−θ)k(1−θ0)
paper, we will provide an example for the exponentially (cid:32) (cid:26) (cid:18) (cid:19) (cid:27)(cid:33)
κ 1 1
distributed transmission function. Often, transmission =log max 2 −(1−θ ) ,
κ θ 0 θ
functions used in the literature fulfill this assumption, 1 0 0
e.g., the Rayleigh distribution [5] and the Weibull distri-
bution for µ≥1 [12].
Byusingtheaboveresults,weshowthatthenecessary
Condition 1 (Boundedness of transmission functions).
number of samples for the network inference problem is
Suppose t ∈ [0,T] is a transmission time random vari-
also Ω(klogp) in the continuous-time model.
able, dependent on its parents π. The probability density
function f(t;π) fulfills the following condition for a pair Theorem 4. Suppose that nature picks a “true” hypoth-
of positive constants κ1 and κ2. esis π¯ uniformly at random from some distribution of hy-
potheses with support F. Then a dataset S of n inde-
min f(t;π)≥κ >0
t∈[0,T] 1 pendent samples t(1),t(2),...,t(n) ∈ ([0,T] ∪{∞})p ×
max f(t;π)≤κ <∞ ([T,2T] ∪ {∞}) is produced, conditioned on the choice
2
t∈[0,T] of π¯. The learner then infers πˆ from the dataset S. As-
sumethatthetransmissionfunctionf(t;π),satisfiesCon-
3.3 Lower Bounds with Fano’s inequality dition 1 with constants κ and κ . Under the settings of
1 2
thetwo-layeredcontinuous-timediffusionmodel,thereex-
First, we provide a bound on the KL divergence that
ists a network inference problem of k direct parent nodes
will be later used in analyzing the necessary number of
such that if
samples for the network inference problem.
klogp−klogk−2log2
Lemma3. Underthesettingsofthecontinuous-timedif- n≤
fusion model, for any pair of hypotheses, π,π(cid:48) ∈F, (cid:32) (cid:26) (cid:18) (cid:19) (cid:27)(cid:33)
(cid:32) (cid:26) (cid:18) (cid:19) (cid:27)(cid:33) 2log max κκ21 θ10 −(1−θ0) ,θ10
κ 1 1
KL(P ||P )≤log max 2 −(1−θ ) ,
t|π t|π(cid:48) κ1 θ0 0 θ0 then learning fails with probability at least 1/2, i.e.,
Proof. We note that the proof is very similar to that of 1
P[πˆ (cid:54)=π¯]≥
Lemma 1. 2
(cid:88) P(t|π) for any algorithm that a learner could use for picking πˆ.
KL(P ||P )= P(t|π)log
t|π t|π(cid:48) P(t|π(cid:48))
t∈T Proof. The proof is very similar to that of Theorem 2.
(cid:18) P(t|π)(cid:19) First, by the pairwise KL-based bound [3] and Lemma 3,
≤log max
t∈T P(t|π(cid:48)) we have
(cid:32) P(t |t t )(cid:33) (cid:32) (cid:26)κ (cid:18) 1 (cid:19) 1 (cid:27)(cid:33)
=log max p+1 π s2 (5) I(π¯,S)<nlog max 2 −(1−θ ) , (6)
t∈T P(tp+1|tπ(cid:48)ts2) κ1 θ0 0 θ0
4
By Fano’s inequality [4], Eq (6), and the well-known From the above, we can obtain the minimum and maxi-
bound, log(cid:0)p(cid:1)≥k(logp−logk), we have mumvaluesofthe densityfunction, κ and κ , inCondi-
k 1 2
tion 1 as follows.
P[fˆ(cid:54)=f¯]
λe−λT λ κ
(cid:32) (cid:26) (cid:18) (cid:19) (cid:27)(cid:33) κ1 = 1−e−λT , κ2 = 1−e−λT ⇒ κ2 =eλT
nlog max κκ21 θ10 −(1−θ0) ,θ10 +log2 1 (7)
≥1−
log(cid:0)p(cid:1)
k Finally using Theorem 4 and Eq (7), we show that if
(cid:32) (cid:26) (cid:18) (cid:19) (cid:27)(cid:33)
nlog max κκ21 θ10 −(1−θ0) ,θ10 +log2 n≤ (cid:32) klog(cid:26)p−(cid:18)klogk−2log2(cid:19) (cid:27)(cid:33)
≥1−
k(logp−logk) 2log max eλT 1 −(1−θ ) , 1
θ0 0 θ0
1
=
2
thenanyconceivablealgorithmwillfailwithalargeprob-
By solving the last equality we conclude that, if n ≤ ability, P[πˆ (cid:54)=π¯]≥1/2.
(cid:32) klo(cid:26)gp−(cid:18)klogk−2log2(cid:19) (cid:27)(cid:33), then any conceivable
2log max κκ12 θ10−(1−θ0) ,θ10 4 Conclusion
algorithm will fail with a large probability, P[πˆ (cid:54)= π¯] ≥
We have formulated the two-layered discrete-time
1/2.
and continuous-time diffusion models and derived the
information-theoretic lower bounds of the sample com-
Lastly,wewillpresentanexamplefortheexponentially
plexity of order Ω(klogp). Our bound is particularly
distributed transmission function.
important since we can infer that the algorithm in [16],
whichonlyworksunderdiscrete-timesettings, isstatisti-
Corollary 5 (Exponential Distribution). Suppose that
cally optimal based on our bound.
nature picks a “true” hypothesis π¯ uniformly at ran-
Our work opens the question of whether it is possible
dom from some distribution of hypotheses with sup-
to devise an algorithm for which the sufficient number
port F. Then a dataset S of n independent samples
of samples is O(klogp) in continuous-time settings. We
t(1),t(2),...,t(n) ∈ ([0,T]∪{∞})p×([T,2T]∪{∞}) is
alsohaveobservedsomepotentialfutureworktoanalyze
produced, conditioned on the choice of π¯. The learner
sharp phase transitions for the sample complexity of the
then infers πˆ from the dataset S. Assume that the trans-
network inference problem.
mission function f(t;π) = λe−λt is of the censored
1−e−λT
(rescaled) exponential distribution form, defined over
[0,T]. Under the settings of the two-layered continuous- References
time diffusion model, there exists a network inference
problem of k direct parent nodes such that if [1] Bruno Abrahao, Flavio Chierichetti, Robert Klein-
berg, and Alessandro Panconesi. Trace complex-
klogp−klogk−2log2
ity of network inference. In Proceedings of the 19th
n≤
(cid:32) (cid:26) (cid:18) (cid:19) (cid:27)(cid:33) ACM SIGKDD international conference on Knowl-
2log max eλT 1 −(1−θ ) , 1
θ0 0 θ0 edge discovery and data mining, pages 491–499.
ACM, 2013.
then learning fails with probability at least 1/2, i.e.,
[2] E.AdarandL.A.Adamic. Trackinginformationepi-
demicsinblogspace. InWeb Intelligence, 2005. Pro-
1
P[πˆ (cid:54)=π¯]≥ ceedings. The 2005 IEEE/WIC/ACM International
2
Conference on, pages 207–214, Sept 2005.
for any algorithm that a learner could use for picking πˆ.
[3] Yu B. Assouad, Fano, and Le Cam. In Torgersen E.
Proof. Sincetheprobabilitydensityfunctionshouldonly Pollard D. and Yang G., editors, Festschrift for Lu-
be defined between [0,T], we need to rescale the proba- cien Le Cam: Research Papers in Probability and
bilitydensityfunctionofthestandardexponentialdistri- Statistics, pages 423–435. Springer New York, 1997.
bution, g(t) ∼ Exp(λ), whose cumulative density func-
[4] T. Cover and J. Thomas. Elements of Information
tion is G(t). Given this, we have the censored (rescaled)
Theory. John Wiley & Sons, 2nd edition, 2006.
transmission function,
[5] Hadi Daneshmand, Manuel Gomez-Rodriguez,
g(t) g(t) λe−λt
f(t;π)= = = Le Song, and Bernhard Schoelkopf. Estimating
G(T)−G(0) G(T) 1−e−λT diffusion network structures: Recovery conditions,
5
sample complexity & soft-thresholding algorithm. [15] Praneeth Netrapalli and Sujay Sanghavi. Learning
In Proceedings of the... International Conference the graph of epidemic cascades. In ACM SIGMET-
on Machine Learning. International Conference on RICS Performance Evaluation Review, volume 40,
Machine Learning, volume 2014, page 793. NIH pages 211–222. ACM, 2012.
Public Access, 2014.
[16] Jean Pouget-Abadie and Thibaut Horel. Inferring
graphsfromcascades: Asparserecoveryframework.
[6] Pedro Domingos and Matt Richardson. Mining the
In Proceedings of the 24th International Conference
network value of customers. In Proceedings of the
on World Wide Web Companion,pages625–626.In-
seventh ACM SIGKDD international conference on
ternational World Wide Web Conferences Steering
Knowledge discovery and data mining, pages 57–66.
Committee, 2015.
ACM, 2001.
[17] Kazumi Saito, Masahiro Kimura, Kouzou Ohara,
[7] N. Du, L. Song, M. Gomez-Rodriguez, and H. Zha.
and Hiroshi Motoda. Learning continuous-time in-
Scalableinfluenceestimationincontinuous-timedif-
formation diffusion model for social behavioral data
fusion networks. In NIPS ’13: Advances in Neural
analysis. In Advances in Machine Learning, pages
Information Processing Systems, 2013.
322–337. Springer, 2009.
[8] M. Gomez-Rodriguez, D. Balduzzi, and [18] Kazumi Saito, Ryohei Nakano, and Masahiro
B. Scho¨lkopf. Uncovering the temporal dynamics Kimura. Prediction of information diffusion prob-
of diffusion networks. In ICML ’11: Proceedings abilities for independent cascade model. In Pro-
of the 28th International Conference on Machine ceedings of the 12th International Conference on
Learning, 2011. Knowledge-Based Intelligent Information and Engi-
neering Systems, Part III, KES ’08, pages 67–75,
[9] Manuel Gomez Rodriguez, Jure Leskovec, and An- Berlin, Heidelberg, 2008. Springer-Verlag.
dreas Krause. Inferring networks of diffusion and
influence. In Proceedings of the 16th ACM SIGKDD
international conference on Knowledge discovery
and data mining, pages 1019–1028. ACM, 2010.
[10] Amit Goyal, Francesco Bonchi, and Laks VS Lak-
shmanan. Learning influence probabilities in social
networks. In Proceedings of the third ACM interna-
tional conference on Web search and data mining,
pages 241–250. ACM, 2010.
[11] DavidKempe,JonKleinberg,andE´vaTardos. Max-
imizing the spread of influence through a social net-
work. InProceedings of the ninth ACM SIGKDD in-
ternational conference on Knowledge discovery and
data mining, pages 137–146. ACM, 2003.
[12] Takeshi Kurashima, Tomoharu Iwata, Noriko
Takaya, and Hiroshi Sawada. Probabilistic la-
tent network visualization: inferring and embed-
ding diffusion networks. In Proceedings of the 20th
ACM SIGKDD international conference on Knowl-
edge discovery and data mining, pages 1236–1245.
ACM, 2014.
[13] Seth Myers and Jure Leskovec. On the convexity
of latent social network inference. In Advances in
NeuralInformationProcessingSystems,pages1741–
1749, 2010.
[14] Harikrishna Narasimhan, David C Parkes, and
Yaron Singer. Learnability of influence in networks.
In Advances in Neural Information Processing Sys-
tems, pages 3168–3176, 2015.
6