Table Of Content

Pairwise Network Information and Nonlinear Correlations Elliot A. Martin,1 Jaroslav Hlinka,2,3 and Jörn Davidsen1,∗ 1Complexity Science Group, Department of Physics and Astronomy, University of Calgary, Calgary, Alberta, Canada, T2N 1N4 2Institute of Computer Science, The Czech Academy of Sciences, Pod vodarenskou vezi 2, 18207 Prague, Czech Republic 3National Institute of Mental Health, Topolova´ 748, 250 67 Klecany, Czech Republic (Dated: September 30, 2016) Reconstructing the structural connectivity between interacting units from observed activity is a challengeacrossmanydifferentdisciplines. Thefundamentalfirststepistoestablishwhetherorto whatextenttheinteractionsbetweentheunitscanbeconsideredpairwiseand,thus,canbemodeled asaninteractionnetworkwithsimplelinkscorrespondingtopairwiseinteractions. Inprinciplethis 6 canbedeterminedbycomparingthemaximumentropygiventhebivariateprobabilitydistributions 1 tothetruejointentropy. Inmanypracticalcasesthisisnotanoptionsincethebivariatedistributions 0 needed may not be reliably estimated, or the optimization is too computationally expensive. Here 2 we present an approach that allows one to use mutual informations as a proxy for the bivariate p probabilitydistributions. Thishastheadvantageofbeinglesscomputationallyexpensiveandeasier e to estimate. We achieve this by introducing a novel entropy maximization scheme that is based on S conditioning on entropies and mutual informations. This renders our approach typically superior 9 to other methods based on linear approximations. The advantages of the proposed method are 2 documented using oscillator networks and a resting-state human brain network as generic relevant examples. ] C PACSnumbers: 89.75.Hc,89.70.Cf,05.45.Tp,87.18.Sn N . o Pairwise measures of dependence such as cross- maximum entropy distribution, consistent with all the i correlations (as measured by the Pearson correlation co- pairwisecorrelationsdescribedthesystem. Ifthesystem b - efficient or covariance matrix) and mutual information is not well described by this maximum entropy distribu- q are widely used to characterize the interactions within tionthenweknowfromtheworkofJaynes[18]thatother [ complexsystems. Theyareakeyingredienttotechniques information beyond pairwise relationships would need to 2 suchasprincipalcomponentanalysis, empiricalorthogo- be taken into account. Similar analyses have since been v nalfunctions,andfunctionalnetworks(networksinferred appliedinneuroscience[19–21],aswellasingenetics[22], 4 from dynamical time series) [1–3]. These techniques are linguistics[23], economics[24], andtothesupremecourt 3 widespread since they provide greatly simplified descrip- of the United States [25]. 3 0 tions of complex systems, and allow for the analysis of However, the data to accurately estimate the needed 0 what might otherwise be intractable problems [4]. In bivariate probability distributions may not be available. 1. particular, functional networks have been widely applied To get around this some researchers have used the first 0 in fields such as neuroscience [4, 5], genetics [6], and cell two moments of the variables as constraints instead of 6 physiology [7], as well as in climate research [1, 8]. thefullbivariatedistributions[26,27]—effectivelyusing 1 : In this paper we study how faithfully these measures the cross-correlations as their constraints. In the case of v alone can represent a given system. With the increas- binaryvariables,asintheoriginalwork[17],thisisequiv- i X ing use of functional networks this topic has received alent to conditioning on the bivariate distributions. For r much attention recently, and many technical concerns largercardinalityvariablesthisisonlyanapproximation a have been brought to light dealing with the inference of though, as the cross-correlation is only sensitive to lin- these networks. Previous studies have shown that the ear relationships [28]. Systems where larger cardinalities estimates of the functional networks can be negatively andnonlinearbehaviourarethoughttoplayasignificant affected by properties of the time series [9–11], as well role such as in coupled oscillators — which have been as properties of the measure of association, e.g. cross- used to model systems as diverse as pacemaker cells and correlations [12–15]. In this work however, we address a crickets[29]—are,however,ratherthenormthananex- more fundamental question: How well do pairwise mea- ception [28]. In particular, we show here that this plays surements represent a system? asignificantroleinaresting-statehumanbrainnetwork. Inprinciplethiscanbeevaluatedusingamaximumen- Inordertoretaintheattractivepropertiesofthecross- tropy approach. The corresponding framework was first correlation and simultaneously capture a much wider laid out in [16] and later applied in [17], where they as- range of relationships we propose using the mutual in- sessed the rationale of only looking at the pairwise cor- formation. Mutualinformationcandetectarbitrarypair- relations between neurons. They examined how well the wiserelationshipsbetweenvariables,andisonlynon-zero 2 when the variables are pairwise independent, making it itlyinclude informationbeyond thesevalues. Asa result the ideal measure [30]. However, while calculating the thetruejointentropy,H({X} ),willalwaysbelessthan N maximum entropy given the moments of a distribution or equal to H ({X} ). m N resultsinsimpleequationsintheprobabilities,usingmu- If the variables are all independent the entropy of tualinformationsasconstraintsresultsintranscendental the system is the sum of the entropies of the individ- (cid:80) equationswhicharemuchhardertosolve. Wecircumvent ual variables, H ({X} ) = H(X ). The most this I N i i this problem here using the set theoretic formulation of uncertainty can be reduced is if the true joint entropy information theory [31], which gives us an upper bound is known, and is given by the multi-information (also onthemaximumentropythatissaturatedinmanycases. called total correlation [35]), I ({X} )=H ({X} )− N N I N The set theoretic formulation of information theory H({X} ) ≥ 0. We similarly define the measure infor- N allows us to map information theoretic quantities to mationIm({X} )=H ({X} )−H ({X} ),tobethe N I N m N the regions of an information diagram, a variation reduction in uncertainty given a measure. The fraction of a Venn diagram. The information diagram for of information retained by describing the system with three variables is shown in Fig. 1 with the associated a given measure, as opposed to the true joint entropy, is information theoretic quantities labeled [51]: entropy, then0≤Im/I ≤1. Iftheusedmeasureisthebivariate N H(X) = −(cid:80)p(x)log(p(x)); conditional entropy, probabilitydistribution, wecallIm thepairwisenetwork (cid:80) H(X|Y,Z) = − p(x,y,z)log(p(x|y,z)); mutual in- information, or the second order connected information (cid:80) formation, I(X;Y) = p(x,y)log(p(x,y)/(p(x)p(y))); as defined in [16]. This is approximated linearly if the conditional mutual information, I(X;Y|Z) = measure used is the cross-correlation, and nonlinearly if (cid:80) p(x,y,z)log(p(x;y|z)/[p(x|z)p(y|z)]); multivariate the measure used is the mutual information. mutual information, I(X;Y;Z)=I(X;Y)−I(X;Y|Z). When using the cross-correlation, estimating H is m conceptually straightforward, though finding the op- timum value can be computationally expensive. Es- timates of the first two moments of the variables uniquely determine the cross-correlations, and can be used as constraints in a Lagrange multiplier problem solving for H . The resulting probability distribution m �� P ({X} ) is the Boltzmann distribution P ({X} ) = m N m N �� (cid:16)(cid:80) (cid:80) (cid:17) exp h x + J x x ,whereh andJ arethe i i i i≥j i,j i j i i,j Lagrange multipliers [3]. When using mutual information, estimating H with FIG. 1: (Color online) The information diagram for 3 vari- m Lagrange multipliers is much harder as the derivatives ables. Itcontains7regionscorrespondingtothepossiblecom- of the Lagrange function are transcendental functions binationsof3variables,withtheircorrespondinginformation theoretic quantities defined in the text. The univariate en- in Pm({X}N). Instead, we use the mutual informations tropyH(X)isthesumofalltheregionsintheredcircle,and and univariate entropies as constraints, and draw on the the mutual information I(Y;Z) is the sum of all the regions structure of information diagrams. Each univariate en- in the blue oval. tropy and mutual information corresponds to a region in the information diagram that can be written as a sum of We illustrate our method using systems of coupled os- a number of atomic regions (atoms). The sum over all cillators, astheycommonlyoccurinnatureandareused atoms is simply H({X} ). Thus, as seen in Fig. 1, we N tomodelalargevarietyofsystems[29]. Inparticularwe obtain constraints of the form: look at the Kuramoto model [32, 33] as a paradigmatic examplethatiscapableofawiderangeofdynamicsfrom const=I(Y;Z)=I(Y;Z|X)+I(X;Y;Z), (1) synchronization to chaos [34], and hence provides an ex- const=H(X)=H(X|Y,Z)+I(X;Y|Z) cellent test bed for our method. +I(X;Z|Y)+I(X;Y;Z). (2) Method: GivenasetofN variables({X} ),wewant N to know how well the cross-correlation or mutual infor- In general, a system of N variables results in (cid:0)N(cid:1) uni- mationbetweenallpairsofvariablescanencodethestate 1 variateentropyconstraints,(cid:0)N(cid:1)mutualinformationcon- of the system. To do this we first determine the maxi- 2 mum entropy consistent with the given measure of sim- straints,andA=(cid:80)N (cid:0)N(cid:1)=2N−1atomstobedeter- k=1 k ilarity, H ({X} ), which represents a standard varia- mined. In the simplest case of N = 3 variables we have m N tionalproblem. Thismeansthatanymodelofthesystem six constraints and A = 7 regions to specify, see Fig 1. consistent with the (cid:0)N(cid:1) values of the similarity measure This means we only have one free parameter, making 2 can have an entropy of at most Hm({X}N). From the themaximizationprocesstogetHm({X}N)particularly work of Jaynes [18], we also know that any model of the easy in this case; in general there are (cid:80)N (cid:0)N(cid:1) free pa- k=3 k system with a smaller entropy must implicitly or explic- rameters. 3 Apartfromthechosenconstraintsdefinedabove,there fore, the only nonzero atoms in the information diagram are also general constraints on the values of the sub- willbetheN-variatemutualinformationI(X ;...;X )= 1 N regions ensuring they define a valid information dia- I(X ;X ), and the conditional entropies H(X |{X} − 1 2 i N gram, i.e. that there exists a probability distribution X ) = 2/3 bits. This is the maximum entropy diagram i with corresponding information-theoretic quantities. A consistent with the pairwise mutual informations and family of such constraints (so-called Shannon inequali- univariateentropies,sotheexpectedresultusingthemu- ties) can be inferred from the fundamental requirement tual information is Im/I = 1. We can see why this is N that, for discrete variables, (conditional) entropies and the case by starting with the information diagram for 2 mutual informations are necessarily non-negative: A) variables (which is fixed from our conditions), and suc- H(X |{X} −X ) ≥ 0; B) I(X ;X |{X} ) ≥ 0, where cessivelyaddingnewvariables. Theadditionofeachnew i N i i j K i(cid:54)=j and {X} ⊆{X} −{X ,X } [52]. Each inequal- variable adds 2/3 bits to the total entropy — which is K N i j ity can also be written as a sum of atoms, e.g. the maximal amount consistent with the mutual informations. Kuramoto Model: The Kuramoto model is a dynam- I(X1;X2|X3)=I(X1;X2|X3,X4)+I(X1;X2;X4|X3)≥0. icalsystemofN phaseoscillatorswithalltoallcoupling (3) proportional to K [32, 33]. The ith oscillator has an Not so-well known, for N ≥ 4, there are also inequal- intrinsic frequency of ω , a phase of θ , and its dynam- i i ities that are not deducible from the Shannon inequali- ics is given by ∂θi = ω + K (cid:80)N sin(θ −θ )+η (t). ∂t i N j=1 j i i ties, so called non-Shannon inequalities [31]. In princi- Here, we have followed [36] and added a dynamical noise ple, these inequalities may be included in our maximiza- term to mimic natural fluctuations and environmental tion problem; however, they have not yet been fully de- effects; η (t) is drawn from a Gaussian distribution with i scribed. Therefore, we suggest constructing the diagram correlation function (cid:104)η (t)η (t(cid:48))(cid:105) = Gδ δ(t−t(cid:48)), where i j i,j with the maximum entropy that satisfies the problem G determines the amplitude of the noise. For values of specific constraints and is consistent with the Shannon K above a critical threshold, K > K , synchronization c inequalities. As it may violate the non-Shannon inequal- occurs[29]. Inthelimitofconstantphasedifferencesthe ities, it may not represent a valid distribution. How- dynamicsaretrivial,andknowledgeofoneoscillatorwill ever, the sum of the atomic regions would still be an specify the phase of all others. Therefore, pairwise infor- upper bound on the entropy H , and thus provide a mation is sufficient to describe the system in this case. m lower bound on Im/I . Notably, for the particular (and Yet, the presence of noise results in random perturba- N in our simulations common) case where all A elements tionsofthephasesandtypicallypreventsconstantphase are non-negative — which is always true for N = 3 — differences [36] such that only Im/I (cid:46) 1 is expected. N onecanprovethattheboundisattainable(seeTheorem Intheweakcouplingregimewhensynchronizationisab- 3.11 [31]). sent, it is nontrivial what Im/I should be. N To summarize, the task of finding the maximum en- Toestimate Im/I andtoestablishtheimportanceof N tropy conditioned on the univariate entropies, mutual thelevelofdiscretizationorcardinality,wefirstdiscretize informations, and elemental Shannon inequalities, can the phase of each oscillator into n equally likely states be solved using linear optimization: Each constraint will [53]. Alternatively, estimators for continuous variables taketheformofalinear(in-)equality, asinEqs.(1),(3), canbeusedaswediscussin[37]. Toprovideclearproofs and we maximize the N-variate entropy by maximizing of principle, we first focus on three-oscillator systems in the sum over all A atoms of the information diagram. the following as this is the smallest system size at which Thus,weavoidhavingtoperformthemaximizationover theresultsarenon-trivial. Specifically,weconsiderthree probability distributions. different cases: (i) all oscillators have the same intrin- Example of an Nonlinear Pairwise Distribution: We sic frequency, (ii) all oscillators have unique intrinsic fre- now give an example illustrating how the mutual infor- quenciesandarestillsynchronized,and(iii)alloscillators mation can better detect pairwise relationships then the have unique intrinsic frequencies and the entire system cross-correlation. Considerasetofvariables{X} : each and all subsystems are unsynchronized (“weak coupling N variable is drawn uniformly from the set {−1,0,1}, and regime”). For three-oscillator systems, the correspond- all variables are simultaneously 0 or independently dis- ing parameter regimes in the absence of noise have been tributed among {−1,1}. The cross-correlation between carefully documented in Ref. [34]. anypairofvariablesiszero,andthereforeconsistentwith Foreachofthethreecasesexaminedwecreatedensem- thehypothesisthatallvariablesareindependent. There- bles of 100 three-oscillator systems, where each element fore, the fraction of information captured by the cross- of the ensemble will have randomly sampled frequencies correlation is Im/I =0. However, there is a significant [54]. These ensembles are studied in two different noise N amount of mutual information between the variables. regimes, G = 0.001 and G = 0.5. The same ensemble of Since P(X |{X} − X ) = P(X |X )∀i and j, all frequencies is used in both noise regimes. i N i i j(cid:54)=i the conditional mutual informations are zero. There- In the first case all oscillators are synchronized with 4 ω = ω = ω , and K = 1.65. Recall, in the synchro- oscillators in two regimes: i) All oscillators are synchro- 1 2 3 nizedcaseweexpectE[Im/I ]≈1. Thisisindeedwhat nized, K = 4; ii) the oscillators are partially synchro- N weseeinthelownoisecase, G=0.001, Fig.2A;though nized with more than 20 different synchronized clusters, the mutual information preserves slightly more informa- K =1.75. In both cases we use the same set of intrinsic tion at larger cardinalities. However, for increased noise, frequencies(drawnfromanormaldistributionwithmean G = 0.5, the cross-correlation performs poorly at larger zero and unit variance), and a noise level G=0.001. cardinalities, while the mutual information behaves ro- As in the analysis done in Ref. [17], we analyzed the bustly, Fig. 2 D. effects of sampling from a larger system by randomly se- lectingT ofthe100oscillatorsandcalculatingIm/I for T 1 thoseoscillators. Foreachtuplesize,T,werepeatedthis 1 0.8 0.6 100 times, using the same sets of tuples in both regimes, 0.99A: Case (i) Synchronized 0.4 D: Case (i) Synchronized G = 0.001 0.2 G = 0.5 and computed Eˆ[Im/I ] as the average of these values. 0.98 0 T ] 2 3 4 2 3 4 Asinourpreviousexamples,ourmethodoutperformsthe 3 1 1 /I0.8 0.8 cross-correlationinthesynchronizedcase(seeFig.3), as ˆmE[I000...02462BG: =C 0a.s0e0 (1ii) Synchron3ized 4 000...02462EG: =C 0a.s5e (ii) Synchro3nized 4 wreeslulltassinforEˆw[Iema/kIeTr]c(cid:46)ou1pliinngb(osteherFegigim. e4s),. anOduracmroestshoadll 000...1468 CG: =C 0a.s0e0 1(iii) Unsynchronized 000...1468 FG: =C a0s.5e (iii) Unsynchronized donislcyredtoizeastsioonfsoarnbdintaurpylevsairzieasb,lwesh.ilethecross-correlation 0.2 0.2 0 0 2 3 4 2 3 4 States 1 FIG. 2: (Color online) The fraction of shared information 0.9 CC T = 3 coded by the mutual information (blue diamonds, solid line) I]T CCCC TT == 57 andthecross-correlation(greensquares,dashedline). Notice m/0.8 MI T = 3 ftohretshcealreeostn.lyTghoeeessftriommat0e.d98ex−p1ecitnatPioannse,lEÂ[,..a.]n,darferoamver0a−ge1s Ê[I MMII TT == 57 over the ensemble of 100 realizations where we draw ω from 0.7 3 anormaldistributionwithzeromeanandunitvariance. Un- certainties corresponding to the 25% and 75% quantiles are 0.6 2 3 4 smaller than the symbol sizes. States For the second case, where the oscillators are syn- FIG. 3: (Color online) The fraction of shared information chronized with different intrinsic frequencies, we use coded by the mutual information (MI, diamonds with solid ∆ /∆ = 1.11, K/∆ = 4, and K = 2.20, where 1 2 2 lines) and the cross-correlation (CC, squares with dashed ∆ =ω −ω ,and∆ =ω −ω . Nowatbothnoiselev- 1 2 1 2 3 2 lines) for a tuple of size T. We simulated 100 nodes with els, at cardinalities greater than 2, the cross-correlation K =4,G=0.001,andtheestimatedexpectations,Eˆ[...],are failstocaptureasignificantportionoftheavailableinfor- averages over 100 randomly selected tuples of the given size. mation — as Eˆ[Im/IN] is significantly less than one — All oscillators are synchronized, and their intrinsic frequen- Fig.2BandE.Thisindicatesthatevensmallamplitude cies are drawn from a normal distribution with zero mean noise can prevent the cross-correlation from accurately and unit variance. Error bars are 25% and 75% quantiles. encodinginformationaboutthesysteminthiscase. The mutual information again robustly encodes almost all of Resting-State Human Brain Networks: To illustrate the possible information, Eˆ[Im/IN] ≈ 1, in both noise the applicability of our methodology in real-world data regimes and across all discretizations analyzed. situations, we apply it to neuroimaging data, in a sim- In the final case, the weak coupling regime (K/∆ = ilar context as in [19]. In particular, we want to as- 2 0.99, all other parameters as in the second case), we do sesstowhatextentthemultivariateactivitydistribution not have a strong hypothesis for what E[Im/I ] should is determined by purely bivariate dependence patterns. N be. InFig.2CandFwecanseethatthecross-correlation The used data consist of time series of functional mag- encodes virtually no information about the system for netic resonance imaging signals from 96 healthy volun- cardinalities greater than 2, Eˆ[Im/I ] ≈ 0. The mutual teersmeasuredusinga3TSiemensMagnetomTrioscan- 3 information again robustly encodes the vast majority of ner in IKEM (Institute for Clinical and Experimental the multi-information, with Eˆ[Im/I ] > 0.8 for all noise Medicine) in Prague, Czech Republic. Average signals 3 levels and discretizations. from 12 regions of the fronto-parietal network were ex- Similaroverallresultsholdforlargersystemsandwhen tracted using a brain atlas [38]. After preprocessing and only a subset of oscillators is observed. As an example, denoising as in [39], the data were temporally concate- weconsiderhereasystemof100non-identicalKuramoto nated. Each variable was further discretized to 2 or 3 5 with as few as 2H/2 data points [42] (H is measured 1 in bits). While this maximization has previously been prohibitively difficult, our work shows that it is feasible 0.8 CC T=3 allowing it to become widely applicable and serve as a /I]T0.6 CCCC TT==57 starting point before considering multivariate informa- m MI T=3 tion measures [43, 44]. Additionally, if our method re- I Ê[0.4 MI T=5 turnsasmallIm/I thissuggestsboththatthefaithful- MI T=7 N ness assumption used in causal inference is violated [45– 0.2 48], and that there is synergy among the variables [49]. 0 Our calculation of H for the mutual information is 2 3 4 m States free of distributional assumptions, computing the maximum entropy in the general space of arbitrary cardinality variables. This may result in higher entropy esti- FIG. 4: (Color online) The fraction of shared information matesthanmethodsthatconsiderpredefinedcardinality, coded by the mutual information (MI, diamonds with solid e.g., binary variables. Notably, our simulations suggest lines) and the cross-correlation (CC, squares with dashed lines) for a tuple of size T as in Fig. 3 but for weak cou- that estimating Hm in this way provides comparable, or pling K =1.75 leading to partial synchronization with more substantially lower, entropy estimates than H for the m than 20 different clusters. cross-correlation, which explicitly constrains the cardinality. This makes the technique competitive even when a specific cardinality could be reasonably assumed. states using equiquantal binning. Using our approach, Conclusions: In this work we introduced a novel we find Im/IN =0.88 for the 2-state and Im/IN =0.77 methodtodeterminetheimportanceofpairwiserelation- for the 3-state discretizations, suggesting that bivariate shipsbyestimatingthemaximumentropyconditionedon dependencepatternscapturethedominantproportionof the mutual informations. We showed that by mapping theinformation. For2-statediscretization,thisissmaller this problem to a linear optimization problem it could than in [19]. However, for the 3-state discretization it also be efficiently computed. Using the generic case of provides a much higher estimate of the bivariate depen- coupled oscillators we gave a proof of principle example dencerolethanthemethodtakingintoaccountonlycor- where our method was able to widely out-preform con- relations, as in the case of the Kuramoto model. This ditioning on the cross-correlations. The example of the suggests that only when accounting also for non-linear resting-state brain network showed that this also carries coupling, the bivariate dependencies provide sufficient over to real world applications, highlighting the poten- data structure approximation resolving the apparent in- tialofthemethodwhencardinalitieslargerthantwoand consistency of the results in [19]. This is also true for nonlinear behavior are important. other brain networks [37]. Our results indicate that in many relevant cases func- Discussion: Our method allows for potential tional networks based on mutual informations can in speedups over the maximum entropy calculation when principle more accurately capture the dynamics of the conditioning on the bivariate distributions, as well as system than those functional networks based on cross- when conditioning on the cross-correlations. In both of correlations. These types of analyses should be applied these cases solving the associated Lagrange multiplier before studying functional networks, both to assess the equations are non-linear optimization problems. The validity of the network paradigm, as well as to test the maximumentropydistributioncouldalsobefoundusing appropriateness of using the given measure of associa- iterative fitting routines like [40], but in these cases tion. Only high values in the fraction of shared informa- the problem will still scale like nN (n is the cardinality tion ensure that this is the case. This has not been done of the variables). While there are pathological linear in the vast majority of applications in the past. Due to optimization problems that scale exponentially with N, the computational efficiency, our proposed methodology there will always be a slightly perturbed problem such should allow to revisit this question, especially in areas that our method will scale polynomially [41]. where functional networks have already been widely ap- Researchers have so far relied on conditioning on the plied such as in climate research [1, 8, 9, 50]. cross-correlations when insufficient data is available to ThisprojectwasfinanciallysupportedbyNSERC(EM estimate the bivariate distributions. They either coarse and JD) and by the Czech Science Foundation project grain to binary variables where it is equivalent to con- No. 13-23940S and the Czech Health Research Coun- ditioning on the distributions [19] — potentially losing cil project NV15-29835A (JH). All authors would like to important information — or use higher cardinality vari- thank the MPIPKS for its hospitality and hosting the ableswhereitisonlyalinearapproximation[26,27]. Our international seminar program “Causality, Information approach based on mutual information can be applied in Transfer and Dynamical Networks”, which stimulated these cases; the associated entropies can be estimated some of the involved research. We also would like to 6 thank P. Grassberger for many helpful discussions. [27] K.Wood,S.Nishida,E.D.Sontag, andP.Cluzel,Proc. Natl. Acad. Sci. 109, 12254 (2012). [28] H.KantzandT.Schreiber,Nonlineartimeseriesanalysis (Cambridge University Press, Cambridge, UK, 1997). [29] A.Pikovsky,M.Rosenblum, andJ.Kurths,Synchroniza- ∗ [email protected] tion: a universal concept in nonlinear sciences, Vol. 12 [1] J. F. Donges, I. Petrova, A. Loew, N. Marwan, and (Cambridge university press, 2003). J. Kurths, Clim. Dynam. 45, 2407 (2015). [30] T.M.CoverandJ.A.Thomas,Elementsofinformation [2] A.Haimovici,E.Tagliazucchi,P.Balenzuela, andD.R. theory (Wiley, 2006). Chialvo, Phys. Rev. Lett. 110, 178101 (2013). [31] R. W. Yeung, Information theory and network coding [3] M. Timme and J. Casadiego, Journal of Physics A 47, (Springer, 2008). 343001 (2014). [32] Y.Kuramoto,inInternationalSymposiumonMathemat- [4] E.BullmoreandO.Sporns,Nat.Rev.Neurosci.10,186 ical Problems in Theoretical Physics, Lecture Notes in (2009). Physics,Vol.39,editedbyH.Araki(SpringerBerlinHei- [5] V. M. Eguiluz, D. R. Chialvo, G. A. Cecchi, M. Baliki, delberg, 1975) pp. 420–422. andA.V.Apkarian,Phys.Rev.Lett.94,018102(2005). [33] J.A.Acebroń, L.L.Bonilla, C.J.P.Vicente, F.Ritort, [6] A. A. Margolin, I. Nemenman, K. Basso, C. Wiggins, and R. Spigler, Rev. Mod. Phys. 77, 137 (2005). G. Stolovitzky, R. D. Favera, and A. Califano, BMC [34] Y.Maistrenko,O.Popovych,O.Burylko, andP.A.Tass, Bioinformatics 7, S7 (2006). Phys. Rev. Lett. 93, 084102 (2004). [7] A. Stoˇzer, M. Gosak, J. Dolenˇsek, M. Perc, M. Marhl, [35] S. Watanabe, IBM Journal of research and development M. S. Rupnik, and D. Koroˇsak, PLoS Comput Biol 9, 4, 66 (1960). e1002923 (2013). [36] H. Sakaguchi, Prog. Theor. Phys. 79, 39 (1988). [8] J. Runge, V. Petoukhov, J. F. Donges, J. Hlinka, N. Ja- [37] E.A.Martin,J.Hlinka,A.Meinke,F.Dˇechtˇerenko, and jcay, M. Vejmelka, D. Hartman, N. Marwan, M. Paluˇs, J. Davidsen, arXiv preprint arXiv:1601.00336 (2016). and J. Kurths, Nat. Commun. 6 (2015). [38] W.R.Shirer,S.Ryali,E.Rykhlevskaia,V.Menon, and [9] E. A. Martin, M. Paczuski, and J. Davidsen, EPL 102, M. D. Greicius, Cerebral Cortex (2011). 48003 (2013). [39] J. Hlinka, M. Paluˇs, M. Vejmelka, D. Mantini, and [10] M. Paluˇs, D. Hartman, J. Hlinka, and M. Vejmelka, M. Corbetta, NeuroImage 54, 2218 (2011). Nonlinear Proc. Geoph. 18, 751 (2011). [40] J.N.DarrochandD.Ratcliff,Ann.Math.Stat.43,1470 [11] S. Bialonski, M. Wendler, and K. Lehnertz, PLoS ONE (1972). 6, e22826 (2011). [41] R. Vershynin, SIAM J. Comput. 39, 646 (2009). [12] G. Tirabassi, R. Sevilla-Escoboza, J. M. Buldu´, and [42] I. Nemenman, Entropy 13, 2013 (2011). C. Masoller, Sci. Rep. 5 (2015). [43] N. Timme, W. Alford, B. Flecker, and J. M. Beggs, J. [13] J.Hlinka,D.Hartman, andM.Paluˇs,Chaos22,033107 Comput. Neurosci. 36, 119 (2014). (2012). [44] B.Kralemann,A.Pikovsky, andM.Rosenblum,NewJ. [14] E. A. Martin and J. Davidsen, Nonlinear Proc. Geoph. Phys. 16, 085013 (2014). 21, 929 (2014). [45] M.Eichler,Philos.T.Roy.Soc.A371,20110613(2013). [15] W. Mader, M. Mader, J. Timmer, M. Thiel, and [46] J. Sun, C. Cafaro, and E. M. Bollt, Entropy 16, 3416 B. Schelter, Sci. Rep. 5, 10805 (2015). (2014). [16] E. Schneidman, S. Still, M. J. Berry, and W. Bialek, [47] J.Runge,J.Heitzig,V.Petoukhov, andJ.Kurths,Phys. Phys. Rev. Lett. 91, 238701 (2003). Rev. Lett. 108, 258701 (2012). [17] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek, [48] K. Hlava´ˇckova´-Schindler, M. Paluˇs, M. Vejmelka, and Nature 440, 1007 (2006). J. Bhattacharya, Phys. Rep. 441, 1 (2007). [18] E. T. Jaynes, Phys. Rev. 106, 620 (1957). [49] V. Griffith and C. Koch, in Guided Self-Organization: [19] T. Watanabe, S. Hirose, H. Wada, Y. Imai, T. Machida, Inception, Emergence, Complexity and Computation, I. Shirouzu, S. Konishi, Y. Miyashita, and N. Masuda, Vol. 9, edited by M. Prokopenko (Springer Berlin Hei- Nat. Commun. 4, 1370 (2013). delberg, 2014) pp. 159–190. [20] S. Yu, H. Yang, H. Nakahara, G. S. Santos, D. Nikolić, [50] J. Deza, M. Barreiro, and C. Masoller, Eur. Phys. J- and D. Plenz, J. Neurosci. 31, 17514 (2011). Spec. Top. 222, 511 (2013). [21] I. E. Ohiorhenuan, F. Mechler, K. P. Purpura, A. M. [51] We use the convention p(x,y,z)=P(X =x,Y =y,Z = Schmid,Q.Hu, andJ.D.Victor,Nature(London)466, z). 617 (2010). [52] This set of equalities is minimal in the sense that no in- [22] T.R.Lezon,J.R.Banavar,M.Cieplak,A.Maritan, and equalityisimpliedbyanycombinationoftheothers[31]. N.V.Fedoroff,Proc.Natl.Acad.Sci.103,19033(2006). [53] Using n equally sized bins did not noticeably alter our [23] G. J. Stephens and W. Bialek, Phys. Rev. E 81, 066119 results. (2010). [54] For the numerical simulations, we use the Euler- [24] N.Xi,R.Muneepeerakul,S.Azaele, andY.Wang,Phys- Maruyamamethod,withatimestepdt=2−6,andunless ica A 413, 189 (2014). otherwise stated we use an integration time of T =2000 [25] E.D.Lee,C.P.Broedersz, andW.Bialek,J.Stat.Phys. inallofourresults.Wealsodiscardtimesupto50tore- 160, 1 (2013). movetransienteffects,andresamplethedataonlytaking [26] W.Bialek,A.Cavagna,I.Giardina,T.Mora,E.Silvestri, every 8th data point. M.Viale, andA.M.Walczak,Proc.Natl.Acad.Sci.109, 4786 (2012).