The interdependent network of gene regulation and metabolism is robust where it needs to be David F. Klosik,1,∗ Anne Grimbs,2,† Stefan Bornholdt,1,‡ and Marc-Thorsten Hu¨tt2,§ 1Institute for Theoretical Physics, University of Bremen 2Department of Life Sciences and Chemistry, Jacobs University Bremen The major biochemical networks of the living cell, the network of interacting genes and the networkofbiochemicalreactions,arehighlyinterdependent,however,theyhavebeenstudiedmostly as separate systems so far. In the last years an appropriate theoretical framework for studying interdependent networks has been developed in the context of statistical physics. Here we study the interdependent network of gene regulation and metabolism of the model or- ganism Escherichia coli using the theoretical framework of interdependent networks. 7 In particular we aim at understanding how the biological system can consolidate the conflicting 1 tasks of reacting rapidly to (internal and external) perturbations, while being robust to minor 0 environmental fluctuations, at the same time. For this purpose we study the network response to 2 localizedperturbationsandfindthattheinterdependentnetworkissensitivetogeneregulatoryand protein-level perturbations, yet robust against metabolic changes. n This first quantitative application of the theory of interdependent networks to systems biology a shows how studying network responses to localized perturbations can serve as a useful strategy for J analyzing a wide range of other interdependent networks. 1 3 Amainconceptualapproachofcurrentresearchinthe processesofnetworkformation[7]orthespreadingofdis- ] N life sciences is to advance from a detailed analysis of in- ease on networks [8]. In the early 2000s, gene regulation dividual molecular components and processes towards a and metabolism have been among the first applications M description of biological systems and to understand the of the formalisms of ’network biology’ [9]. Among the . emergenceofbiologicalfunctionfromtheinterdependen- diverse studies of network structure for these systems, o i cies on the molecular level. the most prominent ones on the gene regulatory side are b Supported by the diverse high-throughput ’omics’ the statistical observation and functional interpretation - q technologies, the relatively recent discipline of systems of small over-represented subgraphs (’network motifs’) [ biologyhasbeenthemajordrivingforcebehindthisnew [10, 11] and the hierarchical organization of gene regu- perspective which becomes manifest, for example, in the latory networks [12]. On the metabolic side, the broad 1 effort to compile extensive databases of biological infor- degreedistributionofmetabolicnetworksstandsout[13], v 2 mation to be used in genome-scale models [1–3]. with the caveat, however, that ’currency metabolites’ 0 Despite its holistic ’game plan’, however, systems bi- (like ATP or H2O) can severely affect network proper- 0 ology frequently operates on the level of subsystems: ties[14],aswellasthehierarchicalmodularorganization 9 Even when considering cell-wide transcriptional regula- of metabolic networks [15, 16]. 0 tory networks, as, e.g., in a network motif analysis [4], Over the last decade, the field of complex networks . 1 this is only one of the cell’s networks. Likewise, the moved its focus from the investigation of single-network 0 popularapproachtostudyingmetabolicnetworksinsys- representations of systems to the interplay of networks 7 tems biology, constraint-based modeling, accounts for that interact with and/or depend on each other. Strik- 1 steady-state predictions of metabolic fluxes of genome- ingly,itturnedoutthatexplicitinterdependencebetween : v scale metabolic networks [5], which again, is only one of network constituents can fundamentally alter the per- Xi the other networks of the cell. colation properties of the resulting interdependent net- In the analysis of such large networks, systems bi- works, which can show a discontinuous percolation tran- r a ology draws its tools considerably from the science of sition in contrast to the continuous behavior in single- complexnetworkswhichprovidesamathematicalframe- networkpercolation[17–22]. Ithasalsobeenfoundthat, work especially suitable for addressing interdisciplinary contrary to the isolated-network case, networks with questions. Combining the mathematical subdiscipline of broaderdegreedistributionbecomeremarkablyfragileas graph theory with methods from statistical physics, this interdependent networks [23]. new field greatly contributed to the understanding of, However, this set of recent developments in network e.g., the percolation properties of networks [6], potential science still lacks application to systems biology. Arguably, the most prominent representative of inter- dependent networks in a biological cell is the combined system of gene regulation and metabolism which are in- ∗ [email protected] terconnected by various forms of protein interactions, † [email protected] e.g., enzyme catalysis of biochemical reactions couples ‡ corresponding author: [email protected] the regulatory to metabolic network, while the activa- § [email protected] tion or deactivation of transcription factors by certain 2 metabolic compounds provides a coupling in the oppo- I. THE SYSTEM site direction. Although it is well-known that gene regulatory and The core object of our investigation is an E. coli metabolicprocessesarehighlydependentononeanother network representation of its combined gene regulation only few studies addressed the interplay of gene regula- and metabolism, which can be thought of as function- tionandmetabolismonalargerscaleandfromasystemic ally divided into three domains: the representation cap- perspective [24–26]. The first two studies have aimed at turesbothgeneregulatoryandmetabolicprocesses,with finding consistent metabolic-regulatory steady states by theseprocessesbeingconnectedbyanintermediatelayer translating the influence of metabolic processes on gene that models both, the enzymatic influence of genes on activity into metabolic flux predicates and incorporating metabolic processes, as well as signaling-effects of the high-throughput gene expression data. This can be con- metabolism on the activation or inhibition of the ex- sidered as an extension of the constraints-based model- pression of certain genes. The underlying interaction ingframeworkbeyondthemetabolicnetworksubsystem. graph G = (V,E) = {G ,G ,G } with its set of R I M In the paper of Samal and Jain [26], on the other hand, nodes (vertices) V and links (edges) E consists of three thetranscriptionalregulatorynetworkofEscherichiacoli interconnected subgraphs, the gene regulatory domain (E. coli) metabolism has been studied as a Boolean net- G , the interface domain G and the metabolic domain R I work model into which flux predicates can be included G . From the functional perspective, G is the union M as additional interactions. These models were first im- of gene regulatory (G ) and metabolic processes (G ), R M portantstepstowardsintegratingthesubsystemsofgene andtheirinteractionsandpreparatorystepsformthein- regulation and metabolism from a systems perspective. terface (G ). Figure 1 shows a sketch of the network I The formalism of interdependent networks now allows model. The integrated network representation has been us to go beyond these pioneering works on integrative models,byanalyzingtherobustnessofthecombinedsys- tem in terms of the maximal effect a small perturbation G R can have on such interdependent systems. In particular, the findings can be interpreted in the context of cascad- ing failures and percolation theory. |V |=2208 R We here undertake a first application of the new |E |=4219 R methodological perspective to the combined networks of gene regulation and metabolism in E. coli. 8 6 Using various biological databases, particularly 6 6 7 3 EcoCyc as the main core [27, 28], we have compiled a G graph representation of gene regulatory and metabolic I sptrrouccetsusreaslodfeEsc.rcipoltiioinn,clduidstininggauihshiginhglebveetlwoefedneatacilominptahre- 294 ||EVII||==11817072 97 atively large number of node and link types according to theAirsbtrioulcotguircaallafunnaclytisoisnaolfittyh.is compilation reveals that, 1032 206 in addition to a small set of direct links, the gene- regulatoryandthemetabolicdomainsarepredominantly coupledviaathirdnetworkdomainconsistingofproteins |V |=6298 G M and their interactions. Figure 1 shows this three-domain M |E |=15769 M functional division. Details about the data compilation, thenetworkreconstructionandthedomain-levelanalysis are given in Grimbs et al. [29]. This rich structural description, together with purpose-built,biologicallyplausiblepropagationrulesal- Figure 1. A sketch of the domain organization of the inte- lows us to assess the functional level with methods de- grative E. coli network. The regulatory domain, G , at the R rivedfrompercolationtheory. Moreprecisely,wewillin- topisconnectedtothemetabolicdomain,G ,shownatthe M vestigate cascading failures in the three-domain system, bottom via a protein-interface layer, G . The figure shows I emanating from small perturbations, localized in one of thenumberofverticeswithinandthenumberofedgeswithin thedomains. Bynetworkresponsetolocalizedperturba- andbetweenthedomains. Forillustrativepurposessnapshots tions analysis we will observe below that (i) randomized of the largest weakly-connected components of GR, GI and versions of the graph are much less robust than the orig- GM are included in the figure. inal graph and (ii) that the integrated system is much more susceptible to small perturbations in the gene reg- assembled based on the EcoCyc database (version 18.5; ulatory domain than in the metabolic one. [27,28])whichoffersbothdataaboutmetabolicprocesses 3 and (gene) regulatory events incorporated from the cor- Every vertex is assigned a Boolean state variable responding RegulonDB release 8.7 [30]. The extensive σ ∈{0,1};sinceweintendtomimicthepropagationofa metadataallowsfortheassignmentoftheverticestoone perturbation(ratherthansimulateatrajectoryofactual of the three functional domains. Details of this process biological states) we identify the state 1 with not yet af- andadetailedcharacterizationoftheresultingmodelwill fected by the perturbation while the state 0 corresponds be described elsewhere [29]. The corresponding graph toaffected by the perturbation. Westressthatthetrajec- representation consists of 10,383 vertices and 24,150 di- tory(cid:126)σ(t)doesnotcorrespondtothetimeevolutionofthe rected edges. abundance of gene products and metabolic compounds, Since we are interested in the propagation of a signal but the rules have been chosen such that the final set of between the domains, in the following we will refer to affected nodes provides an estimate of all nodes poten- the domains of the source and target vertices of the edge tially being affected by the initial perturbation. A node e =(v(i),v(i))assource domain,SD,andtarget domain, not in this set is topologically very unlikely of being af- i s t TD, respectively. The metadata can be used to assign fected by the perturbation at hand (given the biological propertiestothenodesandedgesofthegraphbeyondthe processes contained in our model). domainstructure,someofwhichareusedinthefollowing Astepwiseupdatecannowbedefinedforvertexiwith analysis, namely in the construction of the propagation in-neighbours Γ− in order to study the spreading of per- i rules of the system and of the randomization schemes. turbations through the system by initially switching off We distinguish between biological categories of edges a fraction q of vertices: (capturing the diverse biological roles of the edges) and thelogicalcategories(determiningtherulesoftheperco- σi(t+1)=fi(σj(t)|σj ∈Γ−i ) (1) lation process). According to their biological role in the (cid:104)(cid:80) (cid:105) 1 if c (1−σ )=0 ∧ iscyasltecmat,ebgootrhy;vweretiacbesbraenvdiaetdegtehseabreioalosgsiicganledcattoegaorbyioolofga- (cid:104)(cid:80)jdijσ >0j∨(cid:80) d =0(cid:105)∧ f = j ij j j ij (2) vertex as BCV and the biological category of an edge as i (cid:104) (cid:105) BthCeEei(gfhotrBdeCtEaislscsaenetShuepnpbleemmenatpapreydMuantieqruiaellys).toEoanche ooff 0 oth(cid:80)erwji|sreij|(1−σj)=0 only three logical categories of an edge, LCE, where c is 1 if v is connected to v via a C-link and 0 eLCE ∈{C,D,R} ij j i i otherwise; d and r are defined analogously. ij ij Thus, a vertex will be considered unaffected by the whichareofcentralimportanceforthespreadingdynam- perturbation if none of its in-neighbours connected via ics in our system: either a C or an R edge have failed (regardless of the C, ’conjunct’: The target vertex of an edge with this signoftheregulatoryinteraction), andatleastoneofits logical AND property depends on the source node, in-neighboursconnectedviaaDedgeisstillintact. With i.e., it will fail once the source node fails. For ex- an additional rule it is ensured that an initially switched ample, for a reaction to take place, all of its educts offvertexstaysoff. Thechoiceoftheupdaterulesensures have to be available. thattheunperturbedsystemstateisconservedunderthe dynamics: f(cid:126)((cid:126)1)=(cid:126)1. D, ’disjunct’: Edges with this logical OR property are As a side remark, the spreading of a perturbation ac- considered redundant in the sense that a vertex cording to the rules defined above could also be consid- onlyfailsifthesourceverticesofallofitsincoming ered as an epidemic process with one set of connections D-edges fail. For instance, a compound will only with a very large, and a second set of connections with a become unavailable once all of its producing reac- very low probability of infection [31]. tions have been canceled. R, ’regulation’: Edges of this category cover 14 differ- entkindsofregulatoryevents(describedindetailin III. ELEMENTS OF PERCOLATION THEORY theSupplementaryMaterials). Asshownbelow, in terms of the propagation dynamics we treat these Insystemswhichcanbedescribedwithoutexplicitde- edges similar to the ’conjunct’ ones. pendencies between its constituents but with a notion of functionality that coincides with connectivity, percola- tion theory is a method of first choice to investigate the II. PERTURBATION RULES system’sresponsetoaverageperturbationsofagivensize that can be modelled as failing vertices or edges [6, 32] Next we describe the dynamical rules for the propa- Thefractionalsizeofthegiantconnectedcomponentasa gation of an initial perturbation in the network in terms function of the occupation probability p of a constituent of the logical categories of an edge (LCE), which distin- typically vanishes at some critical value p , the percola- c guish between the different roles a given edge has in the tion threshold. In the following, we will mostly use the update of a target vertex. complementary quantity q = 1−p so that q = 1−p c c 4 canbeinterpretedasthecriticalsizeoftheinitialattack in which the decoupling of nodes from the largest com- or perturbation of the system. The strong fluctuations ponent yields dependent nodes to fail, in our directed ofthesystem’sresponsesinthevicinityofthispointcan model both, connectivity and dependency links are eval- serve as a proxy for the percolation threshold, which is uated in every time step and (apart from nodes failing especially useful in finite systems in which the transition due to dependency) only fully decoupled vertices cause appearssmoothedout. Inouranalysis,thesusceptibility further dependency failures. χ˜=(cid:104)S2(cid:105)−(cid:104)S(cid:105)2,whereS isthesizeofthelargestcluster, is used [33]. Upon the introduction of explicit dependencies be- tween the system’s constituents, the percolation prop- IV. NETWORK RESPONSE TO LOCALIZED erties can change dramatically. The order parameter no PERTURBATION ANALYSIS longer vanishes continuously but typically jumps at p c in a discontinuous transition [17, 18] as cascades of fail- Due to the functional three-domain partition of our ures fragment the system. A broader degree distribution E. coli gene regulatory and metabolic network recon- nowenhancesagraph’svulnerabilitytorandomfailures, struction, we have the possibility to classify perturba- in opposition to the behavior in isolated graphs [23, 34]. tions not only according to their size, but also with re- Details of the corresponding theoretical framework have spect to their localization in one of the domains com- been worked out by Parshani et al. [18], Son et al. [19], prising the full interdependent system, thereby enabling Baxter et al. [35] and more recently the notion of ’net- us to address the balance of sensitivity and robustness worksofnetworks’hasbeenincluded[36–38]. Therehave of the interdependent network of gene regulation and also been attempts to integrate this class of models into metabolism. the framework of multilayer networks [39]. In addition to random node failure other procedures Here we introduce the concept of network response to for initial node removal have been explored, e.g., node localized perturbations analysis. This analysis will re- removal with respect to their degree (targeted attacks) veal that perturbations in gene regulation affect the sys- [40]. teminadramaticallydifferentwaythanperturbationsin Currently, two notions of localized attacks have been metabolism. Thuswestudytheresponsetolocalized per- described. Attacks of the first sort are defined on spa- turbations. We denote such perturbations by Per(X,q), tially embedded networks and are ’local’ with respect to where X is the domain, in which the perturbation is lo- adistanceinthisembedding,i.e.ina’geographical’sense calized (X ∈{R,I,M,T}, with T representing the total [41]. The second approach considers locality in terms of network G, i.e., the case of non-localized perturbations). connectivity: aroundarandomlychosenseed,neighbours The perturbation size q = 1 − p is measured in frac- are removed layer by layer [42, 43]. In contrast, as de- tions of the total network size N = |G|. A perturbation scribedbelowinourapproach,attacksarelocalizedwith Per(M,0.1) thus is a perturbation in the metabolic do- respect to the three network domains, while within the main with (on average) 0.1|G| nodes initially affected. domains nodes are chosen randomly. Note that sizes q of such localized perturbations are lim- ited by the domain sizes, e.g., q|G| < |G | for a pertur- At this point we would like to shortly comment on R bation in the gene regulatory domain. the applicability of the mathematical concepts of inter- dependent networks to real-world data. Aiming at ana- After the initial removal of a fraction q of the ver- lytical tractability, typical model systems need to choose ticesfromthedomainX thestepwisedynamicsdescribed a rather high level of abstraction. While certainly many above will lead to the deactivation of further nodes and systems can be accurately addressed in that way, we ar- run into a frozen state (cid:126)σ∞ in which only a fraction gue that especially in the case of biological systems the A(X,q)oftheverticesareunaffectedbytheperturbation theoretical concepts can require substantial adjustment (i.e., are still in state 1). In addition to being directly to cover essential properties of the system at hand. affected by failing neighbors, in the process of network Whenaskingforthesystemicconsequencesofinterde- fragmentationnodesmayalsobecomepartsofsmallcom- pendency,thedistinctionbetweenseveralclassesofnodes ponentsdisconnectedfromthenetwork’score, andcould and links may be required. Effectively, some classes of in this sense be considered non-functional; we therefore links may then represent simple connectivity, while oth- alsomonitortherelativesizesofthelargest(weakly)con- ers can rather be seen as dependence links. In Biol- nectedcomponentinthefrozenstate,B(X,q),fordiffer- ogy,suchdependenciesaretypicallymediatedbyspecific ent initial perturbation sizes. molecules (e.g., a small metabolite affecting a transcrip- In the limit of infinite system size we could expect a tion factor, or a gene encoding an enzyme catalyzing a direct investigation of B(X,q) as a function of q to yield biochemical reaction). Such implementations of depen- thecriticalperturbationsizeq =1−p atwhichB van- c c dence links are no longer just associations and it is hard ishes. In our finite system, however, we have to estimate to formally distinguish them from the functional links. q ; following Radicchi and Castellano [33] and Radicchi c In contrast to the explicitly alternating ’percolation’ [44] we measure the fluctuation of B(X,q) which serves and ’dependency’ steps in typical computational models as our order parameter and look for the peak position of 5 the susceptibility will be presented elsewhere [29]), already from the ver- tex and edge counts in Figure 1 we see that the domains χ˜(X,q)=(cid:104)B2 (cid:105)−(cid:104)B (cid:105)2 (3) are of different structure. While the regulatory and the ∞ ∞ metabolic subgraphs, G and G have average (inter- R M asafunctionofparameterqinordertoestimatethetran- nal)degreesofabout1.9and2.5,theinterfacesubgraph, sition point from the finite system data. Supplementary G , is very sparse with (cid:104)k(cid:105)≈0.6 and we can expect it to I Figure S1 schematically illustrates the analysis. befragmented. Hence,inthefollowingwedecidetoonly perturb in G and G . R M In a first step we sample some full cascade trajecto- V. RANDOMIZATION SCHEMES ries in order to check our expectation of different re- sponses of the system to small perturbations applied in In order to interpret the actual responses of a given either G or G ; two rather large values of q are cho- R M network to perturbations, one usually contrasts them to sen and the raw number of unaffected nodes is logged those of suitably randomized versions of the network at during the cascade. Indeed, already this first approach hand. Thereby, the often dominant effect of the node implies a different robustness of the gene regulatory and degree distribution of a network can be accounted for the metabolic domains in terms of the transmission of and the effects of higher-order topological features that perturbation cascades to the other domains. Cascades shape the response of the network to perturbations can seeded in the metabolic domain of the network tend to be studied systematically. be rather restricted to this domain, while the system The same is true for the localized perturbation re- seems much more susceptible to small perturbations ap- sponse analysis introduced here. In fact, due to the plied in the gene regulatory domain. This effect can be substantially larger number of links from gene regula- seen both in the overall sizes of the aggregated cascades tion to metabolism (both, directly and via the interface as well as in the domain which shows the largest change component of the interdependent network) than from withrespecttotheprevioustimestep,whichweindicate metabolismtogeneregulationwecanalreadyexpectthe by black markers in Figure 2. More sample trajectories response to such localized perturbations to vary. areshownintheSupplementaryMaterialsand,although Here we employ a sequence of ever more stringent theyillustrateoccasionallylargefluctuationsbetweenthe randomization schemes to generate sets of randomized behaviors of single trajectories, they are consistent with networks serving as null models for the localized per- this first observation. They also show that considerably turbation response analysis. In all of the four schemes larger metabolic perturbations are needed for large cas- the edge-switching procedure introduced by Maslov and cades and back-and-forth propagation between domains Sneppen [45] is employed which conserves the in- and to emerge. out-degrees of all vertices. After this first glance at the system we aim for a Our most flexible randomization scheme (DOMAIN) more systematic approach and apply our analysis as de- only considers the domains of the source and target ver- scribedabove: wecomputecascadesteady-states(cid:126)σ but ∞ tices of an edge (SD and TD): only pairs of edges are nowwechoosethelargest(weakly)connectedcomponent flipped which share both, the source and the target do- B(X,q)astheorderparameterandcomputethesuscep- main(e.g.,bothlinkavertexinthemetabolicdomainto tibility according to equation (3), the peak-position of a vertex in the interface). The remaining three random- which, when considered as a function of q = 1−p, we ization schemes all add an additional constraint. The use as a proxy for the perturbation size at which the DOMAIN LCErandomizationfurtherrequirestheedges interdependent system breaks down. to be of the same logical categories of an edge (i.e., C, The results for different initially perturbed domains D,orR),whiletheDOMAIN BCVschemeonlyswitches illustratethat,indeed,aconsiderablylowerp (i.e.,larger c edgeswhosetargetverticesalsosharethesamebiological critical perturbation size q ) is estimated in the case of c category of a vertex, BCV. The strictest randomization, metabolic perturbations compared to regulatory or non- DOMAIN BCE, finally, only considers edges with, addi- localized ones (Figure 3, panel a). For each point we tionally, the same biological category of an edge, BCE. average500runsforthecorrespondingsetofparameters. A tabular overview of the four schemes is given in Sup- In order to assess whether the above-described behav- plementary Table S1. iorisduetospecificpropertiesofthenetworkweusethe sets of randomized graphs. For each of the four random- ization schemes we prepared 500 graph instances and re- VI. RESULTS peatedtheanalysisforeachofthemasdonebeforeforthe single original graph. The corresponding results for the The main feature of our reconstructed network, the susceptibility (Figure 3, panels b–e) yield two major ob- three-domainstructurebasedonthebiologicalroleofits servations: firstly, metabolic perturbations still lead to, constituents, allows us to study the influence of localiz- albeitonlyslightly,higherq =1−p estimates(withex- c c ing the initial perturbation. Thus, although we will not ceptionofDOMAINrandomization). Thus,thesystem’s focus on (topological) details of the graph here (which tendencytobemorerobusttowardsmetabolicperturba- 6 tions is largely preserved. Secondly, we see that overall a functionally relevant balance between robustness and the original network seems to be much more robust than sensitivity: The biological system can achieve a robust- the randomized networks; very small perturbations are nesstowardsenvironmentalchanges,while–viathemore sufficient to break the latter ones. The robustness of the sensitive gene regulatory domain – it still reacts flexibly originalgraph,thus,cannotbesolelyduetotheedgeand to other systemic perturbations. vertex properties kept in the randomization schemes. Discovering this design principle of the biological sys- Finally, let us focus on the practical aspect of these tem required establishing a novel method of analyzing findings. Beyondthecarefulstatisticalanalysisdescribed the robustness of interdependent networks, the network above,aquantityofpracticalrelevanceistheaveragesize response to localized perturbations: An interdependent oftheunaffectedpartofthesystemunderaperturbation. network can have markedly different percolation thresh- For this purpose, we examine the fractions of unaffected olds, when probed with perturbations localized in one vertices, A(X,q), after cascades emanating from pertur- network component compared to another. bationsofdifferentsizesandseededindifferentdomains, Lastly,wewouldliketoemphasizethattheapplication regardless of the resulting component structure and for ofthetheoreticalconceptsofinterdependentnetworksto both,theoriginalgraphandtheshuffledones(Figure4). real-life systems involves several non-trivial decisions: Thenumberofunaffectedverticesfortherealnetwork Inthevastmajorityof(theoretical)investigations,two is much larger than for all four randomization schemes, definitions of interdependent networks coincide: the one suggesting a strong overall robustness of the biological derivedfromadistinctionbetweendependencylinksand system. Distinguishing, however, between the metabolic connectivity-representinglinksandtheonebasedontwo and the gene regulatory components reveals that the functionally distinguishable, but interconnected subnet- metabolic part is substantially more robust than the works. regulatory part (for not too large initial perturbations, Here we have three classes of nodes: those involved p>0.94). in gene regulation, metabolic nodes, and nodes associ- atedwiththe(protein)interfacebetweenthesetwomain domains. These nodes are interconnected with (func- VII. DISCUSSION AND OUTLOOK tionally) different classes of links. These link classes are necessarytodefinemeaningfulupdaterulesforperturba- We investigated the spreading of perturbations tions. As a consequence, the notion of dependency links through the three domains of a graph representation of vs. connectivity links is no longer applicable. We expect the integrated system of E. coli’s gene regulation and that such adjustments of the conceptual framework will metabolism. Our results quantify the resulting cascad- often be required when applying the notion of interde- ing failures as a function of size and localization of the pendent networks to real-life systems. initial perturbation. Asmentionedabove,onemajortaskwhendealingwith Our findings show that the interdependent network of biologicaldataistoabstractfromtheminorbutkeepthe gene regulation and metabolism unites sensitivity and essential details; we have outlined that in this study we robustness by showing different magnitudes of damage chose to keep a rather high level of detail. dependent on the site of perturbation. With only incomplete information available, a chal- While the interdependent network of these two do- lenge is to find the right balance between radical sim- mainsisingeneralmuchmorerobustthanitsrandomized plifications of systemic descriptions and an appropriate variants (retaining domain structure, degree sequence, level of detail still allowing for a meaningful evaluation and major biological aspects of the original system), a of biological information. Here we incorporate high level pronounced difference between the gene regulatory and of detail in the structural description, distinguishing be- metabolic domain is found: Small perturbations origi- tween a comparatively large number of node and link nating in the gene regulatory domain typically trigger types. This rich structural description, together with a far-reachingsystem-widecascades,whilesmallperturba- setofupdaterulesmotivatedbygeneralbiologicalknowl- tions in the metabolic domain tend to remain more local edge, allows us to assess the dynamical/functional level and trigger much smaller cascades of perturbations. withthecomparativelysimplemethodsderivedfromper- Inordertoarriveatamoremechanisticunderstanding colation theory. of this statistical observation, we estimated the percola- An important question is, whether the analysis of the tionthresholdofthesystem,p ,andfoundthatitismuch fragmentation of such a network under random removal c lower(i.e,largerperturbations,q =1−p ,arerequired) of nodes can provide a reliable assessment of functional c c for perturbations seeded in the metabolic domain than properties,sincetheresponseofsuchamolecularnetwork for those applied to the gene regulatory domain. clearlyfollowsfarmoreintricatedynamicalrulesthanthe This is in accordance with the intuition that the percolation of perturbations can suggest. metabolic system is more directly coupled to the envi- A future step could include the construction of a ronment(viatheuptakeandsecretionofmetaboliccom- Boolean network model for the full transcriptional regu- pounds) than the gene regulatory domain. The distinct latory network and the connection of this model to flux perturbationthresholdsthereforeallowforimplementing predictions obtained via flux balance analysis, a first at- 7 tempt of which is given in Samal and Jain [26] (where theusualmodels(e.g.,infinitesystemlimit,graphsasin- the model of Covert et al. [24] with still fewer interde- stances of network model). While this formalism allows pendence links has been used). for the investigation of many real-world systems there are still restrictions as to the possible level of detail. In Our perturbation spreading approach might help our special case, for instance, a considerable amount of bridging the gap between theoretical concepts from sta- information would be lost if the system was restricted to tistical physics and biological data integration: Integrat- verticeswithconnectionsinboththeC/R-andD-layers. ingdiversebiologicalinformationintonetworks,estimat- Theexistenceofdifferentpercolationthresholdsforlo- ing’responsepatterns’tosystemicperturbationsandun- calizedperturbationsininterdependentnetworksmayre- derstanding the multiple systemic manifestations of per- vealitselfasauniversalprincipleforbalancingsensitivity turbed,pathologicalstatesisperceivedasthemainchal- and robustness in complex systems. The application of lenge in systems medicine (see, e.g., Bauer et al. [46]). these concepts to a wide range of real-life systems is re- Concepts from statistical physics of complex networks quired to make progress in this direction. may be of enormous importance for this line of research [47, 48]. While the simulation of the full dynamics is still prob- ACKNOWLEDGMENTS lematic as our knowledge of the networks is still incom- plete,ourpresentstrategyextractsfirstdynamicalprop- S.B. and M.H. acknowledge the support of Deutsche erties of the interdependent networks. At a later time Forschungsgemeinschaft (DFG), grants BO 1242/6 and point, wecanexpectqualitativelyadvancesfromfulldy- HU 937/9. namical simulations, however, dependent on the quality of the data sets. On the theoretical side, future studies might shift AUTHOR CONTRIBUTIONS the focus onto recasting the system into an appropriate spreading model, e.g., in the form of an unordered bi- M.H. and S.B. designed and supervised the study; nary avalanche [49, 50], or as an instance of the Linear D.K.andA.G.performedthereconstruction,simulations Thresholdmodel[31]withasetoflinkswithaveryhigh and analyses; and D.K., A.G., S.B. and M.H. wrote the andasecondsetwithaverylowtransmissionprobability manuscript. (C/R and D-links, respectively). Radicchi [22] presents an approach for the investiga- tion of the percolation properties of finite size interde- COMPETING FINANCIAL INTERESTS pendent networks with a specific adjacency matrix with thegoaloflooseningsomeoftheassumptionsunderlying The authors declare no competing financial interests. [1] Kitano, H. Systems Biology: A Brief Overview. Science Escherichia coli. Nat. Genet. 31, 64–68 (2002). 295, 1662–1664 (2002). [11] Alon, U. Network motifs: theory and experimental ap- [2] Ideker, T., Galitski, T. & Hood, L. A new approach to proaches. Nat. Rev. Genet. 8, 450–61 (2007). decoding life: Systems Biology. Annu. Rev. Genomics [12] Yu, H. & Gerstein, M. Genomic analysis of the hierar- Hum. Genet. 2, 343–372 (2001). chical structure of regulatory networks. Proc. Natl Acad. [3] Aderem,A.SystemsBiology: ItsPracticeandChallenges. Sci. USA 103, 14724–14731 (2006). Cell 121, 511–513 (2005). [13] Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. & [4] Milo,R.etal. NetworkMotifs: SimpleBuildingBlocksof Baraba´si, A.-L. Thelarge-scale organizationof metabolic Complex Networks. Science 298, 824–827 (2002). networks. Nature 407, 651–654 (2000). [5] Llaneras, F. & Pic, J. Stoichiometric modelling of cell [14] Ma, H. & Zeng, A.-P. Reconstruction of metabolic net- metabolism. J. Biosci. Bioeng. 105, 1–11 (2008). worksfromgenomedataandanalysisoftheirglobalstruc- [6] Cohen, R., ben Avraham, D. & Havlin, S. Percolation ture for various organisms. Bioinformatics 19, 270–277 critical exponents in scale-free networks. Phys. Rev. E (2003). 66, 036113 (2002). [15] Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. [7] Dorogovtsev, S. N. & Mendes, J. F. F. Evolution of net- &Baraba´si,A.L. Hierarchicalorganizationofmodularity works. Adv Phys 51, 1079–1187 (2002). in metabolic networks. Science 297, 1551–5 (2002). [8] Newman,M.E.J.Spreadofepidemicdiseaseonnetworks. [16] Guimera, R. & Amaral, L. Functional cartography of Phys. Rev. E 66, 016128 (2002). complexmetabolicnetworks.Nature433,895–900(2005). [9] Baraba´si, A.-L. & Oltvai, Z. N. Network biology: un- [17] Parshani,R.,Buldyrev,S.V.&Havlin,S. Criticaleffect derstanding the cell’s functional organization. Nat. Rev. of dependency groups on the function of networks. Proc. Genet. 5, 101–13 (2004). Natl Acad. Sci. USA 108, 1007–1010 (2011). [10] Shen-Orr, S., Milo, R., Mangan, S. & Alon, U. Net- [18] Parshani, R., Buldyrev, S. V. & Havlin, S. Interdepen- work motifs in the transcriptional regulation network of dentNetworks: ReducingtheCouplingStrengthLeadsto 8 aChangefromaFirsttoSecondOrderPercolationTran- [33] Radicchi, F. & Castellano, C. Beyond the locally tree- sition. Phys. Rev. Lett. 105, 048701 (2010). likeapproximationforpercolationonrealnetworks. Phys. [19] Son, S.-W., Bizhani, G., Christensen, C., Grassberger, Rev. E 93, 030302 (2016). P. & Paczuski, M. Percolation theory on interdependent [34] Albert, R., Jeong, H. & Barab´asi, A.-L. Error and at- networks based on epidemic spreading. EPL 97, 16006 tacktoleranceofcomplexnetworks. Nature 406,378–382 (2012). (2000). [20] Radicchi,F.&Arenas,A.Abrupttransitioninthestruc- [35] Baxter, G. J., Dorogovtsev, S. N., Goltsev, A. V. & turalformationofinterconnectednetworks. Nat. Phys.9, Mendes, J. F. F. Avalanche Collapse of Interdependent 717–720 (2013). Networks. Phys. Rev. Lett. 109, 248701 (2012). [21] Zhou, D. et al. Simultaneous first- and second-order [36] Gao,J.,Buldyrev,S.V.,Stanley,H.E.&Havlin,S.Net- percolationtransitionsininterdependentnetworks. Phys. works formed from interdependent networks. Nat. Phys. Rev. E 90, 012803 (2014). 8, 40–48 (2011). [22] Radicchi,F.Percolationinrealinterdependentnetworks. [37] Gao, J., Li, D. & Havlin, S. From a single network to a Nat. Phys. 11, 597–602 (2015). network of networks. Natl Sci Rev 1, 346–356 (2014). [23] Buldyrev,S.V.,Parshani,R.,Paul,G.,Stanley,H.E.& [38] Kenett, D. Y., Perc, M. & Boccaletti, S. Networks of Havlin, S. Catastrophic cascade of failures in interdepen- networks - An introduction. Chaos Solitions Fractals 80, dent networks. Nature 464, 1025–1028 (2010). 1–6 (2015). [24] Covert, M. W., Knight, E. M., Reed, J. L., Herrgard, [39] Kivel, M. et al. Multilayer networks. J. Comp. Netw. 2, M. J. & Palsson, B. O. Integrating high-throughput and 203–271 (2014). computationaldataelucidatesbacterialnetworks. Nature [40] Huang, X., Gao, J., Buldyrev, S. V., Havlin, S. & Stan- 429, 92–96 (2004). ley, H. E. Robustness of interdependent networks under [25] Shlomi, T., Eisenberg, Y., Sharan, R. & Ruppin, E. targeted attack. Phys. Rev. E 83, 065101 (2011). A genome-scale computational study of the interplay be- [41] Berezin, Y., Bashan, A., Danziger, M. M., Li, D. & tween transcriptional regulation and metabolism. Mol. Havlin, S. Localized attacks on spatially embedded net- Syst. Biol. 3 (2007). works with dependencies. Sci. Rep. 5, 8934 (2015). [26] Samal, A. & Jain, S. The regulatory network of E. coli [42] Yuan, X., Shao, S., Stanley, H. E. & Havlin, S. How metabolismasaBooleandynamicalsystemexhibitsboth breadth of degree distribution influences network robust- homeostasisandflexibilityofresponse. BMC Syst Biol 2, ness: Comparinglocalizedandrandomattacks.Phys.Rev. 21 (2008). E 92, 032122 (2015). [27] Keseler, I. M. et al. EcoCyc: a comprehensive database [43] Shao, S., Huang, X., Stanley, H. E. & Havlin, S. Per- resourceforEscherichiacoli.NucleicAcidsRes.33,D334– colation of localized attack on complex networks. New J. D337 (2005). Phys. 17, 023049 (2015). [28] Keseler, I. M. et al. EcoCyc: fusing model organism [44] Radicchi, F. Predicting percolation thresholds in net- databases with systems biology. Nucleic Acids Res. 41, works. Phys. Rev. E 91, 010801 (2015). D605–D612 (2013). [45] Maslov, S. & Sneppen, K. Specificity and stability [29] Grimbs, A., Klosik, D. F., Bornholdt, S. & Hu¨tt, M.-T. in topology of protein networks. Science 296, 910–913 Integrative system-wide modeling of metabolic and regu- (2002). latory processes in Escherichia coli (2016). Unpublished. [46] Bauer, C. R. et al. Interdisciplinary approach towards a [30] Salgado,H.etal. RegulonDBv8.0: omicsdatasets,evo- systemsmedicinetoolboxusingtheexampleofinflamma- lutionaryconservation,regulatoryphrases,cross-validated tory diseases. Brief. Bioinformatics bbw024 (2016). gold standards and more. Nucleic Acids Res. 41, D203– [47] Baraba´si, A.-L., Gulbahce, N. & Loscalzo, J. Network D213 (2013). medicine: a network-based approach to human disease. [31] Watts, D. J. A simple model of global cascades on ran- Nat. Rev. Genet. 12, 56–68 (2011). dom networks. Proc. Natl Acad. Sci. USA 99, 5766–5771 [48] Hu¨tt,M.-T. Understandinggeneticvariation–thevalue (2002). of systems biology. Br J Clin Pharmacol 77, 597–605 [32] Callaway, D. S., Newman, M. E. J., Strogatz, S. H. & (2014). Watts, D. J. Network Robustness and Fragility: Percola- [49] Samuelsson, B. & Socolar, J. E. S. Exhaustive percola- tion on Random Graphs. Phys. Rev. Lett. 85, 5468–5471 tiononrandomnetworks.Phys.Rev.E 74,036113(2006). (2000). [50] Gleeson, J. P. Mean size of avalanches on directed ran- dom networks with arbitrary degree distributions. Phys. Rev. E 77, 057101 (2008). 9 REGULATORY METABOLIC 0 0 2000 2000 s 4000 4000 e c p=0.99 erti 6000 6000 not affected v 8000 8000 affected in gene regulatory domain 10000 10000 0 10 20 30 40 0 10 20 30 40 affected in 0 0 interface 2000 2000 affected in metabolic domain s 4000 4000 p=0.97 ce domain of erti 6000 6000 largest change v 8000 8000 10000 10000 0 10 20 30 40 0 10 20 30 40 steps steps Figure 2. Sample trajectories of the integrative E. coli network after perturbations of size q = 1−p = 0.01 (top row) and q = 0.03 (bottom row). The left column illustrates perturbations seeded in the regulatory domain, while the right column shows results for perturbations in the metabolic domain. 10 0.14 (a) nonlocalized 0.12 regulatory 0.10 metabolic 0.08 χ˜ 0.06 0.04 0.02 0.00 0.80 0.85 0.90 0.95 1.00 0.6 (b) DDOOMMAAIINN 0.5 0.6 0.4 0.5 0.4 χ˜0.3 0.3 0.2 0.2 0.1 0.0 0.1 0.980 0.985 0.990 0.995 1.000 0.0 0.80 0.85 0.90 0.95 1.00 0.30 (c) DOMAIN_LCE 0.25 0.30 0.20 0.25 0.20 χ˜0.15 0.15 0.10 0.10 0.05 0.00 0.05 0.980 0.985 0.990 0.995 1.000 0.00 0.80 0.85 0.90 0.95 1.00 0.45 (d)0.40 DOMAIN_BCV 0.35 0.45 0.30 0.40 0.35 0.25 0.30 χ˜ 0.25 0.20 0.20 0.15 0.15 0.10 0.05 0.10 0.00 0.05 0.980 0.985 0.990 0.995 1.000 0.00 0.80 0.85 0.90 0.95 1.00 0.25 (e) DOMAIN_BCE 0.20 0.25 0.15 0.20 0.15 χ˜ 0.10 0.10 0.05 0.05 0.000.980 0.985 0.990 0.995 1.000 0.00 0.80 0.85 0.90 0.95 1.00 p Figure3. ThesusceptibilityoftheintegrativeE. coli model anditsrandomizedversionsforperturbationsindifferentdo- mains. The results for the original graph are shown in the top frame, while the subsequent frames show results for the four different randomization schemes with the least strict on top and the strictest one at the bottom. The original system is more robust than its randomized versions; perturbations in the metabolism consistently need to be larger than in the regulatorypartinordertoreachthemaximumsusceptibility.