Salesetal.BMCBioinformatics2012,13:20 http://www.biomedcentral.com/1471-2105/13/20 SOFTWARE Open Access graphite - a Bioconductor package to convert pathway topology to gene network Gabriele Sales1†, Enrica Calura1†, Duccio Cavalieri2 and Chiara Romualdi1* Abstract Background: Gene set analysis is moving towards considering pathway topology as a crucial feature. Pathway elements are complex entities such as protein complexes, gene family members and chemical compounds. The conversion of pathway topology to a gene/protein networks (where nodes are a simple element like a gene/ protein) is a critical and challenging task that enables topology-based gene set analyses. Unfortunately, currently available R/Bioconductor packages provide pathway networks only from single databases. They do not propagate signals through chemical compounds and do not differentiate between complexes and gene families. Results: Here we present graphite, a Bioconductor package addressing these issues. Pathway information from four different databases is interpreted following specific biologically-driven rules that allow the reconstruction of gene-gene networks taking into account protein complexes, gene families and sensibly removing chemical compounds from the final graphs. The resulting networks represent a uniform resource for pathway analyses. Indeed, graphite provides easy access to three recently proposed topological methods. The graphite package is available as part of the Bioconductor software suite. Conclusions: graphite is an innovative package able to gather and make easily available the contents of the four major pathway databases. In the field of topological analysis graphite acts as a provider of biological information by reducing the pathway complexity considering the biological meaning of the pathway elements. 1 Background the pathway enrichment and the topology of signaling A great deal of effort has been recently directed towards pathways. In particular, SPIA enhances the impact of a the study of gene sets (hereafter GSA) in the context of pathway if the DEGs tend to lie near its entry points. microarray data analysis. The aim is to identify groups Massa et al. [6] introduced an alternative approach that of functionally related genes with possibly moderate, but is based on a correlation structure test. Specifically, the coordinated, expression changes. Several GSA tests, graphicalmodeltheoryisused todecomposetheoverall both univariate and multivariate, have been recently pathwayintosmallercliques,withtheaimofexploringin developed. See [1] for a comprehensive review, and [2-4] detailsmallportionsoftheentiremodel.Recently,Isciet for a detailed description and a critical investigation of al.[7]proposedaBayesianPathwayAnalysisthatmodels the tested hypotheses. eachbiologicalpathway asaBayesian network (BN) and Theseapproaches,althougheffective,misstheinforma- considers the degree to which observed experimental tionofthetopologicalpropertiesofthepathways.Tothis datafitsthemodel.Finally,Laurentetal.[8]developeda end, the seminal paper by Draghici et al. [5] proposed a graph-structuredtwo-sampletestofmeansforproblems radically different approach (called impact analysis, in which the distribution shift is assumed to be smooth SPIA)attemptingtocapture severalaspectsofthe data: onagivengraph. thefoldchangeofdifferentiallyexpressed genes (DEGs), In this perspective the retrieval of pathway informa- tion and the subsequent conversion into a gene/protein network is crucial. However, pathway annotations com- *Correspondence:[email protected] †Contributedequally prise a myriad of interactions, reactions, and regulations 1DepartmentofBiology,UniversityofPadova,viaU.Bassi58/B,Padova,Italy which is often too rich for the conversion to a network. Fulllistofauthorinformationisavailableattheendofthearticle ©2012Salesetal.;licenseeBioMedCentralLtd.ThisisanopenaccessarticledistributedunderthetermsoftheCreativeCommons AttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,andreproductionin anymedium,providedtheoriginalworkisproperlycited. Salesetal.BMCBioinformatics2012,13:20 Page2of12 http://www.biomedcentral.com/1471-2105/13/20 In particular, challenges are posed by the presence of Language (BCML, [18]). Thus, different databases are chemical compounds mediating interactions and by dif- characterised by different annotations and only a part of ferent types of gene groups (e.g. protein complexes or the whole set of reactions are confirmed by all the repo- gene families) that are usually represented as single sitories. On the other hand, Cerami et al. [19] have nodes. Available R packages (KEGGgraph, [9] and recently developed a web repository aiming at collecting NCIpath) share some drawbacks: i) they are focused and integrating all public pathway data available in stan- on a single pathway database each; ii) they do not con- dard formats. It currently contains data from nine data- sider gene connections through chemical compounds; bases with over 1400 pathways and 687,000 interactions. iii) they do not handle differently the various kinds of From the graphical point of view, a number of soft- biological gene groups. ware tools [20-28] have been developed to visually build Here we present graphite (GRAPH Interaction computable models of pathways. For additional details from pathway Topological Environment) a Bioconductor see the web page http://www.sbgn.org/. These tools are package that provides networks from the pathways of usually based on graphical models in which nodes repre- four databases (Biocarta; KEGG, [10]; NCI/Nature Path- sent genes, proteins or chemical compounds, and edges way Interaction Database, [11]; Reactome, [12]). It dis- represent various types of interactions or associations. criminates between different types of biological gene In order to gather curated pathway data we collect groups; propagates gene connections through chemical pathway information from the three public databases compounds; allows the selection of edges by type of that have emerged as reference points for the system interaction; uniformly converts heterogeneous node IDs biology community. Reactome [12] (data was retrieved to EntrezGene IDs and HUGO symbols; and finally in the BioPax format from the Reactome web site), allows the user to directly run SPIA, DEGraph, and backed by the EBI, is one of the most complete reposi- topologyGSA analyses over the provided networks. tories; it is frequently updated and provides a semanti- cally rich description of each pathway. KEGG Pathways 2 Implementation [10] (retrieved in KGML format) provides maps for both graphite was implemented using the statistical program- signalling and metabolic pathways, supplemented by 19 ming language R and the package is included in the highly interconnected databases with genomic, chemical open-source Bioconductor project [13]. In section 2.1 and phenotypic information. BioCarta (http://www.bio- we report a brief state of the art of pathway formats, carta.com) and NCI (NCI/Nature Pathway Interaction databases and tools, while in section 2.2 we report the Database) [11] whose data were retrieved in BioPax for- rules that graphite uses to convert pathway topology mat from the PDI database web page. to gene networks. 2.2 Pathway topology conversion to gene network 2.1 Pathways Background Pathway topologies converted to gene-gene networks A variety of databases containing information on cell (simple interaction format, hereafter SIF) is available signaling pathways have been developed in conjunction either on the Reactome [12] or on Pathway Commons with methodologies to access and analyse the data [14]. [19] web sites. However, they provide a single huge file Pathway databases serve as repositories of current of protein-protein reaction for each database. knowledge on cell signaling. They present pathways in a Unfortunately topological pathway analysis (TPA) graphical format comparable to the representation pre- methods are not based on the analysis of the interaction sent in text books, as well as in standard formats allow- network as a whole, but needs a separate graph for each ing the exchange between different software platforms pathway, in order to identify those that are significantly and further processing by network analysis, visualization involved in the biological problem under investigation. and modeling tools. At the present day, there exist a Moreover, TPA benefits of long paths in which the gene vast variety of databases containing biochemical reac- signal can spread across compounds and cell compart- tions, such as signaling pathways or protein-protein ments. Paxtools, a Java library for working with BioPAX interactions. The Pathguide resource serves as a good (http://www.biopax.org/paxtools.php), defines some overview of current pathway databases [15]. It lists more rules to convert a BioPax file to a SIF. However, it does than 200 pathway repositories; over 60 of those are spe- not take into account signal compound propagation and cialized on reactions of the human species. However, gene group expansion. We start from Paxtools rules and only half of them provide pathways and reactions in we extend them in order to reduce both BioPAX and computer-readable formats needed for automatic retrie- KGML interactions to pairwise relationships with com- val and processing, such as Biological Pathway Exchange pound mediated signal propagation. Table 1 and Figure (BioPAX, [16]), Systems Biology Markup Language 1 report respectively pathway summary statistics and (SBML, [17]) and Biological Connection Markup nodes/edges distributions for the four databases Salesetal.BMCBioinformatics2012,13:20 Page3of12 http://www.biomedcentral.com/1471-2105/13/20 Table 1Numberofpathways converted to networks with average number ofedges andnodes according tothe selected database. Database N.of Mean(Median) Mean(Median) pathways nodes edges KEGG 232 71.86(54) 211.12(75.5) Reactome 1070 33.22(14) 780.64(33) BioCarta 254 15.18(14) 36.88(28) NCI 177 76.79(48) 165.18(81) obtained after the conversion. In the following we describe in detail some of our rules. 2.2.1 Pathway definition The KEGG database provides separate xml files, one for each pathway. In the the other databases, we consider as a pathway every instance of the BioPax class pathway. 2.2.2 Nodes with multiple elements Within a pathway, nodes often correspond to multiple gene products. These can be divided into protein com- plexes (proteins linked by protein-protein interactions) and groups containing alternative members (like gene families,geneswithsimilarbiochemicalfunctions).These groups should be considered differently: the first kind (hereaftergroupAND)shouldbeexpandedintoaclique (all proteins connected to the others), while the second (hereafter group OR) should be expanded without con- nectionsamongthemseeFigure2(panelAandB). Figure 2 Toy examples of nodes with multiple elements In the KGML format there are two ways of defining convertedtogene-network.GroupAND(proteincomplexes, PanelA),groupOR(memberofgenefamily,PanelB)and nodeswithmultiple elements: proteincomplexes (group compoundmediatedsignal(panelC). ANDdefinedbyentry type=“group”,seeFigure2A) andgroupswithalternative members(group OR defined byentrytype=“gene”,seeFigure2B).Anexampleof theKGMLforANDandORgroupscanbeseeninAddi- tionalfile1. BioPax allows only one type of group: protein com- plexes (group AND) corresponding to the complex class. However, it often happens that a protein instance contains multiple xrefs pointing to alternative elements of the process (group OR). An example of Bio- Pax definitions for both the AND and the OR groups can be seen in Additional file 1. 2.2.3 Compound mediated interactions Compound-mediated interactions are interactions for whichacompoundactsasabridgebetweentwoelements (see Figure 2C). As chemical compounds are not usually measuredwithhigh-throughputtechnologies,theyshould beremovedfromthenetwork.However,thetrivialelimi- nation of the compounds will strongly bias the topology interruptingthesignalspassingthroughthem.IfelementA islinkedtocompoundcandcompoundcislinkedtoele- mentB,thenAshouldbelinkedtoB.Moreover,tobestfit the biological model we take into account cell compart- Figure 1 Edges and nodes distribution of networks after mentmembership:theconnectionamonggenesAandBis pathwayconversionaccordingtotheselecteddatabase. established only if the shared compound c has the same Salesetal.BMCBioinformatics2012,13:20 Page4of12 http://www.biomedcentral.com/1471-2105/13/20 localizationinboththereactions.However,therearesome chemical compounds that are highly frequent in map description(suchashydrogen,H O,...).Signalpropagation 2 throughthemwouldleadtodegenerateandlongchainsof compounds.Toremovethisartifact,wedecidedtoignore thesecompoundsduringthesignalpropagation.Afterpar- singalltheBioPaxandKGMLdataweobtaincompound chainswhosedistributionarereportedinTable2. Within the KGML format there are two different ways of describing a compound mediated interaction: i) direct interaction type = “PPrel” (A interacts to B through compound c) and ii) indirect one type = “PCrel” (A interacts to compound c and c interacts to B). The BioPax format, on the other hand, provides only an indirect way of defining compound mediated signals (see Additional file 1). 2.2.4 Relation attributes graphite allows the user to see the single/multiple relation types that characterize an edge. The edge types havebeenkeptascloseaspossibletothoseannotatedin the original formats. Some new types have been intro- ducedduetotheneedsofthetopologicalconversion. 3 Results and discussion In sections 3.1 and 3.2 we provide two examples of pathway topologies converted to gene networks by graphite, while in section 3.3 we show the core func- Figure 3 Differences in signal reconstruction of a selected tions to retrieve, convert and display graphite net- portionoftheinsulinsignalingpathwayofKEGG(hsa04910). works. In section 3.4 we perform a simulation study to PanelA.Theoriginalsignalcascade.PanelB.graphitesignal verify the efficacy of our signal propagation strategy in reconstructionthroughchemicalcompoundpropagation.Numbers representEntrezGeneIDs.PanelC.KEGGgraphsignal terms of topological analyses. In section 3.5 we run an reconstruction. example of topological gene set analysis using gra- phite networks and in section 3.6 we critically com- pare graphite with other available R/Bioconductor Insulin is an hormone controlling the balance between packages providing pathway topologies. mobilization and storage of energy molecules. Insulin binds the Insulin Receptor (IR) and through phosphori- 3.1 Pathway conversion example 1: Insulin lation of the IRS adaptors is able to recruit and activate signaling pathway PI3K. PI3K is a kinase that converts PIP2 in PIP3 which Figure 3 represent an example in which the simple elim- is a secondary messenger involved in the regulation of ination of compounds leads to an incorrect network various processes. The conversion between PIP3 into PI topology. (3,4)P2 or PI(4,5)P2 operated by phosphatases like SHIP1/2 or PTEN induce a depletion of PIP3 levels and of consequence a reduced activity on its downstream Table 2Frequency ofcompound chains thatwe targets [29]. propagate according to different databases. PIP3 associates with the inner lipid bilayer of the Chainlength KEGG Reactome Biocarta NCI plasma membrane to promote the recruitment of pro- 2 19790 55155 502 2790 teins with pleckstrin homology (PH) domains, like 3 0 874 9 134 PDPK and AKT, which is a crucial mediator of various 4 0 736 8 11 cell process, such as apoptosis, cell cycle, protein synth- 5 0 140 0 0 esis, regulation of metabolism [30]. 6 0 39 0 0 Among other functions, AKT activates also the cyclic 7 0 6 0 0 nucleotide phosphodiesterases (PDEs), that is a group of 8 0 17 0 0 enzymes able to regulate the localization, duration, and 9 0 1 0 0 amplitude of the cyclic nucleotides. Signaling PDEs are Salesetal.BMCBioinformatics2012,13:20 Page5of12 http://www.biomedcentral.com/1471-2105/13/20 therefore important regulators of signal transduction (respectively panel B and C) and the graphite final mediated by these second messenger molecules [31]. In network (panel D). In the graphite network the this pathway, PDE, depleting cAMP, indirectly inhibits nodes are annotated using the XRefs informations while the PKC mediated phosphorilation, and the activation of edges preserve the type of the reaction annotated the LIPE that is a lipase able to mobilize lipid energy stores. OWL model. Distinction between OR complexes PDE acts, in this way, as a anti-lipolytic agents [32]. (formed by all the possibile variants of each protein) This hormonal mediated signaling cascade, from the nested inside the AND complex of the Gamma secretase insulin receptor to the inhibition of HSL, involves two are topologically preserved in the resulting graph. “second messenger” compounds (PIP3 and cAMP) cru- cial for the transduction of the signal. 3.3 graphite functions In panel A of Figure 3 we report a part of the insulin To access the Reactome, KEGG, Biocarta and NCI data- signaling pathway of KEGG (hsa4910) that contains bases graphite uses respectively the lists reactome, three groups OR (PDE3, AKT and PKA), and two com- kegg, biocarta and nci. A pathway network can be pound mediated interactions (through PIP3 and cAMP). retrieved from one of the lists using the name of the This is a clear examples of a signal cascade in which the pathway, propagation of the signal through compounds is crucial > names(biocarta)[1:3] to keep the whole signaling path. In panel B we report [1] “acetylation and deacetylation of graphite reconstructed signal cascade while in panel rela in nucleus” C the KEGGgraph partially reconstructed signal. [2] “actions of nitric oxide in the heart” An extract of the xml file corresponding to the signal [3] “activation of camp-dependent pro- reported in Figure 3 of the main text is present in Addi- tein kinase pka” tional file 1. From the xml definition it is evident how > biocarta[["ras signaling pathway"]] entry 2 (SKIP) and entry 3 (SHIP) are linked to com- “ras signaling pathway” pathway from pound 15 (PIP3) while there is no direct interaction BioCarta between compound 15 (PIP3) and entry 62 (PDK1/2). Number of nodes = 18 This is why KEGGgraph misses the signal, while gra- Number of edges = 22 phite captures it by splitting the relation between Type of identifiers = native entry 52 (protein complex P13K) and 62 (PDK1/2) Retrieved on = 2011-05-12 through compound 15 (PIP3) into both 52 to 15 and 15 or its position in the list of pathways: to 62. This dissection allows the reconstruction of the > p <- biocarta [[175]] signal, otherwise impossible. > p@title [1] “ras signaling pathway” 3.2 Pathway conversion example 2: catalysis and cleavage In the network, nodes represent genes and edges func- of Notch 1 by Gamma Secretase Complex tional relationships among them. Nodes can have het- We selected the reaction 1784.3 from the Reactome erogeneous IDs (according to the pathway original pathway called “A third proteolytic cleavage releases annotation) and edges can be characterized by multiple NICD”. Gamma secretase is a multi-subunit protease functional relationships. complex, itself an integral membrane protein, that The function pathwayGraph builds a graphNEL cleaves single-pass transmembrane proteins at residues object from a pathway p: within the transmembrane domain. Here represented > g <- pathwayGraph(p) the processing of the Notch 1 protein. The gamma- > g secretase complex is composed of Presenilin homodimer A graphNEL graph with directed edges (PSEN1 variant 1 or 2 or 3 or 4 or 5 and PSEN2 variant Number of Nodes = 18 1 or 2), Nicastrin (NCSTN variant 1 or variant 2), Number of Edges = 23 APH1 (APH1A or APH1B) and PEN2. Maturation of > edgeData(g) [1] the Notch receptor involves a cleavage of the protein, $ ‘EntrezGene:10928 | EntrezGene:5879’ the intracellular domain is liberated from the plasma $ ‘EntrezGene: 10928 | EntrezGene:5879 membrane that can enter into the nucleus to engage ‘$weight other DNA-binding proteins regulating gene expression. [1] 1 The cleavage is catalyzed and performed by Gamma $ ‘EntrezGene:10928 | EntrezGene:5879 Secretase Complex. ‘$edgeType Figure 4 shows Reactome representation of the reac- [1] “catalysisOut (ACTIVATION)” tions (Panel A), the BioPax information as it is stored in The function converterIdentifiers allows the owl model and in Cytoscape plug-in for BioPax user to map such variety of IDs to a single type (Entrez- Salesetal.BMCBioinformatics2012,13:20 Page6of12 http://www.biomedcentral.com/1471-2105/13/20 Figure4CatalysisandcleavageofNotch1byGammaSecretaseComplex.Reactomerepresentationofthereactions(PanelA),BioPax informationasitisstoredinowlmodelandinCytoscapeplug-inBioPaxdedicated(respectivelypanelBandC)andthegraphitefinal network(panelD). Gene or Gene Symbol). For the ID conversion gra- > pEntrez <- convertIdentifiers (p, phite uses the data provided by the Bioconductor “entrez”) package org.Hs.eg.db. This mapping process, how- > pEntrez ever, may lead to the loss of some nodes (not all identi- “ras signaling pathway” pathway from fiers may be recognized) and has an impact on the BioCarta topology of the network (one ID may correspond to Number of nodes = 20 multiple IDs in another annotation or vice versa). Number of edges = 20 Salesetal.BMCBioinformatics2012,13:20 Page7of12 http://www.biomedcentral.com/1471-2105/13/20 Type of identifiers = Entrez Gene take the p-value of the topological analysis (pPERT); 4) Retrieved on = 2011-05-12 we repeat from step 1 10,000 times. > nodes(pEntrez)[1:10] As shown in Figure 6B the distribution of the topolo- [1] “10928” “1147” “3265” “387” “4303” gical significance p-values in case of signal propagation “5295” “572” “5879” “5894” “5898” is shifted towards lower values with respect to the case Several pathways have a huge number of nodes and of non-propagation. Propagation p-value distribution is edges, thus there is the need of an efficient system of not only centered on 0.1 (while the one with non-propa- visualization. To this end graphite uses the Rcytos- gation is centered on 0.3) but is also less variable. As cape package to export the network to Cytoscape [27]. expected the same results are obtained simulating nega- Cytoscape is a Java based software specifically built to tive fold changes (data not shown). This finding demon- manage biological network complexity and for this rea- strate that compound mediating signal propagation son it is widely used by the biological community. Run improves topological analyses giving more reliable Cytoscape with RPC plugin enabled and type at the R results. command prompt: > cytoscapePlot(convertIdentifiers(a 3.5Exampleoftopologicalanalysis:B-lineageAdultAcute $’toll-like receptor pathway’, “symbol”)) Lymphocytic Leukemia The network will be automatically loaded into Cytos- 3.5.1 Data cape. See Figure 5 for the result of this operation. The dataset, recently published by [33], characterizes gene expression signatures in acute lymphocytic leuke- 3.4 Simulation study: compound propagated signal mia (ALL) cells associated with known genotypic improves topological analysis abnormalities in adult patients. Several distinct genetic In order to verify our signal propagation strategy we mechanisms lead to acute lymphocytic leukemia (ALL) perform a simulation study. Using the insulin signaling malignant transformations deriving from distinct lym- pathway of the KEGG database we select as differentially phoid precursor cells that have been committed to expressed 22 genes lying on the signal paths highlighted either T-lineage or B-lineage differentiation. Chromo- in Figure 6A. These genes are connected if propagation some translocations and molecular rearrangements are is employed, otherwise they are disconnected (see Figure common events in B-lineage ALL and reflect distinct 6C and 6D for propagation and non-propagation respec- mechanisms of transformation. The relative frequencies tively). We expect that propagation will lead to better of specific molecular rearrangements differ in children results in terms of topological analyses. and adults with B-lineage ALL. The BCR breakpoint Our simulation is based on the following steps: 1) we cluster region and the c-abl oncogene 1 (BCR/ABL) randomly generate μFC ~ U (2, 10); 2) we randomly gen- gene rearrangement occurs in about 25% of cases in erate log fold change values (δi for i = 1,..., 22) of the adult ALL, and much less frequently in pediatric ALL. differentially expressed genes as δi ~ N (μFC, 2) (interac- Data is available at the Bioconductor site (http://www. tions of the signal paths selected are characterized all by bioconductor.org/help/publications/2003/Chiaretti/chiar- activation, thus, fold changes have the same sign); 3) we etti2/) Expression values, appropriately normalized run the SPIA[5] algorithm on the Insulin signaling according to rma and quantile normalization, derived pathway with and without signal propagations and we from Affymetrix single channel technology, consist of 37 observations from one experimental condition (n = 37, 1 BCR; presence of BCR/ABL gene rearrangement) and 41 observations from another experimental condition (n = 2 41, NEG; absence of rearrangement). Probes platform have been annotate using EntrezGene custom CDF ver- sion 14 [34]. Given the involvement of BCR and ABL genes in the chimera rearrangement, we expect these genes playing a central role in the gene set analysis; thus, most of the pathways containing BCR and/or ABL genes should be found as significant. 3.5.2 Results We report the results obtained by SPIA[5] and topo- logyGSA[6] on the graphite networks. These statis- tical tests are based on completely different null Figure5VisualizationofgraphitenetworkusingRCytoscape hypotheses; while SPIA needs the list of differentially package. expressed genes, topologyGSA performs two statistical Salesetal.BMCBioinformatics2012,13:20 Page8of12 http://www.biomedcentral.com/1471-2105/13/20 Figure6ResultsofthesimulationstudyontheInsulinsignalingpathwaycompoundmediatedsignalpropagation.PanelA.Signal pathsselectedtobedifferentiallyexpressed.PanelB.p-valuedistributionofthetopologicalanalysisSPIA(pPERT)withandwithout propagation.PanelC.graphitenetworkobtainedfrominsulinpathwaywithpropagation.PanelD.networkobtainedfrominsulinpathway withoutpropagation. tests (to compare the mean and the variance of the Tables 3 and 4 reports the list of significant pathways pathway between two groups) on the entire list of genes identified by the above approaches; pathways marked belonging to a pathway. Here, differentially expressed with ✓ are those containing BCR and/or ABL genes. It genes required for SPIA package have been identified is interesting to observe that several pathways contain- using RankProd test [35] (F D R <0.01), while the test ing either BCR and ABL genes were identified as on the mean has been chosen for topologyGSA deregulated especially with topologyGSA. Then, as package. expected, several additional pathways associated to Salesetal.BMCBioinformatics2012,13:20 Page9of12 http://www.biomedcentral.com/1471-2105/13/20 Table 3Pathway analysisperformed usingSPIAstatistical test ongraphite networks. Name FDR Signal Database BCR ABL 1 Leishmaniasis 0.03 Activated KEGG 2 Phase1-Functionalizationofcompounds 0.02 Activated Reactome 3 Syndecan-4-mediatedsignalingevents 0.00 Activated NCI 4 RegulationofRAC1activity 0.00 Activated NCI 5 RAC1signalingpathway 0.00 Activated NCI 6 RhoAsignalingpathway 0.00 Activated NCI 7 RegulationofRhoAactivity 0.00 Activated NCI 8 NoncanonicalWntsignalingpathway 0.00 Activated NCI 9 Wntsignalingnetwork 0.00 Activated NCI 10 BCRsignalingpathway 0.00 Inhibited NCI 11 IL6-mediatedsignalingevents 0.00 Inhibited NCI 12 HypoxicandoxygenhomeostasisregulationofHIF-1-alpha 0.00 Inhibited NCI 13 StabilizationandexpansionoftheE-cadherinadherensjunction 0.00 Activated NCI 14 E-cadherinsignalinginthenascentadherensjunction 0.00 Activated NCI 15 E-cadherinsignalingevents 0.00 Activated NCI 16 HIF-1-alphatranscriptionfactornetwork 0.00 Inhibited NCI 17 ALK1signalingevents 0.01 Activated NCI 18 CanonicalWntsignalingpathway 0.02 Activated NCI 19 ALK1pathway 0.02 Activated NCI 20 S1P2pathway 0.02 Inhibited NCI 21 RegulationofnuclearSMAD2/3signaling 0.02 Activated NCI 22 RegulationofcytoplasmicandnuclearSMAD2/3signaling 0.02 Activated NCI 23 TGF-betareceptorsignaling 0.02 Activated NCI 24 C-MYBtranscriptionfactornetwork 0.02 Activated NCI 25 Osteopontin-mediatedevents 0.02 Inhibited NCI 26 Directp53effectors 0.02 Inhibited NCI 27 ValidatedtranscriptionaltargetsofAP1familymembersFra1andFra2 0.03 Activated NCI 28 Regulationofnuclearbetacateninsignalingandtargetgenetranscription 0.03 Activated NCI 29 S1P4pathway 0.03 Inhibited NCI 30 amb2Integrinsignaling 0.03 Activated NCI 31 p38MAPKsignalingpathway 0.04 Activated NCI 32 Posttranslationalregulationofadherensjunctionstabilityanddissassembly 0.04 Activated NCI 33 N-cadherinsignalingevents 0.04 Activated NCI 34 Lissencephalygene(LIS1)inneuronalmigrationanddevelopment 0.05 Activated NCI ✓ 35 C-MYCpathway 0.06 Inhibited NCI 36 p53pathway 0.06 Activated NCI cancer progression, apoptosis, cell cycle, cell prolifera- the NCI databases allow the user to have deeper insight tion and inflammation have been selected as significant. into the data. Leaving the comparison between topological analyses To highlight the usefulness of topological analysis in aside (because it is out of the scope of the present the context of transcriptomic data interpretation, we work), the results testify the feasibility of performing report two graphite networks identified as signifi- analyses using graphite and the ability to obtain reli- cantly altered in the previous analysis. able results independently of the chosen analysis Chronic myeloid leukemia pathway includes both method. In addition, for the first time, thanks to gra- genes, BCR and ABL1, and was identified as differen- phite all the topological methods gain the access to tially expressed between BCR/ABL positive and negative pathway repositories previously not considered. patients by topologyGSA. Figure 7 shows the chronic Our results highlight that the hierarchical pathway myeloid leukemia graphite network from KEGG structure and the reduced dimension of the pathways database with differentially expressed genes mapped characterizing respectively the Reactome and Biocarta with different colors according to fold change sign. It is databases jointly with the specialized cancer pathways of interesting to note the presence of several OR groups (e. Salesetal.BMCBioinformatics2012,13:20 Page10of12 http://www.biomedcentral.com/1471-2105/13/20 Table 4Pathway analysisperformed usingtopologyGSA statistical test ongraphite networks. Name FDR Database BCR ABL 1 CDOinmyogenesis 0.00 Reactome ✓ 2 RegulationofcytoskeletalremodelingandcellspreadingbyIPPcomplexcomponents 0.00 Reactome 3 RoleofAblinRobo-Slitsignaling 0.00 Reactome ✓ 4 NF-kBactivationthroughFADD/RIP-1pathwaymediatedbycaspase-8and-10 0.01 Reactome 5 TNFsignaling 0.01 Reactome 6 G1Phase 0.02 Reactome 7 mTORsignalling 0.02 Reactome 8 PI3KCascade 0.02 Reactome 9 CyclinDassociatedeventsinG1 0.02 Reactome 10 PI-3Kcascade 0.03 Reactome 11 E2FmediatedregulationofDNAreplication 0.04 Reactome 12 CyclinA/B1associatedeventsduringG2/Mtransition 0.04 Reactome 13 IntrinsicPathwayforApoptosis 0.04 Reactome 14 ExtrinsicPathwayforApoptosis 0.05 Reactome 15 Lissencephalygene(LIS1)inneuronalmigrationanddevelopment 0.00 NCI ✓ 16 ErbB4signalingevents 0.01 NCI 17 Regulationofretinoblastomaprotein 0.00 NCI ✓ 18 CanonicalNF-kappaBpathway 0.01 NCI 19 p73transcriptionfactornetwork 0.01 NCI ✓ 20 AtypicalNF-kappaBpathway 0.02 NCI 21 Neurotrophicfactor-mediatedTrkreceptorsignaling 0.00 NCI ✓ 22 PathogenicEscherichiacoliinfection 0.00 KEGG ✓ 23 Chronicmyeloidleukeamia 0.00 KEGG ✓ ✓ 24 Cellcycle 0.0 KEGG ✓ 25 Axonguidance 0.00 KEGG ✓ 26 Neurotrophinsignalingpathway 0.00 KEGG ✓ 27 mtorsignalingpathway 0.01 Biocarta 28 nf-kbsignalingpathway 0.01 Biocarta 29 tnf/stressrelatedsignaling 0.02 Biocarta 30 p53signalingpathway 0.03 Biocarta 31 tnfr1signalingpathway 0.02 Biocarta 32 integrinsignalingpathway 0.02 Biocarta 33 erkandpi-3kinasearenecessaryforcollagenbindingincornealepithelia 0.02 Biocarta 34 rbtumorsuppressor/checkpointsignalinginresponsetodnadamage 0.03 Biocarta 35 egfsignalingpathway 0.04 Biocarta 36 tgfbetasignalingpathway 0.04 Biocarta 37 roleofmitochondriainapoptoticsignaling 0.04 Biocarta 38 inhibitionofcellularproliferationbygleevec 0.04 Biocarta 39 atmsignalingpathway 0.05 Biocarta ✓ 40 influenceofrasandrhoproteinsong1tostransition 0.05 Biocarta g. PI3K, AKT, IKK, CBL gene families), single members that parses KGML format but i) considers all group of of which resulted to be differentially expressed. Two genes as groups OR and ii) completely removes com- clear deregulated paths starting from BCR and ABL1 pounds without propagation and NCIgraph that genes towards apoptosis and NFKB pathways highlight imports BioPax models from Cytoscape [27] without the power of topological analysis to deeper investigate taking into account groups and compound propagation signal cascades within large pathways. (compounds become nodes of the network) and uses internal IDs as node labels. The package allows to retain 3.6 R/Bioconductor packages only nodes with EntrezGene IDs loosing all the other Currently there are two Bioconductor packages that try nodes. Thus, both of them are not suitable for topologi- to convert pathway to the SIF model. KEGGgraph[9] cal pathway analyses.