Table Of Content

Schellenbergeretal.BMCSystemsBiology2012,6:9 http://www.biomedcentral.com/1752-0509/6/9 RESEARCH ARTICLE Open Access 13 Predicting outcomes of steady-state C isotope tracing experiments using Monte Carlo sampling Jan Schellenberger1, Daniel C Zielinski2, Wing Choi2, Sunthosh Madireddi2, Vasiliy Portnoy2, David A Scott3, Jennifer L Reed4, Andrei L Osterman3 and Bernhard ∅ Palsson2* Abstract Background: Carbon-13 (13C) analysis is a commonly used method for estimating reaction rates in biochemical networks. The choice of carbon labeling pattern is an important consideration when designing these experiments. We present a novel Monte Carlo algorithm for finding the optimal substrate input label for a particular experimental objective (flux or flux ratio). Unlike previous work, this method does not require assumption of the flux distribution beforehand. Results: Using a large E. coli isotopomer model, different commercially available substrate labeling patterns were tested computationally for their ability to determine reaction fluxes. The choice of optimal labeled substrate was found to be dependent upon the desired experimental objective. Many commercially available labels are predicted to be outperformed by complex labeling patterns. Based on Monte Carlo Sampling, the dimensionality of experimental data was found to be considerably less than anticipated, suggesting that effectiveness of 13C experiments for determining reaction fluxes across a large-scale metabolic network is less than previously believed. Conclusions: While 13C analysis is a useful tool in systems biology, high redundancy in measurements limits the information that can be obtained from each experiment. It is however possible to compute potential limitations before an experiment is run and predict whether, and to what degree, the rate of each reaction can be resolved. Background in the network, is used to simulate data (Figure 1a). For In vivo metabolic reaction flux data provides insight into a specified carbon input label, an isotopomer model the dynamic function of the cell [1-3]. One widely-used enables the calculation of an isotopomer distribution experimental method for measuring in vivo reaction vector (IDV) corresponding to a particular simulated fluxes is steady-state substrate 13C isotope labeling [4-6]. steady-state flux distribution (Figure 1b). Mass spectro- An overview of the general 13C methods is described in metry (MS) experiments on 13C-labeled metabolites (e.g. Figure 1. Isotopomers, or isomers created from inserting macromolecules) generate fractional 13C enrichments labeled isotopes (often 13C) at different positions in a from fragmented macromolecules, forming a mass dis- molecule, provide a unique way to track the progress of tribution vector (MDV) (Figure 1c). The error between carbon through a metabolic network. By measuring the the measured MDV and the MDV corresponding to the enrichment for 13C in metabolite pools after growing on simulated IDV summarizes how well the presumed flux a 13C labeled substrate, inferences about the internal distribution fits the 13C experiment. The flux distribu- flux state can be made. The approach can be summar- tion v that minimizes this error can be computed by sol- ized as a data fitting problem between simulated and ving a non-linear optimization problem. Simulating 13C experimentally measured 13C labeled metabolite concen- enrichment given a flux distribution is computationally trations. An isotopomer model, describing the positional inexpensive; however, the inverse problem of calculating transfer of carbon atoms for all or a subset of reactions the flux distribution that best fits a 13C experiment is both of greater interest and significantly more computationally difficult (Figure 1d). A review of these methods *Correspondence:[email protected] 2DepartmentofBioengineering,UniversityofCalifornia-SanDiego,9500 and associated challenges can be found in [6-8]. GilmanDrive,LaJolla,CA,92093-0412USA Fulllistofauthorinformationisavailableattheendofthearticle ©2012Schellenbergeretal.;licenseeBioMedCentralLtd.ThisisanopenaccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited. Schellenbergeretal.BMCSystemsBiology2012,6:9 Page2of14 http://www.biomedcentral.com/1752-0509/6/9 a b c Isotopomer Network Balance Equation s Experimental Da ta v 1. v = v rxn,1 2. viso,i=n Σ viso,ou t antity rxn iso u q 3. S v = 0 v rxn rxn,2 m m+1 m +2 M/Z vrxn,4 Isotopomer Distribution Mass Distributio n v Iso # fraction Mass % % meas . Δ rxn,3 0 .25 0 .25 .20 .0 5 1 .25 } 1 .45 .51 .0 6 vrxn,5 2 .20 3 .30 2 .30 .29 .0 1 Total error: .1 2 d Exp. v IDV MDV rxn Error Forward Calculation (simulatio n) Inverse Calculation (data fitting) Figure1IsotopomerOverview.a)definitionofthenetwork,includingcarbonfatesb)isotopomerbalanceequations-solvingtheseequations yieldstheIsotopomerDistributionVector(IDV)c)experimentaldataarecomparedtocomputedMassDistributionVectors(MDV)yielding experimentalfit.d)twotypesofpossiblecomputations-theforwardcomputationusesafluxdistributionasinputtocomputetheMDV,while theinverseproblemattemptstofindthefluxdistributionthatminimizestheexperimentaldiscrepancy. There are several distinct sources of variability in a substrate labeling pattern to enhance the information 13C experiment that limit the confidence with which gained from an experiment. There are two primary particular reactions can be determined. First, due to motivations that drive such an endeavor. First, 13C experimental accuracy limitations and biological varia- experiments are expensive, so choosing the best experi- bility, uncertainty arises in the experimentally measured ment a priori is desirable. Second, we can assess the MDV. Second, due to alternate pathways present in capability of the steady-state 13C labeling approach metabolic networks, the mass balance equations under- towards determining reaction fluxes in an unbiased lying a metabolic steady-state are significantly under- manner. The issue of optimization of 13C labeling determined [9]. While the full network flux distribution experiments has been addressed in the literature may not be resolved at high confidence by a given [4,10,11]. However, the use of flux sampling for optimal experiment, certain labeling patterns may resolve fluxes isotopomer experiment prediction has not been through certain pathways with greater confidence than explored previously, and this approach presents several other labeling patterns, as has previously been shown unique advantages over previous methods. [4,10]. We describe a Monte Carlo sampling-based method For a given n-carbon compound, there are 2n possible for choosing the optimal substrate label, based upon the 13C labeling states (as well as mixtures), and the choice Constraint-Based Reconstruction and Analysis (COBRA) of label is known to affect the ability to determine reac- computational platform [12,13]. COBRA methods use tions fluxes [10]. As 13C methods are based upon com- manually-curated biochemical network reconstructions putational modeling of isotopomer distributions, it is of known reaction stoichiometries and measurable nutri- possible to computationally optimize the choice of ent uptake and secretion rates to define feasible ranges Schellenbergeretal.BMCSystemsBiology2012,6:9 Page3of14 http://www.biomedcentral.com/1752-0509/6/9 for internal reaction fluxes. Many of these reconstruc- net flux at steady state, were subsequently omitted from tions have been generated [14] and the procedure is consideration. Groups of reactions that could be merged well-established [13,15]. These models can be used for together without affecting model results (e.g. linear methods such as computing growth rates [16,17], pre- pathways) were combined in order to reduce the num- dicting the effects of gene knockouts [16,18,19], predict- ber of variables. Large sets of biosynthesis reactions that ing the endpoint of adaptive evolutions [20], and produce phospholipids, nucleotides, co-factors were also designing strains for industrial production [21,22]. A combined, since there no experimental measurements review of these methods can be found here [12,13,23]. existed for these high-carbon metabolites. However, by- Monte Carlo sampling of constraint-based metabolic products resulting from high-carbon metabolite produc- models can be used to generate sets of biochemically tion (e.g. CO , formate, succinate, fumarate, and pyru- 2 feasible flux distributions that obey measured uptake vate) that could enter back into the metabolic network and secretion rate constraints [24]. IDVs generated from were tracked. Of the original 932 reactions in the com- these flux distributions in an isotopomer model can plete metabolic iMC1010 network, nearly a third were then be compared against simulated 13C data to evaluate represented in the biosynthetic isotopomer model, either the ability of the experiment to determine reaction individually or as grouped reactions. fluxes. Monte Carlo sampling takes advantage of the The final isotopomer model accounts for a total of speed with which IDVs can be simulated from putative 313 irreversible reactions, including 278 which track flux distributions, making this approach suitable for carbon. Inclusion of these additional pathways is likely large-scale analysis of in silico experiments. important for accurate assessment of the flux-resolving A Monte Carlo sampling approach was implemented power of 13C experiments both within and beyond cen- using a newly developed isotopomer model to evaluate tral metabolism [7]. A complete listing of the reactions the efficiency of different carbon labeling patterns and metabolites in the biosynthetic network can be toward determining reaction fluxes in E. coli. The found in the Additional File 1. dimensionality of simulated 13C data was calculated using singular value decomposition (SVD) for different Monte Carlo Sampling Approach substrate labeling patterns and compared to the number To compute possible flux distributions of the E. coli of undetermined dimensions in the network. 13C experi- model, the network was sampled using a Markov Chain, ments were performed for three substrate labeling pat- Monte Carlo (MCMC) sampling algorithm (see Meth- terns to validate the prior theoretical analysis. The ods). The steady-state mass balance and uptake rate methods developed represent a flexible computational constraints for the metabolic network create a convex analysis that can be applied to various biological systems hyperspace that contains all biochemically feasible and experimental setups to estimate, a priori, the effi- steady-state flux distributions [27]. Monte Carlo sam- ciency of isotopomer experiments in determining reac- pling generates a set of flux distributions that are spread tion fluxes. uniformly throughout the feasible space. The inclusion of 13C experimental data reduces the feasible space in Results and Discussion which the true flux state must lie by requiring that the Expanded Isotopomer Model IDV calculated from the putative flux distribution must An isotopomer model was constructed in two phases. match the experimental data within error. While the First, a central metabolic isotopomer model that space of feasible flux distributions depends only on reac- accounts for 85 reactions including glycolysis, the TCA tion stoichiometry, the space of resulting simulated cycle, the pentose phosphate pathway, oxidative phos- IDVs differs depending on the input substrate labeling phorylation, pyruvate metabolism, and anaplerotic reac- pattern. Hence, different labeling patterns can have dif- tions was derived from the iJR904 E. coli reconstruction fering abilities to resolve each reaction flux. [16]. This initial model was equivalent in reaction con- Here, we used Monte Carlo sampling of flux distribu- tent to commonly used isotopomer models for E. coli tions to analyze the degree to which reaction fluxes can [25,26]. be determined by steady-state 13C labeling experiments An expanded model was then constructed that in terms of several possible experimental objectives. For includes both central and biosynthetic pathways. The example, one possible experimental objective is to deter- iMC1010 metabolic network [19] was evaluated to mine whether a particular reaction has a flux above or determine which reactions can sustain non-zero fluxes below a specified value. For this objective, a well- during growth on glucose, acetate, or lactate when only designed labeling pattern would be one in which flux certain by-products are allowed to be secreted (acetate, distributions that have an objective reaction flux greater formate, D-lactate, pyruvate, succinate, glycerol, CO , than the specified value can be easily distinguished from 2 and ethanol). Blocked reactions, which must have zero flux distributions with an objective reaction flux less Schellenbergeretal.BMCSystemsBiology2012,6:9 Page4of14 http://www.biomedcentral.com/1752-0509/6/9 than the specified value. As seen in Figure 2, a hypothe- whether a reaction has high or low flux (hi-lo). The tical experiment 1 produces measurement distributions solution space is partitioned into all points with v j which overlap whereas experiment 2 shows greater >threshold versus v <threshold. A different hypothesis j separation. If one were interested in differentiating is generated for each reaction j. The threshold was cho- between the two partitions, experiment 2 would be sen to be the median of all v so that half of all points j much preferable. This method allows for the scoring of would be in each of the two partitions. The second set any label for any given experimental objective without of hypotheses tested consisted of biologically relevant first knowing the true cellular flux distribution v. flux ratios. For each point the ratio of two reactions, v/ i v, was determined to be above or below some threshold j Generating and Evaluating 13C Experimental Hypotheses that formed a partition. An experimental hypothesis is defined as a partition of Intuitively, a hypothesis score should be high if the the sampled flux distribution set. While many possible isotopomer distributions coming from one partition are hypotheses could be considered, two rational hypotheses distinguishable from distributions in the other partition. were studied. The first case attempts to elucidate While there are several ways of doing this, we chose a a 2 n o artiti P Flux Constraints 1 n S v = 0 o artiti P v < v < v lb ub Hi Lo Hypothesis b Glucose Isotopomer Mixture Measurement Distribution C -12 C -13 nts oi } p Experiment 1 8200%% mple m e nts a g # s hypothesis score (e.g. flux v) Fra i s nt 30% } poi Experiment 2 20% ple e nts m m 50% a a g s Fr # hypothesis score (e.g. flux v) i Figure2MethodOverview.a)Thespaceoffluxdistributionsispartitionedintwopartscorrespondingto‘high’fluxversus‘low’flux.Auniform randomsampleisdrawnfromthespaceandisalsopartitionedintopartition1andpartition2.b)Foreachpointinthespacethedistribution ofexperimentalmeasurementsissimulated.Hypotheticalexperiment1andexperiment2withdifferentglucoselabelmixturesproducedifferent measurementdistributions.Thedistributionsfromexperiment2aremoreseparated,indicatingparametersofexperiment2aremoreconducive fordifferentiatingbetweenthehighandlowpartition. Schellenbergeretal.BMCSystemsBiology2012,6:9 Page5of14 http://www.biomedcentral.com/1752-0509/6/9 heuristic metric based on a Z-score, which is commonly associated with the set of flux distributions. Raw and used to determine the difference between two samples. normalized Z-scores are given in the Additional File 1. A Z-score was calculated for each fragment (element) of Z-scores varied from the level of noise to a maximum of the calculated MDV for each simulated flux distribution: >20-fold the level of noise. To illustrate the differences in label-dependent reac- |x¯ −x¯ | Z = (cid:2) hi lo tion resolving capacity, two sets of Z-scores correspond- i s2 +s2 +σ2 ing to [1-13C] glucose and [6-13C] glucose are plotted in hi lo Figure 3. Lighter colors indicate higher Z-scores and where x¯ and x¯ are the average fragment enrich- ease of measurement. In this case, [6-13C] glucose scores hi lo ments for the upper and lower partitions, respectively, higher at measuring the pentose phosphate pathway and s2 and s2 are the variances of fragment enrichments most of lower glycolysis, whereas [1-13C] glucose glu- hi lo cose scores much higher at measuring the glyoxylate for the upper and lower partitions, respectively, and the a is a constant equal to 0.014. a is on the order of mag- shunt. The results suggest that there is no single label that yields a high score for all experimental objectives. nitude of the uncertainty in measurements. This slight For example, the exchange of formate (EX_for) could be modification to the standard Z-score puts a lower easiest measured with a [1,2-13C] glucose; however, this bound on the expected experimental variation. The Z- labeling pattern is bested by [1-13C] glucose for the score of each fragment is added together to give the Z- measurement of reaction formyltetrahydrofolate defor- score of the experiment. mylase (FTHFD) (Figure 4). This non-universality of (cid:3) labels is in line with expectations, as it has been pre- Z= Z i viously shown that the choice of labels can affect the i∈fragment flux resolution. For many reactions, the best experiment Using this approach, candidate flux states were that could be performed involves hypothetical (non- sampled uniformly and experimental hypotheses tested. commercially available) labels. One example is the ratio Z-scores were calculated for the hi-lo hypothesis corre- of phosphofructokinase (PFK) flux to fructose bispho- sponding to 1) individual reactions 2) reaction ratios sphate aldolase (FBA) flux. The best label for determin- and 3) two ‘random’ reactions or ratios. Random ing this ratio is [1,2,3-13C] glucose (Z = 28.0), which hypotheses were tested to estimate the level of noise gives a much higher Z-score than the best commercially a b Scale: Linear; Z-score: 0 50+ Figure3SimulatedZ-Scores.Twopossibleglucoselabelpatternsshowdifferentstrengthsinevaluatingdifferentpartsofthenetwork.Brighter colorsindicatemoreeasilydeterminedfluxes.a)[1-13C]glucoseZ-scoresillustratesfluxdeterminabilitywith100%[1-13C]glucose.b)[6-13C] glucoseZ-scoresshowsthesamenetworkevaluatedwith[6-13C]glucose.Itisobserved,forexample,that[6-13C]glucoseispredictedto elucidatethepentosephosphatepathwaymoreeasily,while[1-13C]glucosebetterelucidatestheglyoxylateshunt. Schellenbergeretal.BMCSystemsBiology2012,6:9 Page6of14 http://www.biomedcentral.com/1752-0509/6/9 Absolute Z-scores Relative Z-scores (noise corrected) C] C] 13% [U-C] 130% [U- 13% [U-C] 130% [U- C] 20 + 2 C] 20 + 2 MAX 13C][1,2,3- 13C][1,2,5- 13C][2,3- 13C][5,6- 13[2,3,4,5,6- 12C] + 80% [ 13C][1- 13C] 80% [1- 13C][2- 13C][3- 13C][4- 13C][5- 13C][6- 13[1,2-C] 13C][1,2,3- 13C][1,2,5- 13C][2,3- 13C][5,6- 13[2,3,4,5,6- 12C] + 80% [ 13C][1- 13C] 80% [1- 13C][2- 13C][3- 13C][4- 13C][5- 13C][6- 13[1,2-C] EX_nh4 24.018.324.016.313.113.411.513.412.1 9.711.0 9.722.5 8.916.7 0.6 0.9 0.5 0.4 0.4 0.3 0.4 0.4 0.3 0.3 0.3 0.8 0.2 0.6 EX_for 27.215.121.521.9 9.421.5 9.421.518.510.422.9 3.6 4.4 6.027.2 0.4 0.7 0.7 0.2 0.7 0.2 0.7 0.6 0.3 0.7 0 0 0.1 0.9 CS 54.444.038.754.443.928.415.428.422.839.736.912.120.643.147.9 0.7 0.7 0.9 0.7 0.5 0.2 0.5 0.4 0.7 0.6 0.2 0.3 0.7 0.8 EDA 29.328.616.720.123.319.6 5.419.616.121.512.221.418.411.229.3 0.9 0.5 0.6 0.7 0.6 0.1 0.6 0.4 0.6 0.3 0.6 0.5 0.3 0.9 FTHFD 30.812.213.221.7 9.330.8 5.030.825.416.6 9.9 2.6 8.6 5.516.0 0.3 0.3 0.6 0.2 0.9 0.1 0.9 0.7 0.4 0.2 0 0.2 0.1 0.4 GLYK 13.0 2.4 9.610.0 1.810.0 1.010.0 8.1 1.413.0 0.9 1.2 1.011.2 0 0.5 0.5 0 0.5 0 0.5 0.4 0 0.8 0 0 0 0.6 MALS 47.736.321.434.517.836.621.636.632.023.617.6 8.610.3 7.147.7 0.7 0.4 0.7 0.3 0.7 0.4 0.7 0.6 0.4 0.3 0.1 0.1 0.1 0.9 PGI 68.159.844.052.468.134.111.434.124.845.432.713.631.960.554.0 0.8 0.6 0.7 1 0.5 0.1 0.5 0.3 0.6 0.4 0.2 0.4 0.8 0.7 PGK 61.252.834.941.861.232.5 8.532.524.139.425.415.330.349.347.2 0.8 0.5 0.6 0.9 0.5 0.1 0.5 0.3 0.6 0.4 0.2 0.4 0.8 0.7 POX 3.4 3.0 2.6 2.8 2.1 2.2 2.5 2.2 1.8 2.2 2.3 1.5 0.8 1.8 3.4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PFL 22.813.317.221.3 7.221.4 7.821.418.3 9.918.5 4.1 3.9 2.622.8 0.4 0.6 0.8 0.2 0.8 0.2 0.8 0.7 0.3 0.7 0 0 0 0.9 PPC 58.841.635.139.822.749.321.949.341.924.830.7 9.512.7 8.658.8 0.7 0.5 0.6 0.3 0.8 0.3 0.8 0.7 0.4 0.5 0.1 0.2 0.1 0.9 rFUM 55.040.640.655.033.635.018.235.029.435.441.211.216.132.552.2 0.7 0.7 0.9 0.6 0.6 0.3 0.6 0.5 0.6 0.7 0.1 0.2 0.5 0.9 PFK / FBP 28.028.0 7.612.2 9.2 7.3 4.6 7.3 6.0 4.510.3 4.7 4.3 2.618.5 0.9 0.2 0.3 0.2 0.1 0.1 0.1 0.1 0 0.3 0.1 0 0 0.5 GAPD / G6PDH2r 69.160.343.752.169.134.611.034.625.345.732.513.832.660.654.3 0.8 0.6 0.7 1 0.5 0.1 0.5 0.3 0.6 0.4 0.2 0.4 0.8 0.7 PYK / PPS 25.119.617.824.617.317.110.117.115.316.716.0 5.7 6.517.925.1 0.7 0.6 0.9 0.6 0.6 0.3 0.6 0.5 0.5 0.5 0.1 0.1 0.6 0.9 PPC / PPCK 11.110.8 4.1 7.5 4.1 7.4 5.6 7.4 6.4 6.2 3.4 1.8 2.0 2.211.1 0.7 0.1 0.4 0.1 0.4 0.2 0.4 0.3 0.3 0 0 0 0 0.7 rFUM / ENO 50.737.239.050.028.035.617.735.629.830.339.011.514.526.250.7 0.7 0.7 0.9 0.5 0.6 0.3 0.6 0.5 0.5 0.7 0.2 0.2 0.5 0.9 rACONT 54.444.038.754.443.928.415.428.422.839.736.912.120.643.147.9 0.7 0.7 0.9 0.7 0.5 0.2 0.5 0.4 0.7 0.6 0.2 0.3 0.7 0.8 PDH / rFUM 20.711.517.020.7 8.116.2 5.216.213.6 9.819.7 3.6 3.7 6.719.0 0.4 0.7 0.8 0.2 0.6 0.1 0.6 0.5 0.3 0.8 0 0 0.2 0.8 MALS / rACONT 44.335.618.928.120.534.819.134.830.319.814.2 7.811.4 6.644.3 0.7 0.4 0.6 0.4 0.7 0.4 0.7 0.6 0.4 0.2 0.1 0.2 0.1 0.9 GLUDy / GLUSy 2.1 1.4 1.4 2.1 1.2 1.8 0.6 1.8 1.5 1.6 1.0 1.4 0.9 1.1 1.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 LDHD / PDH 7.9 6.2 7.9 4.8 6.0 5.6 1.5 5.6 4.5 5.3 4.8 2.5 3.7 5.5 7.5 0.4 0.6 0.2 0.4 0.3 0 0.3 0.2 0.3 0.2 0 0.1 0.3 0.5 PYK / PDH 20.915.014.920.612.016.4 7.616.414.111.316.3 4.6 5.111.120.9 0.6 0.6 0.8 0.4 0.6 0.2 0.6 0.5 0.4 0.6 0.1 0.1 0.4 0.8 FRD2 / SUCD1i 3.8 3.2 2.7 3.4 1.4 3.4 2.0 3.4 3.1 2.2 2.3 0.9 1.5 1.6 3.8 0 0 0.1 0 0 0 0 0 0 0 0 0 0 0.2 random1 3.1 3.0 2.6 1.7 2.2 2.5 1.0 2.5 2.0 1.5 1.8 1.4 1.0 1.3 3.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 random2 2.5 1.8 2.5 1.6 1.3 2.1 1.3 2.1 2.0 0.9 1.3 1.5 2.1 1.4 2.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 random noise level Figure 4 Computational Evaluation of Glucose. Potential glucose labels are evaluated based on both absolute Z-scores and Z-scores normalizedwithrespecttothelabelingpatternwiththehighestscore.Glucoselabelsarelistedontopincludinghypotheticallabelingpatterns (e.g.[1,2,3-13C])andcommerciallyavailablelabels(e.g.[1-13C]).[U-13C]=uniformlabeledand[12C]=unlabeled.Reactionandreactionratio hypothesesarelistedontheleft.The‘random’hypotheses,asdescribedinthemethods,showsthelevelofnoise. available label, [1,2-13C] glucose (Z = 18.5). Thus, there each subsystem (Figure 5b). Reactions that were stoi- may be motivation to synthesize compounds with more chiometrically fixed by the measured constraints on complex labeling patterns than commonly used. Addi- acetate, glucose, D-lactate, oxygen, growth rate, and tionally, there are certain reactions which are predicted ATP maintenance were also categorized by subsystem. to be difficult to measure with any labeling pattern. For Stoichiometrically fixed (i.e. constraint-determined) example, the Z-scores for each possible labeled glucose reactions have a confidence interval of zero, and thus substrate for the reaction pyruvate oxidase (POX) all lie are label-independent and receive no additional knowl- within the level of noise, as determined by the compari- edge from 13C experiments. It was found that histidine, son with random hi-lo experiment Z-scores. valine, leucine, and isoleucine metabolism fluxes are In addition to label-specific reaction flux elucidation completely identified solely based on the flux con- properties, 13C experiments show a clear pathway bias straints. On the other hand, prior constraints fix none regardless of labeling pattern. The maximum Z-score of of the fluxes in central carbon metabolic systems such all labeling patterns was found for each reaction, giving as glycolysis, citric acid cycle, pentose phosphate path- a metric for the maximum potential for reaction flux way, and anaplerotic reactions; however, fluxes in these determination using 13C-labeled glucose (Figure 5A). pathways are all predicted to be identifiable with a 13C Then, the fraction of reactions that had a maximum experiment using optimal labeling patterns for each potential at least twice the noise level was found for reaction. This result is expected as these identifiable Schellenbergeretal.BMCSystemsBiology2012,6:9 Page7of14 http://www.biomedcentral.com/1752-0509/6/9 a 60 50 s n o cti 40 a e R of 30 r e b 20 m u N 10 0 >20x 10-20x 5-10x 2-5x 1-2x within noise(Z<3) Max Z-score b 1 13C Determinable (Max Z > 2x noise) e 0.9 bl Constraint Determined (a priori) a n 0.8 mi er 0.7 et d 0.6 s n o 0.5 cti a e 0.4 r of 0.3 n o cti 0.2 a Fr 0.1 0 Figure5PathwayAnalysisofOptimal13CExperiments.a)ThemaximumZ-scoreofalllabelingpatternsforeachnon-fixedreactionwas found.Notably,greaterthan50%ofreactionshavenoexperimentwithapredictedZ-scoregreaterthantwicethenoiselevel.Thisindicates thatmanyreactionswillbedifficulttoelucidate.b)Reactionsseparatedbysubsystem.Reactionswithfluxesfixedbyconstraintsaretermed “constraintdetermined.”ThereactionswithmaximumZ-scoresatleasttwicethenoiselevelweretermed“13Cdeterminable.”Thedifference betweendeterminable/determinedfluxesand1isthefractionofreactionsthatcannotbedeterminedthrougheither13Cexperimentsor constraints. Schellenbergeretal.BMCSystemsBiology2012,6:9 Page8of14 http://www.biomedcentral.com/1752-0509/6/9 pathways are the typical pathways being studied using the free dimensions inherent in a network structure, 13C analysis. Other pathways, such as cysteine, threo- given a set experimental error. In an extreme case, if all nine, and lysine metabolism, are completely identifiable the data falls on one point (zero dimensions), no addi- through a combination of prior stoichiometric con- tional information is given from the data. Similarly, a straints combined with well-chosen 13C experiments. dimensionality of one indicates that the data can specify However, many of the remaining subsystems have a one degree of freedom. Singular value decomposition fraction of reactions that cannot be determined using (SVD) is a data reduction technique that allows the esti- any 13C-labeling pattern of glucose. In particular, no mation of data dimensionality (Figure 6a). A data matrix additional information can be obtained from 13C-labeled M of size (n x n ), consisting of all sample fragments points glucose experiments about certain biosynthetic path- points generated from Monte Carlo Sampling, is decom- ways, nucleotide salvage pathways, reductive citric acid posed into M = U · Σ · V T where U and V are ortho- cycle reactions, and certain alternate pathways periph- normal bases and Σ is a diagonal matrix containing eral to glycolysis, such as an alternate pathways from singular values in descending order. The singular values DHAP to D-lactate. Measuring metabolites other than are effectively weightings that describe the information amino acids may give more information on these path- content of the corresponding vectors in U and V ways. Note that in this discussion of identifiability, the towards reconstructing the full matrix M. A partial Z-score metric indicates that an experiment can signifi- reconstruction of M is possible by taking only a subset cantly reduce the confidence interval of a particular of the singular values greater than some threshold. reaction but does not specifically predict the value of These thresholds have a direct interpretation as the the confidence interval. Confidence intervals are directly uncertainty with which a data point can be measured. calculated for experimental data sets in a later section For example, a threshold cutoff of 0.01 indicates that and compared to the Z-scores for the same labeling the remaining uncertainty of the data falls within 0.01 patterns. or 1% error in the measurement of isotope enrichment. To determine the dimensionality of the isotopomer Dimensionality of Isotopomer Data data, SVD was performed on 13C fragments derived The Monte Carlo sampling approach enables the deter- from uniformly sampled flux distribution sets for several mination of the dimensionality of simulated 13C experi- glucose labels. The results are summarized in Figure 6b. ments for a particular substrate labeling pattern. The Globally, the choice of glucose labels affects the dimen- dimensionality gives an indication of the degree to sionality of the resulting isotopomer data set. At the 1% which a particular substrate labeling pattern can specify (0.01) threshold, the label with the highest a b 0.03 0.01 0.003 0.001 Experiment Space random (186 dim.) #000011 54 71 90 100 sample #110010 53 73 91 99 Model #111000 48 64 84 95 133395 d rx.on.sf. SimCu1la3 tion p ocilnotu d 80%CC1+1220%CU 4447 6646 8822 9934 #011000 50 66 82 88 Singular Value C5 41 57 69 81 Decomposition C4 41 56 70 80 C6 39 53 68 77 nd error Point Cloud Instrument precision #01C11111 3399 5533 6655 7777 a e C2 38 49 58 66 s noi Measurement (<100 d.o..f) C3 26 33 42 50 80%C0+20%CU 27 35 42 49 Figure6DataDimensionalitywithSVD.ThelineardimensionalityofexperimentaldataspaceismeasuredwithSingularValueDecomposition. a)TheE.colimodelhas313reactionsand139degreesoffreedom.Theisotopomerfragmentswerecomputedforarandomsampleofflux distributionsandplottedinthe186dimensionalspaceofsimulatedmeasurements.Theupperboundonthenumberofdegreesoffreedomin thisspacewasdeterminedbysingularvaluedecompositiononthesamples.Thenumberofsingularvalueswascounteduntilthemagnitudeof thenextsingularvaluefellbelowtheinstrumentthreshold.b)Thenumberofsignificantsingularvaluesatdifferentlevelsofexperimentalerror istabulatedforvariouslabelingpatterns,withvaluesrangingfrom26to100. Schellenbergeretal.BMCSystemsBiology2012,6:9 Page9of14 http://www.biomedcentral.com/1752-0509/6/9 dimensionality was hypothetical [1,2,5-13C] glucose with combinations thereof. Confidence intervals for reaction 73 dimensions. The three labels for which experimental rates were computed by maximizing and minimizing the data was measured in the subsequent section, [1-13C] value of each reaction in turn subject to a slightly glucose, [6-13C] glucose, and 20% [U-13C] glucose, had relaxed score. dimensions 53, 53 and 35, respectively, at this cutoff. These values are all significantly lower than the best mαin/mαaxcTi ·N·α label, and, in particular, the uniform labeled experiment subjectto: only produces half of the dimensionality as the optimal v ≤N·α≤v min max experiment. This result is significant. While 139 dimen- Error(N·α)≤Error max sions (the number of undetermined dimensions for the model used) are required to specify a unique flux vector, where ci = (0, 0, ...0, 1, 0...0) is a vector of all zeros the dimensionality of the 13C data for each label is sig- with a 1 in position i, and Errormax was set based on the nificantly lower. The best labeling experiment specifies confidence value. Because different data sets provide dif- just over half (73/139 = 0.52) the degrees of freedom ferent levels of consistency, Errormax was chosen to be required, and 20% [U-13C] glucose only specifies about 30 units greater than the minimum error found. one fourth of the possible degrees of freedom (0.26). It These intervals were compared with Z-scores calcu- is worth noting that SVD is a linear operation used to lated through Monte Carlo methods to assess the ability approximate properties of a non-linear system and the of the Z-scores to predict the size of experimental reac- true degrees of freedom may be even lower than tion ranges in a label-specific manner (Figure 7a). The reported. SVD serves as a useful upper bound on the Z-scores were found to be correlated with the relative dimensionality of data for non-linear systems, but the flux ranges in a statistically significant manner (Stu- difference between SVD dimensionality and true dimen- dent’s T-test, p <8.6 × 10-34). A receiver operating char- sionality may grow to be unacceptable for large systems. acteristic (ROC) curve suggests that the Z-scores can For the system studied here, SVD was found to be of identify with both sensitivity and specificity the reactions practical use. that can be elucidated in a label-specific manner, with better performance predicting ranges that are restricted Experimental Validation more by data (Figure 7b). These findings indicate that In order to assess the agreement of computationally pre- the Z-score is indeed a useful predictor of the degree a dicted flux elucidation capacity with experimental data, flux range will be constrained by a particular 13C experi- we took fluxomic measurements for three labeling pat- ment and provide experimental support for the compu- terns in E. coli. Flux distributions that best explain each tational approach taken. set of 13C data were calculated using a non-linear opti- The number of reactions elucidated at particular con- mization problem: fidence intervals was then found (Figure 8). Using different labels provides different levels of reaction confidence minError(v) (Additional File 1). Including no 13C data generates the v subjectto: largest flux ranges (lower black line), while adding 13C v <v<v data reduces the ranges and shifts the curve left. With min max S·v=0 almost no exception, including one experiment yields larger confidence intervals than any combination of two The function Error(v) is a score of how well a given carbon sources which in turn is a larger range than flux distribution fits the experimental data. It is defined including all three sets. Of the single experiment curves, as: the 20% [U-13C] glucose curve provides notably worse (cid:3) (fragment(v)−measured)2 ranges than the other two experiments, consistent with Error(v)= i i the finding that 20% [U-13C] substrate provides data σ2 i∈fragments with the smallest number of dimensions. At a reaction confidence of 1 mmol · gDW-1·h-1 (a where measured is the measured fractional enrich- i relatively non-stringent cutoff), 85 reactions are specified ment of fragment i, fragment i(v) is the computed frac- simply from uptake rate data without any 13C data. Per- tional enrichment of fragment i as a function of the flux forming the least informative 13C experiment, using 20% distribution v, and a = 0:014 is the standard deviation [U-13C] substrate, yields 105 reactions that meet the of the fragments as calculated from experimental confidence criterion, whereas the combination of all replicates. three 13C experiments yields 125 reactions that meet the Reaction flux confidence ranges were then computed criterion. In other words, performing all three experi- for all reactions using all three sets of 13C data and all ments will increase the number of elucidated reactions Schellenbergeretal.BMCSystemsBiology2012,6:9 Page10of14 http://www.biomedcentral.com/1752-0509/6/9 g a b n ECXS_for 22C18..154Z-s46cC3.o.061res1C95..4U4 nC11o49o..n55nefid15eC1.n.715ce 15RC3..5a65ng1eC69.s.U32 R00aC..46n110ge 00RC..36e612duc00Ct..i05oU27n e 10..08 Zcu(toZ f f i= nc0 r-e>a sZi = 70) EDA 19.6 11.2 5.4 13.6 1.0 1.8 12.2 0.93 0.87 0.10 Rat FTHFD 30.8 5.5 5.0 15.8 3.5 5.4 14.7 0.78 0.66 0.07 e 0.6 v GMLAYLKS 1360..60 71..10 211..06 284..67 234..91 233..12 244..01 00..5052 00..6046 00..5042 Positi Range Reduction PGI 34.1 60.5 11.4 100.0 4.2 13.6 15.6 0.96 0.86 0.84 e 0.4 > 0.8 u PGK 32.5 49.3 8.5 96.8 9.9 9.6 12.6 0.90 0.90 0.87 Tr > 0.6 POX 2.2 1.8 2.5 13.0 12.4 12.1 12.2 0.05 0.07 0.06 > 0.4 PFL 21.4 2.6 7.8 19.7 14.4 13.6 12.6 0.27 0.31 0.36 0.2 > 0.2 PPC 49.3 8.6 21.9 27.7 7.1 6.6 10.1 0.74 0.76 0.63 rFUM 35.0 32.5 18.2 100.0 6.8 5.9 8.3 0.93 0.94 0.92 0 0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure7ComparisonofCalculatedZ-scoresandExperimentalFluxRanges.a)Forseveralreactions,thecomputedZscoresarecompared totheresultingmeasuredfluxranges.Z-scoresshow(colorcoded)Z-scoresforeachofthe12reactionsandthreeglucoselabels.FVAindicates theabsoluteallowablefluxrangesforthreeglucoses([1-13C],[6-13C],[U-13C])aswellastherangeifno13Cdataisimposed(’none’).A normalizedversionofthistableisalsopresentedwhereallfluxrangesaredividedbytheFVArangethusshowingthefractionoffluxrange remaining.Thisquantityrangesfrom0(rangefullyspecified)to1(noadditionalinformation).b)Areceiveroperatingcharacteristic(ROC)curve fortheabilityofZ-scorestopredictconfidenceintervalreductionbydifferentfractions.Z-scoresareobservedtopredictlargerinterval reductions(e.g.ratio<0.2)withhigherspecificityandsensitivitythansmallerreductions(e.g.ratio<0.8). by 40 reactions or about 50%. As the model used con- than either [1-13C] glucose or [6-13C] glucose. Thus, tains 278 carbon-tracking reactions and reaction groups, the amount of information likely to be obtained by a the increase in knowledge at 1 mmol · gDW-1·h-1 confi- 13C experiment can be predicted in a reaction-speci- dence from 85 to 125 reactions from using 13C data fic manner before having to carry out an experiment. indicates that a large gap in the knowledge remains. It (cid:129) There is no universally best label. The best label seems apparent that other methods must be developed depends on the experimental objective. Certain reac- to obtain flux information at the genome scale from sin- tions are more precisely measured with some labels gle experiments, as would ultimately be desirable. How- than others, and no label is best at elucidating all ever, as noted in the above section, the majority of reactions. Certain hypothetical 13C labels of glucose, reactions that are elucidated by 13C-labeled glucose for example [1,2,3-13C] glucose and [1,2,5-13C] glu- experiments lie in central metabolic pathways, which cose, are predicted to perform better than commer- tend to be both of high interest and not-specifiable by cially available single labels for many reactions. constraints alone. (cid:129) The 13C data dimensionality is less than anticipated. Whereas each 13C experiment can measure Conclusions 186 pieces of information at a time, there is a high We introduce a new framework for calculating the degree of interdependence. We measured the true uncertainty inherent to 13C experiments using Monte data dimensionality to be between 35 and 50 dimen- Carlo Sampling. This allows us to predict the success of sions for commercially available labels and as high experiments before performing them. The method used as 73 for exotic labels. This high data redundancy here 1) does not require experimental identification of can partially explain why 13C experiments yield the ‘real’ flux state a priori[10] and 2) reports scores many reaction rates with high uncertainties. for the resolution capability for each reaction as opposed to Boolean identification calls [11]. This frame- This study suggests limitations of steady-state 13C ana- work reveals several key findings: lysis using solely amino acids due to the lower than expected dimensionality of the isotopomer data. How- (cid:129) The choice of input label is important, as different ever, steady-state 13C analysis is clearly still useful eluci- labels perform better than others. In particular, the dating reaction fluxes in E. coli metabolism. Notably, as commonly-used 20% mixture of uniform label + 80% the study was conducted using only protein-derived natural label was shown computationally and experi- amino acids, it would be of immediate interest to deter- mentally to resolve significantly fewer reaction fluxes mine the additional benefit of measuring other classes

Predicting outcomes of steady-state 13C isotope tracing experiments using Monte Carlo sampling. PDF

1.4 MB·English

by Schellenberger, Jan

#journals #pubmed

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Predicting outcomes of steady-state 13C isotope tracing experiments using Monte Carlo sampling.

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.