ebook img

Volatile Biomarkers. Non-Invasive Diagnosis in Physiology and Medicine PDF

564 Pages·2013·12.968 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Volatile Biomarkers. Non-Invasive Diagnosis in Physiology and Medicine

PART A Interpretation of Breath Analysis Data CHAPTER 1 Mathematical and Statistical Approaches for Interpreting Biomarker Compounds in Exhaled Human Breath JoachimD.PleilandJonR.Sobus NationalExposureResearchLaboratory,OfficeofResearchandDevelopment, USEnvironmentalProtectionAgency,ResearchTrianglePark,NC27711,USA 1.1 INTRODUCTION Thevariousinstrumentaltechniques,humanstudies,anddiagnosticteststhatproduce datafromsamplesofexhaledbreathhaveonethingincommon:theyallneedtobe putintoacontextwhereinaposedquestioncanactuallybeanswered.Exhaledbreath containsnumerouscompounds;justthevolatileorganicfractionalonehasbeenesti- matedtorepresentinexcessof500differentchemicalspecies.Inaddition,theaerosol fractioncontainsproteins,signalingmolecules,dissolvedinorganiccompounds,and evenbacteriaandvirusesaddingtothecomplexityofthetotalsample. Nosingletechniquecandetecteverythinginbreath,infact,eventhemostbroadly designedbreathmeasurementsresultinsuitesofcompoundsrestrictedbythemethods used.Forexample,reactiveoxygenspeciesmaybeobservedusingreal-timesensors orreal-timemassspectrometry(MS),butnotbygaschromatography-MS(GC-MS), whereas GC-MS can discriminate among a variety of hydrocarbons, alcohols, and ketones that may overlap completely in a real-time MS instrument without benefit ofchromatographicseparation.Furthermore,thefractionofthebreath(gas-phaseor aerosolphase)alsodetermineswhatmeasurementscanbemade;forexampleproteins andsignalingmoleculescouldbedetectedinexhaledbreathcondensateviaenzyme- linkedimmunosorbentassay(ELISA),nuclearmagneticresonance(NMR),orliquid chromatography(LC)MS,butnotwithanygas-phaseinstrumentssuchasthosebased onopticalspectroscopyorgaschromatography. Thefirstissuethatthedataanalystfacesisthatalldataheorshesees,regardless howcomplexordetailed,isnotcomprehensivebutalwaysstratified(restricted)bythe chemistry, instrumentation, and thermodynamics of the choices made for sampling andanalysis.Thesecondissueisthatasuiteofcompounds measuredinanygiven sampledoesnotrepresentanindependentsetofvariables;sub-groupsofbiomarker compoundsoftenhaveappreciablecovariancereflectingsimilarmetabolicpathways orexogenoussources.Thethirdissuerelatestovariability;anygivensampleisunique anditcanneverbetakenforgrantedthattheconstituentsofthebreatharethesame betweenpeoplenorwithinonepersonovertime.Finally,anyindividualcompound VolatileBiomarkers.http://dx.doi.org/10.1016/B978-0-44-462613-4.00001-5 3 2013PublishedbyElsevierB.V. 4 CHAPTER 1 Mathematical and Statistical Approaches inbreathcanhaveawiderangeofconcentrationsthatareallconsidered“normal”or “unremarkable”intheapparentlyhealthygeneralpopulation.Thishasimplications for assessing health or exposure status based on just a few data points; under such constraints,theanalystcanonlyinterpretameasurementasastatisticalprobability thattheconcentrationisprobative. 1.2 DATA INTERPRETATION Inthischapter,wedescribeaseriesofmathematicalandstatisticalapproachesgeared specifically to the interpretation of volatile organics in exhaled breath that can be implementedtoaddressthefourissuesoutlinedabove.Althoughallmethodsinteract, wehaveassignedthemtofivecategoriesforthepurposesofthisdiscussionasfollows: 1. Datavisualizationandsummarystatistics. 2. Variableindependenceandclustering. 3. Populationstatisticsandvariancecomponents. 4. Stochasticmodelsandmeta-data. 5. Dynamicmodelsandlongitudinaldata. Eachoftheseapproachescanprovidedistinctinformationaboutabreathdataset, generallyincreasingindetailintheorderlisted.Wenotethatthebroaderinterpreta- tionsgleanedfromcategories1–3feedthemodelingprocessesin4and5.Wefurther note that not all of these procedures need to be performed; often it is sufficient to answeraquestionwithasimpleanalysisbasedonagraphorasummarytable.Inthe following,wedescribethesecategoriesandshowhowtheycanprogressivelytella storyaboutacquireddata.Weassumethatmultiplecompoundshavebeenmeasured acrossanumberofpeople,subjectsmaybegroupedascasesandcontrols,hostfactor meta-data have been collected, and measurements may have been repeated, either withorwithoutinterventionortreatment. 1.2.1 Data visualization and summary statistics The first step in any analysis of newly acquired data is to get a “feel” for how the experimentturnedout.Generally,wemakegraphsandcalculateaverages,standard deviations, data ranges, etc. We have found that it is very helpful to get all of the data into some form of visual representation, either as bar graphs, cluster plots, or heatmaps,dependingoncomplexityofthedatastructure.Forexample,inarecent publication, we described measurements of a series of polar volatile organic com- pounds (PVOCs) in exhaled breath condensate (EBC) made during an intervention study of diesel exhaust exposure.1 Human subjects (3 males and 6 females) were eachexposedfor2htoadilutedieselexhaustatmosphereandtoapurifiedairatmo- sphereonseparateoccasions.2 Avarietyofsamples,includingEBC,werecollected immediately before and after the exposures, and again 24h later; EBC means data foreightofthemostprevalentcompoundswereplottedbyninesubjectsgroupedby Pleil and Sobus 5 FIGURE1.1 Bar graph visualization of balanced exhaled breath data from an intervention study of dieselexhaustexposure.Summarybreathdataforninesubjectsandeightcompounds. gender(Figure1.1).Wenotedaslightgenderbiasforsomecompounds,butotherwise the results were unremarkable and we went on to assess the data from a statistical perspective. In a subsequent publication, however, we developed methodology for datavisualizationusingheatmapstylegraphics.3Briefly,heatmapsarevisualrepre- sentationsofquantitativedataontwoaxes;thex-axisreflectsindividualsamplesand they-axisconsistsofgroupsofmeasuredparameters.Thefieldbetweentheaxesis comprisedofanarraycontiguousboxescolorcodedtoreflectquantitation.Theterm “heatmap”isderivedfromtheconventionthatthehigherlevelstendtowardred,the lowerlevelstendtowardblue. WerevisitedthedatasetfromHubbardetal.1 andfoundthatheatmapvisualiza- tionwascapableofshowingmorepatterndetail.Thistechniquehadtoberestricted to seven subjects with complete data to avoid blank spots on the heat map as we lostafewsamplestofollow-up.Figure1.2showsthisalternateapproach;notethat we now have access to results from individual subjects’ samples, for all 16 mea- sured PVOCs, and the flexibility for grouping samples by gender and longitudi- nal parameters. Here, the gender effect becomes obvious; males were expressing muchhigherlevelsofmanyofthemeasuredPVOCs,especially2-methylpropanal, 1-heptanol,butanal,pentanal,1-hexanol,and3-methyl-3-pentanol.Onlyhexanaland heptanalreversethistrend.Wefurtherseethatthereisnoapparenttreatmenteffect orlongitudinaltimeeffect,thatis,therearenoobviouspatterndifferencesbetween diesel exposures and air exposures, nor pre- and post-exposure data. These simple 6 CHAPTER 1 Mathematical and Statistical Approaches FIGURE1.2 Heat map visualization of balanced exhaled breath data from an intervention study of diesel exhaust exposure. Individual samples grouped by longitudinal time frame (pre-exposure,post-exposure,and24hpost-exposure)aswellasbygender. visualization approaches, coupled with summary statistics, can provide hints as to howmoredetailedmathematicalapproachescouldbeimplementedtoquantifythese observations.Also,wenotethatthevisualizationmethodsforcomplexdatamakethe subsequentstatisticalcalculationsmoreaccessibletothereadership. 1.2.2 Variable independence and clustering Oneofthemostvexingproblemsencounteredindatainterpretationproceduresisthe determinationofactualindependenceofwhatwegenerallydenoteas“independent variables.” Subsequent statistical evaluations rely on the notion that measurements that are used in models are not overly correlated. For example, consider the height and weight parameters of human subjects as host-factor data in a complex breath experiment. We know that taller people tend to be heavier, and so there is signifi- cantcorrelationexpected.Ifoneweretotreatheightandweightasindependentand place them both into a model for predicting a health outcome (along with a series of breath biomarkers), one can get completely different interpretations as to their respectiveimportancedependinguponwhichwasenteredfirst.Thisiswhyoneoften seesbody-massindex(BMI)usedasacompositeparameterinstead.Nowconsider unexpectedcorrelationsamongvariables,forexample,amongcompoundsmeasured in the exhaled breath of subjects in an intervention or case-control study. If certain Pleil and Sobus 7 compoundsarerepeatedlybehavingthesameway,thentheirinclusioninevensimple multivariateregressionmodelswillresultinmathematicalinstability. Wecannotknowaprioriifdifferentanalytesaretrackingthesameoutcome,how- ever,wecanperformvariouscorrelationtestsamongpresumedindependentvariables toassessthedegreeofindependence.Themostfundamentalmethodisthecorrelation matrixwhereintheregressionbetweenpairsofvariables(V,V )iscalculatedasthe i k Pearsoncorrelationcoefficient“r”whichhaspossiblevaluesrangingfrom−1to1. Positiver-valuesindicatethatalargerV isassociatedwithalargerV ,whereasaneg- i k ativerindicatesthatalargerV isassociatedwithasmallerV .Thecloserther-value i k isto0,themoreindependentthetwovariables.Forexample,inastudymeasuring height(H),weight(W),andpercentbodyfat(BF),r=0.486forWvs.H,r=0.074for Hvs.BF,andr=0.613forWvs.BF.4Basedontheresultsofthissimplecorrelation matrix, one would probably not include both parameters H and W, nor both W and BF,butwouldkeepHandBFashostfactorsinaresultingmodel. The correlation matrix approach is considered a semi-quantitative measure for overall data independence interpretation because it treats variables two at a time; one cannot discern directly how multiple variables, or combinations of variables, correlate.AmoresophisticatedapproachisavailablewhereinEigenvectorprojections arecalculatedforallvariablesinn-dimensionalspace.Usingstatisticalsoftwaresuch as proc VARCLUS from SAS Inc. (Cary, NC, USA), it is possible to develop a “dendrite”diagramthatcanbeusedtocreateclustersofvariablesthathavecertain levels of co-linearity. This is similar to principle components analysis (PCA), but rather than grouping samples, this approach groups variables. Variable clustering servestwopurposes,itimprovesmodelstabilitybyidentifying/removingcollinearity, andreducesthetotalnumberofvariablesforamoreparsimoniousmodel.5 Asanexample,considerthedendritediagram(dendrogram)inFigure1.3where weshowagenericexampleofbothhostfactorandmeasurementdataclustering.5In thiscase,weuseahypotheticaldatastructurewithsixhostfactors(gender,BMI,age, etc.)andsevenenvironmental/biomarkervariables(e.g.concentrationmeasurements in blood or urine). This methodology has been applied to complex environmental dioxincongenerdata6 andtostudiesofjetfuelexposureintheUSAirForce.7 By collapsing the more highly correlated variables into clusters, we do not lose muchexplanatory power,butimprovetheinterpretivepower ofsubsequent models by increasing the ratio (n/m) between number of samples “n” and the number of independentvariables“m.”Wehaveobservedthatn/m(cid:2)10isagoodruleofthumb forassessingtheimportanceofindividualvariables.6Wecautionthatthisprocedure requiresacertainamountofjudgmentonthepartoftheinvestigator.Thefirstissue is choosing the variance level vs. number of clusters. This is very dependent on the eventual n/m parameter. The second issue is how to combine information from variablescollapsedintoclusters.Thereareanumber ofchoices thatdependonthe natureofthevariables;intheaboveexample,weuseasimplesum,butotherstrategies canbeemployedaswell.Forexample,givenhighlycorrelatedvariablessuchasHF3 andHF4(especiallyiftheyarenotcontinuous),onecanjustdiscardoneortheother. 8 CHAPTER 1 Mathematical and Statistical Approaches FIGURE1.3 Example dendrogram of variable cluster analysis. Starting with a total of m=13 inde- pendentvariables(p=6hostfactorsandq=7environmentalvariables),forming8clus- ters(HF1,HF2,HF3+HF4,HF5+HF6,RV1,RV2,RV3+RV4+RV5+RV6,andRV7) resultsinexplainedvarianceof∼93%.Collapsingtheclustersfurtherto2Groupsexplains ∼83%ofthevariance,butnowisdifficulttointerpret. Formeasuredconcentrations,itissometimesbettertonormalizeeachmeasurement toatotaltoavoidhavingaparticularvariableoverwhelmthesum. 1.2.3 Population statistics and variance components Traditionalmathematicalanalysesofbreathbiomarkers,bothforclinicalandenviron- mentalresearchapplications,utilizemeasurementstatistics(e.g.meanandmedian) toevaluatedifferencesbetweengroups.Forexample,itiscommontoevaluatemea- surements statistics to compare cases vs. controls, exposed vs. unexposed subjects, ormalesvs.females(asshowninourearlierexampleofadieselinterventionstudy). Furthermore,itiscommontoemploystratifieddataanalyses—thatis,analysesusing multiplelevelsoforganization—toreducetheimpactsofconfoundingvariables.Con- sidertheevaluationofsmokingeffectsonbreathbiomarkerlevels;onecouldperform asingleevaluationusingallsubjects(smokersvs.non-smokers),twoevaluationsafter stratifyingbysex(malesmokersvs.malenon-smokers;femalesmokersvs.female non-smokers),fourevaluationsafterstratifyingbysexanddiseasestatus(malesmoker [case]vs.malesmoker[control];malenon-smoker[case]vs.malenon-smoker[con- trol]; female smoker [case] vs. female smoker [control]; female non-smoker [case] vs. female non-smoker [control]), and so on. These stratified analyses can narrow Pleil and Sobus 9 downthepotentialoriginsofanobservedeffect,butrequirelargersamplenumbers witheachlevelofstratification.Thus,aninvestigatormustbalancethedesiredout- come of a given analysis with costs required to achieve sufficient statistical power andsensitivity.6 Givenadequatesamplenumbersforstratifiedgroupanalyses,itisprudenttofirst investigatetheunderlyingdistribution(s)ofthebiomarkerdatainquestion.Ourear- lierexamplesfordataanalysis(i.e.datavisualizationandvariableclustering)canbe consideredqualitativeorsemi-quantitative,andgenerallyhavenoaprioriconditions fordatastructure.Hypothesistestingontheotherhand,isentirelyquantitative,and inmanycasesreliesondistributionsofmeasurementdatabeingapproximatelyGaus- sian,or“normal.”Simplediagnosticproceduresexistinmostsoftwarepackagesto evaluatedatadistributions.Oftentimes,simplehistograms,normalprobabilityplots, or quantile-quantile plots can be visually inspected to evaluate normality assump- tions. More advanced statistical packages offer statistical tests (e.g. Shapiro-Wilk andKolmogorov-Smirnov)toconfirmresultsfromvisualinspection. Measurementsofanalytesinbiologicalmediaareoften“lognormally”distributed; thatis,thedistributionoftheloggedvaluesisapproximatelynormal.Lognormaldata canbeidentifiedbyaright-skeweddistributionoftheoriginal(non-transformed)data, reflectingfewvaluesatexceedinglyhighlevels,andmanyvaluesnear,butnotbelow, alowerthreshold(generallyzero).Additionalsignsoflognormallydistributeddata include:(1)anobservedarithmeticmeanvaluethatislargerthanthemedian(reflect- ingthedifferentialeffectsofextremelyhighvaluesonthesestatisticalparameters)and (2)anobservedconfidenceintervalthatincludesnegativevalues(itisimpossibleto havenegativeamountsofabiomarker).Whenoneencountersthesesigns,itisbest to evaluate data transformation approaches before pursuing quantitative hypothesis testing. “Parametric”testingproceduresaregenerallyusedtoevaluatenormallydistributed dataandlognormallydistributeddatathathavebeenlog-transformed.Commonlyused parametrictestsincludetheStudent’st-test,thepairedt-test,andanalysisofvariance (ANOVA).AStudent’st-testevaluatesdifferencesinbiomarkermeasurementsacross twogroups(e.g.menvs.women);apairedt-testevaluatesdifferencesinbiomarker measurements between paired samples from individuals (e.g. pre-intervention vs. post-intervention);andANOVAcomparesbiomarkermeasurementsacrossmultiple groups(e.g.childrenvs.adolescentsvs.adults).Intheeventthatbiomarkerdataarenot Gaussian,orcannotbemathematicallytransformedtoapproximatenormality,“non- parametric” methods exist for hypothesis testing. For example, the Mann-Whitney test,Wilcoxontest,andKruskal-WallistestarecommonsubstitutesfortheStudent’s t-test,pairedt-test,andANOVA,respectively.Thesetestsgenerallyapproximatethe resultsoftheparametrictestsgivenalargeenoughsamplesize.However,theexact datarequirements(e.g.samplenumber,sampleindependence,randomselection)for anygiventestshouldalwaysbeconsideredpriortohypothesistesting. The statistical tests discussed thus far generally utilize a single observation for eachsubject.(Whilethepairedt-testusestwoobservationspersubject,thedifference betweenthetwovalues[i.e.asinglevalue]isusedforhypothesistesting.)Therefore, 10 CHAPTER 1 Mathematical and Statistical Approaches differentapproachestodataanalysisarerequiredwhenmultiplemeasurementsexist for each subject. At the most basic level, it is of interest to evaluate the spread of thedata,orthe“variance,”betweenandwithinsubjects.Todothis,thetotalvariance acrossallmeasurementsisfirstpartitionedintothatwhichisobservedacrossrepeated measurementsofindividualsubjects,knownas“within-person(intra-individual)vari- ance,” and that which is observed across average levels of all subjects known as “between-person(inter-individual)variance.”Thesebetween-andwithin-personvari- ancecomponentscanbecalculatedusinganumberoftechniques.8 Thesimplestof thesemethodsisanalysisofvariance(ANOVA)whichissuitableforbalanceddata setswherethesamenumberofmeasurementsexistsforeachsubject.Morecomplex methods,suchasrestrictedmaximumlikelihoodestimation(REML),mayberequired whenworkingwithunbalanceddatasets;thesemethodsaretypicallyavailableonly inmoreadvancedstatisticalsoftwarepackages. Oncethevariancecomponentsareestablished,theybecomeinstrumentalforiden- tifyingtheoriginsofexposureordisease,andinturn,thebestopportunitiesformitiga- tionorintervention.Largewithin-personvarianceandsmallbetween-personvariance indicateslittledifferencebetweensubjectsonaverage,butlargedifferencesovertime foreachsubject.Thisresultmaypointtoatemporalevent,randomorotherwise,that affects each subject equally. Alternatively, small within-person variance and large between-person variance indicates little change over time, but marked differences betweensubjects.Thisresultmaypointtoahost-specificparameter(e.g.genotype, fitness-level) thatdictates biomarker response. Inthefirstexample, ageneral inter- vention strategy, equally applicable to all subjects, might be suitable to reduce an exposure or limit a biological response. In the second example, a targeted strategy wouldlikelybenecessarytofirstidentifythecauseofelevatedbiomarkerlevelsfor certainindividuals,andthenforintervention. 1.2.4 Stochastic models and meta-data After identifying group-based differences in biomarkers levels, and within- and between-subjectvariancecomponents(forrepeated-measuresstudiesonly),thenext stepistodevelop statisticalmodelsusingstudymeta-data. Statisticalmodels serve twofunctionsforbreathresearch:(1)theyallowinvestigatorstosimultaneouslyiden- tifymultiplesignificantpredictorsofbreathbiomarkerlevelsand(2)theyprovidea platformforpredictingbreathbiomarkerlevelsinotherstudieswhereonlymeta-data exist. Animportantdecisionformodeldevelopmentisidentifyingadependentvariable; this is not as easy as it sounds. Often, the dependent variable is obvious by design and represents the “outcome” for a subject; it can be binomial, that is, cancer/not cancer,infected/notinfected,oritcanbeacontinuoushealthoutcomevariablesuch ascholesterollevel,totalurinaryprotein,pulmonaryfunction(e.g.forcedexpiratory volume in 1s (FEV1), forced vital capacity (FVC)), or DNA damage (e.g. sister chromatidexchange(SCE),strandbreaks),amongothers. Pleil and Sobus 11 However,therearemanyoccasionswhenthechoiceofthedependentvariableis notobvious,especiallyinenvironmentalorcross-sectionalpublichealthstudiesfor whichtheanalystwasnotconsultedinsamplingdesign.Forexample,supposeone hasmeasuredasuiteofexogenouschemicalsandmetabolitesinexhaledbreath,and hasacquiredmeta-dataconcerningrecentactivity,jobtype,height,weight,gender, ethnicity,etc.,butnohealtheffectsinformationwascollectedorobserved.Whatcould thedependentvariablebeforinterpretationpurposes?Insuchcases,thefirstquestion tobeaddressedis:Whatdowewanttoknow?Generally,wewouldliketoexplore the linkages between suspected exposure sources and the resulting internal dose in humansubjects.Supposethatinthetotalityofallmeasurements,weobservebenzene, toluene,ethyl-benzene,andxylenes(BTEX),plusmanyotherorganiccompoundsin breath. If one of the suspected sources for all exposures is automobile exhaust, we couldsumtheBTEXnumbersintoonecompositeparameterandusethistorepresent the dependent variable for overall fuels exposure. This is a bit of a bootstrapping approach, especially if we keep the individual BTEX compounds as independent continuousvariables,butoftenthisistheonlywaytobuildastochasticmodelinthe absenceofadesigneddependentvariable. Giventhatwehaveanumberof“independentvariable”measurementsandrelated meta-dataandsomeconsensusdependent(outcome)variable,thenthenextstepisto determine which independent variables and data actually tell a story relating to the changes intheidentified outcome variable. Thisrequires someformof amodeling approach,andifthedataincludebothhostfactorsandcontinuousvariables,thebest approachisageneralizedlinearmodel,ora“mixedeffects”model.8,9 Ageneralformofsuchamodelis: (cid:2) (cid:3) Y = β X +β X +···+β X +α +b +(cid:4) , hij 1 1hij 2 2hij p phij h hi hij whereY isthevalueofsomerelevantbiologicalparameterforthejthobservation hij oftheithsubjectinthehthgroup; X ,X ,...,X arethevaluesofthefixed 1hij 2hij phij effect variables such as environmental chemical concentrations (in air, water, food, dust,etc.),andhostfactorssuchasage,healthstate,gender,geneticpolymorphisms, ethnicity,etc.;pisthetotalnumberoffixedeffects(note:thehostfactorsmaybefixed foralljwithinagiveni);β ,β ,…,β arethecorrespondingmodeledcoefficients 1 2 p for the fixed effects and host factors; α is the random effect for the hth group; b h hi is the random effect for the ith subject from the hth group; and (cid:4) is the residual hij (unexplained)errorforthejthobservationoftheithsubjectfromthehthgroup. Softwareapplicationsforthisstyleofapproacharecommerciallyavailable(e.g. procMIXED,SAS).Uponcalculation,thecoefficientsandtheirp-valuescanbeinter- pretedtodeterminetheeffectofincludingtheparticularfixedeffectorrandomeffect variableinthefinalmodelforexplainingthevarianceinthebiologicalparameters’ values.Thiscanbedonewithiterativestepsofforwardadditionorreverseelimination withtheeventualobjectivebeingaparsimoniousmodelwithoutappreciablelossof modelingpower.Oncethefinalmodelisestablished,wecanobservewhichexposure parameters and fixed effects are more likely to cause perturbations to the systems biology.5

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.