ebook img

Multiple Predictor Smoothing Methods For Sensitivity Analysis PDF

132 Pages·2006·7.12 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multiple Predictor Smoothing Methods For Sensitivity Analysis

SANDIA REPORT SAND2006-4693 Unlimited Release Printed September 2006 Multiple Predictor Smoothing Methods For Sensitivity Analysis Curtis B. Storlie and Jon C. Helton Prepared by Sandia National Laboratories Albuquerque,New Mexico 87185and Livermore,California 94550 Sandia is a multiprogram laboratory operated by Sandia Corpora tion. a Lockheed Martin Company, for the United States Department of Energy's National Nuclear SecurityAdministration under Contract DE-AC04-94AL85000. Approved for public release;further dissemination unlimited. (III) Sandia National Laboratories Issued by Sandia National Laboratories, operated for the United States De partmentofEnergyby Sandia Corporation. NOTICE: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Govern ment, nor any agency thereof, nor any oftheir employees, nor any oftheir con tractors, subcontractors, or their employees, make any warranty, express or implied, or assume any legal liability or responsibility for the accuracy, com pleteness, or usefulness of any information, apparatus, product, or process dis closed, or represent that its use would not infringe privatelyownedrights. Ref erence herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government, any agency thereof, or any oftheir contractors or subcontractors. The views and opinions expressed herein do not necessarily state or reflect those ofthe United States Government, anyagencythereof, or anyoftheir con tractors. Printed in the United States ofAmerica. This report has been reproduced di rectlyfrom thebestavailablecopy. Available to DOE andDOEcontractorsfrom U.S. DepartmentofEnergy OfficeofScientificandTechnicalInformation P.O. Box 62 OakRidge, TN 37831 Telephone: (865)576-8401 Facsimile: (865)576-5728 E-Mail: [email protected] Onlineordering: http://www.doc.govlbridge Availableto the publicfrom U.S. DepartmentofCommerce NationalTechnicalInformationService 5285 PortRoyalRd. Springfield, VA 22161 Telephone: (800)553-6847 Facsimile: (703)605-6900 E-Mail: [email protected] Onlineordering: http://www.ntis.gov/ordering.htm SAND2006-4693 Unlimited Release Printed September 2006 Multiple Predictor Smoothing Methods for Sensitivity Analysis CurtisB.Storlie"andJonC.Heltonb aDepartmentofStatistics,North Carolina StateUniversity, Raleigh,NC 27695-8203 USA bDepartmentofMathematics and Statistics,Arizona StateUniversity, Tempe,AZ 85287-1804 USA Abstract The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application ofthe following nonparametric regression techniques are described: (i) locally weighted regression (LOESS), (ii) additive models, (iii) projection pursuit regression, and (iv) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems andresults from aperformance assessment fora radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown bythe example illustrations, the use ofsmoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results thancanbeobtained withmoretraditional sensitivity analysisproceduresbased onlinearregression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions arepresent. Key Words: Additive models, Epistemic uncertainty, Locally weighted regression, Nonparametric regression, Projection pursuit regression, Recursive partitioning regression, Scatterplot smoothing, Sensitivity analysis, Stepwise selection, Uncertaintyanalysis. 3 Acknowledgements Work performed for Sandia National Laboratories (SNL), which is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Security Administration under contract DE-AC04-94AL-85000. Review at SNL provided by L. Swiler and C. Sallaberry. Editorial support provided by F. Puffer, K. Best, M. Spielman, and 1. Ripple ofTech Reps, a division of Ktech Corporation. 4 Contents 1. Introduction 9 2. TraditionalParametricRegression Models 11 2.1 LinearRegression 11 2.2 RankRegression 12 2.3 Quadratic Regression 13 2.4 NonlinearRegression 13 3. Nonparametric Regression 15 3.1 Univariate ScatterplotSmoothers 15 3.1.1 Running Means 15 3.1.2 LocallyWeighted Means: Kernel Smoothers 15 3.1.3 LocallyWeighted Regression 16 3.1.4 Smoothing Splines 18 3.2 Equivalent Degrees ofFreedom andSmoothingParameters 18 3.3 Multivariate Smoothers 20 3.3.1 LocallyWeighted Regression: LOESS 20 3.3.2 AdditiveModels 21 3.3.3 Projection PursuitRegression 22 3.3.4 Recursive Partitioning Regression 23 3.4 Hypothesis TestingforVariable Importance 27 4. Implementation ofSmoothingMethods forSensitivityAnalysis 29 4.1 StepwiseVariable Selection 29 4.2 Traditional Regression:LinearRegression (LIN_REG), RankRegression (RANK_REG) and Quadratic Regression (QUAD_REG) 30 4.3 LocallyWeighted Regression (LOESS) 30 4.4 GeneralizedAdditiveModels (GAMs) 31 4.5 Projection PursuitRegression (PP_REG) 35 4.6 RecursivePartitioning Regression (RP_REG) 37 5. Example SensitivityAnalysisResults 41 5.1 ExampleResults: AnalyticTestModels 43 5.1.1 MonotonicRelationships:Yl =fi(xb x2) 44 5.1.2 Monotonic Relationships:Y2=h(xbx2) 46 5.1.3 Nonmonotonic Relationships:Y3=.f3(xbXl> ,xs)· ···········..··..· · ·· · ·.47 5.1.4 Nonmonotonic Relationship:Y4=.!4(x" x2'x3) .48 5.2 ExampleResults: Two-Phase FluidFlow 48 5.2.1 CumulativeBrineFlow at 1000yr(BRNREPTC.1K) .49 5.2.2 CumulativeBrineFlowat 10,000yr(BRNREPTC.10K) ; 55 5.2.3 Brine Saturationat 1000yr(REP_SATB.1K) 56 5.2.4 Brine Saturationat 10,000yr(REP_SATB.1OK) 56 5.2.5 Pressure at 1000yr(WAS_PRES.1K) 62 5.2.6 Pressure at 10,000yr(WAS_PRES.10K) 64 6. Observations andInsights 69 7. References 71 AppendixA: RCode A-l 5 Figures Fig. 1. Linearregressiononresults generatedinasensitivityanalysis ofatwo-phasefluid flow model. 12 Fig. 2. Rank regressionon anexamplemonotonicrelationship 13 Fig. 3. Rank regressiononanonlinearand nonmonotonicrelationshipgeneratedinasensitivityanalysis of atwo-phasefluid flow model. 14 Fig. 4. Quadraticregressionon anonlinearand nonmonotonicrelationshipgeneratedinasensitivity analysis ofatwo-phasefluid flow model. l4 Fig. 5. Runningmeans with r=20 onresults generatedinasensitivityanalysis ofatwo-phasefluid flow model. 16 Fig. 6. Locallyweightedmeans with kernelfunction k(z;h)inEq. (3.5) and bandwidthh=0.6 onresults generatedinasensitivityanalysis ofatwo-phasefluid flow model. 17 Fig. 7. Analysiswith LOESS forkernelfunction k(z; h) inEq. (3.11) andr=60(i.e., aspan of0.20) on results generatedinasensitivityanalysis ofatwo-phasefluid flow model. 18 Fig. 8. Analysiswith smoothingspline with a=X(l)' b=x(nS) anddf= 8(see Eq. (3.13)) onresults generatedin asensitivityanalysis ofatwo-phasefluid flow model. 19 Fig. 9. ExampleofLOESS surface constructedforY =.f(xbxz) =(l/21t) exp{-[(xi - 5)z +(xz - 5)z]/2}; see Eq. (3.27) 21 Fig. 10. Exampleofadditive model surface constructedforY =.f(xbxz)=sin(xI) +(xz - 5)3;seeEq. (3.35) 22 Fig. 11. Recursivepartitioningregressiononresults generatedinasensitivityanalysis ofatwo-phasefluid flowmodel: (a) Individualregression linesgeneratedwith traditionalleastsquares regression, and (b)Individualregressionlines generatedwith robustregressioninwhichthe sumofsquares is minimizedover themiddle two quartilesofthe deviations fromthe regressionline 25 Fig. 12. Recursivepartitioningregressiononresults generatedin asensitivityanalysis ofatwo-phasefluid flow model with individualregressionlines constrainedtomeet continuously: (a) Individual regressionlines generatedwith traditionalleast squares regression, and (b)Individualregression lines generatedwith robust regression inwhichthe sumofsquares isminimizedover themiddle two quartiles ofthe deviationsfrom the regressionline 25 Fig. 13. RecursivepartitioningregressionconstructedforY =.f(xI,xz) =sinxI+(xz - 5)3with individual regressionsurfaces constrainedto meet continuously; seeEq. (3.35) 26 Fig. 14. Analytictest modely. =fl(xb xz) =5xI +(5xz)z .45 Fig. 15. Analytic test modelyz=fz(xb xz)=(xz +0.5)4/(XI +0.5)z .46 Fig. 16. Analytictestmodel v, =f3(xb xz, ,Xg) (see Eq. (5.9)) with surface averagedover x3'x4' ,Xg .47 Fig. 17. ScatterplotsforXlandxzforanalytic test model v, =J4(xbxz, x3) (see Eq. (5.10)) .48 Fig. 18. Time-dependenttwo-phasefluid flow results obtainedwith replicateRl foradrilling intrusionat 1000yrthat penetratesthe repositoryand anunderlyingregion ofpressurizedbrine (i.e., anE1 intrusionat 1000yr) 53 Fig. 19. Scatterplotsforcumulativebrine flow at 1000yr intorepository(BRNREPTC.1K) for undisturbed conditions 55 Fig. 20. Scatterplots forcumulativebrine flow at 10,000yr intorepository(BRNREPTC.10K) foranEl intrusionat 1000yr. 58 Fig. 21. Scatterplots foraverage brine saturationat 1000yrinwaste panels not penetratedby adrilling intrusion(REP_SATB.1K) forundisturbedconditions 60 Fig. 22. Scatterplots foraverage brine saturationat 10,000yrinwaste panelsnotpenetratedby adrilling intrusion(REP_SATB.10K) foranE1 intrusionat 1000yr. 62 Fig. 23. Scatterplotsforpressureat 1000yr inwaste panel penetratedby adrilling intrusion (WAS_PRES.1K) forundisturbedconditions 65 Fig. 24. Scatterplotsforpressureat 10,000yr inwaste panel penetratedby adrilling intrusion (WAS_PRES.10K) for anEl intrusionat 1000yr 67 6 Tables Table 1. ForwardStepwiseVariableSelectionAlgorithmfor SensitivityAnalysiswithLIN_REG, RAN~REGand QUAD_REG 31 Table2. ForwardStepwiseVariable SelectionAlgorithmfor SensitivityAnalysiswithLOESS 32 Table3. ForwardStepwiseVariable SelectionAlgorithmfor SensitivityAnalysiswithGAMs 33 Table4. ForwardStepwiseVariable SelectionAlgorithmfor SensitivityAnalysiswithPP_REG 36 Table5. ForwardStepwiseVariable SelectionAlgorithmfor SensitivityAnalysiswithRP_REG 38 Table 6. SensitivityAnalysesfor AnalyticTestModelv. =flex]. x2) 45 Table 7. SensitivityAnalysisforAnalyticTest Modelv- =h.(x].x2) 46 Table 8. SensitivityAnalysesfor AnalyticTest Modelj-, =h(x].x2' ,Xg) .47 Table9. SensitivityAnalysesfor AnalyticTestModelY4=.f4(x].x2'x3) .49 Table 10. Independent(i.e., sampled)VariablesConsideredinExample SensitivityAnalysesfor Two-Phase FluidFlow(Source: Table 1,Ref. 103,and Table 1,Ref. 140) 50 Table 11. Time-DependentTwo-PhaseFluidFlowResults for aDrillingIntrusionat 1000yrthatPenetrates the Repositoryand anUnderlyingRegionofPressurizedBrine(i.e., anE1 intrusionat 1000yr) UsedtoIllustrateSensitivityAnalysisResults 52 Table 12. SensitivityAnalysesfor CumulativeBrineFlowat 1000 yr into Repository(BRNREPTC.IK) for UndisturbedConditions 54 Table 13. SensitivityAnalysesfor CumulativeBrineFlowat 10,000 yrinto Repository(BRNREPTC.1OK) for an E1 Intrusionat 1000yr 57 Table 14. SensitivityAnalysesforAverageBrineSaturationat 1000 yr in WastePanelsNotPenetratedby a DrillingIntrusion(REP_SATB.IK) for UndisturbedConditions 59 Table 15. SensitivityAnalysesfor AverageBrineSaturationat 10,000 yr in Waste PanelsNotPenetratedby a DrillingIntrusion(REP_SATB.1OK)for anE1 Intrusionat 1000yr 61 Table 16. SensitivityAnalysis for Pressureat 1000yr inWaste PanelPenetratedby aDrillingIntrusion (WAS_PRES.IK) for UndisturbedConditions 63 Table 17. SensitivityAnalysesfor Pressureat 10,000 yrinWastePanelPenetratedby aDrillingIntrusion (WAS_PRES.IOK) foran El Intrusionat 1000yr 66 7 Thispage intentionallyleftblank. 8 1. Introduction (1.4) where D is a probabilitydistribution characterizingthe The importance ofuncertainty analysis and sensi j uncertainty in xi" Correlations and other restrictions tivity analysis as components ofanalyses for complex involving the relations betweenthex, are also possible. systems is almost universally recognized, where uncer Such distributions and any associaied restrictions are tainty analysis designates the determination ofthe un intendedto numericallycapture the existing knowledge certainty in analysis results that derives from the uncer tainty in analysis inputs and sensitivity analysis about the elements of X and are often developed throughanexpert review process.58-73 designates the determination of the contributions of ~ndividua.luncertain analysis inputs to the uncertainty The uncertainty characterized by the distributions m analysis results.1-11 A number ofapproaches to un ~ertai~ty and sensitivity analysis have been developed, DbD2,...,DnXinEq. (1.4) isoften referredto asepis temic uncertainty. Alternate designations for epistemic mcludmg differential analysis,I2-I7 response surface uncertainty include state ofknowledge, subjective, re methodology,I8-26Monte Carlo analysis,27-38 and vari ance decomposition procedures.39-43 Overviews of ducible, and type B,74-82 In particular, epistemic un these approachesare available inseveral reviews.44-52 certainty derives from a lack ofknowledge about the appropriate value to use for a quantity that is assumed to have a fixed value in the context of a particular The focus ofthis presentation is on Monte Carlo analysis. In the conceptual and computational organi (i.e., sampling-based) approaches to uncertainty and zation ofan analysis, epistemic uncertaintyisgenerally sensitivity analysis. Such analyses involve the consid considered to be distinct from aleatory uncertainty, eration ofmodels oftheform which arises froman inherent randomness inthebehav iorofthe systemunder study,74-83 y=f(x), (1.1) Sampling-based uncertainty and sensitivity analy where sesarebased on asample y = [Yt>Y2' ...,YnY] (1.2) Xi =[XiI' xi2'...,xi,nXJ,i=1,2,...,nS, (1.5) isavector ofanalysis results and from the possible values for X generated inconsistency with the distributions in Eq. (1.4) and any associated (1.3) restrictions. Randomsampling isonepossibilityforthe generation ofthis sample. However, owing to its effi is a vector of imprecisely known analysis inputs. In cient stratificationproperties,Latin hypercubesampling general, the model f can be quite large and involved is widely used inanalyses ofthis type, especiallywhen (e.g., asystem ofnonlinearpartial differential equations computationallyintensive models are involved.27,37,38 requiring numerical solution (see Ref. 53) orpossiblya sequence ofcomplex, linked models as is the case in a The analysis evaluations probabilistic risk assessment for a nuclear power plant (see Refs. 54, 55) or a performance assessment for a Yi=y(xd=f(xi),i =1,2,...,nS, (1.6) radioactive waste disposal facility (see Refs. 56, 57)); the vector y ofanalysis results can be ofhigh dimen provideamapping betweenanalysis inputs (i.e.,Xi) and sion and complex structure (e.g., the elements of y analysis results (i.e., Yi) that forms the basis for both might be several hundred temporally or spatially de uncertainty analysis and sensitivity analysis. Once the pendent functions); and the vector x ofanalysis inputs preceding mapping is available, the determination of can also be of high dimension and complex structure uncertainty analysis results is generally straightforward (e.g., several hundred variables, with some variables and involves the generationofsummary results such as corresponding to physical properties ofthe system un histograms, density functions, cumulative distribution der study and other variables corresponding to parame functions (CDFs), complementary cumulative distribu ters in probability distributions or perhaps to designa tion functions (CCDFs), and box plots for individual tors foralternativemodels). elements ofY(Sect. 6.5, Ref. 29). The determination of sensitivity analysis results involves the exploration The uncertainty in the elements ofx is character ofthe preceding mapping with techniques such as ex- izedby asequence ofprobabilitydistributions 9 amination of scatterplots, regression analysis, correla these methods can be successfully applied in situations tion and partial correlation analysis, and searches for involving nonlinear relationships between analysis in nonrandompatterns (Sect. 6.6, Ref.29). puts and analysis results where more traditional regres sion-based approaches would fail to appropriately cap The determination ofsensitivity analysis results is turethese relationships. generally more demanding than the determination of uncertainty analysis results. In particular, the popular The presentation is organized as follows. First, regression and correlation based techniques can fail to traditional approaches to regression-based sensitivity appropriately identify the effects ofthe individual ele are briefly described (Sect. 2), and then nonparametric ments of X on the elements ofy when nonlinear and approaches to regression analysis based on local data nonmonotonic relations arepresent (Sect. 6.6, Ref. 29). smoothing are introduced (Sect. 3). Next, technical Possible approaches to sensitivity analysis to use in details related to the implementation ofthe techniques such situationsinclude grid-basedstatistical analyses of described in Sect. 3 to a sequence ofexample sensitiv scatterplots,30,84distance-based statistical analyses of ity analyses are described (Sect. 4), and the results of scatterplots,85-98 multidimensional Kolmogorov these examples are presented (Sect. 5). The presenta Smirnov tests,99-102 rank-concordance tests,103, 104 and tion concludes with asummary discussion(Sect. 6). classification trees.105, 106 However, the preceding approacheslackthe intuitive appeal ofregression-based Although analyses for real systems almost always approaches to sensitivityanalysis. Inparticular, regres involve multiple output variables as indicated in con sion-based sensitivity analysis can be carried out in a junction with Eqs. (Ll) - (1.3), the following discus sequential manner with variable importance being indi sions assume that asingle real-valuedresult ofthe form cated by the order in which variables enter the regres sionmodel andbythe fraction oftotal variance that can Y= f(x) (1.7) be accounted for as successive variables enter the re gression model. isunder consideration. Similarly, The purpose ofthis presentation is to describe re Yi =f (Xi),i=1,2,...,nS , (1.8) gression-based techniques for sensitivity analysis that are based on multiple predictor smoothing methods. is used to represent the result ofevaluatingY with the Such methods are conceptually consistent with regres sample in Eq. (1.5). This simplifies the notation and sion-based methods that have been widely used in the results in no loss in generality as the results under dis past insensitivityanalysis (Sect. 6.6, Ref. 29), but have cussion are valid for individual elements ofy. All sta the important advantage that they are capable ofincor tistical analyses in this presentation are carried out porating local changes in the relationship between a within the R statistical computing environment,I07 dependent variable (i.e., an element ofy) and multiple which is an open source equivalentto the S-Plus statis independentvariables (i.e., elements ofx). As aresult, tical package.108 10

Description:
The use of multiple predictor smoothing methods in sampling-based sensitivity . Linear regression on results generated in a sensitivity analysis of a two-phase fluid flow model. 12 .. on such models can be obtained in a number of excel- .. short for local regression and was chosen in allusion to.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.