ebook img

Variable Selection in Bayesian Smoothing Spline ANOVA Models PDF

22 Pages·2009·0.2 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Variable Selection in Bayesian Smoothing Spline ANOVA Models

Variable Selection in Bayesian Smoothing Spline ANOVA Models: Application to Deterministic Computer Codes By: Brian J. Reich1, Curtis B. Storlie2, and Howard D. Bondell1 1 North Carolina State University 2 University of New Mexico E-mail:[email protected] BrianReich BayesianVariableselectionforSS-ANOVAModels Motivation (cid:73) A common problem for statisticians is a sensitivity analysis of a complex computer model. (cid:73) For example, we consider a two-phase fluid flow simulation study carried out by Sandia National Labs (cid:73) The computer model simulates the waste panel’s condition 10,000 years after the waste panel has been penetrated by a drilling intrusion. (cid:73) The simulation model uses several input variables describing various environmental conditions. (cid:73) The objectives are to predict waste pressure for new sets of environmental conditions and to determine which environmental factors have the largest effect on the response. BrianReich BayesianVariableselectionforSS-ANOVAModels NP regression for complex computer models (cid:73) Because computer models are governed by complex differential equations, linear regression is not an adequate approach for these data. (cid:73) The standard approach is nonparametric regression: y = f(x ,...,x )+(cid:178). 1 p (cid:73) f : Rp → R is an unknown function that relates the predictors to y’s mean. (cid:73) Typically f is modeled as a Gaussian process. (cid:73) (cid:178) ∼ N(0,σ2) is error. BrianReich BayesianVariableselectionforSS-ANOVAModels SS-ANOVA models (cid:73) Modeling f as a Gaussian process is very flexible, but from this model it can be difficult to identify the effect of an individual covariate. (cid:73) When the goal is to identify the effect of each covariate on the response a more structured model may be preferred. (cid:73) The smoothing splines ANOVA (SS-ANOVA) model decomposes the p-dimensional surface f into the sum of one-dimensional main effect curves main effects f , j two-dimensional interaction curves f , etc. lk (cid:88)p (cid:88) f(x ,...,x ) = β + f (x )+ f (x ,x )+... 1 p 0 j j lk l k j=1 l<k (cid:73) Our goal is to develop a computationally efficient method for selecting important main effects and interactions when p is fairly large. BrianReich BayesianVariableselectionforSS-ANOVAModels Outline In this talk we will... (cid:73) Develop a stochastic search variable selection model to identify important main effect and interaction terms in the SS-ANOVA framework. (cid:73) Propose a method for selecting prior to give desirable long-run properties. (cid:73) Apply our approach to the Sandia Labs data. BrianReich BayesianVariableselectionforSS-ANOVAModels Model for the main effect curves (cid:73) The regression functions fj is typically restricted to a particular class of functions. (cid:73) We consider the subset of Mth-order Sobolev space that includes only functions that integrate to zero and have M proper derivatives, i.e., f ∈ F where j M F = {g|g,...,g(M−1) are absolutely continuous, M (cid:90) 1 g(s)ds = 0,g(M) ∈ L2[0,1]}. 0 (cid:73) We restrict the main effects to integrate to zero to identify the overall intercept β . 0 (cid:73) We select M = 1 so that draws from the prior are continuously differentiable with path properties of integrated Brownian motion. BrianReich BayesianVariableselectionforSS-ANOVAModels Prior for f j (cid:73) We assume fj is a Gaussian process with E(fj(s)) = 0 and covariance Cov(f (s),f (t)) = j j (cid:34) (cid:35) (cid:88)2 1 σ2τ2 c B (s)B (t)+ B (|s −t|) , j m m m 4! 5 m=1 where c > 0 are known constants and B is the mth m m Bernoulli polynomial. (cid:80) (cid:73) The first term 2m=1cmBm(s)Bm(t) controls the covariance of the quadratic trend. (cid:73) The second term controls the covariance of the deviance from the quadratic trend. BrianReich BayesianVariableselectionforSS-ANOVAModels Prior for f (cont.) j (cid:73) Complex models often have categorical variables that represent different states or point to different submodels to be used in the analysis. (cid:73) Assume xj ∈ {1,2,...,G} is categorical and f(xj) = θxj, where θ i∼id N(0,σ2τ2), g = 1,...,G. g j (cid:73) To identify the intercept we enforce the sum-to-zero (cid:80) constraint G θ = 0. g=1 g (cid:73) This model can also be written in the kernel framework by taking f to be a mean-zero Gaussian process with singular covariance (cid:183) (cid:184) G −1 1 cov(f(s),f(t)) = σ2τ2 I(s = t)− I(s (cid:54)= t) j G G BrianReich BayesianVariableselectionforSS-ANOVAModels Prior for f (cont.) j (cid:73) We take cm = 100 to give a vague prior for the quadratic trend. (cid:73) τ2 controls the overall variance, relative to error variance σ2. j (cid:73) Interactions are model similarly. (cid:73) f has mean zero and covariance proportional to σ2τ2 lk lj (cid:73) f ’s covariance places a vague prior on the second-order lk interaction trends. (cid:73) We do not include higher-order interactions. (cid:73) Interactions with categorical predictors are handled no different than other interactions. BrianReich BayesianVariableselectionforSS-ANOVAModels Stochastic search variable selection (SSVS) (cid:73) One way to do variable selection would be to fit all possible models and pick the one with the smallest AIC, BIC, DIC, etc. (cid:73) If p = 11 there are 11 main effects, 55 interactions, and 266 possible models. (cid:73) We use stochastic search variable selection to avoid enumerating all possible models. (cid:73) SSVS treats the models as a random variable and uses MCMC to search for model that fit the data well. BrianReich BayesianVariableselectionforSS-ANOVAModels

Description:
Main effect. P(fj = 0). Anhydrite permeability. 1.00. Borehole permeability. 1.00. Bulk compressibility of brine pocket. 1.00. Halite porosity. 1.00. Microbial degradation of cellulose. 1.00. Residual brine saturation in shaft. 0.66. Halite permeability. 0.46. Permeability of asphalt component of s
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.