Variable Selection in Bayesian Smoothing Spline ANOVA Models: Application to Deterministic Computer Codes By: Brian J. Reich1, Curtis B. Storlie2, and Howard D. Bondell1 1 North Carolina State University 2 University of New Mexico E-mail:[email protected] BrianReich BayesianVariableselectionforSS-ANOVAModels Motivation (cid:73) A common problem for statisticians is a sensitivity analysis of a complex computer model. (cid:73) For example, we consider a two-phase fluid flow simulation study carried out by Sandia National Labs (cid:73) The computer model simulates the waste panel’s condition 10,000 years after the waste panel has been penetrated by a drilling intrusion. (cid:73) The simulation model uses several input variables describing various environmental conditions. (cid:73) The objectives are to predict waste pressure for new sets of environmental conditions and to determine which environmental factors have the largest effect on the response. BrianReich BayesianVariableselectionforSS-ANOVAModels NP regression for complex computer models (cid:73) Because computer models are governed by complex differential equations, linear regression is not an adequate approach for these data. (cid:73) The standard approach is nonparametric regression: y = f(x ,...,x )+(cid:178). 1 p (cid:73) f : Rp → R is an unknown function that relates the predictors to y’s mean. (cid:73) Typically f is modeled as a Gaussian process. (cid:73) (cid:178) ∼ N(0,σ2) is error. BrianReich BayesianVariableselectionforSS-ANOVAModels SS-ANOVA models (cid:73) Modeling f as a Gaussian process is very flexible, but from this model it can be difficult to identify the effect of an individual covariate. (cid:73) When the goal is to identify the effect of each covariate on the response a more structured model may be preferred. (cid:73) The smoothing splines ANOVA (SS-ANOVA) model decomposes the p-dimensional surface f into the sum of one-dimensional main effect curves main effects f , j two-dimensional interaction curves f , etc. lk (cid:88)p (cid:88) f(x ,...,x ) = β + f (x )+ f (x ,x )+... 1 p 0 j j lk l k j=1 l<k (cid:73) Our goal is to develop a computationally efficient method for selecting important main effects and interactions when p is fairly large. BrianReich BayesianVariableselectionforSS-ANOVAModels Outline In this talk we will... (cid:73) Develop a stochastic search variable selection model to identify important main effect and interaction terms in the SS-ANOVA framework. (cid:73) Propose a method for selecting prior to give desirable long-run properties. (cid:73) Apply our approach to the Sandia Labs data. BrianReich BayesianVariableselectionforSS-ANOVAModels Model for the main effect curves (cid:73) The regression functions fj is typically restricted to a particular class of functions. (cid:73) We consider the subset of Mth-order Sobolev space that includes only functions that integrate to zero and have M proper derivatives, i.e., f ∈ F where j M F = {g|g,...,g(M−1) are absolutely continuous, M (cid:90) 1 g(s)ds = 0,g(M) ∈ L2[0,1]}. 0 (cid:73) We restrict the main effects to integrate to zero to identify the overall intercept β . 0 (cid:73) We select M = 1 so that draws from the prior are continuously differentiable with path properties of integrated Brownian motion. BrianReich BayesianVariableselectionforSS-ANOVAModels Prior for f j (cid:73) We assume fj is a Gaussian process with E(fj(s)) = 0 and covariance Cov(f (s),f (t)) = j j (cid:34) (cid:35) (cid:88)2 1 σ2τ2 c B (s)B (t)+ B (|s −t|) , j m m m 4! 5 m=1 where c > 0 are known constants and B is the mth m m Bernoulli polynomial. (cid:80) (cid:73) The first term 2m=1cmBm(s)Bm(t) controls the covariance of the quadratic trend. (cid:73) The second term controls the covariance of the deviance from the quadratic trend. BrianReich BayesianVariableselectionforSS-ANOVAModels Prior for f (cont.) j (cid:73) Complex models often have categorical variables that represent different states or point to different submodels to be used in the analysis. (cid:73) Assume xj ∈ {1,2,...,G} is categorical and f(xj) = θxj, where θ i∼id N(0,σ2τ2), g = 1,...,G. g j (cid:73) To identify the intercept we enforce the sum-to-zero (cid:80) constraint G θ = 0. g=1 g (cid:73) This model can also be written in the kernel framework by taking f to be a mean-zero Gaussian process with singular covariance (cid:183) (cid:184) G −1 1 cov(f(s),f(t)) = σ2τ2 I(s = t)− I(s (cid:54)= t) j G G BrianReich BayesianVariableselectionforSS-ANOVAModels Prior for f (cont.) j (cid:73) We take cm = 100 to give a vague prior for the quadratic trend. (cid:73) τ2 controls the overall variance, relative to error variance σ2. j (cid:73) Interactions are model similarly. (cid:73) f has mean zero and covariance proportional to σ2τ2 lk lj (cid:73) f ’s covariance places a vague prior on the second-order lk interaction trends. (cid:73) We do not include higher-order interactions. (cid:73) Interactions with categorical predictors are handled no different than other interactions. BrianReich BayesianVariableselectionforSS-ANOVAModels Stochastic search variable selection (SSVS) (cid:73) One way to do variable selection would be to fit all possible models and pick the one with the smallest AIC, BIC, DIC, etc. (cid:73) If p = 11 there are 11 main effects, 55 interactions, and 266 possible models. (cid:73) We use stochastic search variable selection to avoid enumerating all possible models. (cid:73) SSVS treats the models as a random variable and uses MCMC to search for model that fit the data well. BrianReich BayesianVariableselectionforSS-ANOVAModels
Description: