Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September 2014 1/45 Matching Results from picadilo.com Ryan Tibshirani , Rob Tibshirani CMU. PhD student of Taylor Stanford 2011 2/45 ⎜ 81% 71% Ryan Tibshirani Rob Tibshirani ë CMU Stanford Top matches from picadilo.com 3/45 81% Ryan Tibshirani Rob Tibshirani 71% 69% CMU Stanford 4/45 Conclusion Confidence— the strength of evidence— matters! 5/45 Outline Setup and basic question • Quick review of least angle regression and the covariance test • A new framework for inference after selection • Application to forward stepwise and least angle regression • Application of these and related ideas to other problems • 6/45 Setup and basic question (cid:73) Given an outcome vector y Rn and a predictor matrix ∈ X Rn×p, we consider the usual linear regression setup: ∈ y = Xβ∗+σ(cid:15), where β∗ Rp are unknown coefficients to be estimated, and ∈ the components of the noise vector (cid:15) Rn are i.i.d. N(0,1) ∈ (cid:73) Main question: If we apply least angle or forward stepwise regression, how can we compute valid p-values and confidence intervals? 7/45 Forward stepwise regression (cid:73) This procedure enters predictors one a time, choosing the predictor that most decreases the residual sum of squares at each stage. (cid:73) Defining RSS to be the residual sum of squares for the model containing k predictors, and RSS the residual sum of null squares before the kth predictor was added, we can form the usual statistic R = (RSS RSS)/σ2 k null − (with σ assumed known), and compare it to a χ2 distribution. 1 8/45 Simulated example: naive forward stepwise Setup: n = 100,p = 10, true model null 5 l 1 l Test statistic 510 lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 0 0 2 4 6 8 10 Chi−squared on 1 df Test is too liberal: for nominal size 5%, actual type I error is 39%. (Yes, Larry, can get proper p-values by sample splitting: but messy, loss of power) 9/45 Quick review of LAR and the covariance test Least angle regression or LAR is a method for constructing the path of solutions for the lasso: (cid:88) (cid:88) (cid:88) min (y β x β )2+λ β i 0 ij j j β0,βj i − − j · j | | LAR is a more democratic version of forward stepwise regression. (cid:73) Find the predictor most correlated with the outcome (cid:73) Move the parameter vector in the least squares direction until some other predictor has as much correlation with the current residual (cid:73) This new predictor is added to the active set, and the procedure is repeated (cid:73) Optional (“lasso mode”): if a non-zero coefficient hits zero, that predictor is dropped from the active set, and the process is restarted 10/45
Description: