ebook img

Uniform post selection inference for LAD regression and other Z-estimation problems Alexandre ... PDF

37 Pages·2014·0.74 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Uniform post selection inference for LAD regression and other Z-estimation problems Alexandre ...

Belloni, Alexandre; Chernozhukov, Victor; Kato, Kengo Working Paper Uniform post selection inference for LAD regression and other Z-estimation problems cemmap working paper, No. CWP51/14 Provided in Cooperation with: Institute for Fiscal Studies (IFS), London Suggested Citation: Belloni, Alexandre; Chernozhukov, Victor; Kato, Kengo (2014) : Uniform post selection inference for LAD regression and other Z-estimation problems, cemmap working paper, No. CWP51/14, Centre for Microdata Methods and Practice (cemmap), London, https://doi.org/10.1920/wp.cem.2014.5114 This Version is available at: http://hdl.handle.net/10419/130013 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. Uniform post selection inference for LAD regression and other Z-estimation problems Alexandre Belloni Victor Chernozhukov Kengo Kato The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP51/14 UNIFORM POST SELECTION INFERENCE FOR LAD REGRESSION AND OTHER Z-ESTIMATION PROBLEMS A.BELLONI,V.CHERNOZHUKOV,ANDK.KATO ABSTRACT. Wedevelopuniformlyvalidconfidenceregionsforregressioncoefficientsinahigh- dimensionalsparsemedianregressionmodelwithhomoscedasticerrors.Ourmethodsarebased on a moment equation that is immunized against non-regular estimation of the nuisance part ofthemedianregressionfunctionbyusingNeyman’sorthogonalization. Weestablishthatthe resultinginstrumentalmedianregressionestimatorofatargetregressioncoefficientisasymptot- icallynormallydistributeduniformlywithrespecttotheunderlyingsparsemodelandissemi- parametricallyefficient. Wealsogeneralizeourmethodtoageneralnon-smoothZ-estimation frameworkwiththenumberoftargetparametersp beingpossiblymuchlargerthanthesample 1 sizen.WeextendHuber’sresultsonasymptoticnormalitytothissetting,demonstratinguniform asymptotic normality of the proposed estimators over p -dimensional rectangles, constructing 1 simultaneousconfidencebandsonallofthep targetparameters, andestablishingasymptotic 1 validityofthebandsuniformlyoverunderlyingapproximatelysparsemodels. Keywords: Instrument;Post-selectioninference;Sparsity;Neyman’sOrthogonalScoretest; Uniformlyvalidinference;Z-estimation. Publication:Biometrika,2014doi:10.1093/biomet/asu056 1. INTRODUCTION We consider independent and identically distributed data vectors (y ,xT,d )T that obey the i i i regressionmodel (1) y = d α +xTβ +(cid:15) (i = 1,...,n), i i 0 i 0 i where d is the main regressor and coefficient α is the main parameter of interest. The vector i 0 x denotesotherhigh-dimensionalregressorsorcontrols. Theregressionerror(cid:15) isindependent i i of d and x and has median zero, that is, pr((cid:15) ≤ 0) = 1/2. The distribution function of i i i (cid:15) is denoted by F and admits a density function f such that f (0) > 0. The assumption i (cid:15) (cid:15) (cid:15) motivatestheuseoftheleastabsolutedeviationormedianregression,suitablyadjustedforuse in high-dimensional settings. The framework (1) is of interest in program evaluation, where d represents the treatment or policy variable known a priori and whose impact we would like i Date:Firstversion: May2012,thisversionMay,2014. WewouldliketothanktheparticipantsofLuminycon- ferenceonNonparametricandhigh-dimensionalstatistics(December2012),OberwolfachworkshoponFrontiersin QuantileRegression(November2012),8thWorldCongressinProbabilityandStatistics(August2012),andseminar attheUniversityofMichigan(October2012). Thispaperwasfirstpresentedin8thWorldCongressinProbability andStatisticsinAugust2012. Wewouldliketothanktheeditor, anassociateeditor, andanonymousrefereesfor theircarefulreview.WearealsogratefultoSaravandeGeer,XumingHe,RichardNickl,RogerKoenker,Vladimir Koltchinskii,EnnoMammen,StevePortnoy,PhilippeRigollet,RichardSamworth,andBinYuforusefulcomments anddiscussions.ResearchsupportfromtheNationalScienceFoundationandtheJapanSocietyforthePromotionof Scienceisgratefullyacknowledged. 1 2 UNIFORMPOSTSELECTIONINFERENCEFORZ-PROBLEMS to infer [27, 21, 15]. We shall also discuss a generalization to the case where there are many parametersofinterest,includingthecasewheretheidentityofaregressorofinterestisunknown apriori. The dimension p of controls x may be much larger than n, which creates a challenge for i inference on α . Although the unknown nuisance parameter β lies in this large space, the key 0 0 assumption that will make estimation possible is its sparsity, namely T = supp(β ) has s < n 0 elements, where the notation supp(δ) = {j ∈ {1,...,p} : δ (cid:54)= 0} denotes the support of a j vector δ ∈ Rp. Here s can depend on n, as we shall use array asymptotics. Sparsity motivates theuseofregularizationormodelselectionmethods. A non-robust approach to inference in this setting would be first to perform model selection viathe(cid:96) -penalizedmedianregressionestimator 1 λ (2) (α(cid:98),β(cid:98)) ∈ argmα,iβnEn(|yi−diα−xTiβ|)+ n(cid:107)Ψ(α,βT)T(cid:107)1, where λ is a penalty parameter and Ψ2 = diag{E (d2),E (x2 ),...,E (x2 )} is a diagonal n i n i1 n ip matrix with normalization weights, where the notation E (·) denotes the average n−1(cid:80)n n i=1 overtheindexi = 1,...,n. Thenonewouldusethepost-modelselectionestimator (cid:110) (cid:111) (3) (α(cid:101),β(cid:101)) ∈ argmin En(|yi−diα−xTiβ|) : βj = 0, j ∈/ supp(β(cid:98)) , α,β toperforminferenceforα . 0 Thisapproachisjustifiedif(2)achievesperfectmodelselectionwithprobabilityapproaching unity,sothattheestimator(3)hastheoracleproperty. Howeverconditionsforperfectselection are very restrictive in this model, and, in particular, require strong separation of non-zero coef- ficients away from zero. If these conditions do not hold, the estimator α does not converge to (cid:101) α atthen−1/2 rate,uniformlywithrespecttotheunderlyingmodel,andsotheusualinference 0 breaksdown[19]. WeshalldemonstratethebreakdownofsuchnaiveinferenceinMonteCarlo experimentswherenon-zerocoefficientsinβ arenotsignificantlyseparatedfromzero. 0 The breakdown of standard inference does not mean that the aforementioned procedures are not suitable for prediction. Indeed, the estimators (2) and (3) attain essentially optimal rates {(slogp)/n}1/2 of convergence for estimating the entire median regression function [3, 33]. This property means that while these procedures will not deliver perfect model recovery, they will only make moderate selection mistakes, that is, they omit controls only if coefficients are localtozero. Inordertoprovideuniformlyvalidinference,weproposeamethodwhoseperformancedoes notrequireperfectmodelselection,allowingpotentialmoderatemodelselectionmistakes. The latter feature is critical in achieving uniformity over a large class of data generating processes, similarly to the results for instrumental regression and mean regression studied in [34] and [2, 4, 5]. This allows us to overcome the impact of moderate model selection mistakes on inference,avoidinginpartthecriticismsin[19],whoprovethattheoraclepropertyachievedby thenaiveestimatorsimpliesthefailureofuniformvalidityofinferenceandtheirsemiparametric inefficiency[20]. Inordertoachieverobustnesswithrespecttomoderateselectionmistakes,weshallconstruct an orthogonal moment equation that identifies the target parameter. The following auxiliary equation, (4) d = xTθ +v , E(v | x ) = 0 (i = 1,...,n), i i 0 i i i UNIFORMPOSTSELECTIONINFERENCEFORZ-PROBLEMS 3 which describes the dependence of the regressor of interest d on the other controls x , plays a i i keyrole. Weshallassumethesparsityofθ ,thatis,T = supp(θ )hasatmosts < nelements, 0 d 0 andestimatetherelation(4)vialassoorpost-lassoleastsquaresmethodsdescribedbelow. Weshallusev asaninstrumentinthefollowingmomentequationforα : i 0 (5) E{ϕ(y −d α −xTβ )v } = 0 (i = 1,...,n), i i 0 i 0 i whereϕ(t) = 1/2−1{t ≤ 0}. Weshallusetheempiricalanalogof(5)toformaninstrumental median regression estimator of α , using a plug-in estimator for xTβ . The moment equation 0 i 0 (5)hastheorthogonalityproperty (cid:12) ∂ (cid:12) (6) ∂βE{ϕ(yi−diα0−xTiβ)vi}(cid:12)(cid:12) = 0 (i = 1,...,n), β=β0 so the estimator of α will be unaffected by estimation of xTβ even if β is estimated at a 0 i 0 0 slowerratethann−1/2, thatis, therateofo(n−1/4)wouldsuffice. Thisslowrateofestimation of the nuisance function permits the use of non-regular estimators ofβ , such as post-selection 0 or regularized estimators that are not n−1/2 consistent uniformly over the underlying model. Theorthogonalizationideascanbetracedbackto[22]andalsoplayanimportantroleindoubly robustestimation[26]. Ourestimationprocedurehasthreesteps: (i)estimationoftheconfoundingfunctionxTβ in i 0 (1);(ii)estimationoftheinstrumentsv in(4);and(iii)estimationofthetargetparameterα via i 0 empirical analog of (5). Each step is computationally tractable, involving solutions of convex problemsandaone-dimensionalsearch. Step(i)estimatesforthenuisancefunctionxTβ viaeitherthe(cid:96) -penalizedmedianregression i 0 1 estimator(2)ortheassociatedpost-modelselectionestimator(3). Step(ii)providesestimatesv(cid:98)i ofvi in(4)asv(cid:98)i = di−xTiθ(cid:98)orv(cid:98)i = di−xTiθ(cid:101)(i = 1,...,n). Thefirstisbasedontheheteroscedasticlassoestimatorθ(cid:98),aversionofthelassoof[30],designed toaddressnon-Gaussianandheteroscedasticerrors[2], λ (7) θ(cid:98)∈ argmθinEn{(di−xTiθ)2}+ n(cid:107)Γ(cid:98)θ(cid:107)1, whereλandΓ(cid:98) arethepenaltylevelanddata-drivenpenaltyloadingsdefinedintheSupplemen- taryMaterial. Thesecondisbasedontheassociatedpost-modelselectionestimatorandθ(cid:101),called thepost-lassoestimator: (cid:104) (cid:105) (8) θ(cid:101)∈ argmin En{(di−xTiθ)2} : θj = 0, j ∈/ supp(θ(cid:98)) . θ Step(iii)constructsanestimatorαˇofthecoefficientα viaaninstrumentalmedianregression 0 [11],using(v )n asinstruments,definedby (cid:98)i i=1 (9) αˇ ∈ argminL (α), L (α) = 4|En{ϕ(yi−xTiβ(cid:98)−diα)v(cid:98)i}|2, n n E (v2) α∈A(cid:98) n (cid:98)i whereA(cid:98)isapossiblystochasticparameterspaceforα0. WesuggestA(cid:98)= [α(cid:98)−10/b,α(cid:98)+10/b] withb = {E (d2)}1/2logn,thoughweallowforotherchoices. n i Ourmainresultestablishesthatunderhomoscedasticity,providedthat(s3log3p)/n → 0and other regularity conditions hold, despite possible model selection mistakes in Steps (i) and (ii), 4 UNIFORMPOSTSELECTIONINFERENCEFORZ-PROBLEMS theestimatorαˇ obeys (10) σ−1n1/2(αˇ−α ) → N(0,1) n 0 in distribution, where σ2 = 1/{4f2E(v2)} with f = f (0) is the semi-parametric efficiency n (cid:15) i (cid:15) (cid:15) bound for regular estimators of α . In the low-dimensional case, if p3 = o(n), the asymptotic 0 behavior of our estimator coincides with that of the standard median regression without selec- tion or penalization, as derived in [13], which is also semi-parametrically efficient in this case. However,thebehaviorsofourestimatorandthestandardmedianregressiondifferdramatically, otherwise,withthestandardestimatorevenfailingtobeconsistentwhenp > n. Ofcourse,this improvementintheperformancecomesatthecostofassumingsparsity. Analternative,morerobustexpressionforσ2 isgivenby n (11) σ2 = J−1ΩJ−1, Ω = E(v2)/4, J = E(f d v ). n i (cid:15) i i WeestimateΩbytheplug-inmethodandJ byPowell’s([25])method. Furthermore,weshow thattheNeyman-typeprojectedscorestatisticnL (α)canbeusedfortestingthenullhypothesis n α = α ,andconvergesindistributiontoaχ2 variableunderthenullhypothesis,thatis, 0 1 (12) nL (α ) → χ2 n 0 1 indistribution. Thisallowsustoconstructaconfidenceregionwithasymptoticcoverage1−ξ basedoninvertingthescorestatisticnL (α): n (13) A(cid:98)ξ = {α ∈ A(cid:98): nLn(α) ≤ q1−ξ}, pr(α0 ∈ A(cid:98)ξ) → 1−ξ, whereq isthe(1−ξ)-quantileoftheχ2-distribution. 1−ξ 1 Therobustnesswithrespecttomoderatemodelselectionmistakes,whichisdueto(6),allows (10) and (12) to hold uniformly over a large class of data generating processes. Throughout the paper, we use array asymptotics, asymptotics where the model changes with n, to better capture finite-sample phenomena such as small coefficients that are local to zero. This ensures therobustnessofconclusionswithrespecttoperturbationsofthedata-generatingprocessalong variousmodelsequences. Thisrobustness,inturn,translatesintouniformvalidityofconfidence regionsovermanydata-generatingprocesses. The second set of main results addresses a more general setting by allowing p -dimensional 1 targetparametersdefinedviaHuber’sZ-problemstobeofinterest,withdimensionp potentially 1 muchlargerthanthesamplesizen, andalsoallowingforapproximatelysparsemodelsinstead of exactly sparse models. This framework covers a wide variety of semi-parametric models, includingthosewithsmoothandnon-smoothscorefunctions. Weprovidesufficientconditions to derive a uniform Bahadur representation, and establish uniform asymptotic normality, using central limit theorems and bootstrap results of [9], for the entire p -dimensional vector. The 1 latter result holds uniformly over high-dimensional rectangles of dimension p (cid:29) n and over 1 an underlying approximately sparse model, thereby extending previous results from the setting withp (cid:28) n[14,23,24,13]tothatwithp (cid:29) n. 1 1 In what follows, the (cid:96) and (cid:96) norms are denoted by (cid:107) · (cid:107) and (cid:107) · (cid:107) , respectively, and the 2 1 1 (cid:96) -norm, (cid:107)·(cid:107) , denotes the number of non-zero components of a vector. We use the notation 0 0 a∨b = max(a,b)anda∧b = min(a,b). DenotebyΦ(·)thedistributionfunctionofthestandard normaldistribution. Weassumethatthequantitiessuchasp,s,andhencey ,x ,β ,θ ,T andT i i 0 0 d arealldependentonthesamplesizen,andallowforthecasewherep = p → ∞ands = s → n n ∞ as n → ∞. We shall omit the dependence of these quantities on n when it does not cause UNIFORMPOSTSELECTIONINFERENCEFORZ-PROBLEMS 5 confusion. For a class of measurable functions F on a measurable space, let cn((cid:15),F,(cid:107)·(cid:107) ) Q,2 denoteits(cid:15)-coveringnumberwithrespecttotheL2(Q)seminorm(cid:107)·(cid:107) ,whereQisafinitely Q,2 discretemeasureonthespace,andletent(ε,F) = logsup cn(ε(cid:107)F(cid:107) ,F,(cid:107)·(cid:107) )denotethe Q Q,2 Q,2 uniformentropynumberwhereF = sup |f|. f∈F 2. THE METHODS, CONDITIONS, AND RESULTS 2.1. The methods. Each of the steps outlined in Section 1 could be implemented by several estimators. Twopossibleimplementationsarethefollowing. Algorithm1. Thealgorithmisbasedonpost-modelselectionestimators. Step(i). Runpost-(cid:96)1-penalizedmedianregression(3)ofyi ondi andxi;keepfittedvaluexTiβ(cid:101). Step(ii). Runthepost-lassoestimator(8)ofdi onxi;keeptheresidualv(cid:98)i = di−xTiθ(cid:101). Step(iii). Runinstrumentalmedianregression(9)ofyi−xTiβ(cid:101)ondi usingv(cid:98)i astheinstrument. Reportαˇ andperforminferencebasedupon(10)or(13). Algorithm2. Thealgorithmisbasedonregularizedestimators. Step(i). Run(cid:96)1-penalizedmedianregression(3)ofyi ondi andxi;keepfittedvaluexTiβ(cid:101). Step(ii). Runthelassoestimator(7)ofdi onxi;keeptheresidualv(cid:98)i = di−xTiθ(cid:101). Step(iii). Runinstrumentalmedianregression(9)ofyi−xTiβ(cid:101)ondi usingv(cid:98)i astheinstrument. Reportαˇ andperforminferencebasedupon(10)or(13). In order to perform (cid:96) -penalized median regression and lasso, one has to choose the penalty 1 levelssuitably. WerecordourpenaltychoicesintheSupplementaryMaterial. Algorithm1relies on the post-selection estimators that refit the non-zero coefficients without the penalty term to reducethebias,whileAlgorithm2reliesonthepenalizedestimators. InStep(ii),insteadofthe lasso or the post-lasso estimators, Dantzig selector [8] and Gauss-Dantzig estimators could be used. Step(iii)ofbothalgorithmsreliesoninstrumentalmedianregression(9). Comment2.1. Alternatively,inthisstep,wecanuseaone-stepestimatorαˇ definedby (14) αˇ = α(cid:98)+[En{f(cid:15)(0)v(cid:98)i2}]−1En{ϕ(yi−diα(cid:98)−xTiβ(cid:98))v(cid:98)i}, where α is the (cid:96) -penalized median regression estimator (2). Another possibility is to use the (cid:98) 1 post-double selection median regression estimation, which is simply the median regression of y on d and the union of controls selected in both Steps (i) and (ii), as αˇ. The Supplemental i i Materialshowsthatthesealternativeestimatorsalsosolve(9)approximately. 2.2. Regularity conditions. We state regularity conditions sufficient for validity of the main estimation and inference results. The behavior of sparse eigenvalues of the population Gram matrix E(x xT) with x = (d ,xT)T plays an important role in the analysis of (cid:96) -penalized (cid:101)i(cid:101)i (cid:101)i i i 1 median regression and lasso. Define the minimal and maximal m-sparse eigenvalues of the populationGrammatrixas δTE(x xT)δ δTE(x xT)δ (15) φ¯ (m) = min (cid:101)i(cid:101)i , φ¯ (m) = max (cid:101)i(cid:101)i , min 1≤(cid:107)δ(cid:107)0≤m (cid:107)δ(cid:107)2 max 1≤(cid:107)δ(cid:107)0≤m (cid:107)δ(cid:107)2 wherem = 1,...,p. Assumingthatφ¯ (m) > 0requiresthatallpopulationGramsubmatrices min formedbyanymcomponentsofx arepositivedefinite. (cid:101)i 6 UNIFORMPOSTSELECTIONINFERENCEFORZ-PROBLEMS The main condition, Condition 1, imposes sparsity of the vectors β and θ as well as other 0 0 more technical assumptions. Below let c and C be given positive constants, and let (cid:96) ↑ 1 1 n ∞,δ ↓ 0,and∆ ↓ 0begivensequencesofpositiveconstants. n n Condition 1. Suppose that (i) {(y ,d ,xT)T}n is a sequence of independent and identically i i i i=1 distributed random vectors generated according to models (1) and (4), where (cid:15) has distribu- i tion distribution function F such that F (0) = 1/2 and is independent of the random vector (cid:15) (cid:15) (d ,xT)T; (ii) E(v2 | x) ≥ c and E(|v |3 | x ) ≤ C almost surely; moreover, E(d4) + i i i 1 i i 1 i E(v4) + max E(x2 d2) + E(|x v |3) ≤ C ; (iii) there exists s = s ≥ 1 such that i j=1,...,p ij i ij i 1 n (cid:107)β (cid:107) ≤ sand(cid:107)θ (cid:107) ≤ s; (iv)theerrordistributionF isabsolutelycontinuouswithcontinu- 0 0 0 0 (cid:15) ouslydifferentiabledensityf (·)suchthatf (0) ≥ c andf (t)∨|f(cid:48)(t)| ≤ C forallt ∈ R;(v) (cid:15) (cid:15) 1 (cid:15) (cid:15) 1 thereexistconstantsK andM suchthatK ≥ max |x |andM ≥ 1∨|xTθ |almost n n n j=1,...,p ij n i 0 surely,andtheyobeythegrowthcondition{K4 +(K2 ∨M4)s2+M2s3}log3(p∨n) ≤ nδ ; n n n n n (vi)c ≤ φ¯ ((cid:96) s) ≤ φ¯ ((cid:96) s) ≤ C . 1 min n max n 1 Condition1(i)imposesthesettingdiscussedintheprevioussectionwiththezeroconditional median of the error distribution. Condition 1 (ii) imposes moment conditions on the structural errors and regressors to ensure good model selection performance of lasso applied to equation (4). Condition 1 (iii) imposes sparsity of the high-dimensional vectors β and θ . Condition 0 0 1 (iv) is a set of standard assumptions in median regression [16] and in instrumental quantile regression. Condition1(v)restrictsthesparsityindex,namelys3log3(p∨n) = o(n)isrequired; this is analogous to the restriction p3(logp)2 = o(n) made in [13] in the low-dimensional setting. The uniformly bounded regressors condition can be relaxed with minor modifications providedtheboundholdswithprobabilityapproachingunity. Mostimportantly,noassumptions ontheseparationfromzeroofthenon-zerocoefficientsofθ andβ aremade. Condition1(vi) 0 0 is quite plausible for many designs of interest. Conditions 1 (iv) and (v) imply the equivalence betweenthenormsinducedbytheempiricalandpopulationGrammatricesovers-sparsevectors by[29]. 2.3. Results. The following result is derived as an application of a more general Theorem 2 giveninSection3;theproofisgivenintheSupplementaryMaterial. Theorem1. Letαˇ andL (α )betheestimatorandstatisticobtainedbyapplyingeitherAlgo- n 0 rithm 1 or 2. Suppose that Condition 1 is satisfied for all n ≥ 1. Moreover, suppose that with probabilityatleast1−∆n,(cid:107)β(cid:98)(cid:107)0 ≤ C1s. Then,asn → ∞,σn−1n1/2(αˇ −α0) → N(0,1)and nL (α ) → χ2 indistribution,whereσ2 = 1/{4f2E(v2)}. n 0 1 n (cid:15) i Theorem 1 shows that Algorithms 1 and 2 produce estimators αˇ that perform equally well, to the first order, with asymptotic variance equal to the semi-parametric efficiency bound; see the Supplemental Material for further discussion. Both algorithms rely on sparsity of β(cid:98)and θ(cid:98). Sparsity of the latter follows immediately under sharp penalty choices for optimal rates. The sparsityfortheformerpotentiallyrequiresahigherpenaltylevel,asshownin[3];alternatively, sparsityfortheestimatorinStep1canalsobeachievedbytruncatingthesmallestcomponents of β(cid:98). The Supplemental Material shows that suitable truncation leads to the required sparsity whilepreservingtherateofconvergence. An important consequence of these results is the following corollary. Here P denotes a n collectionofdistributionsfor{(y ,d ,xT)T}n andforP ∈ P thenotationpr meansthat i i i i=1 n n Pn underpr ,{(y ,d ,xT)T}n isdistributedaccordingtothelawdeterminedbyP . Pn i i i i=1 n UNIFORMPOSTSELECTIONINFERENCEFORZ-PROBLEMS 7 Corollary 1. Let αˇ be the estimator of α constructed according to either Algorithm 1 or 2, 0 andforeveryn ≥ 1,letP bethecollectionofalldistributionsof{(y ,d ,xT)T}n forwhich n i i i i=1 Condition 1 holds and (cid:107)β(cid:98)(cid:107)0 ≤ C1s with probability at least 1−∆n. Then for A(cid:98)ξ defined in (13), (cid:12) (cid:110) (cid:111) (cid:12) sup (cid:12)pr α ∈ [αˇ±σ n−1/2Φ−1(1−ξ/2)] −(1−ξ)(cid:12) → 0, (cid:12) Pn 0 n (cid:12) Pn∈Pn (cid:12) (cid:12) sup (cid:12)(cid:12)prPn(α0 ∈ A(cid:98)ξ)−(1−ξ)(cid:12)(cid:12) → 0, n → ∞. Pn∈Pn Corollary1establishesthesecondmainresultofthepaper. Ithighlightstheuniformvalidity of the results, which hold despite the possible imperfect model selection in Steps (i) and (ii). Condition 1 explicitly characterizes regions of data-generating processes for which the unifor- mity result holds. Simulations presented below provide additional evidence that these regions aresubstantial. Herewerelyonexactlysparsemodels,buttheseresultsextendtoapproximately sparsemodelinwhatfollows. Both of the proposed algorithms exploit the homoscedasticity of the model (1) with respect totheerrorterm(cid:15) . Thegeneralizationtotheheteroscedasticcasecanbeachievedbutweneed i to consider the density-weighted version of the auxiliary equation (4) in order to achieve the semiparametricefficiencybound. Theanalysisoftheimpactofestimationofweightsisdelicate and is developed in our working paper “Robust Inference in High-Dimensional Approximate SparseQuantileRegressionModels”(arXiv:1312.7186). 2.4. Generalizationtomanytargetcoefficients. Weconsiderthegeneralizationtotheprevi- ousmodel: p1 (cid:88) y = d α +g(u)+(cid:15), (cid:15) ∼ F , F (0) = 1/2, j j (cid:15) (cid:15) j=1 whered,uareregressors,and(cid:15)isthenoisewithdistributionfunctionF thatisindependentof (cid:15) regressorsandhasmedianzero,thatis,F (0) = 1/2. Thecoefficientsα ,...,α arenowthe (cid:15) 1 p1 high-dimensionalparameterofinterest. Wecanrewritethismodelasp modelsofthepreviousform: 1 y = α d +g (z )+(cid:15), d = m (z )+v , E(v | z ) = 0 (j = 1,...,p ), j j j j j j j j j j 1 whereα isthetargetcoefficient, j p1 (cid:88) g (z ) = d α +g(u), m (z ) = E(d | z ), j j k k j j j j k(cid:54)=j and where z = (d ,...,d ,d ,...,d ,uT)T. We would like to estimate and perform j 1 j−1 j+1 p1 inferenceoneachofthep coefficientsα ,...,α simultaneously. 1 1 p1 Moreover, we would like to allow regression functions h = (g ,m )T to be of infinite di- j j j mension, that is, they could be written only as infinite linear combinations of some dictionary with respect to zj. However, we assume that there are sparse estimators (cid:98)hj = (g(cid:98)j,m(cid:98)j)T that can estimate h = (g ,m )T at sufficiently fast o(n−1/4) rates in the mean square error sense, j j j asstatedpreciselyinSection3. Examplesoffunctionsh thatpermitsuchestimationbysparse j methodsincludethestandardSobolevspacesaswellasmoregeneralrearrangedSobolevspaces [7,6]. Heresparsityofestimatorsg andm meansthattheyareformedbyO (s)-sparselinear (cid:98)j (cid:98)j P 8 UNIFORMPOSTSELECTIONINFERENCEFORZ-PROBLEMS combinationschosenfromptechnicalregressorsgeneratedfromz ,withcoefficientsestimated j from the data. This framework is general; in particular it contains as a special case the tradi- tional linear sieve/series framework for estimation of h , which uses a small number s = o(n) j ofpredeterminedseriesfunctionsasadictionary. Given suitable estimators for h = (g ,m )T, we can then identify and estimate each of the j j j targetparameters(α )p1 viatheempiricalversionofthemomentequations j j=1 E[ψ {w,α ,h (z )}] = 0 (j = 1,...,p ), j j j j 1 whereψ (w,α,t) = ϕ(y−d α−t )(d −t )andw = (y,d ,...,d ,uT)T. Theseequations j j 1 j 2 1 p1 havetheorthogonalityproperty: (cid:12) [∂E{ψj(w,αj,t) | zj}/∂t](cid:12)t=hj(zj) = 0 (j = 1,...,p1). Theresultingestimationproblemissubsumedasaspecialcaseinthenextsection. 3. INFERENCE ON MANY TARGET PARAMETERS IN Z-PROBLEMS In this section we generalize the previous example to a more general setting, where p tar- 1 get parameters defined via Huber’s Z-problems are of interest, with dimension p potentially 1 much larger than the sample size. This framework covers median regression, its generalization discussedabove,andmanyothersemi-parametricmodels. Theinterestliesinp = p real-valuedtargetparametersα ,...,α . Weassumethateach 1 1n 1 p1 α ∈ A , whereeachA isanon-stochasticboundedclosedinterval. Thetrueparameterα is j j j j identifiedasauniquesolutionofthemomentcondition: (16) E[ψ {w,α ,h (z )}] = 0. j j j j Here w is a random vector taking values in W, a Borel subset of a Euclidean space, which contains vectors z (j = 1,...,p ) as subvectors, and each z takes values in Z ; here z and j 1 j j j z with j (cid:54)= j(cid:48) may overlap. The vector-valued function z (cid:55)→ h (z) = {h (z)}M is a j(cid:48) j jm m=1 measurablemapfromZ toRM,whereM isfixed,andthefunction(w,α,t) (cid:55)→ ψ (w,α,t)is j j a measurable map from an open neighborhood of W ×A ×RM to R. The former map is a j possiblyinfinite-dimensionalnuisanceparameter. Supposethatthenuisancefunctionhj = (hjm)Mm=1admitsasparseestimator(cid:98)hj = ((cid:98)hjm)Mm=1 oftheform p (cid:98)hjm(·) = (cid:88)fjmk(·)θ(cid:98)jmk, (cid:107)(θ(cid:98)jmk)pk=1(cid:107)0 ≤ s (m = 1,...,M), k=1 where p = pn may be much larger than n while s = sn, the sparsity level of (cid:98)hj, is small comparedton,andf : Z → Raregivenapproximatingfunctions. jmk j Theestimatorα ofα isthenconstructedasaZ-estimator,whichsolvesthesampleanalogue (cid:98)j j oftheequation(16): (17) |En[ψj{w,α(cid:98)j,(cid:98)hj(zj)}]| ≤ inf |En[ψ{w,α,(cid:98)hj(zj)}]|+(cid:15)n, α∈A(cid:98)j where (cid:15)n = o(n−1/2b−n1) is the numerical tolerance parameter and bn = {log(ep1)}1/2; A(cid:98)j is apossiblystochasticintervalcontainedinAj withhighprobability. Typically,A(cid:98)j = Aj orcan beconstructedbyusingapreliminaryestimatorofα . j

Description:
may exercise further usage rights as specified in the indicated ference on Nonparametric and high-dimensional statistics (December UNIFORM POST SELECTION INFERENCE FOR Z-PROBLEMS .. The estimator ̂αj of αj is then constructed as a Z-estimator, which solves the sample analogue.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.