ebook img

Partially linear additive quantile regression in ultra-high dimension PDF

0.41 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Partially linear additive quantile regression in ultra-high dimension

TheAnnalsofStatistics 2016,Vol.44,No.1,288–317 DOI:10.1214/15-AOS1367 (cid:13)c InstituteofMathematicalStatistics,2016 PARTIALLY LINEAR ADDITIVE QUANTILE REGRESSION 6 IN ULTRA-HIGH DIMENSION 1 0 By Ben Sherwood and Lan Wang1 2 Johns Hopkins University and University of Minnesota n a J We consider a flexible semiparametric quantile regression model for analyzing high dimensional heterogeneous data. This model has 2 several appealing features: (1) By considering different conditional 2 quantiles, wemay obtain a more complete pictureof theconditional ] distributionofaresponsevariablegivenhighdimensionalcovariates. T (2) The sparsity level is allowed to be different at different quan- S tile levels. (3) The partially linear additive structure accommodates . nonlinearity and circumvents the curse of dimensionality. (4) It is h t naturally robust to heavy-tailed distributions. In this paper, we ap- a proximate the nonlinear components using B-spline basis functions. m We first study estimation under this model when the nonzero com- [ ponents are known in advance and the number of covariates in the linearpartdiverges.Wetheninvestigateanonconvexpenalizedesti- 1 mator for simultaneous variable selection and estimation. We derive v itsoraclepropertyforageneralclassofnonconvexpenaltyfunctions 0 in the presence of ultra-high dimensional covariates under relaxed 0 0 conditions.Totacklethechallengesofnonsmoothlossfunction,non- 6 convex penalty function and the presence of nonlinear components, 0 we combine a recently developed convex-differencing method with . modern empirical process techniques. Monte Carlo simulations and 1 anapplicationtoamicroarraystudydemonstratetheeffectivenessof 0 6 the proposed method. We also discuss how the method for a single 1 quantileofinterestcanbeextendedtosimultaneousvariableselection : and estimation at multiple quantiles. v i X 1. Introduction. In this article, we study a flexible partially linear ad- r a ditive quantile regression model for analyzing high dimensional data. For the ith subject, we observe Y ,x ,z , where x =(x ,...,x )′ is a p - dimensional vector of covaria{teis aindi}z =(z ,..i.,z )i′1is a di-pdnimensionnal i i1 id Received September2014; revised July 2015. 1Supportedin part by NSFGrant DMS-13-08960. AMS 2000 subject classifications. Primary 62G35; secondary 62G20. Key words and phrases. Quantile regression, high dimensional data, nonconvex penalty, partial linear, variable selection. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2016,Vol. 44, No. 1, 288–317. This reprint differs from the original in pagination and typographic detail. 1 2 B. SHERWOODANDL. WANG vector of covariates, i=1,...,n. The τth (0<τ <1) conditional quantile oFf(Yxi g,izve)nisxit,hzeiciosnddeitfiinonedaladsisQtrYiib|xuit,ziio(nτ)fu=nicntfio{nt:oFf(Yt|xig,izvie)n≥xτ},anwdhezre. i i i i i ·| The case τ =1/2 corresponds to the conditional median. We consider the following semiparametric model for the conditional quantile function (1.1) QYi|xi,zi(τ)=x′iβ0+g0(zi), where g (z ) = g + d g (z ), with g . It is assumed that g 0 i 00 j=1 0j ij 00 ∈ R 0j sthateinsfyεEs(agt0ijs(fizeisj))P=(ε0Pfor0ixde,nzti)fi=catτioannpdurwpeosmesa.yLeatlsεoi=wrYitie−YQY=i|xxi,′zβi(τ+), g (z )+i ε . In the resti o≤f th|eipaiper,we will drop the dependenceion τ iin0the 0 i i notation for simplicity. Modeling conditional quantiles in high dimension is of significant impor- tance for several reasons. First, it is well recognized that high dimensional data are often heterogeneous. How the covariate influence the center of the conditional distribution can be very different from how they influence the tails. As a result, focusing on the conditional mean function alone can be misleading. By estimating conditional quantiles at different quantile levels, we are able to gain a more complete picture of the relationship between the covariates and the response variable. Second, in the high dimensional setting, the quantile regression framework also allows a more realistic in- terpretation of the sparsity of the covariate effects, which we refer to as quantile-adaptive sparsity. That is, we assume a small subset of covariates influence the conditional distribution. However, when we estimate different conditionalquantiles,weallowthesubsetsofactive covariates tobedifferent [Wang, Wu and Li (2012); He, Wang and Hong (2013)]. Furthermore, the conditional quantiles are often of direct interest to the researchers. For ex- ample, for the birth weight data we analyzed in Section 5, low birth weight, which corresponds to the low tail of the conditional distribution, is of di- rect interest to the doctors. Another advantage of quantile regression is that it is naturally robust to outlier contamination associated with heavy-tailed errors. For high dimensional data, identifying outliers can be difficult. The robustness of quantile regression provides a certain degree of protection. Linear quantile regression with high dimensional covariates was investi- gated by Belloni and Chernozhukov [(2011), Lasso penalty] and Wang, Wu and Li [(2012), nonconvex penalty]. The partially linear additive structure weconsiderinthispaperisusefulforincorporatingnonlinearityinthemodel while circumventing the curse of dimensionality. We are interested in the case p is of a similar order of n or much larger than n. For applications in n microarray data analysis, the vector x often contains the measurements on i thousandsofgenes,whilethevector z contains themeasurementsofclinical i or environment variables, such as age and weight. For example, in the birth ULTRA-HIGHDIMENSIONALPLA QUANTILEREGRESSION 3 weight example of Section 5, mother’s age is modeled nonparametrically as exploratory analysis reveals a possible nonlinear effect. In general, model specification can be challenging in high dimension; see Section 7 for some further discussion. WeapproximatethenonparametriccomponentsusingB-splinebasisfunc- tions, which are computationally convenient and often accurate. First, we studytheasymptotictheoryofestimating themodel(1.1)whenp diverges. n In our setting, this corresponds to the oracle model, that is, the one we ob- tain if we know which covariates are important in advance. This is along the line of the work of Welsh (1989), Bai and Wu (1994) and He and Shao (2000) for M-regression with diverging number of parameters and possi- bly nonsmooth objective functions,which,however, were restricted to linear regression. Lam and Fan (2008) derived the asymptotic theory of profile kernel estimator for general semiparametric models with diverging number of parameter while assuminga smooth quasi-likelihood function.Second, we propose a nonconvex penalized regression estimator when p is of an expo- n nential order of n and the model has a sparse structure. For a general class of nonsmooth penalty functions, including the popular SCAD [Fan and Li (2001)] and MCP [Zhang (2010)] penalty, we derive the oracle property of the proposed estimator under relaxed conditions. An interesting finding is that solving the nonconvex penalized estimator can be achieved via solving a series of weighted quantile regression problems, which can be conveniently implemented using existing software packages. Derivingtheasymptoticpropertiesofthepenalizedestimatorisverychal- lenging as weneed to simultaneously deal with thenonsmooth loss function, the nonconvex penalty function, approximation of nonlinear functions and very high dimensionality. To tackle these challenges, we combine a recently developed convex-differencing method with modern empirical process tech- niques. The method relies on a representation of the penalized loss function as thedifferenceof two convex functions,whichleads to asufficientlocal op- timality condition [Tao and An (1997), Wang, Wu and Li (2012)]. Empirical process techniques are introduced to derive various error bounds associated with the nonsmooth objective function which contains both high dimen- sional linear covariates and approximations of nonlinear components. It is worth pointing out that our approach is different from what was used in the recent literature for studyingthetheory of high dimensionalsemiparametric mean regression and is able to considerably weaken the conditions required in the literature. In particular, we do not need moment conditions for the random error and allow it to depend on the covariates. Existing work on penalized semiparametric regression has been largely limited to mean regression with fixed p; see, for example, Bunea (2004), LiangandLi(2009),WangandXia(2009),Liu,WangandLiang(2011),Kai, Li and Zou (2011) and Wang et al. (2011). Important progress in the high 4 B. SHERWOODANDL. WANG dimensional p setting has been recently made by Xie and Huang [(2009), still assumes p<n] for partially linear regression, Huang, Horowitz and Wei (2010)foradditivemodels,Li, Xue and Lian[(2011),p=o(n)]forsemivary- ing coefficient models, among others. When p is fixed, the semiparametric quantile regression model was considered by He and Shi (1996), He, Zhu and Fung (2002), Wang, Zhu and Zhou (2009), among others. Tang et al. (2013) considered a two-step procedure for a nonparametric varying coeffi- cients quantile regression model with a diverging number of nonparametric functional coefficients. They required two separate tuning parameters and quite complex design conditions. Therestof thisarticle is organized as follows. InSection 2,wepresentthe partially linear additive quantile regression model and discuss the proper- ties of the oracle estimator. In Section 3, we present a nonconvex penalized method for simultaneous variable selection and estimation and derive its oracle property. In Section 4, we assess the performance of the proposed penalized estimator via Monte Carlo simulations. We analyze a birth weight data set while accounting for gene expression measurements in Section 5. In Section 6, we consider an extension to simultaneous estimation and vari- able selection at multiple quantiles. Section 7 concludes the paper with a discussion of related issues. The proofs are given in the Appendix. Some of the technical details and additional numerical results are provided in online supplementary material [Sherwood and Wang (2015)]. 2. Partially linear additive quantile regression with diverging number of parameters. For high dimensional inference, it is often assumed that the vector of coefficients β =(β ,β ,...,β )′ in model (1.1) is sparse, that 0 01 02 0pn is, most of its components are zero. Let A= 1 j p :β =0 be the n 0j { ≤ ≤ 6 } index set of nonzero coefficients and q = A be the cardinality of A. The n | | set A is unknown and will be estimated. Without loss of generality, we assume that the first q components of β are nonzero and the remaining n 0 p q components arezero. Hence, we can write β =(β′ ,0′ )′,where 0n− ndenotes the (p q )-vector of zeros. Let X0be the01n pnp−qnmatrix of pn−qn n− n × n linearcovariates andwriteitasX =(X ,...,X ).LetX bethesubmatrix 1 pn A consistingofthefirstq columnsofX correspondingtotheactivecovariates. n For technical simplicity, we assume x is centered to have mean zero; and i z [0,1], i,j. ij ∈ ∀ 2.1. Oracle estimator. We first study the estimator we would obtain when the index set A is known in advance, which we refer to as the oracle estimator. Our asymptotic framework allows q , the size of A, to increase n with n. This resonates with the perspective that a more complex statistical model can be fit when more data are collected. ULTRA-HIGHDIMENSIONALPLA QUANTILEREGRESSION 5 We use a linear combination of B-spline basis functions to approximate the unknown nonlinear functions g (). To introduce the B-spline functions, 0 · we start with two definitions. Definition. Let r m+v, where m is a positive integer and v (0,1]. ≡ ∈ Define as the collection of functions h() on [0,1] whose mth derivative r H · h(m)() satisfies the H¨older condition of order v. That is, for any h() , r · · ∈H there exists some positive constant C such that (2.1) h(m)(z′) h(m)(z) C z′ z v 0 z′,z 1. | − |≤ | − | ∀ ≤ ≤ Assume for some r 1.5, the nonparametric component g () . Let 0k r ≥ · ∈H π(t)=(b (t),...,b (t))′ denote a vector of normalized B-spline basis 1 kn+l+1 functions of order l+1 with k quasi-uniform internal knots on [0,1]. Then n g () can be approximated using a linear combination of B-spline basis 0k func·tions in Π(z )=(1,π(z )′,...,π(z )′)′. We refer to Schumaker (1981) i i1 id for details of the B-spline construction, and the result that there exists ξ0∈RLn, where Ln=d(kn+l+1)+1, such that supzi|Π(zi)′ξ0−g0(zi)|= O(k−r). For ease of notation and simplicity of proofs, we use the same num- n ber of basis functions for all nonlinear components in model (1.1). In prac- tice, such restrictions are not necessary. Now consider quantile regression with theoracle information thatthelast (p q ) elements of β are all zero. Let n− n 0 n 1 (2.2) (βˆ ,ξˆ)=argmin ρ (Y x′ β Π(z )′ξ), 1 n τ i− Ai 1− i (β ,ξ) 1 i=1 X where ρ (u)=u(τ I(u<0)) is the quantile loss function and x′ ,...,x′ denote tτhe row vec−tors of X . The oracle estimator for β is (βˆA′,10′ A)′n. Write ξˆ=(ξˆ,ξˆ′,...,ξˆ′)′ whAere ξˆ and ξˆ kn+l+1,0j =11...,pdn−. qTnhe 0 1 d 0 ∈R j ∈R estimator for the nonparametric function g is 0j n gˆ (z )=π(z )′ξˆ n−1 π(z )′ξˆ , j ij ij j − ij j i=1 X for j =1,...,d; for g is gˆ =ξˆ +n−1 n d π(z )′ξˆ . The centering 00 0 0 i=1 j=1 ij j of gˆ is the sample analog of the identifiability condition E[g (z )]=0. The j 0j i P P estimator of g (z ) is gˆ(z )=gˆ + d gˆ (z ). 0 i i 0 j=1 j ij 2.2. Asymptotic properties. WPe next present the asymptotic properties of the oracle estimators as q diverges. n Definition. Given z=(z ,...,z )′, the function g(z) is said to belong 1 d totheclassoffunctions ifithastherepresentationg(z)=α+ d g (z ), α , g and E[gG(z )]=0. k=1 k k k r k k ∈R ∈H P 6 B. SHERWOODANDL. WANG Let n h∗()=arginf E[f (0)(x h (z ))2], j · i ij − j i hj(·)∈G i=1 X wheref ()istheprobabilitydensityfunctionofε given(x ,z ).Letm (z)= i i i i j E[x z =· z], then it can be shown that h∗() is the weighted projection of ij| i j · m () into under the L norm, where the weights f (0) are included to j 2 i · G account for the possibly heterogeneous errors. Furthermore, let x be the Aij (i,j)th element of X . Define δ x h∗(z ), δ =(δ ,...,δ )′ qn A ij ≡ Aij − j i i i1 iqn ∈R and ∆ =(δ ,...,δ )′ Rn×qn. Let H be the n q matrix with the (i,j)th n 1 n n element H =h∗(z ), t∈hen X =H +∆ . × ij j i A n The following technical conditions are imposed for analyzing the asymp- totic behavior of βˆ and gˆ. 1 Condition 1 (Conditions on the random error). The random error ε i has theconditional distribution function F and continuous conditional den- i sity function f , given x , z . The f are uniformly bounded away from 0 i i i i and infinity in a neighborhood of zero, its first derivative f′ has a uniform i upper bound in a neighborhood of zero, for 1 i n. ≤ ≤ Condition 2 (Conditions on the covariates). There exist positive con- stants M and M such that x M , 1 i n,1 j p and E[δ4] 1 2 | ij|≤ 1 ∀ ≤ ≤ ≤ ≤ n ij ≤ M , 1 i n,1 j q . There exist finite positive constants C and C 2 n 1 2 ∀ ≤ ≤ ≤ ≤ such that with probability one C λ (n−1X X′ ) C , C λ (n−1∆ ∆′ ) C . 1≤ max A A ≤ 2 1≤ max n n ≤ 2 Condition 3 (Condition on the nonlinear functions). For r=m+v> 1.5 g . 0 ∈G Condition 4 (Condition on the B-spline basis). The dimension of the spline basis k has the following rate k n1/(2r+1). n n ≈ Condition 5 (Condition on model size). q =O(nC3) for some C < 1. n 3 3 Condition1is considerably morerelaxed thanwhatis usuallyimposedon the random error for the theory of high dimensional mean regression, which often requires Gaussian or sub-Gaussian tail condition. Condition 2 is about thebehavior of thecovariates andthedesign matrix undertheoracle model, whichisnotrestrictive.Condition3istypicalfortheapplicationofB-splines. Stone(1985)showed thatB-splines basisfunctionscan beusedtoeffectively approximate functions satisfying H¨older’s condition. Condition 4 provides the rate of k needed for the optimal convergence rate of gˆ. Condition 5 is n standard for linear models with diverging number of parameters. ULTRA-HIGHDIMENSIONALPLA QUANTILEREGRESSION 7 Thefollowingtheoremsummarizestheasymptoticpropertiesoftheoracle estimators. Theorem 2.1. Assume Conditions 1–5 hold. Then βˆ β =O ( n−1q ), k 1− 01k p n n p n−1 (gˆ(z ) g (z ))2=O (n−1(q +k )). i 0 i p n n − i=1 X An interesting observation is that since we allow q to diverge with n, it n influences the rates for estimating both β and g. As q diverges, to investi- n gate the asymptotic distribution of βˆ , we consider estimating an arbitrary 1 linear combination of the components of β . 01 Theorem 2.2. Assume the conditions of Theorem 2.1 hold. Let A be n an l q matrix with l fixed and A A′ G, a positive definite matrix, then × n n n→ √nA Σ−1/2(βˆ β ) N(0 ,G) n n 1− 01 → l in distribution, where Σ = K−1S K−1 with K = n−1∆′ B ∆ , S = n n n n n n n n n n−1τ(1 τ)∆′ ∆ , and B =diag(f (0),...,f (0)) is an n n diagonal ma- trix wit−h f (0)ndennoting thne conditio1nal densitny function of×ε given (x ,z ) i i i i evaluated at zero. If we consider the case where q is fixed and finite, then we have the following result regarding the behavior of the oracle estimator. Corollary 1. Assume q is a fixed positive integer, n−1∆′ B ∆ Σ n n n→ 1 and n−1τ(1 τ)∆′ ∆ Σ , where Σ and Σ are positive definite matrices. − n n→ 2 1 2 If Conditions 1–4 hold, then √n(βˆ β ) d N(0 ,Σ−1Σ Σ−1), 1− 01 → q 1 2 1 n n−1 (gˆ(z ) g (z ))2 = O (n−2r/(2r+1)). i 0 i p − i=1 X In the case q is fixed, the rates reduce to the classical n−1/2 rate for n estimating β and n−2r/(2r+1) for estimating g, the latter which is consistent with Stone (1985) for the optimal rate of convergence. 3. Nonconvex penalized estimation for partially linear additive quantile regression with ultra-high dimensional covariates. 3.1. Nonconvex penalized estimator. In real data analysis, we do not know which of the p covariates in x are important. To encourage sparse n i 8 B. SHERWOODANDL. WANG estimation, we minimize the following penalized objective function for esti- mating (β ,ξ ), 0 0 n pn (3.1) QP(β,ξ)=n−1 ρ (Y x′β Π(z )′ξ)+ p ( β ), τ i− i − i λ | j| i=1 j=1 X X where p () is a penalty function with tuning parameter λ. The L penalty λ 1 · or Lasso [Tibshirani (1996)] is a popular choice for penalized estimation. However, the L penalty is known to over-penalize large coefficients, tends 1 to be biased and requires strong conditions on the design matrix to achieve selection consistency.Thisisusuallynotaconcernforprediction,butcanbe undesirable if the goal is to identify the underlying model. In comparison, an appropriate nonconvex penalty function can effectively overcome this problem [Fan and Li (2001)]. In this paper, we consider two such popular choicesofpenaltyfunctions:theSCAD[FanandLi(2001)]andMCP[Zhang (2010)] penalty functions. For the SCAD penalty function, aλβ (β2+λ2)/2 p (β )=λ β I(0 β <λ)+ | |− I(λ β aλ) λ | | | | ≤| | a 1 ≤| |≤ − (a+1)λ2 + I( β >aλ) for some a>2, 2 | | and for the MCP penalty function, β2 aλ2 p ( β )=λ β I(0 β <aλ)+ I(β aλ) for some a>1. λ | | | |−2aλ ≤| | 2 | |≥ (cid:18) (cid:19) For both penalty functions, the tuning parameter λ controls the complexity of the selected model and goes to zero as n increases to . ∞ 3.2. Solving the penalized estimator. We propose an effective algorithm to solve the above penalized estimation problem. The algorithm is largely based on the idea of the local linear approximation (LLA) [Zou and Li (2008)]. We employ a new trick based on the observation β =ρ (β )+ j τ j | | ρ ( β ) to transform the approximated objective function to a quantile re- τ j − gression objective function based on an augmented data set, so that the pe- nalizedestimatorcanbeobtainedbyiterativelysolvingunpenalizedweighted quantile regression problems. More specifically, we initialize the algorithm by setting β=0 and ξ=0. Then for each step t 1, we update the estimator by ≥ n pn (3.2)(βˆt,ξˆt)=argmin n−1 ρ (Y x′β Π(z )′ξ)+ p′ (βˆt−1 )β , τ i− i − i λ | j | | j| (β,ξ) ( ) i=1 j=1 X X ULTRA-HIGHDIMENSIONALPLA QUANTILEREGRESSION 9 where βˆt−1 is the value of β at step t 1. j j − By observing that we can write β as ρ (β )+ρ ( β ), the above min- j τ j τ j | | − imization problem can be framed as an unpenalized weighted quantile re- gression problem with n+2p augmented observations. We denote these n augmented observations by (Y∗,x∗,z∗), i=1,...,(n+2p ). The first n ob- i i i n servations are those in the original data, that is, (Y∗,x∗,z∗)=(Y ,x ,z ), i i i i i i i = 1,...,n; for the next p observations, we have (Y∗,x∗,z∗) = (0,1,0), n i i i i=n+1,...,n+p ; and the last p observations are given by (Y∗,x∗,z∗)= n n i i i (0, 1,0), i=n+p +1,...,n+2p . We fit weighted linear quantile regres- n n sion−modelwiththeobservations(Y∗,x∗,z∗)andcorrespondingweights wt∗, i i i i where wt∗=1, i=1,...,n; wt∗ =p′ (βˆt−1 ), j=1,...,p ; and wt∗ = i n+j λ | j | n n+pn+j p′ (βˆt−1 ), j=1,...,p . − λ | j | n The above new algorithm is simple and convenient, as weighted quantile regression can be implemented using many existing software packages. In our simulations, we used the quantreg package in R and continue with the iterative procedure until βˆt βˆt−1 <10−7. 1 k − k 3.3. Asymptotic theory. In addition to Conditions 1–5, we impose an additional condition on how quickly a nonzero signal can decay, which is needed to identify the underlying model. Condition 6 (Condition on the signal). There exist positive constants C and C such that 2C <C <1 and n(1−C4)/2min β C . 4 5 3 4 1≤j≤qn| 0j|≥ 5 Due to the nonsmoothness and nonconvexity of the penalized objective functionQP(β,ξ),theclassicalKKTconditionisnotapplicabletoanalyzing the asymptotic properties of the penalized estimator. To investigate the asymptotic theory of the nonconvex estimator for ultra-high dimensional partially linear additive quantile regression model, we explore the necessary condition for the local minimizer of a convex differencing problem [Tao and An (1997); Wang, Wu and Li (2012)] and extend it to the setting involving nonparametric components. Our approach concerns a nonconvex objective function that can be ex- pressed as the difference of two convex functions. Specifically, we consider objective functions belonging to the class F= q(η):q(η)=k(η) l(η),k(),l() are both convex . { − · · } This is a very general formulation that incorporates many different forms of penalized objective functions. The subdifferential of k(η) at η =η is 0 defined as ∂k(η )= t:k(η) k(η )+(η η )′t, η . 0 { ≥ 0 − 0 ∀ } 10 B. SHERWOODANDL. WANG Similarly, we can definethesubdifferentialof l(η). Let dom(k)= η:k(η)< { be the effective domain of k. A necessary condition for η∗ to be a ∞} local minimizer of q(η) is that η∗ has a neighborhood U such that ∂l(η) ∂k(η∗)=∅, η U dom(k) (see Lemma 7 in the Appendix). ∩ 6 ∀ ∈ ∩ To appeal to the above necessary condition for the convex differencing problem, it is noted that QP(β,ξ) can be written as QP(β,ξ)=k(β,ξ) l(β,ξ), − wherethe two convex functions k(β,ξ)=n−1 n ρ (Y x′β Π(z )′ξ)+ λ pn β , and l(β,ξ)= pn L(β ). The speci=ifi1c τformi−ofiL(−β ) deipends j=1| j| j=1 j P j on the penalty function being used. For the SCAD penalty function, P P L(β )=[(β2+2λ β +λ2)/(2(a 1))]I(λ β aλ) j j | j| − ≤| j|≤ +[λ β (a+1)λ2/2]I( β >aλ); j j | |− | | while for the MCP penalty function, L(β )=[β2/(2a)]I(0 β <aλ)+[λ β aλ2/2]I( β aλ). j j ≤| j| | j|− | j|≥ Buildingontheconvexdifferencingstructure,weshowthatwithprobabil- ityapproachingonethattheoracleestimator(βˆ′,ξˆ′)′,whereβˆ =(βˆ′,0′ )′, 1 pn−qn is a local minimizer of QP(β,ξ). To study the necessary optimality condi- tion, we formally define ∂k(β,ξ) and ∂l(β,ξ), the subdifferentials of k(β,ξ) andl(β,ξ),respectively.First,thefunctionl(β,ξ)doesnotdependonξ and is differentiable everywhere. Hence, its subdifferential is simply the regular derivative. For any value of β and ξ, ∂l(β,ξ)= µ=(µ ,µ ,...,µ )′ Rpn+Ln : 1 2 pn+Ln ∈ (cid:26) ∂l(β) µ = ,1 j p ;µ =0,p +1 j p +L . j n j n n n ∂β ≤ ≤ ≤ ≤ j (cid:27) For 1 j p , for the SCAD penalty function, n ≤ ≤ 0, 0 β <λ, j ∂l(β) ≤| | = (β λsgn(β ))/(a 1), λ β aλ, j j j ∂β  − − ≤| |≤ j λsgn(βj), βj >aλ, | | while for the MCPpenalty function, ∂l(β) β /a, 0 β <aλ, j j = ≤| | ∂β λsgn(β ), β aλ. j (cid:26) j | j|≥

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.