ebook img

Adaptive group LASSO selection in quantile models PDF

0.28 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Adaptive group LASSO selection in quantile models

Noname manuscript No. (will be inserted by the editor) Adaptive group LASSO selection in quantile models GABRIELA CIUPERCA 6 1 0 2 l u J Received: date/Accepted: date 3 1 Abstract The paper considers a linear model with grouped explanatory variables. If the model ] errors are not with zero mean and bounded variance or if model contains outliers, then the least T squares framework is not appropriate. Thus, the quantile regression is an interesting alternative. S In order to automatically select the relevant variable groups,we propose and study here the adap- . h tive group LASSO quantile estimator. We establish the sparsity and asymptotic normality of the t a proposedestimatorintwocases:fixednumber anddivergentnumberofvariablegroups.Numerical m study by Monte Carlo simulations confirms the theoretical results and illustrates the performance [ of the proposed estimator. 2 Keywords group selection quantile model adaptive LASSO selection consistency oracle v properties. · · · · 5 6 Mathematics Subject Classification (2010) 62J05 62J07 0 · 8 0 . 1 Introduction 1 0 Classically, for the regression model, the errors are assumed to be independent, of mean zero and 6 1 bounded variance.Then, the model is estimated by the leastsquares(LS) method, eventually with : a penalty of LASSO type when automatic detection of significant variables is performed. If the v assumptions on the first two moments of the model error are not satisfied, then the LS framework i X breaks down. In this case, an alternative is to consider the quantile regression with a LASSO type r penalty. This is one of the interests of this paper. The quantile regression is robust and allows a relaxation of the two first moment conditions of the model error. Often enough in practice, for example in the variance analysis case, are considered the regression linear models with grouped variables. For models with grouped explanatory variables it is more meaningful to identify relevant variable groups instead of individual variables. If the errors have GABRIELACIUPERCA Universit´edeLyon,Universit´eLyon1,CNRS,UMR5208,InstitutCamilleJordan,Bat.Braconnier,43,blvddu11 novembre1918,F-69622VilleurbanneCedex,France, E-mail:[email protected] tel: 33(0)4.26.23.45.57, fax: 33(0)4.72.43.16.87 2 GABRIELACIUPERCA Normal distribution, then for detecting the relevant variable groups, the F-statistic test is used. If the errors are not Gaussian and if more the number of groups is large, then the F-statistic test is inappropriate. From where, another interest of this paper: we consider the quantile process with LASSO type penalty in order to automatically detect the irrelevant variable groups. The automatic selection method of the grouped variables using the LASSO penalties was in- troduced by Yuan and Lin (2008) for gaussian errors, by proposing the LASSO group penalty for the process of the error squares sum. Several recent papers have considered group selection using LASSOtype penalties.Forfixedparameterspaceandmeanzero,finite secondmomenti.i.d.model errors, Nardi and Rinaldo (2008) established the model selection consistency and asymptotic nor- mality of nonzero group LASSO estimator. The same estimator is studied by Nardi and Rinaldo (2008)whennumberofcovariatesislarger,forparticularcaseofnormalerrors.Forgaussianerrors, Xu and Ghosh(2015) realizea Bayesianvariableselectionbypenalizationofthe errorsquaressum with Bayesian group LASSO. For this estimation method, the posterior median estimator satisfies thesparsityproperty.TheadaptivegroupLASSOestimators,whenthenumberpofgroupsisfixed, wasstudiedbyWang and Leng(2008).Forhigh-dimensionalmodel,Wei and Huang(2010)studied the selection and estimation properties of the adaptive group LASSO, but under assumption that the errors are gaussian. Still for the error squares sum penalized with adaptive LASSO penalty, Zhang and Xiang (2015) consider the case of the number of groups p converges to infinity when n n ,fori.i.d.errorsεsuchthatIE[ε]=0andVar[ε]< .Theconsistencyandasymptoticnor- →∞ ∞ mality of the parameter estimator are established. A paper that doesn’t consider the LS penalized process, but a process associated to a twice differentiable convex function, with LASSO penalty, for the casep largeand small n wasconsideredby Wang et al.(2015). When the number ofgroups can grow at a certain polynomial rate, the automatic selection property of variable groups for a LS process with SCAD penalty has been proven in Guo et al. (2015). Automatic selection of the relevantvariablegroups,whenpconvergestoinfinity,hasalsoconsideredbyZou and Zhang(2009) penalizingtheLSprocesswithadaptiveelastic-netpenalty.Forareviewofgroupselectionmethods and several applications of these methods the reader can see Huang et al. (2012). Inthispaperweconsiderthe modelselectionproblemandtheestimationinalinearmodelwith p groupsofexplanatoryvariables.We proposeandstudy the asymptotic propertiesofthe adaptive groupLASSO quantile estimator in two cases: p fixed and p as n . This estimator is the →∞ →∞ minimizerofthe quantileprocesspenalizedbyanadaptivegroupLASSOpenalty.Theoracleprop- erties, i.e. the automatic selection of significantvariables groupsand their asymptotic distribution, are proved. The remainder of the paper is organized as follows. In Section 2 we present the model and introduce some notations used throughout in this paper. Oracle properties for the adaptive group LASSO quantile estimator are proved for p fixed in Section 3 and for p as n in Sec- → ∞ → ∞ tion 4. Section 5 reports some simulation results which illustrate the method interest. We compare the adaptive group LASSO quantile estimation performance with the adaptive groupLASSO least squares estimations, proposed by Zhang and Xiang (2015). All proofs are given in Section 6. 2 Model and notations Inthissection,wepresentthestatisticalmodelandwealsointroducesomenotationsusedthrough- out in the paper. Webeginbyintroducingsomegeneralnotations.Allvectorsandmatricesaredenotedbyboldsym- bols and all vectors are written as column vectors. For a vector v, we denote by vt its transposed Adaptive group LASSO selection in quantile models 3 P andby v itsEuclideannorm.Notations L , representtheconvergenceindistributionand k k n−→ n−→ in probability, respectively, as n . For→a∞pos→it∞ive definite matrix M, we denote by λ (M) min → ∞ and λ (M) its the smallest and largest eigenvalues, respectively. max We willalsouse the followingnotations:ifV andU arerandomvariablesequences,V =o (U ) n n n IP n means that lim IP[U /V > e] = 0 for any e > 0, V = O (U ) means that there exists n n n n IP n →∞ | | a finite C > 0 such that IP[U /V > C] < e for any n and e. If V and U are deterministic n n n n | | sequences, V = o(U ) means that the sequence V /U 0 for n , V = O(U ) means that n n n n n n → → ∞ the sequence V /U is bounded for sufficiently large n. n n Throughout this paper, C will denote generic constant; not depending on size n which may take different values in different formula or even in different parts of the same formula. The value of C is not of interest. We will also use the notation 0 for the zero k-vector. k We consider the following linear model with p groups of explanatory variables: p Y = Xt β +ε =Xtβ+ε , i=1, ,n, (1) i ij j i i i ··· j=1 X with Y ,ε random variables. For each group j = 1, ,p, the vector of the parameters is β i i ··· j ≡ (βj1,··· ,βjdj)∈Rdj andthe designforobservationi is Xij,a columnvectorofsizedj.The vector with all coefficients is β (β , ,β ) and for observation i, the vector with all explanatory ≡ 1 ··· p variables is X = (X , ,X ). Denote by β0 = (β0 , ,β0 ) the true value (unknown) of the i i1 ··· ip j j1 ··· jdj parameter β . For observation i, we denote by X the kth variable of the jth group. j ij,k We emphasize that for the ith sample, we observe (Y ,X ), i=1, ,n. i i ··· The relevant groups of explanatory variables correspond to the nonzero vectors. Without loss of generality, on suppose that the first p (p p) groups of explanatory variables are relevant: 0 0 ≤ β0 =0, for all j p and β0 =0, for all j >p , k jk6 ≤ 0 k jk 0 where . istheEuclideannorm.Letrbethetotalnumberofexplanatoryvariables,sor = p d . kk j=1 j We denote by r0 = p0 d . So, p is the number of nonzero true parameter vectors and r0 is the j=1 j 0 P total number of parameters in these nonzero true vectors. P The multi-factor ANOVA model is an example of this model. We introduce now the quantile framework. For a fixed quantile index τ (0,1), the check function ρ (.):R R is defined by ρ (u)=u(τ 11 ). ∈ τ + τ u<0 → − The quantile estimator of β, is the minimizer of the quantile process associated to model (1): n β argmin ρ (Y Xtβ). (2) n ≡ β Rr τ i− i ∈ Xi=1 e Fortheparticularcaseτ =1/2weobtainthe medianregressionand(2)becomesthe leastabsolute deviations estimator. A great advantage of the quantile framework is that, compared to classical estimation methods that are sensitive to outliers, the quantile method provides more robust esti- mators. Moreover,the required assumptions to the error moments are relaxed. The estimator β =(β ,β , ,β ) has as d -subvector β for each group j =1, ,p. n n;1 n;2 ··· n;p j n;j ··· The quantile estimation method doesn’t perform automatic variable selection. For finding the zero e e e e e 4 GABRIELACIUPERCA vectors, i.e. the irrelevant groups of variables, hypothesis tests are required. However when model (1) has a large group number p, it is useful to estimate simultaneously the parameter groups and to eliminate the irrelevant groups without crossing every time by a hypothesis test. The adaptive LASSO penalties have the advantage of automatic selection and of parameter estimation (see for example Zhang and Xiang (2015), Wei and Huang (2010), Wang and Leng (2008)). Inorderto introduceandstudy the adaptiveLASSOestimator,weconsiderthefollowingindex set j; β0 =0 = 1, ,p A≡{ k jk6 } { ··· 0} and c j; β0 =0 = p +1, ,p its complementary set. The set contains the index set A ≡{ k jk } { 0 ··· } A corresponding to groups with nonzero true parameters. For β a r-vector of parameters, we denote by β the r0-subvector of β which contains β , for j j =1, ,p . Similarly, the (r r0)-vector β coAntains β for j =p +1, ,p. In prac·t·i·ce,0the set is unkno−wn. Then, weAmcust find thejset and0estim·a·t·e the corresponding A A parameters. In Sections 3 and 4 we will introduce an estimator, denoted β∗, which minimizes the quantile n process penalized with an adaptive group LASSO penalty, for two cases: p fixed and p as → ∞ n . We generalize the adaptive LASSO quantile estimatorbproposed by Ciuperca (2016) for → ∞ individual variable selection to the case of group selection. We call this estimator, adaptive group LASSO quantile (ag LASSO Q) estimator. We say that β∗ satisfies the oracle properties if: n (i) asymptotic normality: √n(β∗ β0) converges in law to a centred Normal distribution. (ii) sparsity propberty: lim Pn[ −= jA=1, ,p; β∗ =0 ]=1. n→∞b A { ··· k n;jk6 dj} b 3 Fixed p case In this section we propose and study the asymptotic properties of the ag LASSO Q estimator for the parameter β of model (1) when the group number p is fixed. We define the ag LASSO Q estimator by: β∗ argminQ(β), n ≡ β Rr ∈ b where Q(β) is the penalized quantile process with the adaptive group LASSO penalty: n p p Q(β) ρ (Y Xt β )+µ ω β , (3) ≡ τ i− ij j n n;jk jk i=1 j=1 j=1 X X X b with the weight ω β γ, γ >0. The estimator β∗ is written as β∗ =(β∗ , ,β∗ ) and n;j ≡k n;jk− n n n;1 ··· n;p β∗ is a subvector of size d , for j =1, ,p. Fonr;ja particularbcase of aeqjuantile mod·e·l·with non-groubped variables, db= 1 fbor all j =b1, ,p, j ··· wbe obtain the adaptive LASSO quantile estimator proposed and studied by Ciuperca (2016). Adaptive group LASSO selection in quantile models 5 Before presenting the main results for β∗ in the fixed p case,we give the requiredassumptions. n The tuning parameter µ and the constant γ are such that, for n , n →∞ µb µ , n 0, n(γ 1)/2µ . (4) n − n →∞ √n → →∞ For the design (X ) we consider the following assumption: i 16i6n (A1)n 1max XtX 0 andn 1 n X Xt Υ,withΥ ar r positivedefinite matrix. − 1≤i≤n i in−→ − i=1 i in−→ × For the errors ε we suppos→e∞that: →∞ i P (A2) (ε ) are independent, identically distributed, with F : [0,1] the distribution func- i 16i6n B → tion and a continuous positive density f in a neighborhood of 0. The τth quantile of ε is zero: i τ =F(0). Moreover,for every e int( ), 1 Rr we have r ∈ B ∈ n xti1r 1 lim n 1 √n[F(e+v/√n) F(e)]dv = f(e)1tΥ1 , (5) n − − 2 r r →∞ i=1Z0 X where 1 is the r-vector with all components 1. The set is a real set, with 0 . r B ∈B Assumption (A1) is standard for LASSO methods and (A2) is classic for quantile regression (see Ciuperca (2016), Koenker (2005), Zou and Yuan (2008), Wu and Liu (2009)). Assumption (A1) requests that the design matrix has a reasonable good behaviour. For the tuning parameter µ ,the sameconditionson(4)arerequiredinCiuperca(2016)foradaptiveLASSO quantilemodel n but with ungrouped explanatory variables. We make the remark that for ANOVA model, since in the analysis of variance there is a con- straint for each level of a factor, we consider as constraintthat the effect of this level is zero. Then this zero level is not considered in the model in order that assumption (A1) is satisfied. In order to study the asymptotic properties of the estimator β∗, let us consider the index set n of the groups selected by the adaptive group LASSO quantile method: b j 1, ,p ; β∗ =0 A∗n ≡{ ∈{ ··· } k n;jk6 } and A c its complementary set. ∗n b b Tbhe following Theorem shows that the ag LASSO Q estimators with the index in the set A areasymptoticallyGaussian.Then, the estimatorsofthe nonzeroparametervectorshavethe same asymptotic distribution they would have if the zero parameter vectors were known. Theorem 1 Under assumptions (A1), (A2) and condition (4), we have √n(β∗ β0) L n − A n−→ 0r0,τ(1 τ)f−2(0)Υ−1 , with Υ the submatrix of Υ with the row and column indices→i∞n N1, ,d ,d−+1, ,d A+d , , Ap0 d . b { (cid:0)··· 1 1 ··· 1 (cid:1)2 ··· j=1 j} WegivenowtheKarush-Kuhn-TPucker(KKT)optimalityconditions,neededtoprovethesparsity property for β∗. n For all j , we have, with probability one, the following d equalities ∈A∗n j b n n µ ω β∗ b τ Xij − Xij11Yi<Xtiβb∗n = n βn∗;j n;j. (6) Xi=1 Xi=1 kb n;jbk b 6 GABRIELACIUPERCA For all j , for all k =1, ,d we have, with probability one, the following inequality 6∈A∗n ··· j n n b (cid:12)τ Xij,k− Xij,k11Yi<Xtiβb∗n(cid:12)≤µnωn;j. (7) (cid:12) Xi=1 Xi=1 (cid:12) (cid:12) (cid:12) The following theorem sh(cid:12)ows the sparsity property of th(cid:12)e ag LbASSO Q estimator. This result (cid:12) (cid:12) states that the adaptive group LASSO quantile estimators of the nonzero parameter vectors are exactly nonzero with a probability converging to one when n diverges to infinity. Theorem 2 Under the assumptions of Theorem 1 and under the condition nγ/2 1λ , as − n n , we have lim P[ = ]=1. → ∞ →∞ n→∞ A∗n A Theorem1andTheorem2establishtheasymptoticnormalityandthesparsityoftheag LASSO Q b estimator, which means that this estimator still share the oracle properties in the case of fixed p. Remark 1 For the weight ω associated to the jth group, we considered the quantile estimator n;j norm to the power γ. In view of the proofs of Theorem 1 and Theorem 2, these two theorems − remain true also when β is replacedby any estimator of β , with convergencerate n 1/2, under n;jb j − assumptions (A1), (A2). e 4 The case of p depending on n Consider now same model (1) with grouped variables, but with the number p of groups depending on n: p = p and p as n . More precisely, we consider p = O(nc), with the constant n n n c (0,1). For readab→ilit∞y, we ke→ep∞the notation p instead of p . Similarly, we have r = p d , wi∈th r depending on n. Always for simplicity of notation, for tnhe design X , for the paramejt=e1r βj, i P ever if their dimension depends on n, we do not put subscript n. We will first find the convergence rate of the quantile estimator β of β. Afterwards, we will n proposefor β anadaptivegroupLASSO quantileestimator.Eventhoughthe numberp divergesas n , this estimator keeps the oracle properties. f →∞ Since the design size depends on n, we need reconsider the assumptions on X . Then, let us i consider the following assumptions for the errors (ε ), design (X ) and for the number p of groups: i i (A3)(ε ) arei.i.d.LetF bethedistributionfunctionandf bethedensityfunctionof(ε ).The i 1 i n i ≤≤ density function f is continuously, strictly positive in a neighbourhood of zero and has a bounded first derivative in the neighbourhood of 0. The τth quantile of ε is zero: τ =F(0). i (A4) There exist two constants 0 < m M < , such that m λ (n 1 n X Xt) λ (n 1 n X Xt) M . 0 ≤ 0 ∞ 0 ≤ min − i=1 i i ≤ max − i=1 i i ≤ 0 P (A5) (p/n)1/2max X 0, as n . P 16i6nk ik→ →∞ (A6) p is such that p=O(nc), with 0<c<1. Sincep ,condition(5)ofassumption(A2)forthecasepfixedisnowreplacedbyf bounded ′ →∞ in the neighborhood of 0. This assumptionalso been consideredfor alwayshigh-dimensional quan- tilemodel,withseamlessL penaltybyCiuperca(2015).InCiuperca(2015),assumptions(A4)and 0 (A5) are also required. Assumption (A6) was considered by Zhang and Xiang (2015) for an high- dimensional linear model where the objective function is the error squares sum, penalized with Adaptive group LASSO selection in quantile models 7 an adaptive group LASSO penalty. Assumptions (A4), (A5), (A6) are also required for an high- dimensional linear model by Zou and Zhang (2009), which penalize the LS process with adaptive elastic-netpenalty.Inrespecttothecasepfixed,assumptions(A4)and(A5)arethesimilarof(A1). We will start by finding the convergence rate of quantile estimator (2) in the case p as → ∞ n . For this, consider the quantile process: →∞ n G (β) ρ (Y Xtβ). n ≡ τ i− i i=1 X For the quantile estimator existence, we assume that the total number r of parameters is strictly less than n. We recall that in the case p fixed, the convergence rate of the quantile estimator β is of n order n 1/2 (see for example Koenker (2005)). We will show that, the quantile estimator has the − convergence of order (p/n)1/2, when the explanatory group variable number diverges weith the sample size. In view of the proof of Lemma 1, the convergence rate of β depends only of p and n not of total number r of parameters, thanks to assumption (A5). One needs the convergence rate of the quantile estimator is necessary for studying the asymptotic behaveiour of the penalty which intervenes in adaptive group LASSO quantile process. Lemma 1 Under assumptions (A3)-(A6), we have kβn−β0k=OP np . Consider now the following adaptive group LASSO quantile (ag L(cid:0)ApSSO(cid:1) Q) estimator: e p 1 β∗ argmin G (β)+λ ω β , n ≡ β Rd n n n n;jk jk ∈ Xj=1 b  b  where λ is a tuning parameter (positive) and the weights of the LASSO penalty are ω n n;j ≡ β γ, with γ > 0. The relation between the tuning parameter µ of relation (3) for the case k n;jk− n p fixed and λn for the case p depending on n is λn = µn/n. We prefer to consider these fobrms as tuening parameterand as objectiveprocess,for having a similarity with the adaptive groupLASSO LS (ag LASSO LS) case considered by Zhang and Xiang (2015). Inorderto study the asymptotic normalityofβ∗ we need to impose anadditionalconditionon n the total number of nonzero parameters. More precisely, r0 it is assumed to be the same order as p . This is for controlling the penalty, so that it isbsmaller than the quantile process. 0 Concerning the size of the nonzero parameter vectors, we take the following assumption: (A7) r0 =O(p ). 0 For the smallest nonzero vector norm and on constant c of assumption (A6) we assume: (A8) Let us denote h min β0 . There exists a constant M > 0 such that M n αh 0 ≡ 16j6p0k jk ≤ − 0 and α>(c 1)/2. − ThesetwoassumptionswerealsofoundinthepaperZhang and Xiang(2015),forag LASSO LS method in high-dimensional linear model, but with a supplementary condition for r: r = O(p). Here, we do not need this requirement, since assumption (A5) is imposed. On the other hand, in 8 GABRIELACIUPERCA Zhang and Xiang (2015), instead of assumption (A5) the condition n 1/2max X 2 0, − 16i6n i, k Ak → as n , is required. →∞ The following theorem gives the convergence rate of the ag LASSO Q estimator when p . →∞ We obtainthe same convergenceratethat ofquantile estimatorwhengroupnumber diverges.This convergencerate is also obtained by Zhang and Xiang (2015) for the ag LASSO LS estimator,but for errors (ε) with mean zero and bounded variance. 16i6n Theorem 3 Under assumptions (A3)-(A6), (A8) and the tuning parameter (λn)n N satisfying ∈ λnn(1+c)/2−αγ →0, as n→∞, we have kβ∗n−β0k=OP np . (cid:0)p (cid:1) Thefollowingtheoremshowstheoraclebpropertiesforag LASSO Q estimatorwhenthenumber p of groups diverges. We denote by X a r0-vector which contains the sub-vectors X , for j i, i,j A ∈ 1, ,p . 0 { ··· } Theorem 4 Suppose that assumptions (A3)-(A6), (A8) are satisfied and also that the tuning pa- rameter satisfies λ n(1 c)(1+γ)/2 , λ n(c+1)/2 αγ 0, as n . Then: n − n − →∞ → →∞ (i) P = 1, for n . A∗n A → →∞ (ii) Ifhmoreoveir assumption (A7) holds, then, for any vector u of size r0 such that u = 1, with b k k notation Υn,A ≡ n−1 ni=1Xi,AXti,A, we have √n(utΥ−n,1Au)−1/2ut(β∗n −β0)A n−L→ N 0,τ(1− τ)f−2(0) . P →∞ (cid:0) b Forth(cid:1)e tuning parameterλ ,the sameconditionsarerequiredinZhang and Xiang (2015)such n that,theag LASSO LS estimatorinanhigh-dimensionallinearmodelsatisfiestheoracleproperties. Remark 2 Asforthecasepfixed,weconsideredtheweightω = β γ,withβ thequantile n;j k n;jk− n;j estimator of the d -vector β , for any j = 1, ,p. In view of the proof of Theorem 4, the oracle j j ··· properties for ag LASSO Q estimator remain true also wbhen β eis replaced bey any (p/n)1/2- n;j estimator of β , under assumptions (A3)-(A6). j e Remark 3 If h , defined in assumption (A8), doesn’t depend on n, then α = 0. In this case, the 0 conditions required on (λn)n N in Theorem 3 imply γ >2c/(1 c), and then γ can take values in ∈ − the interval (0, ). The value of γ increase with that of c (0,1). For example, if c = 1/2 then ∞ ∈ γ >2. 5 Simulations In order to evaluate the performance of the proposed estimation method, Monte Carlo simula- tions are realized in this section. To assess this performance we compare the ag LASSO Q and ag LASSO LS estimation methods. ThedesignX isgeneratedinthesamewayasinpaperWei and Huang(2010):X=(X , ,X ), i 1 p ··· with the group explanatory variables X = (X , ,X ), for all j = 1, ,p. We first j 5(j 1)+1 5j − ··· ··· generate r = 5p independent random variables R , ,R of standard normal distribution. We 1 r ··· Adaptive group LASSO selection in quantile models 9 also generate the variables Z of multivariate normal distribution with mean zero and covariance j Cov(Zj1,Zj2)=0.9|j1−j2|. Finally, the variables X1,··· ,Xr are generated as: Z +R j 5(j 1)+k X5(j−1)+k = √2− , 1≤j ≤p, 1≤k ≤5. Two model errors are considered: Normal (0,32) and Cauchy (0,32). For the parameters we take: β0 = (0.5,1,1.5,1,0.5), β0 = (1,1,1,1N,1), β0 = ( 1,0,1,2,C1.5), β0 = ( 1.5,1,0.5,0.5,0.5) 1 2 3 − 4 − and all other parameters are zero vectors. The nonzero vectors were also considered in Example 2 of Wei and Huang (2010) for errors (0,32), p = 10, when the parameters were estimated by LS N method with adaptive group LASSO penalty. The constant c of assumption (A6) is c = 0.43. Then, we will consider the following value cou- ples for n and p: (30,5), (60,5), (60,10), (100,10), (200,10), (400,15), (1000,25) and (1000,100). On the other hand, p0 will always be equal to 4. The response variable Y is generated as: Y = i p Xt β0+ε , for i=1, ,n. j=1 ij j i ··· We will compare the obtained results by the adaptive group LASSO quantile method, proposed in P thispaper,withthoseobtainedbytheadaptivegroupLASSOLSmethod,proposedbyWei and Huang (2010), Zhang and Xiang (2015). Forsimulations,weusedtheRlanguage.Afterascaletransformation,wecanusethegroupLASSO methods instead of the adaptive LASSO group methods. Then, in order to calculate the adaptive group LASSO LS estimations we have used the function grpreg of package grpreg, the tuning pa- rameter being chosen on a value grid, using the AIC criterion. In order to calculate the adaptive group LASSO quantile estimations, we have used the function groupQICD of package rqPen and the tuning parameter varies on a value grid. For each considered case, 1000 Monte Carlo replications was made. In Table 1 we give how the two estimation methods identify the parameter vectors (zero or nonzero), for the part that contains the four nonzero parameter vectors β0, j = 1, ,4, and j ··· for the part with p 4 zero vectors. We present the minimum, three quartiles (Q1, median, Q3), − the mean and the maximum of the number of nonzero vectors (j = 1, ,4), respectively, zero ··· (j =5, ,p), found by the two estimation methods. ··· For n large (equal to 100, 200, 400, 1000), we observe that for errors of (0,32) law, the two N estimation methods well identify the zero and nonzero parameter vectors. However,for Cauchy er- rors, the ag LASSO LS method poorly identifies nonzero vectors (the group of the four significant variables). The zero vectors are very well identified by the two methods. For n small (equal to 30 or 60), the two estimation methods well identify the four relevant vari- able groups, that errors are Normal or Cauchy (except for ag LASSO Q, in the case n = 60, p = 5, ε (0,32)). However, the (p 4) irrelevant variable groups are not well identified by the ∼ C − ag LASSO LS method. Conclusion For gaussian errors, the ag LASSO LS method identifies well the two (relevant and irrelevant) variable groups for n large. For n small, the irrelevant variable groups are not well identified. For Cauchy errors,this method, either does not identify the relevantvariable groupsor irrelevantvari- able groups, regardless of the value n. Then, for Cauchy errors, the ag LASSO LS estimations do not have the sparsity property. 10 GABRIELACIUPERCA The ag LASSO Q method, forthe twotypes oferrors,identifies the two variablegroups(signif- icantandirrelevant),the precisionincreasingwith n.Then, the ag LASSO Q estimations havethe sparsity property. We conclude then that the simulations confirm the theoretical results for the ag LASSO Q es- timators. 6 Proofs In this section we provide the proofs of all results presented in Sections 3 and 4. 6.1 Proofs for results of Section 3 Proof of Theorem 1. The proof is similar to that of Theorem 4.1 of Zou and Yuan (2008). We denote √n(β∗ β0) u , and in general √n(β β0) u (u , ,u ), with u = n − ≡ n − ≡ ≡ 1 ··· p j (u , ,u ), for j =1, ,p. j,1 ··· j,dj ··· Since Y =Xtβ0b+ε , then Y b Xtβ = Xtiu +ε . Let us consider the following random variables i i i i− i √n i (1 τ)11 τ11 , (8) Di ≡ − εi<0− εi≥0 n 1 v , n i ≡ √n D i=1 X n Xtu/√n i B (u) [11 11 ]dt n ≡ εi<t− εi<0 i=1Z0 X and the random vector n 1 z X . n i i ≡ √n D i=1 X Obviously, IE[ ]=0 et IE[z ]=0 . By the CLT, using assumptions (A1) and (A2), we have i n r D zn L (0r,τ(1 τ)Υ), vn L (0,τ(1 τ)). (9) n−→ N − n−→ N − →∞ →∞ The vector u is the minimizer of the following random process: n n Xtu p u bL (u) ρ ε i ρ (ε ) +µ ω β0+ j β0 , n ≡ τ i− √n − τ i n n;j j √n −k jk i=1 j=1 (cid:13) (cid:13) X(cid:2) (cid:0) (cid:1) (cid:3) X (cid:2)(cid:13) (cid:13) (cid:3) which can be written under the following form: b (cid:13)(cid:13) (cid:13)(cid:13) p u √n L (u)=[ztu+B (u)]+µ ω β0+ j β0 . (10) n n n n n;j j √n −k jk √n j=1 (cid:13) (cid:13) X (cid:2)(cid:13) (cid:13) (cid:3) We first study the last sum of the right hand side obf (10(cid:13)(cid:13)). (cid:13)(cid:13) For all j p (thus β0 =0) we have, since the quantile estimators are consistent: ≤ 0 k jk6 ω P β0 γ =0 (11) n;j n−→ k jk− 6 →∞ b

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.