Table Of ContentWeighted envelope estimation to handle variability in model
selection
DanielJ. Eck andR. DennisCook
January5, 2017
7
1
0 Abstract
2
Envelope methodology can provide substantial efficiency gains in multivariate statistical problems, but in
n
a some applications the estimation of the envelope dimension can induce selection volatility that will mitigate
J
thosegains. Current envelope methodology does not account for theadded variance that can result fromthis
3
selectionvolatility. Inthisarticle,wecircumvent dimensionselectionvolatilitythroughthedevelopment ofa
]
E weightedenvelopeestimator. Theoreticaljustificationisgivenforourweightedenvelopeestimatorandvalidity
M
oftheresidualbootstrapapproximationforthemultivariateregressionmodelisestablished. Asimulationstudy
.
t andananalysisonarealdatasetillustratetheutilityofourweightedenvelopeestimator.
a
t
s Keywords:DimensionReduction;EnvelopeModels;ModelSelection;ResidualBootstrap;VarianceReduction.
[
1
v
1 Introduction
6
5
8
0 Envelopemethodologywasdevelopedoriginallyinthecontextofthemultivariatelinearregressionmodel(Cook,etal.,
0
2010),
.
1
0 Y =α+βX+ε, (1)
7
1 where α ∈ Rr, the random response vector is Y ∈ Rr, the fixed predictor vector X ∈ Rp is centered to have
:
v
meanzero,andtheerrorvectorε∼N(0,Σ). ItwasshownbyCook,etal.(2010)thattheenvelopeestimatorof
i
X
theunknowncoefficientmatrixβ ∈ Rr×p in(1)hasthepotentialtoyieldmassiveefficiencygainsrelativetothe
r
a standardestimator of β. These efficiency gainscan arise when the dimensionu of the envelope, definedin the
nextsection,islessthanr. Inmostpracticalapplications,uisunknownandhastobeestimated. Thisestimation
can be problematicsince the estimated varianceof the envelopeestimator is typically calculatedconditionalon
the estimated dimensionu. Variationassociated with modelselection is thereforenotconsideredin the current
envelopeparadigm.
In this article, we propose a weighted envelope estimator of β that smooths out model selection volatility.
The weighting is across all possible envelope models under (1). The weights corresponding to each envelope
estimatorarefunctionsoftheBayesianInformationCriterion(BIC)valuecorrespondingtothatparticularenvelope
model. Weighting in this manner is similar to the model averaging techniques discussed by Buckland,etal.
1
(1997)andBurnhamandAnderson(2004)whoprovidedaphilosophicaljustificationfortheuseofsuchweighted
estimatorswithoutgivinganytheoreticalproperties. HjortandClaeskens(2003)andLiang,etal.(2011)builton
theframeworkofBuckland,etal.(1997)andBurnhamandAnderson(2004)byderivingtheasymptoticproperties
forweightedestimatorsofgeneralizedlinearregressionparameterswithweightingconductedacrosssubmodels
underconsideration.
2 The Envelope Model
Theoriginalmotivationforenvelopemethodologycomesfromtheobservationthat,inthemultivariateregression
model(1),somelinearcombinationsofY mayhaveadistributionthatdoesnotdependonX,whileotherlinear
combinationsofY dodependonX. TheenvelopemodelseparatesouttheseimmaterialandmaterialpartsofY,
andtherebyallowsforefficiencygains(Cook,etal.,2010;SuandCook,2011).
Morecarefully,supposethatwecanfindasubspaceS ⊆Rr sothat
QSY⊧PSY X, and QSY X =x1∼QSY X =x2, forall x1,x2, (2)
∣ ∣ ∣
where∼meansidenticallydistributed,P projectsontothesubspaceindicatedbyitsargumentandQ=I −P.
(⋅) r
ForanyS withtheproperties(2),P Y carriesallofthematerialinformationandperhapssomeoftheimmaterial
S
information,whileQ containsjustimmaterialinformation.LetB=span β . Then(2)holdsifandonlyifB⊆S
S
( )
andΣ=ΣS +ΣS⊥,whereΣS =var PSY andΣS⊥ =var QSY . Theenvelopeisdefinedastheintersectionof
( ) ( )
allsubspacesS thatsatisfy(2)andisdenotedbyE B withdimensionu=dim E B .
Σ Σ
( ) { ( )}
The envelopemodelcan be representedin terms of coordinatesby parameterizingmodel(1) to incorporate
conditions (2). Define Γ ∈ Rr×u to be a semi-orthogonal basis matrix for E B and let Γ ∈ Rr×(r−u) be a
Σ o
( )
completionofΓsothat Γ,Γ ∈Rr×r isanorthogonalmatrix. Thentheenvelopemodelwithrespecttomodel
o
( )
(1)isparameterizedas
Y =α+ΓηX+ε, ε∼N 0,Σ , (3)
( )
where Σ = ΓΩΓT +Γ Ω ΓT, Ω ∈ Ru×u and Ω ∈ R(r−u)×(r−u) are positive definite, and η ∈ Ru×p is β in
o o o o
the coordinates of Γ. We see from (3), that E B links the mean and covariance structures of the regression
Σ
( )
problem and it is this link that provides the efficiency gains. The gains can be massive when the immaterial
informationislargerelativetothematerialinformation;forinstance,when Ω Ω , where isamatrix
o
∥ ∥≪ ∥ ∥ ∥⋅∥
norm(Cook,etal.,2010).Anilluminatingschematicshowinghowanenvelopeincreasesefficiencywasgivenby
SuandCook(2011).
Candidateenvelopeestimators of β at dimensionj and sample size n, denotedβˆ , are foundvia maximum
j
likelihoodestimationofmodel(3)withβˆ = Γηˆ. Theenvelopeestimatorofβ isfoundbycomparingallcandi-
j ̂
dateenvelopeestimatorsusingamodelselectioncriterionsuchas BIC, orlikelihoodratiotestsorperhapscross
validation. The estimated dimension, uˆ, obtained from any one of these selection criteria is a variablequantity
dependentontheobserveddata. Traditionalenvelopemethodologydoesnotaddressthisextravariability. Inthe
nextthreesections,wedevelopnewenvelopemethodologythattakesthisextravariabilityintoaccount.
2
3 BIC Weighted Estimators
Wedevelopasolutiontotheproblemofpotentialvolatilityinenvelopemodelselectionbybuildingontheideas
inBuckland,etal.(1997)andBurnhamandAnderson(2004),whosuggestedcombiningestimatorsoverdifferent
models by weighting. Bootstrapping was then suggested for stochastic weighting schemes, but no theoretical
propertiesweregivenbytheauthors.
Weconsiderweightedestimatorsoftheform
r
βˆ = w βˆ , (4)
w j j
∑
j=1
where∑rj=1wj =1andwj ≥0,forj =1,...,r. Theweightswj dependontheBIC valuesforallofthecandidate
envelopemodelsunderconsideration. Letthe BIC valuefortheenvelopemodelwithdimensionj bedenotedby
b = 2l βˆ k j log n ,wherel βˆ istheloglikelihoodevaluatedattheenvelopeestimatorβˆ andk j is
j j j j
− ( )+ ( ) ( ) ( ) ( )
thenumberofparametersoftheenvelopemodelofdimensionj.Theweightsforenvelopemodeljareconstructed
as
exp b
j
wj = ∑r e(x−p )b . (5)
k=1 (− k)
It follows from the Supplement that βˆw is a √n-consistent estimator of β, but assessing the variance of βˆw is
notso raightforward. In the nextsection we show that the residualbootstrapprovidesa consistentestimator of
var βˆ .
w
( )
SimilarweightscorrespondingtoAkaike’sInformationCriterion(AIC)donothavetheniceasymptoticprop-
erties that weights corresponding to BIC enjoy. In particular, analogous AIC weight at j = u is not guaran-
teed to converge to 1 asymptotically. Additionally, the weights in (5) differ slightly from those mentioned in
BurnhamandAnderson(2004)whichwerealsoadvocatedbyKassandRaftery(1995)andTsague(2014).These
weightsareoftheform
exp b 2
j
w˜j = ∑r e(x−p /b)2 (6)
k=1 (− k/ )
andtheycorrespondtoanapproximationoftheposteriorprobabilityformodeljgiventheobserveddataunderthe
priorwhichplacesequalweightforallcandidatemodels.Weightsoftheform(6)donothavethesameasymptotic
propertiesastheweightsgivenby(5). AmorethoroughdiscussionofthisisgivenafterTheorem1.
ˆ
4 Bootstrap for β
w
Theresidualbootstrapusedtoestimatethevariabilityfortheenvelopeestimatoratthetruedimensionuusesthe
starredresponses,
Y =XβˆT ε , (7)
∗ u ∗
+
to obtain βˆ , where X ∈ Rn×p is the fixed design matrix with rows X and the rows of ε are the realizations
u∗ i ∗
ofnresamplesoftheresidualsfromtheoriginalmodelfitwithreplacement. Theenvelopeestimatorβˆu is√n-
consistentandasymptoticallynormal(Cook,etal.,2010;CookandZhang,2015). Thetechniquesusedtoverify
theconsistencyandasymptoticnormalityofβˆ requiretheasymptoticsofextremumestimationasinAmemiya
u
3
(1985, Theorems4.1.1-4.1.3). The setup in Andrews (2002, Section 2 pgs. 122-124and Theorem 2) confirms
thattheresidualbootstrap,withresponses(7),providesa√n-consistentestimatoroftheasymptoticvariabilityof
βˆ . Theproblemwiththisapproach,asitcurrentlystands,isthatuisunknown. Thecurrentimplementationof
u
theresidualbootstrapimplicitlyassumesthatuˆ =uwhereuˆisobtainedviasomeselectioncriterion. Therefore,
variabilityintroducedbymodelselectionuncertaintyisignored.Thisissueisresolvedbyusingβˆ inplaceofβˆ
w u
in(7). Thenexttheoremformalizesourasymptoticjustificationfortheuseoftheweightedenvelopeestimatorβˆ
w
inpracticalproblems.ItsproofisgivenintheSupplement.
Theorem1. Assume the regression model(1) andsupposethatan envelopesubspaceof dimensionu = 1,...,r
exists. Assumethat 1XTX→Σ >0. Letβˆ betheweightedenvelopeestimatorofβ definedin(4)andletβˆ
n X w w∗
betheweightedenvelopeestimatorofβ obtainedfromresampleddata.Then,asntendsto ,
∞
√n vec βˆ vec βˆ =√n vec βˆ vec βˆ
w∗ w u∗ u
{ ( )− ( )} { ( )− ( )} (8)
O n(1/2−p) 2 u 1 O 1 √ne−n∣Op(1)∣.
p p
+ { }+ ( − ) ( )
Theorem1showstheutilityoftheweightedenvelopeestimatorβˆ . In(8),weseethatasymptoticdistribution
w
of the residual bootstrap with respect to βˆ is the same as the asymptotic distribution of the residual bootstrap
w
atβˆ ,theenvelopeestimatoratthetruedimension. Thedifferencebetweenthetwobootstrapproceduresisthat
u
thebootstrapgiveninTheorem1doesnotrequiretheconditioningonuˆasaprerequisiteforitsimplementation.
Weinsteadbootstrapwithrespecttoatangibleestimatorthatdoesnotignorekeyelementsofvariabilitythatare
apparentinpracticalproblems.
Theordersin(8)resultfrommodelselectionvariabilitythatarisesfromfoursources.TheOp n(1/2−p) term
{ }
correspondstotherateatwhich√nwj and√nwj∗ vanishforj =u 1,...r. Thisrateisacostofoverestimation
+
oftheenvelopespace. Itdecreasesquitefast,particularlywhenpisnotsmall,becausemodelswithj >uaretrue
andthushavenosystematicbiasduetochoosingthewrongdimension.
The2 u 1 √ne−n∣Op(1)∣ termcorrespondstotherateatwhich√nwj and√nwj∗ vanishforj = 1,...,u
( − ) −
1. This rate arises from under estimating the envelope space and it is affected by systematic bias arising from
choosingthewrongdimension. Togainintuitionaboutthisrate,letB = GTΣG −1/2GTβΣ1/2,whereG ∈
j ( o o) o X o
Rr×(r−j)isthepopulationbasismatrixforthecomplementoftheenvelopespaceofdimensionj. Thisquantityis
astandardizedversionofGTβ thatreflectsbias,sinceGTβ ≠0whenj <u,butGTβ =0whenj ≥u. LetB
o o o ̂j,n
denotethe√n-consistentestimatorofBj obtainedbyplugginginthesampleversionofΣX andtheestimatorsof
G ,Σandβ thatarisebymaximizingthelikelihoodwithdimensionj<u. Thenthe n O 1 termappearing
o p
− ∣ ( )∣
intheexponentof2(u−1)√ne−n∣Op(1)∣istherateatwhich−nlog(∣Ip+B̂jT,nB̂j,n∣)approaches−∞. Additionally,
thistermis0whenu=1. Thatarisesbecauseweconsideronlyregressionsinwhichβ ≠0andthusu≥1. When
u=1underestimationisnotpossibleinourcontextandthus2 u 1 √ne−n∣Op(1)∣vanishes.
( − )
We now revisit the origins of construction of the weights used in Theorem 1. In Section 3, we mentioned
thatourconstructionissimilarto,butnotthesameas,thosementionedinBurnhamandAnderson(2004). Inthe
case when p = 1, the term √nw˜j=u+1 defined by (6) does notvanish as n → ∞. We thereforewould not have
the same asymptotic result given by (8) in Theorem 1. Instead, there would be non-zero weight placed on the
4
envelopemodelwithdimensionj =u 1asymptotically. Thisweightingschemewouldthereforeleadtohigher
+
estimatedvariabilitythanisnecessaryinpractice.However,thisissueisnolongerproblematicwhenp>1.When
p > 1,theweights(6)canbeusedandchangesto(8)wouldresult. TheOp n1/2−p termin(8)wouldbecome
{ }
Op n(1−p)/2 whentheweights(6)areusedinplaceoftheweights(5). Whenpislarge,onemayproceedwith
{ }
weightingaccordingto(6)atrelativelylittlecosttoefficiency.
5 Examples
We now provide examples which show that our weighted envelope estimator performsbetter than the standard
estimator and favorably with other envelope estimators at reasonable values of u. The first two are simulated
examples in which we know β, Σ, u, and P . Their role is only to illustrate the theory developed in the
EΣ(B)
previoussections. Thethirdexampleisarealdataexampleinwhichwedonotknowanyquantitiesofinterest.
5.1 Simulated examples
Example1: Forthisexample,wecreateasettinginwhichY ∈R3isgeneratedaccordingtothemodel
ind
Y =β X ε , ε ∼ N 0,Σ , (9)
i i i i i
+ ( )
i = 1,...,n, whereX ∈ R2 isacontinuouspredictorwithentriesgeneratedindependentlyfroma normaldistri-
i
butionwithmean4andvariance1. ThecovariancematrixΣwasgeneratedusingthreeorthonormalvectorsand
haseigenvaluesof50,10,and0.01. Thematrixβ ∈ R3×2 isanelementinthespacespannedbythesecondand
third eigenvectorsof Σ. We know that the dimensionof E B is u = 2. Three datasets were simulated using
Σ
( )
model (9) at different sample sizes, as given in Table 1. The multivariate residual bootstrap was then used to
comparetheefficienciesofourweightedenvelopeestimatorβˆ totheoracleenvelopeestimatorβˆ . Theratios
w u=2
ofbootstrappedestimatedstandarderrorsbetweenbothenvelopeestimatorstothoseofthemaximumlikelihood
estimator(MLE)fromthefullmodel,se βˆr se∗ βˆw ,areseeninTable1. Ratiosgreaterthan1indicatethatthe
( )/ ( )
envelopeestimatorismoreefficientthanthestandardestimator. Therearetwoconclusionsthatareapparentfrom
Table1. Weseethatenvelopeestimationismoreefficientthantheestimationusingthefullmodelandweseethat
theefficiencyoftheweightedenvelopeestimatorapproachesthatoftheoracleestimator,βˆ ,asnincreases.
u=2
Example2: Forthisexample,weillustratetheeffectthatphasontheperformanceoftheweightedenvelope
estimator. We generateddata accordingto model(9) with Y ∈ R5. In this example u = 1 and Σ is compound
symmetricwithdiagonalentriessetto1andoff-diagonalentriessetto0.5,β =1 cT,where1 isther 1vector
r p r
×
ofones,c isap 1vectorwhereeveryentryis10.WegeneratethepredictorsaccordingtoX ∼N 0,I ,where
p p
× ( )
I isthep-dimensionalidentitymatrix. Wesetn=250.
p
The results of our simulation study are seen in Table 2. For each value of p that is considered, we display
the number of estimated dimensions uˆ as determined by BIC . From Table 2, we see that the distribution of uˆ
approachesa point mass at the truth as p increases. This implies that the bias terms in Theorem 1 vanish as p
increasesjustas(8)states.
5
n=250 n=500 n=2000
βˆ βˆ βˆ βˆ βˆ βˆ
w u=2 w u=2 w u=2
1.88 2.40 2.34 2.98 2.71 2.81
1.39 1.79 1.65 1.78 1.79 1.81
2.67 3.60 2.57 3.52 3.51 3.71
2.33 2.66 2.18 2.99 2.67 2.79
1.87 1.86 1.67 1.81 1.73 1.77
3.39 3.75 2.52 3.70 3.36 3.74
Table 1: Ratios of estimated standard errors obtained from the multivariate residual bootstrap for a different
numberofsamplesizesn.
n uˆ=1 n uˆ=2 n uˆ=3
( ) ( ) ( )
p=2 128 111 11
p=5 214 34 2
p=10 249 1 0
p=25 250 0 0
Table2: SimulationresultsforExample2.
5.2 Cattledata
Thedataforthisillustrationresultedfromanexperimenttocomparetwotreatmentsforthecontrolofanintestinal
parasite in cattle: thirty animals were randomly assigned to each of the two treatments and their weights (in
kilograms)wererecordedatweeks2,4,...,18and19aftertreatment(Kenward,1987). Becauseofthenatureofa
cowsdigestivesystem,thetreatmentswerenotexpectedtohaveanimmediatemeasurableaffectonweight. The
objectivesofthestudyweretofindifthetreatmentshaddifferentialeffectsonweightand,ifso,aboutwhenwere
theyfirstmanifested.Webeginbyconsiderthemultivariatelinearmodel(1),whereY ∈R10isthevectorofcattle
i
weightsfromweek2toweek19,andthebinarypredictorX iseither0or1indicatingthetwotreatments. Then
i
α =E Y X =0 isthemeanprofileforonetreatmentandβ = E Y X = 1 E Y X =0 isthemeanprofile
( ∣ ) ( ∣ )− ( ∣ )
differencebetweentreatments.
Turningtoa fitoftheenvelopemodel(3), likelihoodratiotestingselectsuˆ = 1 and BIC selectsuˆ = 3asthe
dimensionoftheenvelopemodel.Furthercomplicatingmatters,whenBICisusedtodetermineuateveryiteration
ofthemultivariateresidualbootstrap,weseehighvariabilityinmodelselectionasseeninTable3. FromTable3,
itappearsthatthetruedimensionoftheenvelopesubspaceisanywherefrom1to5withthehighestlikelihoodthat
itisbetween2and4. Modelselectionvolatilityofthisvarietyispreciselythereasonwhytheweightedenvelope
estimatorisadvocated;itwouldnotbesafetoperformabootstrapprocedurethatmakesauniformselectionofa
particulardimensionateveryiteration.SuchaprocedureignoresthemodelselectionvariabilityseeninTable3.
FromTable4,weseetheratiosofbootstrappedestimatedstandarderrorsbetweenbothenvelopeestimatorsto
6
thoseoftheMLEfromthefullmodel,se βˆr se∗ βˆw . Ratiosgreaterthan1indicatethattheenvelopeestimator
( )/ ( )
is more efficient than the standard estimator. We see that βˆ is comparable to βˆ . Similar conclusions are
w u=3
drawn from the other elements of estimates of β. The findings displayed in Table 4 show that the weighted
envelopeestimatorcanprovideusefulefficiencygainswhileprotectingagainstunderestimationofuthatmaynot
beproperlyaccountforbythestandardenvelopeestimator.
uˆ 1 2 3 4 5
n uˆ 10 10 24 12 4
( )
Table3: Countsoftheselectedenvelopedimensionateveryiterationofa multivariateresidualbootstrapfor60
resamples.
d B βˆ βˆ βˆ βˆ βˆ βˆ
w u=1 u=2 u=3 u=4 u=5
5 60 1.93 4.65 3.89 1.85 1.54 1.27
100 1.38 3.97 1.49 1.14 1.14 1.07
200 1.62 4.26 3.14 1.69 1.32 1.19
500 1.61 4.58 2.43 1.59 1.29 1.15
1000 1.56 4.10 2.48 1.55 1.29 1.15
2000 1.57 4.43 2.30 1.53 1.28 1.16
6 60 1.75 2.30 2.35 1.79 1.38 1.24
100 1.25 2.15 1.26 1.05 1.05 1.00
200 1.50 2.27 2.47 1.55 1.20 1.11
500 1.50 2.22 2.05 1.55 1.24 1.10
1000 1.52 2.24 1.99 1.48 1.26 1.14
2000 1.53 2.32 1.91 1.46 1.26 1.16
Table4:Ratiosofestimatedstandarderrorsobtainedfromthemultivariateresidualbootstrapatadifferentnumber
ofresamplesB forthefifthandsixthelements(indicatedbythedcolumn)ofestimatesofβ.
6 Discussion
Efron (2014) proposed an estimator motivated by bagging (Breimen, 1996) that aims to reduce variability and
smoothoutdiscontinuitiesresultingfrommodelselectionvolatility. Variabilityofthemodelaveragedestimator
ofEfron(2014)isassessedviaadoublebootstrap.Thesetechniqueshavebeenappliedtoenvelopemethodology
in Eck,etal. (2016) and usefulvariancereductionwas foundempirically. Theproblemof interestin Eck,etal.
(2016) falls outside the scope of the multivariate linear regression model, and general envelope methodology
(CookandZhang,2015)wasrequiredtoobtainefficiencygains.Inthecontextofthemultivariatelinearregression
model, we show that only a single level of bootstrappingis necessary to assess the variability of our weighted
7
envelopeestimator and that bootstrappingin this way guaranteesa consistentestimator of the variability of the
weightedenvelopeestimator.
7 Supplementary material
SupplementarymaterialavailableatBiometrikaonlineincludestheproofofTheorem1.
References
Amemiya,T.(1985). AdvancedEconometrics. HarvardUniversityPress,Cambridge,MA.
Andrews, D. W. K. (2002). Higher-Order Improvementsof a Computationally Attractive k-Step Bootstrap for
ExtremumEstimators. Econometrica,70,1,119-162.
Breiman,L.(1996). BaggingPredictors. MachineLearning,24,123–140.
Buckland, S. T., Burnham, K. P., and Augustin, N. H. (1997). ModelSelection: An IntegralPart of Inference.
Biometrics,53,603–618.
Burnham,K.P.,Anderson,D.R.(2004).MultimodelInference.SociologicalandMethodsResearch,33,261–304
Cook,R.D.,Li,B.,Chiaromonte,F.(2010). Envelopemodelsforparsimoniousandefficientmultivariatelinear
regression. StatisticaSinica,20,927–1010.
Cook,R.D.,Forzani,L.,andSu,Z.(2016). Anoteonfastenvelopeestimation. J.Mult.Anal.,150,42–54.
Cook,R.D.,Zhang,X.(2015). FoundationsforEnvelopeModelsandMethods. J.Am.Statist.Assoc.,110:510,
599–611.
Eck,D.J.,Geyer,C.J.,andCook,R.D.(2016). AnApplicationofEnvelopeandAsterModels. Submitted.
Efron,B.(2014). EstimationandAccuracyAfterModelSelection. J.Am.Statist.Assoc.,109:507,991–1007.
Hjort, N. L. and Claeskens, G. (2003). Frequentist Model Average Estimators J. Am. Statist. Assoc., 98:464,
879–899.
Kass,R.K.andRaftery,A.E.(1995). BayesFactors J.Am.Statist.Assoc.,90:430,775–795.
Kenward, M. G. (1987). A methodfor comparingprofilesof repeated measurements. J. R. Statist. Soc. C, 36,
296–308.
Liang,H.,Zou,G.,Wan,A.T.K.,andZhang,X.(2011). OptimalWeightChoiceforFrequentistModelAverage
Estimators J.Am.Statist.Assoc.,106:495,1053–1066.
Su, Z. and Cook, R. D. (2011). Partial envelopes for efficient estimation in multivariate linear regression.
Biometrika,98,133–146.
8
Tsague,G.N.(2014). OnOptimalWeightingSchemeinModelAveraging. AmericanJournalofAppliedMathe-
maticsandStatistics,2,No.3,150–156.
9
‘Supplementary material for Weighted envelope estimation to handle vari-
ability in model selection’
ThisSupplementaryMaterialssectioncontainstheproofofTheorem1inEckandCook(2017).
Proof. Wegothroughthestepsshowingthat(8)inEckandCook(2017)holds. Recallthatu=dim E . Define
( )
l βˆ tobetheloglikelihoodoftheenvelopemodelevaluatedattheenvelopeestimatorβˆ ,fittingwithdim E =j,
j j
( ) ( )
anddefinek j tobethenumberofparametersoftheenvelopemodelofdimensionj. Fromtheconstructionof
( )
b andtheabovecalculationsweseethat
j
ebu−bj =e−2{l(βˆu)−l(βˆj)}n−{k(j)−k(u)}.
Letb∗j betheBICvalueoftheenvelopemodelofdimensionjfittothestarreddataanddefine
e−b∗j
w = .
j∗ ∑rk=1e−b∗k
Let betheEuclideannorm.Weshowthat√n wj∗vec βˆj∗ wjvec βˆj →0forj ≠ubyshowingthat
∥⋅∥ { ( )− ( )}
√n w vec βˆ w vec βˆ ≤ √n w vec βˆ √n w vec βˆ → 0
j∗ j∗ j j j∗ j∗ j j
∥ ( )− ( )∥ ∥ ( )∥+ ∥ ( )∥
asn→ forallj ≠u. Now,
∞
√nw vec βˆ ≤√n O 1 ebu−bj
j j p
∥ ( )∥ ∣ ( )∣
= O 1 n{k(u)−k(j)+1/2}e−2{l(βˆu)−l(βˆj)} (10)
p
∣ ( )∣
= O 1 n{k(u)−k(j)+1/2}e2{l(βˆr)−l(βˆj)}−2{l(βˆr)−l(βˆu)}.
p
∣ ( )∣
The first inequalityin (10) followsfromthe factthat vec βˆ ≤ vec βˆ and vec βˆ = O 1 . We first
j r r p
∥ ( )∥ ∥ ( )∥ ∥ ( )∥ ( )
considerthecasewherej=u 1,...,r. Inthissetting,modelswithenvelopedimensionsuandjarebothtrueand
+
nestedwithinthefullmodelwithenvelopedimensionr. Consequently, 2 l βˆ l βˆ and 2 l βˆ l βˆ
u r j r
− { ( )− ( )} − { ( )− ( )}
are asymptotically distributed as χ2p(r−u) and χ2p(r−j) by Wilks’ Theorem. Therefore e−2{l(βˆu)−l(βˆj)} = Op(1)
sinceitistheexponentiationofthedifferencebetweentwoχ2randomvariables.Weseethat
√nw vec βˆ ≤ O 1 n{k(u)−k(j)+1/2} =O n{k(u)−k(j)+1/2} .
j j p p
∥ ( )∥ ∣ ( )∣ [ ]
Sincej >u,wehavethatk u k j =p u j ≤ p. Thus,
( )− ( ) ( − ) −
√nw vec βˆ ≤O n(1/2−p)
j j p
∥ ( )∥ { }
forj=u 1,...,r. Followingthesamestepsas(10),appliedtothestarreddata,yields
+
√nwj∗ vec βˆj∗ ≤ Op 1 n{k(u)−k(j)+1/2}e−2{l∗(βˆu∗)−l∗(βˆr∗)}+2{l∗(βˆj∗)−l∗(βˆr∗)} (11)
∥ ( )∥ ∣ ( )∣
where l is the log likelihood function corresponding to the starred data. Both 2 l βˆ l βˆ and
∗ ∗ u∗ ∗ r∗
(⋅) − { ( )− ( )}
2 l βˆ l βˆ in(11)areO 1 . Thus,
{ ∗( j∗)− ∗( r∗)} p( )
√nw vec βˆ ≤ O 1 n{k(u)−k(j)+1/2} =O n{k(u)−k(j)+1/2} ,
j j∗ p p
∥ ( )∥ ∣ ( )∣ [ ]
10