Munich Personal RePEc Archive Does BIC Estimate and Forecast Better than AIC? Medel, Carlos A. and Salgado, Sergio C. Central Bank of Chile 25 October 2012 Online at https://mpra.ub.uni-muenchen.de/42235/ MPRA Paper No. 42235, posted 04 Nov 2012 18:29 UTC Does BIC Estimate and Forecast Better than AIC? (cid:3) Carlos A. Medel Sergio C. Salgado Central Bank of Chile University of Minnesota October 26, 2012 Abstract We test two questions: (i) Is the Bayesian Information Criterion (BIC) more parsimonious than Akaike Information Criterion (AIC)?, and (ii) Is BIC better than AIC for forecasting purposes? By using simulated data, we provide statis- tical inference of both hypotheses individually and then jointly with a multiple hypotheses testing procedure to control better for type-I error. Both testing procedures deliver the same result: The BIC shows an in- and out-of-sample superiority over AIC only in a long-sample context. Keywords: AIC, BIC, time-series models, over(cid:133)tting, forecast comparison, joint hypothesis testing. JEL-Codes: C22, C51, C52, C53. (cid:3) We thank Yan CarriŁre-Swallow, Mario Giarda, Michael Pedersen, Pablo Pincheira, and Felipe Sa¢e for their kind help and comments. We also thank the comments of seminar participants at Central Bank of Chile. Any errors or omissions are responsibility of the authors. The views and ideas expressed in this paper do not necessarily represent those of the Central Bank of Chile or its authorities. E-mails: [email protected] (corresponding author); [email protected]. 1 Introduction The success of many economic decisions relies on the forecast accuracy of certain key variables. Often, economic theory is not clear on the relation between two or more variables, and a data snooping analysis is performed prior to modeling. A useful model-building procedure in circumstances with lower levels of knowledge about the fundamental variables behind the dynamics of the true data generating process is the use of the so-called information criteria (cid:150)measures of goodness of (cid:133)t based on the log likelihoodfunction(‘), the numberof regressors (p), andthesamplesize(T). However, isnotclearwhen(cid:150)especiallysamplesize, giventhedi⁄erentasymptoticbehavior(cid:150)their model-based forecast may dominate. Theaimof thispaperistotesttwoquestions: (i)IstheBayesianInformationCriterion (BIC)moreparsimoniousthantheAkaikeInformationCriterion(AIC)?and(ii)IsBIC better than AIC for forecasting purposes?1 We provide statistical inference on both hypothesesindividuallywithasigni(cid:133)cancetest(cid:150)basedonDieboldandMariano(1995), and West (1996)(cid:150) and jointly with a multiple hypotheses test following White (2000) approach with some considerations of Hansen(cid:146)s (2005) superior predictive ability test.2 The exercise consists in the simulation of a large stationary dataset, containing 1,000 series generated by an autoregressive process (AR) of order p = 6. We then compute and comparing the order determined by each criteria, which often di⁄er from the true order. Then, for each series, we generate 1-step ahead forecasts and evaluate their accuracy based on the root of the squared forecast error (RSFE). We perform this exercise several times, each one considering a di⁄erent sample size of the same 1,000 series, to basically account for the di⁄erent asymptotic behavior of each information criteria. The AIC is de(cid:133)ned as T log‘+2pAIC, while the BIC as T log‘+pBIC logT. A lower (cid:1) (cid:1) (cid:1) score re(cid:135)ects a better (cid:133)t. The di⁄erence in the chosen lag length comes exclusively from the penalty term imposed on the number of regressors of the (cid:133)tted model. As is shown in Granger and Jeon (2004), it is expected for a sample size T 8 and a (cid:21) given value of ‘ that pBIC pAIC. The results reveal the existence of (in-sample) (cid:20) over(cid:133)tting by AIC compared with BIC across di⁄erent estimation sample sizes. From a predictive point of view, BIC beats AIC yielding a smaller RSFE on average, only in a long-sample context. When we test both hypotheses together controlling better for type-I error, our results supports this long-sample BIC-superiority. The remaining work proceed as follows. In section 2, we describe our dataset, and discuss some asymptotic properties of information criteria. In section 3, we report univariate in- and out-of-sample test results. In section 4, we describe and analyze the 1MoredetailsonderivationandcomparisonbetweenbothcriterioncanbefoundinAkaike(1974), Shibata (1976), Rissasen (1978), Schwarz (1978), Stone (1979), L(cid:252)tkepohl (1985), Koehler and Mur- phree (1988), Zucchini (2000), Kuha (2004), and Weakliem (2004). 2These procedures are related to those used in Wolak (1987, 1989), and Sullivan, Timmermann, and White (1999). We use a version closer to that used in Pincheira (2011a, 2011b, 2012). A recent survey can be found in Corradi and Distaso (2011). 1 results of joint test. Also, we provide some intuition about the di⁄erent type-I error control treated by our testing approaches. Finally, section 5 concludes. 2 Estimation setup 2.1 Data The simulated stationary data is generated as realizations of the AR(6) process y = t 0:09y +0:08y +0:07y +0:06y +0:05y +0:04y +" ,where" iid (0;2%), t(cid:0)1 t(cid:0)2 t(cid:0)3 t(cid:0)4 t(cid:0)5 t(cid:0)6 t t (cid:24) N using a random numbers generator. The number of replications is I=1,000, and the complete sample size is T=5,000, adding one observation for forecasting evaluation. We perform the same exercise four times, each one with a di⁄erent sample size varying according to (cid:28) = 50; 100; 1;000; 5;000 . By doing this, we analyze the behavior of f g each yi2I t=(cid:28)+1 processfourtimes, carryingoutanempiricalinsightaboutasymptotic f t gt=1 behavior of both information criteria. As I=1,000 may represent a number of replica- tions which may not describe population parameters, we carry out a backup simulation with I(cid:146)=10,000 for the more sensitive case ((cid:28) = 50). This, to have a measure of how far we are from a case more closely to population parameters. As the results are both numerical and qualitative maintained, we keep I=1,000 for the sake of computational e¢ciency.3 2.2 Asymptotic properties Both criteria have di⁄erent asymptotics properties: AIC is not consistent while BIC it is, and when k > 1 it will choose the correct model almost sure (becoming strongly consistent).4 As is pointed out by Canova (2007), intuitively AIC is not consistent because the penalty function used does not simultaneously goes to in(cid:133)nity as T , ! 1 and to zero when scaled by T. This lead us to the use of di⁄erent values of (cid:28), and stands for our conclusion with univariate tests.5 Note that consistency is not a must for forecast accuracy; the true model may underperformout-of-sample against a nested benchmark. Hansen(2009)(cid:133)ndsthatitisexpectedthatamodelwithanautoregressive order smaller than true may beat out-of-sample, as a consequence of under(cid:133)tting. The asymptotic properties of AIC and BIC are derived in Shibata (1976, 1980, 1981), Bhansali and Downham (1977), Sawa (1978), Stone (1979), Geweke and Meese (1981), 3We perform our simulations using an ad hoc Matlab code for I=1,000. We then perform our backup simulation using the more speci(cid:133)c commands provided in Econometrics Toolbox 2.1. The latterestimatestakesaprohibitivedebuggingtimewithI(cid:146)=10,000andfourvaluesof(cid:28). Anothertool used was Eviews 7.2, but its pseudo-random numbers generator was not so powerful as the generated byMatlab. Weprovidestatisticalinferenceof eachcomparisontochecktherobustnessof ourresults. 4See more details on Bozdogan (1987), Bickel and Zhang (1992), and Wasserman (2000). Some authors has proposed several modi(cid:133)cations to AIC to improve its long-sample behavior, as Hurvich and Tsai (1993), and Burnham and Anderson (1998). 5There is no speci(cid:133)c de(cid:133)nition for short-sample. Thus, we (cid:133)nd that, for example, are used as 45 observations in Sargent and Sims (1977), 14 in Miller, Supel, and Turner (1980), 15 in Nickelsburg, 23 in Sims (1980), 68 in Fischer (1981), 56 in Gordon and King (1982), and many other candidates. 2 P(cid:246)tscherandSrinivasan(1991),MarkonandKrueger(2004),andKaragrigoriou,Mattheou, and Vonta (2011). Recently, Xu and McLeod (2012) derive the asymptotics properties of the Generalized Information Criteria (GIC) which nests the criterion considered in this paper. In appendix A we show the asymptotic properties of AIC and BIC based on Nishii (1984).6 3 Univariate results 3.1 In-sample results As pointed out by L(cid:252)tkepohl (1985), Nickelsburg (1985), Yi and Judge (1988), Clark (2004), Granger and Jeon (2004), Ra⁄alovich et al. (2008), and Shittu and Asemota (2009), AIC is prone to selecting more dynamic models than is the BIC (cid:150)a fact that is supported theoretically. In (cid:133)gure 1, we report the relative frequency of the number of regressors chosen by each criterion with di⁄erent sample size, showing the common (cid:133)nding. These lag length orders are chosen by computing the lowest score achieved by each criterion (cid:133)tting the AR(6) process choosing p N [1;24]. The results of (cid:133)gure 2 1 are summarized in table 1, which re(cid:135)ects a consistent over(cid:133)tting of AIC and the alignment of BIC through the true order as sample size increases. Table 1: Statistics ofthe number ofregressors chosen by each criterion (cid:28)=50 (cid:28)=100 (cid:28)=1,000 (cid:28)=5,000 AIC BIC AIC BIC AIC BIC AIC BIC Median 19 17 10 1 12 4 12 6 Maximum 24 24 24 10 24 9 24 11 Minimum 1 1 1 1 2 1 5 4 Standard deviation 6.36 9.67 7.80 1.31 6.88 1.35 6.67 0.59 Skewness -1.49 -0.04 0.29 1.92 0.22 0.08 0.26 0.84 Kurtosis 4.19 1.11 1.58 7.12 1.55 3.17 1.52 13.21 Source: Authors(cid:146)computations. For inference purposes, we de(cid:133)ne the variable (cid:1)N for the ith replication as the dif- ij(cid:28) ference between the number of regressors chosen by AIC and by BIC given a sample size (cid:28): (cid:1)N = NRegAIC NRegBIC. Naturally, the variable (cid:1)N has a (cid:133)xed sample ij(cid:28) ij(cid:28) (cid:0) ij(cid:28) ij(cid:28) size of 1,000 observations (the number of replications). We estimate the regression (cid:1)N = c +(cid:29) , where (cid:29) iid (0;(cid:27)2) and test the one-sided null hypothesis (NH) ij(cid:28) (cid:28) ij(cid:28) ij(cid:28) (cid:24) N v that NHIn(cid:0)sample : E[c ] 0, following the Diebold and Mariano (1995) and West (cid:28) (cid:28) (cid:20) (1996) approach. Rejecting the NH will con(cid:133)rm the statistical signi(cid:133)cance of AIC(cid:146)s over(cid:133)tting compared with BIC.7 The estimates by ordinary least squares (OLS) are 6Along this paper we keep (cid:133)xed the variance of the data generating process. Other cases of asymptotic properties, besides when T , are derived for instance in Stone (1979) and Shibata ! 1 (1981). Empirically,Yang(2003)andChen,Giannakouros,andYang(2007)analyzesomecaseswhere the variance becoming larger. 7This (cid:133)nding is not necessarily bad for the AIC. There an extensive empirical literature that (cid:133)nds thatAICoutperformsBICinmanycontexts. Moreover,Kilian(2001)(cid:133)ndsthatitisabettercriterion for identifying the true impulse response function. 3 presented in table 2. Table 2: Estimates ofdi⁄erences in number ofregressors (cid:28)=50 (cid:28)=100 (cid:28)=1,000 (cid:28)=5,000 c(cid:28) 6.30 9.75 8.94 7.81 Standard deviation 0.28 0.25 0.22 0.20 One-sided p-value 0.00 0.00 0.00 0.00 Source: Authors(cid:146)computations. The statistic t(cid:1)N = (cid:1)N=[(cid:27)(cid:1)N=pObs:] is statistically signi(cid:133)cant at traditional levels of signi(cid:133)cance. This implies that the AIC chooses consistently more dynamic models than those chosen by BIC. b Figure 1: Histograms ofin-sample autoregressive order estimates A:(cid:28)=50 B:(cid:28)=100 400 600 AIC›based chosen order AIC›based chosen order 350 BIC›based chosen order 500 BIC›based chosen order 300 400 250 uency 200 uency 300 q q Fre 150 Fre 200 100 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 2 4 6 8 10 12 14 16 18 20 22 24 Autoregressive order Autoregressive order C:(cid:28)=1,000 D:(cid:28)=5,000 280 800 AIC›based chosen order AIC›based chosen order 240 BIC›based chosen order 700 BIC›based chosen order 600 200 500 uency 160 uency 400 Freq 120 Freq 300 80 200 40 100 0 0 2 4 6 8 10 12 14 16 18 20 22 24 2 4 6 8 10 12 14 16 18 20 22 24 Autoregressive order Autoregressive order Source: Authors(cid:146)computations. 3.2 Out-of-sample results L(cid:252)tkepohl (1985) shows that BIC outperforms AIC among other criteria in a 1-step ahead out-of-sample simulation exercise with vector autoregressions. Other authors, such as Koehler and Murphree (1988), and Granger and Jeon (2004), also (cid:133)nd BIC 4 to be superior to AIC when using macroeconomic data, and at multiple horizons. We replicate this (cid:133)nding in our setup by performing 1-step ahead forecasts for each yi2I T=(cid:28)+1 replication. The results for each criterion are depicted in table 3, where f t gt=1 BIC-based forecasts show a better (cid:133)t with (cid:28) = 50 and along with less volatile errors only with (cid:28) = 5;000. The columns of table 3 corresponds to descriptive statistics of root squared forecast error (RSFE) measure, de(cid:133)ned as: 1 RSFE = (yi yij(cid:28);criterion)2 2 ; tjt (cid:0) tjt(cid:0)1 h i where yij(cid:28);criterion is the 1-step ahead forecabst of yi based on a model estimated with tjt(cid:0)1 tjt a sample size (cid:28) and the criterion AIC or BIC. b Wethenevaluatetheaccuracybycomputingthestatisticalsigni(cid:133)canceofthedi⁄erence between the squared forecast error (SFE) achieved by both criteria, using the series, (cid:1)SFE = SFEAIC SFEBIC = (yi yij(cid:28);AIC)2 (yi yij(cid:28);BIC)2: ij(cid:28) ij(cid:28) (cid:0) ij(cid:28) tjt (cid:0) tjt(cid:0)1 (cid:0) tjt (cid:0) tjt(cid:0)1 We test the one-sided null hypothesis that NHO(cid:28)but(cid:0)of(cid:0)sample : E[d(cid:28)b] (cid:20) 0 over the re- gression (cid:1)SFE = d +(cid:24) , with (cid:24) iid (0;(cid:27)2). Estimates by OLS are presented ij(cid:28) (cid:28) ij(cid:28) ij(cid:28) (cid:24) N (cid:24) in table 4. There is evidence of predictive BIC-superiority only with long-sample es- timates. For short-sample we can not determine about predictive (cid:133)t between both information criteria; even more, with (cid:28) = 100 the statistic d is negative but not (cid:28) signi(cid:133)cant. Table 3: Statistics ofthe forecasting evaluation series (cid:28)=50 (cid:28)=100 (cid:28)=1,000 (cid:28)=5,000 AIC BIC AIC BIC AIC BIC AIC BIC Mean 0.65 0.64 0.65 0.66 0.68 0.66 0.99 0.91 Median 0.56 0.53 0.56 0.57 0.57 0.58 0.45 0.42 Maximum 9.00 9.24 10.50 8.52 8.61 7.94 10.31 9.69 Minimum 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Standard deviation 0.76 0.77 0.76 0.75 0.74 0.76 1.33 1.21 Skewness 5.48 5.21 6.04 5.14 4.32 5.07 2.40 2.31 Kurtosis 44.68 40.12 57.07 40.12 29.94 38.31 10.75 10.21 Source: Authors(cid:146)computations. Table 4: Estimates ofdi⁄erences in SFE (cid:28)=50 (cid:28)=100 (cid:28)=1,000 (cid:28)=5,000 d(cid:28) 0.01 -0.01 0.02 0.08 Standard deviation 0.01 0.02 0.02 0.02 One-sided p-value 0.16 0.27 0.12 0.00 Source: Authors(cid:146)computations. 5 4 A joint test 4.1 A reality check We nowtest the two null hypotheses together in a standardizedversionfor eachsample size (cid:28): NHIn(cid:0)sample NRegAIC NRegBIC (cid:28) = E (cid:28);Standardized (cid:0) (cid:28);Standardized = E[Z ] 0: NHOut(cid:0)of(cid:0)sample SFEAIC SFEBIC (cid:28) (cid:20) (cid:20) (cid:28) (cid:21) (cid:20) (cid:28);Standardized (cid:0) (cid:28);Standardized (cid:21) ItisexpectedthatavectorxthatcontainalltheNHshavenonpositivevalues,implying that BIC is the best in estimation and forecasting. When the number of replications (I) goes to in(cid:133)nity, we have pI(Z E[Z]) A (0;(cid:10)); where Z is a standardized (cid:0) (cid:0)! N vector x (Z = [x x]0(cid:6)(cid:0)1, with (cid:6) the covariance matrix of x), and (cid:10) is the long-run (cid:0) x covariance matrix. While I goes to in(cid:133)nity, we are able to build the following statistic, I 1 max pI (Z E[Z ]) ; mi mi m2f1;:::;Hg" I (cid:0) # Xi=1 H(cid:2)1 where m is the mth row of a vector Z that contains all the hypotheses to be tested. Nevertheless, as the maximum of a Gaussian process is not Gaussian, we have to use any methodology able to deliver asymptotically valid p-values for the least favorable con(cid:133)guration (LFC). As White (2000) pointed out, there two ways in which we can compute the p-values for LFC: (i) a simulation-based approach, and (ii) a bootstrap- based approach. We use the former, but in a less conservative manner as in Hansen (2005).8 Consider the diagonal matrix D, de(cid:133)ned as D = (cid:27)(cid:0)1; m = 1;:::;H; in which mm m (cid:27)2 = (cid:10) . Then, it must be ful(cid:133)lled that pID(Z E[Z]) A (0;D(cid:10)D), with the m mm (cid:0) (cid:0)! N advantage that now [D(cid:10)D] = 1; m = 1;:::;H. However, the terms E[Z], D, and mm 8 (cid:10) are unknown. Regarding the (cid:133)rst unknown term, note that the NH can be written as NH : E[Z] 0; and, as the number of vectors that are coherent with this NH goes (cid:20) to in(cid:133)nity, we can pick the LFC, E[Z] = 0, and work in a bounded test that allows for the identi(cid:133)cation of unknown terms. For the remaining two, we can use the Newey and West (1987) method to obtain a positive de(cid:133)nite consistent estimator of (cid:10), generating 1 an estimation of D using D = (cid:10)(cid:0)2 .9 mm mm Embedding all the identi(cid:133)ebd termsb, under the NH we have pIDZ A (0;(cid:10)); where (cid:0)! N (cid:10) D(cid:10)D. Then, the statistic can be written as, (cid:17) b b max pIDZ; b b b b m2f1;:::;Hg 8A brief review about divergences of both methods abre discussed in Corradi and Distaso (2011). 9As (cid:10) is a positive semide(cid:133)nite matrix, at least one hypothesis has to be nonnested. There is no available test for multiple nested hypotheses with m > 2 at the time. However, the test proposed in Clark and McCracken (2001) can be used for pairwise comparisons (m=2). 6 where m-elements represent the components of the vector DZ. The critical values of the statistic are derived from Monte Carlo simulations according b toWhite(cid:146)s(2000)procedure, followingthesesteps: (i)calculatetheCholeskydecompo- sition of D(cid:10)D= G0G, with G being a superior triangular matrix, (ii) de(cid:133)ne a number of replications, representing the number of realizations of the experiment, in this case, 1,000,000b, (biiib) for each replication, calculate an independent realization (cid:23) of a multi- variate normal distribution (0;I ), (iv) de(cid:133)ne ! as ! = G0(cid:23), such that ! is an H(cid:2)H N independent realization of (0;D(cid:10)D), (v) de(cid:133)ne s as: N s = max ! ; b b b m m2f1;:::;Hgf g and (cid:133)nally, (vi) sort the m terms and de(cid:133)ne the critical values according to the corre- sponding quantiles. 4.2 Estimates results The estimates of Z and (cid:10) with the Newey-West estimator gives the next pairwise results, b 8:03 10(cid:0)17 1:00 0:06 Z = (cid:0) (cid:2) ;(cid:10) = ; (cid:28)=50 1:44 10(cid:0)16 (cid:28)=50 0:06 1:00 (cid:20) (cid:0) (cid:2) (cid:21) (cid:20) (cid:21) 1:84 10(cid:0)17 1:00 0:07 Z = (cid:0) (cid:2) ;(cid:10)b = ; (cid:28)=100 1:90 10(cid:0)16 (cid:28)=100 0:07 1:00 (cid:20) (cid:0) (cid:2) (cid:21) (cid:20) (cid:21) 1:31 10(cid:0)16 1:00 0:10 Z = (cid:0) (cid:2) ;(cid:10)b = ; (cid:28)=1;000 2:15 10(cid:0)16 (cid:28)=1;000 0:10 1:00 (cid:20) (cid:0) (cid:2) (cid:21) (cid:20) (cid:21) 1:65 10(cid:0)16 1:00 0:08 Z = (cid:2) ;(cid:10)b = : (cid:28)=5;000 2:74 10(cid:0)16 (cid:28)=5;000 0:08 1:00 (cid:20) (cid:2) (cid:21) (cid:20) (cid:21) After 1,000,000 of replications of each G0(cid:23) mbatrix, we have the following estimations of D(cid:10)D, 23:78 2:79 24:48 3:34 b b b D(cid:10)D = ;D(cid:10)D = ; (cid:28)=50 0:00 24:19 (cid:28)=100 0:00 24:73 (cid:20) (cid:21) (cid:20) (cid:21) 23:25 4:47 22:25 3:52 D(cid:10)bDb b = ;Db(cid:10)bDb = : (cid:28)=1;000 0:00 21:56 (cid:28)=5;000 0:00 24:56 (cid:20) (cid:21) (cid:20) (cid:21) Given thabtbthbe results of tabulated (m90% , (cid:28)b b (cid:28)b) and calculated critical value of the (cid:28)=(cid:28)0 0 2 maximum element of Z (t(cid:28)=(cid:28)0 = max pIZ ) are: (cid:28) Zm m=1;:::;H mj(cid:28) (cid:28) m90% t(cid:28)=(cid:28)0 (cid:28)=(cid:28)0 Zm 50 1:13 10(cid:0)16 2:93 10(cid:0)17 (cid:0) (cid:2) (cid:0) (cid:2) 100 1:49 10(cid:0)16 3:82 10(cid:0)17 (cid:0) (cid:2) (cid:0) (cid:2) 1,000 1:69 10(cid:0)16 4:63 10(cid:0)17 (cid:0) (cid:2) (cid:0) (cid:2) 5,000 2:14 10(cid:0)16 5:52 10(cid:0)17 (cid:2) (cid:2) 7 the NH : E[Z ] 0 is not rejected at typical signi(cid:133)cance levels for (cid:28) = 50;100;1;000 . (cid:28) (cid:20) f g But, when (cid:28) = 5;000 the results leads us to state that BIC is a dominant criteria for modeling stationary autoregressive processes for forecasting purposes. 4.3 Type-I error control analysis According to White (2000), Hansen (2005), Corradi and Distaso (2011), and Pincheira (2011a, 2012), wheninterest is centeredontestingmore thanone univariate hypothesis jointly, therearegenerallytwostrategiesforstatisticalinference. Ononehand, wemay determine the superiority in- and out-of-sample of BIC over AIC by stating that, given the results of both individual tests, we may reject or not both NH.10 On the other hand, we can perform a joint test that control better for the type-I error (this is, reject a true null hypothesis), as is summarized in the derivation of asymptotic valid p-values for LFC statistic. Obviously, both strategies will have the same outcome when the hypotheses are fully independent. The (cid:133)rst strategy (cid:150)in this case, that based on the separate regressions(cid:150) may present shortcomings handling type-I error, that is, rejection of a true NH. To (cid:133)gure out, we will follow closely the next example proposed in Pincheira (2011a, 2012). Assume that NH : E[Y] = 0 , L N, and the alternative hypothesis (AH) states L(cid:2)L 2 that at least one component of Y is positive, AH : l 1;:::;L E[Y ] > 0. Let(cid:146)s l 9 2 f gj suppose now that we have a collection of tests T that depends on sample size ((cid:9)), and l is assigned to test NH(l) : E[Y ] = 0, with one-sided AH(l) : E[Y ] > 0, implying that l l any T will reject the NH(l) at a determined con(cid:133)dence level 0 (cid:11) 1 when T ((cid:9)) > (cid:14). l l (cid:20) (cid:20) In this case, (cid:14) represents a tabulated value coming from the distribution function to which contrast the NH. If the elements of (cid:0)!T = (T ;:::;T )0 are orthogonal, we have 1 L that, L Pr( l 1;:::;L T ((cid:9)) > (cid:14) NH) = Pr( (cid:7) > 0 NH); l l 9 2 f g 3 j j l=1 X in which (cid:7) = 1 if T ((cid:9)) > (cid:14), or 0 otherwise. Then, (cid:7) is a random variable that l l l follows a Bernoulli distribution function of parameter p (cid:11), 0 p 1. Under the L (cid:20) (cid:20) (cid:20) NH, (cid:7) follows a binomial distribution with parameters L and p. By using this l l=1 terms, we have that, X L L Pr( (cid:7) > 0 NH) = 1 Pr( (cid:7) = 0 NH); l l j (cid:0) j l=1 l=1 X X = 1 Pr(T ((cid:9)) (cid:14) l 1;:::;L NH); l (cid:0) (cid:20) 8 2 f gj = 1 (1 p)L 1 when L : (cid:0) (cid:0) (cid:0)! (cid:0)! 1 10In this class of tests we found approaches like Bonferroni bounds and the proposed by Holm (1979). 8
Description: