ebook img

ERIC ED341728: Correcting for Systematic Bias in Sample Estimates of Population Variances: Why Do We Divide by n-1? PDF

31 Pages·0.35 MB·English
by  ERIC
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ERIC ED341728: Correcting for Systematic Bias in Sample Estimates of Population Variances: Why Do We Divide by n-1?

DOCUMENT RESUME ED 341 728 TM 017 923 AUTHOR Mittag, Kathleen Cage TITLE Correcting for Systematic Bias in Sample Estimates of Population Variances: Why Do We Divide by n-l? PUB DATE Jan 92 NOTE 31p.; Paper presented at the Annual Meeting of the Southwest Educational Research Association (Houston, TX, January-February 1992). PUB TYPE Reports - Evaluative/Feasibility (142) -- Speeches/Conference Papers (150) EDRS PRICE MF01/PCO2 Plus Postage. DESCRIPTORS College MathematIcs; Computer Software; *Equations (Mathematics); *Estimation (Mathematics); Higher Education; *Mathematical Models; Methods Courses; Monte Carlo Methods; Research Methodology; Research Needs; *Sampling; *Statistical Bias; Statistics *Population Parameters IDENTIFIERS *Variance (Statistical) ABSTRACT An important topic presented in introductory statistics courses is the estimation of population parameters using samples. Students learn that when estimating population variances using sample data, we always get an underestimate of the population variance if we divide by n rather than n-1. One implication of this correction is that the degree of bias gets smaller as the sample gets larger and larger. This paper explains the nature of bias and correction in the estimated variance and discusses the properties of a good estimator (unbiasedness, consistency, efficiency, and sufficiency). A BASIC computer program that is based on Monte Carlo methods is introduced, which can be used to teach students the concept of bias in estimating variance. The program is included in this paper. This type of treatment is needed because surprisingly few students or researchers understand this bias and why a correction for bias is needed. One table and three graphs summarize the analyses. A 10-item list of references is included, and two appendices present the computer program and five examples of its use. (Author/SLD) *********************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document. *********************************************************************** "PERMISSION TO REPRODUCE THIS U.S. DEPARTMENT OF EDUCATION variance/bias Improvement MATERIAL HAS BEEN GRANTED BY Office of Educational Research and INFORMATION EDUCATIONAL RESOURCES Hirm4 CENTER (ERIC) letn-fiLceu Vithis document has been reproduced as organization received Irom the person or originating it made to improve CI Minor changes have been reproduction duality this docii Points of view or opinions stated in TO THE EDUCATIONAL RESOURCES represent official ment do not necessarily INFORMATION CENTER (ERIC)." OERI position or policy Estimates Bias in Sample Correcting for Systematic by n -1? Why Do We Divide of Population Variances: Kathleen Cage Mittag 77843-4232 Texas A&M University Southwest Educational annual meeting of the Paper presented at the 1992. Houston, TX, January 31, Research Association, 0 BEST COPY AVAILABLE ABSTRACT estimation courses is the presented in introductory statistics An important topic estimating learn that when using samples. Students of population parameters underestimate of sample data, we always get an population variances using of n-1. One implication divide by n rather than by the population variance if we larger the sa.nple gets of bias gets smaller as this correction is that the degree in the of bias and correction This paper explains the nature and larger. estimator. A the properties of a "good" estimated variance and discusses and is included illustrate the bias concept computer program was included to surprising few is needed, because in this paper. This type of treatment correction for bias is this bias and why a students or researchers understand needed. 3 is the introductory statistics courses An important topic presented in statistical studies using samples. In many estimation of population parameters impossible to gather data time-consuming, or simply it may be too costly, too to estimate these Methods have been developed from the entire population. of bias and will explain the nature population parameters and this paper of a "good" and discuss the properties correction in the estimated variance included in Appendix for an IBM PC is estimator. A BASIC computer program of bias in estimating teach students the concept 1; this program can be used to variance. is computed estimate. A point estimate Variance is what is called a point an single numerical value that acts as from a given sample and has a discussed Interval estimates (not approximation of the population parameter. fall with a between which population parameters in this paper) specify limits intervals. estimates are called confidence given probability. These interval computational estimates, a review of the basic Before discussing parameter will be variance, and standard deviation statistics for population mean, central The mean, a measure of presented (Harnett, 1970; Ott, 1998). tendency, is defined as: n (1) where N = number in population (1/N) I xi. la = E[s] = , i=1 is defined as: The variance, a measure of variability, (2) 02 Z (i = E[(x = ON R)2 R)2] 1 The standard deviation, a measure of variability, is defined as the square root It is used because the variance is in a squared metric, and of the variance. people are more comfortable thinking in units of dollars rather than squared dollars, or 10 rather than squared 10, and so forth. a = 4(02) (3) Since population data are seldom available, it is often necessary to estimate parameters using sample data. There are four criteria which are considered when deciding if an estimator is a "good" estimator. These criteria are unbiasedness, consistency, efficiency, and sufficiency (Harnett, 1970; Khazanie, 1990). The cost of making incorrect estimates from sample data should be minimized; therefore, it is very important to chose the correct estimation procedure. In this paper, parameters will be referred to as 0. Parameter estimates will be referred to as O. An example is if 0 = a2, then UNBIASEDNESS Unbiasedness is the first property of a "good" estimator. Carl Gauss is given credit for first presenting this concept. Unbiasedness is defined by Harnett (1970, p. 188) as the following: An estimator is said to be unbiased if the expected value of the estimator is equal to the parameter being estimated, or if (4) E[0] = 0 2 either estimator will equal zero. A biased Ideally, the bias should be If parameter 0. 1(b)) the overestimate (Figure underestimate (Figure 1(c)) or population, then from a samples are taken is "good" and several an estimator parameter value be close to the these samples should the mean value of concepts as these unbiasedness (1990) illustrated (Figure 1(a)). Khazanie follows: FIGURE 1 Curves represent sampling distributions of 0 I E(] 6 EA E[a] o overestimates (a). 11 is unbiased (b). (c). underestimates 0 estimator of 6 0 is an unbiased widely used estimator, The sample mean (MI, the most 1970, p. 159). follows (Harnett, This fact can be shown as estimator of Define M = (1/n) (xi+ x2 + . + ?Sn ) . + n ELM] = Ej(1/n) (?si +12 + . . . + xrd E[ = 1/n 4->12 + + E[/1] + Ek2) + EL)InD = 1 /12 ( =1M(v+R+...+g 3 ; = = titA (6) ELM] = . [I (x-)2]), the second = 1/n s2 sample variance ( It would be nice if the estimator of the population also an unbiased most widely used estimator, was a2) (Harnett, (32 ( E[s2] s2 is not an unbiased estimator of variance (02), but (32 by a factor of (n-1)/n. The s2 always underestimates It is a fact that 1970). that fact: following relationship results from (7) E[s21= (32 {(n-1)/D) or by rewriting (7) (8) 02/n E [ s 2] = 02 - If n is large, then (32/n. that the bias is equal From formula (8) it can be seen important to reinforces the idea that it is 02/n becomes very small. That fact (32/n can be important, as possible. The value of have a large sample size, if variance is 02 = Assume that the population illustrated in the next example. size n=5, n=10, variance, s2, from samples of 50 and calculate the estimate since n=5 will be 20% too low, and n=20. The estimate from (50/5) = 40. = 50 - E[s2] estimate from n=20 be 10°h too low and the The estimate from rk=10 would effects the illustrates how sample size would be 5% too low. This underestimates of variance. 4 7 All that variance formula (7). for this bias in the It is very easy to correct reciprocal of both sides by the multiply formula (7) on needs to be done is to n/(n-1) (Harnett, 1970). (n-1)/11. which would be 0/(n-1)}{(1/n) x 0.0%4)2) = {n/0-1)) E[s2] (9) os-N2 {1/(n-1)} = S2 02 and estimate of referred to as the unbiased The formula (9) will be denoted by S2. variance estimate; deriving the unbiased Formula (7) is essential in proof is as in this paper. The formula (7) in included ire, the proof of ther fol'ows (Harnett, 1970): (N-M), since substituted for the term proof, (x-p) - (M-p) is In the first step of the mathematically. they are equivalent (x-M)2] E[s2] = E[(1/n) (M-P)}2) (1/a) E[1{(21-14 = follows. a2 + 2ab + b2, the next step Note: Since (a+b)2 = EIZ(M-P)21 (2/a) EII(M-p)(21-P)) + (1/s) = (1 /n) E[Z(2..(-142] outside it can be taken I (M-p) is a constant, so Note: In the second term, the expectation sign. In) E[E(M-P)2] (Vs) /(M-P) E[(N-P)] + (1 (1/a) E[I(1.(-02] = 5 Note: Let E[(x-p)] = (M-p) a2 Let E[(x-p)2] = o2/n Let ERM-p)21 = E(02/E1) Ea2 (2/11) E(..M-IMM-p) + (lt) = (1k1) 1(02/n) (24j) 1(M-p)2 + (1/n) Icr2 = (1/n) 2(02/n) + (cy2In) = - cr2 = = E[s2] ... value only the average weakness in that it requires Unbiasedness has one 0. The of7can be very far from 0 and still average '6 equal 0. The values of -6. into consideration. takes the variability of next property, consistency, CONSISTENCY (1970, p. 191) consistency given by Harnett The definition and properties of are: yields estimates be consistent if it Definition: An estimator is said to n parameter being estimated as which approach the population becomes larger. 4 0 as ri., 03 Properties: 1). Var (6) ER = 0 ) 2). '4 is unbiased ( 6 9 consistent be both that "estimates" can Rahman (1968, p. 301) emphasized quotation: in the following one or the other and unbiased, neither, or different is a very emphasized that consistency Nevertheless, it is to be from a different and it is also derived concept from unbiasedness, of least from the theory (Unbiasedness is derived theory of estimation. unbiased. not be consistent estimate may or may squares.) As such, a be consistent. may not Conversely, an unbiased estimate may or sample.mean) which (such as the Despite this, there exist estimates and consistent. are both unbiased in the 1920's. consistency property R. A. Fisher introduced the EFFICIENCY estimate of 0 for the reliability of the The third property, efficiency, concerns efficiency and 303) defined size. Khazanie (1990, P. a given sample efficiency as follows: illustrated the concept of 131 is more and unbiased estimators of 8, If Eli and 02 are two distribution of -61 of the sampling efficient than 02 if the variance of 62. the sampling distribution is less than the variance of FIGURE 3 FIGURE 2 Probability density function Probability density function Curve of sampliQg distribution for 01 Curve of samplpg distribution for 02 ValRes 0 7 0 0 Values 0 d 02 of CI

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.