Variance approximations for assessments of classification accuracy PDF

40 Pages·1994·2.7 MB·English
Preview Variance approximations for assessments of classification accuracy

Historic, Archive Document Do assume not content reflects current scientific knowledge, policies, or practices. E A99.9 F7632L1 United States Variance Approximations Department of Agriculture Assessments for of Forest Service Rocky Mountain Classification Accuracy Forest and Range Experiment Station Fort Collins, Colorado 80526 Raymond Czaplewski L. Research Paper RM-316 IS <4 o Var {kw ) = X>° [101] Xk Ik k k . [wr.+w.s){p0-l] +wij[l-pc) r=l s=l +wrs{l-pc) [25] <4« Var (*J = 0 k k -{W.+W.^il-Pc) k k * -{w +w. )[l- p II r s c) +wJl-p p=l/=l c) r=1s=1 U-pJ4 [28] Var (*J = 0 1fc 1k K, -Wf k Ik -0[f,y£„](iv„ -w„ - w.s ) >J (1-Pc): 4* Co 4V 4 J CO * [80] f ec J pj [87] Abstract Czaplewski, R. L. 1994. Variance approximations for assessments of classification accuracy. Res. Pap. RM-316. Fort Collins, CO: U.S. Department of Agriculture, Forest Service, Rocky Mountain For- est and Range Experiment Station. 29 p. Variance approximations are derived forthe weighted and unweight- ed kappa statistics, the conditional kappa statistic, and conditional probabilities. These statistics are useful to assess classification accu- racy, such as accuracy ofremotely sensed classifications in thematic maps when compared to a sample of reference classifications made in the field. Published variance approximations assume multinomial sampling errors, which implies simple random sampling where each sample unit is classified into one and only one mutually exclusive category with each of two classification methods. The variance ap- proximations in this paper are useful for more general cases, such as reference data from multiphase or cluster sampling. As an example, these approximations are used to develop variance estimators for accuracy assessments with a stratified random sample of reference data. Keywords: Kappa, remote sensing, photo-interpretation, stratified random sampling, cluster sampling, multiphase sampling, multivari- ate composite estimation, reference data, agreement. USDA Forest Service September 1994 Research Paper RM-316 Variance Approximations for Assessments of Classification Accuracy Raymond Czaplewski L. USDA Forest Service Rocky Mountain Forest and Range Experiment Station1 'Headquarters is in Fort Collins, in cooperation with Colorado State University. Contents Page Introduction 1 Kappa Statistic [kw] 1 Estimated Weighted Kappa ( kw) 2 Taylor Series Approximation for Var ( kw) 2 Partial Derivatives for Var ( kw) Approximation 2 First-Order Approximation of Var k 4 ( w) Var k Assuming Chance Agreement 4 0( w) Unweighted Kappa k) 4 ( Matrix Formulation of k Variance Approximations 5 Verification with Multinomial Distribution 8 Conditional Kappa ( for Row i 9 Conditional Kappa ( for Column i 11 Matrix Formulation for Var (£, ) and Var ) 11 Conditional Probabilities ( p,i ; and pii}:) 14 Matrix Formulation for Var pm and Var p 15 ( ) ( jU ) Test for Conditional Probabilities Greater than Chance 16 Covariance Matrices for E^f^j and vecp 17 Covariances Under Independence Hypothesis 19 Matrix Formulation for En0 \\e.i).£rsJ 20 Stratified Sample of Reference Data 20 Accuracy Assessment Statistics Other Than kw 22 Summary 23 Acknowledgments 23 Literature Cited 23 Appendix A: Notation 25 a: Cc R (1! \ m Variance Approximations for Assessments of Classification Accuracy Raymond L. Czaplewski INTRODUCTION where each sample unit is classified into one and only one mutually exclusive category with each of the two Assessments ofclassification accuracy are important methods (Stehman 1992). This paper considers more to remote sensing applications, as reviewed by general cases, such as reference data from stratified ran- Congalton and Mead (1983), Story and Congalton (1986), dom sampling, multiphase sampling, cluster sampling, Rosenfield and Fitzpatrick-Lins (1986), Campbell (1987, and multistage sampling. pp. 334-365), Congalton (1991), and Stehman (1992). Monserud and Leemans (1992) consider the related problem of comparing different vegetation maps. Re- KAPPA STATISTIC (k ) cent literature favors the kappa statistic as a method for assessing classification accuracy or agreement. The weighted kappa statistic (kJ was first proposed The kappa statistic, which is computed from a square by Cohen (1968) to measure the agreement between two contingency table, is a scalar measure of agreement be- different classifiers or classification protocols. Let p.. tween two classifiers. If one classifier is considered a represent the probability that a member of the popula- reference that is without error, then the kappa statistic tion will be assigned into category i by the first classi- is a measure of classification accuracy. Kappa equals 1 fier and category;'by the second. Let kbe the number of for perfect agreement, and zero for agreement expected categories inthe classification system, which is the same by chance alone. Figure 1 provides interpretations of for both classifiers, kW is a scalar statistic that is a non- the magnitude ofthe kappa statistic that have appeared linear function of all k2 elements of the k x k contin- in the literature. In addition to kappa, Fleiss (1981) sug- gency table, where p is the i/th element of the contin- . gests that conditional probabilities are useful when as- gency table. Note thai{ the sum of all k2 elements of the sessing the agreement between two different classifi- contingency table equals 1: ers, and Bishop et al. (1975) suggest statistics that quantify the disagreement between classifiers. Existing variance approximations for kappa assume multinomial sampling errors for the proportions in the i=l ;=1 contingencytable; this implies simple random sampling Define w~ as the value which the user places on any partial agreement whenever a member of the popula- tion is assigned to category i by the first classifier and category by the second classifier (Cohen 1968). Typi- cally, the weights range from 0 < w{. < 1, witwh wu = 1 (Landis and Koch 1977, p. 163). For example, might {- equal 0.67 if category i represents the large size class and is the medium size class; if rrepresents the small w w size class, then might equal 0.33; and might equal ir is 0.0 if s represents any other classification. The un- w w weighted kappa statistic uses n = 1 and {. = 0 for i * j (Fleiss 1981, p. 225), which means that the agreement must be exact to be valued by the user. Using the notation of Fleiss et al. (1969), let: k Pi=^Pa [2] 7=1 Landis and Koch Fleiss Monserud and Leemans (1977) (1981) (1992) — k Figure 1. Interpretations of kappa statistic as proposed in past Pj=^P'j literature.Landisand Koch (1977)characterizetheirinterpretations [3] as useful benchmarks, although they are arbitrary; they use clini- cal diagnoses from the epidemiological literature as examples. Fleiss (1981, p. 218) bases his interpretations on Landis and Koch k k (1977), and suggests that these interpretations are suitable for "most purposes." Monserud and Leemans (1992) use their inter- pretations for global vegetation maps. i=l ;=1 1 Pc=k^^k W where R is the remainder. In addition, assume that p is i=l /=1 ijPiPi - [5] n£eijar=liyPijeq~uaPilj)t'otphei}h(ii.geh.,erp-{o.r~depr^)p;rohdeuncctes,o£f..8.«. i0nbtehecaTuaj7sye- lor series expansion are assumed to be much smaller UdesfiinngetdhibsynCotoahteinon,(1t9h6e8)weiisgghitveednkasa:ppa statistic [kw) as than £jj, and the R in Eq. 10 is assumed to be negligible. Eq. 10 is linear with respect to all e = (p - p^). The Taylor series expansion in Eqif. 10j;provides the Po-Pc following linear approximation after ignoring the re- [6] 1-Pc mainder R: K Yk Yk = -£J« Estimated Weighted Kappa {k dp [11] w) i=l /=1 p«=pa The true proportions p~ are not known in practice, and the true must be estimated with estimated pro- The squared random error approximately equals ek2 from portions in the contingency table Eq. 11: ( Kw - [7] IkIk- 11^ [dkw ^ 1-Pc d f=7 7=i <?P;/ r=l s=l [ Prs) \Prs=Prs_ where p and p are defined as in Eqs. 2, 3, 4, and o c using in place ofp... A: k k k The true k equals the estimated k plus anunknown [12] random error £^: 7=1 /=i r=l s=l <?p dPii \ Prs=Pr. V=Pij = + K,. k,„ £, [8] r From Eqs. 9 and 12, Var [kw) is approximately: If k is an unbiased estimate of k then E[e = 0 and , k] £[£J = kw . By definition, E[e|] = E[(K:w -£j2], andthe k k variance of jr., is: Var(0"XX 1=1 7=1 <?P// r=l s=i V°Prs JPrs=Pr. Var(Jej = E[£|]-£'2 [eJ = ^[4]. [9] [13] This corresponds to the approximation using the delta Taylor Series Approximation for Var k method (e.g., Mood et al. 1963, p. 181; Rao 1965, pp. ( w) 321-322). The partial derivatives needed for Vai{icw) kw is a nonlinear, multivariate function of the k2 ele- in Eq. 13 are derived in the following section. ments (p^) in the contingency table (Eqs. 2, 3, 4, 5, and 6). The multivariate Taylor series approximation is used to produce an estimated variance Var(fCw,)- Let Partial Derivatives for Var (kw) Approximation £ij = iPij ~ Pij)> and /3ft; Vrt be the partiafderiva- tive of kw with respect to p~ evaluated at p^ = p/; . The The partial derivative ofkwin Eq. 13 is derived by re- multivariate Taylor series expansion (Deutch 1965, pp. writing kw as a function of p-. First, pQ in Eq. 6 is ex- 70-72) of K^is: panded to isolate the p term using the definition ofp {. 0 in Eq. 4: K - K w w k k W W Po= ijPij+yy, rsPrs' +L +£u + £f£? r[14], dpn <?Pi*A, (re) * Uj) \Pii=Pn The partial derivative ofp with respect to p is simply: o {- +£ +L + +L 21 ^2* V^P21 J ydPlk J <?Po _ w \PirPa ... [15] (dK) As the next step in deriving the partial derivative of k.w +^1 +L + £kk + R in Eq. 13, p in Eq. 6 is expanded to isolate the p^ term \pij=Pij yfykk j\pij=Pij [10] using the decfinition ofp in Eq. 5: c

