Pieter G. de Vries Sampling Theory for Forest Inventory A Teach-Yourself Course With·20 Figures Springer-Verlag Berlin Heidelberg New York London Paris Tokyo· PIETER G. DE VRIES Dept. of Forest Management Wageningen Agricultural University "Hinkeloord" Gen. Foulkesweg 64 P.O.B.432 NL-6700 AH Wageningen ISBN-13: 978-3-540-17066-2 e-ISBN-13: 978-3-642-71581-5 DOl: 10.1007/978-3-642-71581-5 This work IS subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use a fee is payable to "Verwertungsgesellschaft Wort", Munich. © Springer-Verlag Berlin Heidelberg 1986 The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Product Liability: The publisher can give no guarantee for information about drug-dosage and application thereof contained in this book. In every individual case the respective user must check its accuracy by consulting other pharmaceutical literature. 213113130-543210 PREFACE Forest inventory may be defined as the technique of collecting, evaluating and presenting specified information on forest areas. Because of the generally la~ge extent of forest areas, data are usually collected by sampling, i.e. by making observations on only part of the area of interest. As there are many different sampling methods (e.g. Appendix 1), a choice must first be made as to which method suits the given field and financial circumstances best. On completion of the sampling procedure, the numerous data collected have next to be condensed to manageable representative quantities. Finally, from these quantities, inferences about the situation in the entire forest area are made, preferably accompanied by an indication of their reliability. This book is intended for students who want to know the whepefope of the sampling techniques used in forest inventory. The danger of lack of knowledge is a blind following of instructions and copying statistical formulae, or, even worse, feeding data into a computer loaded with a program that is said to print out the required information. In serious persons, such approaches may leave a feeling of dissatisfaction or even of professional incompetence, be cause of inability to direct or evaluate the procedure critically. If a student tries to improve his or her situation, he/she will find that the few existing forest inventory textbooks, though some with merit, either use confusing statistical symbols or do not adequately cover theoretical principles. As a result, complex formulae may drop out of a blue sky, and the student is discouraged from trying to work out their principle and origin. On the other hand, there are a number of excellent general textbooks on sampling theory, though the lucidity of their symbolism may differ. The statistical sophistica tion required of the reader, however, is often too high for even a graduate forestry student, who experiences the gaps in his knowledge as unexplained "jumps" in the statistical text. The present book is an effort to evade the above drawbacks, by going step by step, giving ample proofs in the text, and using a symbolism as clear as possible. Moreover, the appendixes review many elementary statistical concepts concisely and include some indispensable general statistical proofs. Numerical examples are worked out with restricted simple data for each sampling method, in order to illustrate the type of calculations involved. All this, however, should not lead to the false impression that the student VI could manage without any previous mathematical and statistical knowledge. Apart from being conversant with ordinary algebra, he should know the principles of calculus (differentiation and integration), and he should have taken a course in elementary statistics. Those acquainted with the principles of vectors and vector spaces (linear algebra) will appreciate the theory of stochastic vectors (Appendix 4), by which otherwise tedious proofs can be expressed elegantly. Moreover, stochastic vector theory gives a clear insight into concepts such as "degrees of freedom" and "analysis of variance". Though this text can be used in regular courses, its primary purpose is for self-teaching. The "normally intelligent" student should then realize that relevant statistical knowledge cannot be assimilated in a hurry. He should take his time (a year, say), during which the new matter can sink in. Further, he should know that no part of the text, even the appendixes, can be skipped, and that every phrase has its meaning and purpose. Just to complete or brush up his statistical knowledge, the budding student should first read Appendixes 2 and 3 before starting with Chapter I. Then study Appendix 5, and (if you know a bit about vectors), Appendix 4. If the reader tackles the subject matter with this advice in mind, I guar antee that he will experience the satisfaction of mastering some sampling methQds widely used in forest inventory, that he will be able to read critically more professional literature than before, and that he will possess a sound basis on which to extend" his knowledge of sampling. Utmost care has been taken to avoid typographic errors and errors of calcu lation. But to be human is to be imperfect, so errors will remain. I will be grateful for any suggestion for improvement. ACKNOWLEDGEMENT I am greatly indebted to Dr A.C. van Eijnsbergen, Department of Mathematics, Agricultural University, Wageningen, for expert advice. Without his construc tive criticism and encouragement, this book would never have been completed. For the opinions expressed, I assume full responsibility. Wageningen, The Netherlands Pieter G. de Vries February 1986 VII CONTENTS CHAPTER PAGE SIMPLE RANDOM SAMPLING WITHOUT REPLACEMENT 1.1 Introduction ••••••••• 1 1.2 Expected Value. Estimators for Population Mean and Total 4 1.3 Population and Sample Variance •••••••• 7 1.4 Variances of Estimated Population Mean and Total 10 1.5 Confidence Interval and Confidence Statement 13 1.6 Estimation of Proportions ••• 18 1.7 Required Sample Size ••••• 20 1.8 Some General Remarks on Sample Plots 25 1.9 Numerical Examples. 29 2 STRATIFIED RANDOM SAMPLING 31 2.1 Introduction ••••• 31 2.2 Unbiased Estimators for Population Mean and Total. Variances 33 2.3 Some Special Cases ••••••••• 37 2.4 Optimization of the Sampling Scheme •••••••• 39 2.5 Confidence Intervals. Behrens-Fisher Problem ••• 41 2.6 Gain in Precision Relative to Simple Random Sampling 44 2.7 Numerical Examples ••••••••• 46 3 RATIO ESTIMATORS IN SIMPLE RANDOM SAMPLING 56 3.1 Introduction. Population Ratio. Ratio Estimators for Total and Mean 56 3.2 Variances. • • • • • • • • • • • • • • • • 59 3.3 Confidence Interval. Precision versus SRS. Required Sample Size. 62 3.4 Bias of the Ratio Estimator. • • • • • • • 63 3.5 Ratio Estimator per Species Group in Mixed Forest. • 67 3.6 Numerical Example. • • • • • • • • • • • 69 3.7 Combining Results of Different Samples to Obtain New Information 72 4 RATIO ESTIMATORS IN STRATIFIED RANDOM SAMPLING 75 4.1 Introduction •••••••• 75 4.2 The Separate Ratio Estimator 75 4.3 The Combined Ratio Estimator 78 4.4 Illustrations ••• 81 4.5 Numerical Example. 85 5 REGRESSION ESTIMATOR • 88 5.1 Introduction • • • • 88 5.2 Unbiased Estimator of Population Regression Line from Sample Data. 90 5.3 Linear Regression Estimator and its Variance • • • 94 5.4 Regression Estimator in Stratified Random Sampling 101 5.5 Numerical Example ••••••••••••••••• 101 VIII 6 TWO-PHASE SAMPLING or DOUBLE SAMPLING • • • 104 6.1 Introduction. • • • . • . . • • . • • • 104 6.2 The Ratio Estimator in Double Sampling. 105 6.2.1 Ratio Estimator in Double Sampling - Dependent Phases 105 6.2.2 Ratio Estimator in Double Sampling - Independent Phases 108 6.3 The Regression Estimator in Double Sampling. ....••. 109 6.3.1 Regression Estimator in Double Sampling - Independent Phases. 109 6.3.2 Regression Estimator in Double Sampling - Dependent Phases .• 114 6.3.3 Numerical Example - Dependent Phases.. • •.••..•.• 117 6.4 Optimization in Double Sampling with Ratio and Regression Estimators 121 6.5 Double Sampling for Stratification. . : .•.••• ' ••• 122 6.5.1 Introduction ••••••.•••.••••••••• 122 6.5.2 Unbiased Estimator for Population Mean. Variance Expression 123 6.5.3 Variance Estimator .••••.••• 127 6.5.4 Optimization of the Sampling Scheme •..•.••••••• 129 6.5.5 Numerical Example • • • • • • • . • • • • . • • • • • • • • 131 6.6 Correction for Misinterpretation in Estimating Stratum Proportions from Aerial Photographs • • • 132 6.6.1 Derivation of Formulas ••. 132 6.6.2 Numerical Example ••••• 135 6.7 Volume Estimation with Correction for Misinterpretation 137 6.7.1 Derivation of Formulas. 1.37 6.7.2 Numerical Example ••••. 139 7 CONTINUOUS FOREST INVENTORY WITH PARTIAL REPLACEMENT OF SAMPLE PLOTS. 141 7.1 Introduction. • • • • • • • • • • • • . • • . • . • • 141 7.2 Definition of Symbols. • • • • • • • • • • • • . • • • • • • 142 7.3 Most Precise Unbiased Linear Estimator for Population Mean on the Second Occasion • • • • • • • • • • • • • • • 143 7.4 Optimization of Sampling for Current Estimate 150 7.5 Estimation of Change (Growth or Drain). 153 7.t A Compromise Sampling Scheme. 156 7.7 Numerical Example • • • • • . • 158 8 SINGLE- AND MORE-STAGE CLUSTER SAMPLING 161 8.1 Introduction. • • . • • • • • • • 161 8.2 Estimators in Two-Stage Sampling. • 168 8.2.1 Definition of Symbols. • • • 168 8.2.2 Unbiased Estimators for Population Total and Mean per SUo 169 8.2.3 Unbiased Estimators in Special Cases. 171 8.2.3.1 Single-Stage Cluster Sampling. 171 8.2.3.2 Primary Units of Equal Size. • 172 8.2.3.3 Equal Within-Cluster Variances. 172 8.2.3.4 Relation to Stratified Random Sampling. 173 8.2.4 Ratio Estimator for Population Total and Mean per SUo 173 8.3 Optimization of the Two-Stage Sampling Scheme 175 8.4 Three- and More-Stage Sampling. • • • • 178 8.5 Numerical Example of Two-Stage Sampling • • • 182 IX 9 SINGLE-STAGE CLUSTER SAMPLING AS A RESEARCH TOOL 183 9.1. Introduction ••••••••••••• 183 9.2. Intracluster Correlation Coefficient. 183 9.3. Variance and Intracluster Correlation 185 9.4. Measures of Heterogeneity ••••• 190 9.4.1. The Intracluster Correlation Coefficient 190 9.4.2. The C-Index ••••••• 192 9.4.3. The Index of Dispersion •••••••• 192 9.4.4. Numerical Example ••••••••••• 194 9.5. Intracluster Correlation Coefficient in Terms of Anova Quantities 197 9.6. About the Optimum Sample Plot Size •• 198 10 AREA ESTIMATION WITH SYSTEMATIC DOT GRIDS. 204 10.1. Random Sampling with n Points ••• 204 10.2. Systematic Sampling with n Points. 206 10.3. Numerical Example ••• 210 II SAMPLING WITH CIRCULAR PLOTS 212 11.1. Sampling from a Fixed Grid of Squares. 212 11.2. Sampling from a Population of Fixed Circles. 213 11.3. Sampling with Floating Circular Plots. 215 11.4. Comparison of Variances. 218 12 POINT SAMPLING • 223 12.1. General Estimator •• 223 12.2. Specific Estimators. 227 12.3. Variances •••••• 228 12.4. Sampling Near the Stand Margin 231 12.5. Required Sample Size. Choice of K. Questionable Trees. 234 12.6. Numerical Example ••••••••• 237 12.7. A More General View at PPS-Sampling, wtr 238 13 LINE INTERSECT SAMPLING ••• 242 13.1. Introduction. • • • • 242 13.2. BUFFON's Needle Problem and Related Cases 244 13.3. Total-Estimator Based on One-line Data. 249 13.4. Variance in Case of One-Line Data • • • 251 13.5. Sampling with More Than One Line •••• 253 13.6. Required Number and Length of Transects 256 13.7. Estimating Properties of Residual Logs in Exploited Areas 258 13.8. Estimators Based on Circular Elements • • • • • • • • • 262 13.8.1. Generalization of STRAND's Estimator •••••• 262 13.8.2. Density Estimation of Mobile Animal Populations 263 13.8.3. Biomass Estimation in Arid Regions. 264 13.9. Bias in Oriented Needle Populations • • • • 266 13.10. Generalization of LIS Theory •••••••• 269 13.10.1. KENDALL Projection and Expected Number of Intersections269 13.10.2. General LIS Estimator and its Variance 273 13.10.3. Applications ••• 275 13.11. Line Intersect Subsampling. • • • • • • • • • • 276 x 14 LIST SAMPLING ••••••••••••••••• 280 14. I. Introduction . • • • • • • • . • . • • . 280 14.2. Estimation of Population Total. Variance 282 14.3. Optimum Measure of Size. Comparison with Simple Random Sampling. 284 14.4. Numerical Example. • . • 287 14.5. Two-Stage List Sampling. 292 15 3-P SAMPLING •••• 296 15.1. Introduction. 296 15.2. The Principle of 3-P Sampling. 297 15.3. Variance and Expected Value of Sample Size and its Inverse 301 15.4. Considerations about the Sample Size 302 15.5. GROSENBAUGH's 3-P Estimators 307 15.6. Summary and Conclusions •• 311 15.7. Numerical Example. 313 15.8. List of Equivalent Symbols 317 APPENDIX I. A Family of Sampling Schemes • • • • • 319 2. Permutations, Variations, Combinations 320 3. Stochastic Variables • • • • • • • • • 322 3.1. Stochastic Variables in General. Normal and Standard Norm. Variable 322 3.2. The Chi-Suare Distribution. 325 3.3. STUDENT's t-Distribution. 326 3.4. FISHER's F-Distribution • 327 4. Stochastic Vectors and Some of their Applications 328 Introduction 2 328 Appl.l. Distribution of the Sample Variance s in Simple Random Sampling 331 Appl.2. Distribution of the Pooled Variance in Stratified R.S. 333 Appl.3. Analysis of Variance in Stratified Random Sampling • . 335 Appl.4. Analysis of Variance in 2-Stage Sampling • • • • • • • 336 Appl.5. Proof of STEIN's Method for Estimating Required Sample size. 339 5. Covariance, Correlation, Regression. • • • • • • • • • • • 341 6. The LAGRANGE MUltiplier Method of Optimization • • . • . . 348 7. Expected Value and Variance in Multivariate Distributions. 351 8. Hypergeometric, Multinomial and Binomial Distributions • • 357 9. The Most Precise Unbiased Linear Estimator of a Parameter X, based on a Number of Independent Unbiased Estimates of Different Precision. 362 10. Variance Formulas for Sums, Differences, Products and Ratios 367 II. The Random Forest (POISSON FOREST) • • • • • • • 371 12. Derivation of the Identity used in List Sampling 372 13. Expanding a Function in a TAYLOR Series. 373 14'. About Double Sums. 376 15. Exercises. 378 REFERENCES 393 INDEX. • • 395 CHAPTER 1 SIMPLE RANDOM SAMPLING WITHOUT REPLACEMENT 1.1. Introduction We will consider a population consisting of N elements (named population elements or sampling elements), numbered i I, ••• ,N, from which a sample of size n is drawn. The sample is a subset of size n from N, i.e. the order in which the elements occur in the sample is irrelevant, and "doubles" are not allowed to occur. Examples: a sample of n trees from a stand containing N trees; a sample of n squares from a stand area divided in N equally-large squares; a sample of n subareas from a stand area divided in N subareas of different sizes; a sample of n stands from a forest area consisting of N stands; a sample of n days from a period of N days. In practice the elements for the sample generally are selected suaaessively from the population. This implies that an ordered sample is obtained; subse quently however, this order is ignored. Once drawn, an element is not returned to the population, i.e. the drawing is without replaaement (wtr), in order to avoid "doubles". The elements for the sample (sample elements) seldom are drawn directly from the population, as in doing so the selection may be influenced by their location in the population, the sampler's personal preference, and so on. Generally, in drawing a sample either use is made of a table of random numbers, or the N sam pling elements are substituted e.g. by N numbered marbles or tickets. = To execute the sampling procedure, we will use N marbles, numbered i I, •• • ,N , which are thoroughly mixed in an urn. From the urn, n marbles are selected in succession and without replacement. Then the sample consists of the n popula tion elements that bear the same number as the marbles selected. The selection is random. That is: each of the N marbles has equal probability (viz. lIN) of being the first one to be selected. Once selected, the first sam ple element is not returned to the urn. So each of the remaining N-I marbles has equal probability (viz. I/(N-I» to be selected as the second sample element, and so on. Finally, when we come to selecting the n-th sample element, each of the remaining N-(n-I) marbles in the urn has equal probability of being selec ted, viz. I/(N-n+l) • So the probability of a sample that consists of a specific set of numbers (population elements) arranged in the order in which they were selected, is: