STATISTICAL INFERENCE AND PREDICTION IN CLIMATOLOGY: A BAYESIAN APPROACH METEOROLOGICAL MONOGRAPHS Volume 1 No. 1 Wartime Developments in Applied Climatology, 1947 (Out of Print) No. 2 The Observations and Photochemistry of Atmospheric Ozone, 1950 (Out of Print) No. 3 On the Rainfall of Hawaii, 1951 (Out of Print) No. 4 On Atmospheric Pollution, 1951. ISBN 0-933876-00-9 No. 5 Forecasting in Middle Latitudes, 1952 (Out of Print) Volume 2 No. 6 Thirty-Day Forecasting, 1953. ISBN 0-933876-01-7 No. 7 The Jet Stream, 1954. ISBN 0-933876-02-5 No. 8 Recent Studies in Bioclimatology, 1954. ISBN 0-933876-03-3 No. 9 Industrial Operations under Extremes of Weather, 195 7. ISBN 0-933876-04-1 No. 10 Interaction of Sea and Atmosphere, 1957. ISBN 0-933876-05-X No. 11 Cloud and Weather Modification, 1957. ISBN 0-933876-06-8 Volume 3 Nos. 12-20 Meteorological Research Reviews, 1957. Review of Climatology. Meteorological In struments. Radiometeorology. Weather Observations, Analysis and Forecasting. Applied Meteorology. Physics of the Upper Atmosphere. Physics of Clouds. Physics of Precip itation. Atmosphere Electricity Bound in One Volume. ISBN 0-933876-07-6 Volume 4 No. 21 Studies of Thermal Convection, 1959. ISBN 0-933876-09-2 No. 22 Topics in Engineering Meteorology, 1960. ISBN 0-933876-10-6 No. 23 Atmospheric Radiation Tables, 1960. ISBN-0933876-11-4 No. 24 Fluctuations in the Atmospheric Inertia, 1961. ISBN 0-933876-12-2 No. 25 Statistical Prediction by Discriminant Analysis, 1962. ISBN 0-933876-13-0 No. 26 The Dynamical Prediction of Wind Tides of Lake Erie, 1963. ISBN 0-933876-15-7 Volume 5 No. 27 Severe Local Storms, 1963. Paperbound, ISBN 0-933876-17-3 Volume 6 No. 28 Agricultural Meteorology, 1965. Paperbound, ISBN 0-933876-19-X; Clothbound, ISBN 0-933876-18-1 Volume 7 No. 29 Scattered Radiation in the Ozone Absorption Bands at Selected Levels of a Terrestrial, Rayleigh Atmosphere, 1966. Paperbound, ISBN 0-933876-22-X; Clothbound, ISBN 0-933876- 21-1 VolumeS No. 30 The Causes of Climatic Change, 1968. ISBN 0-933876-28-9 Volume 9 No. 31 Meteorological Investigations of the Upper Atmosphere, 1968. ISBN 0-933876-29-7 Volume 10 No. 32 On the Distribution and Continuity of Water Substance in Atmospheric Circulations, 1969. ISBN 0-933876-30-0 Volume 11 No. 33 Meteorological Observations and Instrumentation, 1970. ISBN 0-933876-31-9 Volume 12 No. 34 Long-Period Global Variations of Incoming Solar Radiation, 1972. ISBN 0-933876-37-8 Volume 13 No. 35 Meteorology of the Southern Hemisphere, 1972. ISBN 0-933876-38-6 Volume 14 No. 36 Alberta Hailstorms, 1973. ISBN 0-933876-39-4 Volume 15 No. 37 The Dynamic Meteorology of the Stratosphere and Mesosphere, 1975. ISBN 0-933876-41-6 Volume 16 No. 38 Hail: Review of Hail Science and Hail Suppression, 1977. ISBN 0-933876-46-7 Volume 17 No. 39 Solar Radiation and Clouds, 1980. ISBN 0-933876-49-1 Volume 18 No. 40 METROMEX: A Review and Summary, 1981. ISBN 0-933876-52-1 Volume 19 No. 41 Tropical Cyclones-Their Evolution, Structure and Effects, 1982. ISBN 0-933876-54-8 Volume 20 No. 42 Statistical Inference and Prediction in Climatology: A Bayesian Approach, 1985. ISBN 0-933876-62-9 Orders for the above publications should be sent to: THE AMERICAN METEOROLOGICAL SOCIETY 45 Beacon St., Boston, Mass. 02108 METEOROLOGICAL MONOGRAPHS Volume 20 September 1985 Number 42 STATISTICAL INFERENCE AND PREDICTION IN CLIMATOLOGY: A BAYESIAN APPROACH EdwardS. Epstein CLIMATE ANALYSIS CENTER NATIONAL METEOROLOGICAL CENTER NWS/NOAA WASHINGTON, D.C. American Meteorological Society ISSN 0065-940 I ISBN 978-1-935704-27-0 (eBook) DOI 10.1007/978-1-935704-27-0 American Meteorological Society 45 Beacon Street, Boston, Massachusetts Table of Contents l. INTRODUCTION 1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I 1.2 Probability, the Language of Uncertainty . . . . . . . . . . . . 3 1.3 Stochastic Processes and Climate Prediction . . . . . . . . . 6 1.4 Final Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I 0 2. SOME FUNDAMENTALS OF PROBABILITY 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II 2.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Expectations and Moments . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Joint, Marginal and Conditional Probabilities . . . . . . . . 18 2.6 Bayes' Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2. 7 Sufficient Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.8 Conjugate Distributions; Prior and Posterior Parameters 25 3. BERNOULLI PROCESSES 3.1 Definition of a Bernoulli Process . . . . . . . . . . . . . . . . . . 29 3.2 Distributions of Sufficient Statistics . . . . . . . . . . . . . . . . 31 3.3 Prior and Posterior Probabilities . . . . . . . . . . . . . . . . . . . 33 3.4 Conjugate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5 Selecting Prior Parameters . . . . . . . . . . . . . . . . . . . . . . . . 40 3.6 Predictions of Future Results . . . . . . . . . . . . . . . . . . . . . 47 3. 7 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4. POISSON PROCESSES 4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Distributions of Sufficient Statistics . . . . . . . . . . . . . . . . 54 4.3 Conjugate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Selection of Prior Parameters . . . . . . . . . . . . . . . . . . . . . 62 4.5 Predictive Distributions and Probabilities . . . . . . . . . . . 67 4.6 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5. NORMAL DATA-GENERATING PROCESSES 5.1 Normal Distributions and the Central Limit Theorem 77 5.2 Sufficient Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3 Bivariate Prior and Posterior Densities: Prior and Posterior Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.4 Conjugate Density, Precision Known . . . . . . . . . . . . . . . 82 5.5 Predictive Distribution, Precision Known . . . . . . . . . . . 84 5.6 An Example: Normal Data-Generating Process with Pre- cision Known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.7 Conjugate Distribution, Precision Unknown . . . . . . . . . 88 5.8 The Normal-Gamma Distribution: Marginal and Con- ditional Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.9 Predictive Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.10 An Example of Inference and Prediction: Normal Data- Generating Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6. NORMAL LINEAR REGRESSION 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2 Sufficient Statistics for Simple Linear Regression . . . . . 107 6.3 Diffuse Prior-Simple Linear Regression . . . . . . . . . . . . 109 6.4 Simple Linear Regression with a Nondiffuse Conjugate Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.5 Predictive Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.6 An Example: Normal Simple Linear Regression 123 7. FIRST-ORDER AUTOREGRESSION 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.2 First-Order Normal Autoregression . . . . . . . . . . . . . . . . 140 7.3 Inferences and Predictions . . . . . . . . . . . . . . . . . . . . . . . . 142 7.4 A Numerical Example: Annual Streamflow . . . . . . . . . . 146 7.5 Comments on Computational Methods . . . . . . . . . . . . . 151 7.6 Results When the Prior Is Relatively Uninformative . . 155 7. 7 Results When the Prior Is Informative . . . . . . . . . . . . . . 161 Appendix A: SUMMARY OF BASIC INFORMATION ON PROBABILITY DISTRIBUTIONS ENCOUN TERED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Appendix B: SELECfED TABLES OF PROBABILITY DIS TRIBUTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Appendix C: FORTRAN PROGRAM TO IMPLEMENT EX AMPLE GIVEN IN CHAPTER 7 . . . . . . . . . . . . 191 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Chapter 1 Introduction 1.1 OBJECTIVE The objective of this monograph is to introduce to the climatological and meteorological community a set of statistical techniques for making predictions about events that can at best be described as being the output of a stochastic process. These techniques are especially useful when one's knowledge of the system is incomplete and there is only limited empirical evidence; these are situations where most of the more widely known ap proaches are oflittle help. The techniques themselves are very general, but they will be presented here in the context of climatological and meteoro logical applications, especially the former. Most applications of climatological information involve, in one way or another, predictions. Climatological predictions are not based on detailed projections of the evolution of weather events, but rather on knowledge and empirical evidence of the collective behavior of weather (or climate) at time scales that extend beyond the limit within which weather events are predictable in any detail. The ultimate objective of (short-range) me teorological predictions, however unattainable, is totally accurate forecasts of weather at specific times and locations. In the usual climatological con text, the ultimate is a description of what may be expected, and how likely 1.1 2 CHAPTER 1 or unlikely various alternatives are. Whether generated by statistical or physical methods, climatological predictions are inherently uncertain. Even if we discovered a "perfect" climate prediction technique, our predictions would still necessarily be imperfect, because inherently unpredictable weather events will give rise to a background of "noise" that cannot be avoided. This concept of noise due to weather has been utilized, especially by Madden (1976}, to estimate the limits of predictability of monthly and seasonal climate. Thus the language of climatological predictions must be probability, even in the best of circumstances when the "climate" is very well-known. When the "climate" is not so well-known, either because of our lack of understanding or because of insufficient empirical information, then there is even more reason to turn to probabilities to express that uncertainty. If the empirical evidence is very substantial, and directly relevant to the situation for which a prediction is desired, then the problem is an easy one. The necessary frequencies are simply extracted from the data and interpreted as predictions. For example, if one is interested in the maximum temperatures to be expected next July on an experimental farm in a remote rural location for which a long and homogeneous climatological record is available, the relative frequencies of maximum temperatures in past Julys are examined and accepted as the probabilities of what will happen in the next year. But now consider how to deal with the same situation when the farm is being moved to a new location, and for purposes of experimental design one wants to know how the July maximum temperatures at the new location will differ from those at the old location. If the move has been well planned, measurements at the new location for the last year or two may be available. How is this information used to make a credible prediction? How does one at the same time use more qualitative information about location, drainage, land use, etc., that tells the trained climatologist a good deal about the likely differences between the two locations? This is clearly a more difficult problem than the former one, and it cannot be solved uniquely. But there do exist methods that optimize the combination of these partial sources of information: the climatologist's useful although incom plete knowledge on the one hand, and the too-short empirical record on the other. We will describe a series of such methods; while they are not an exhaustive set, they cover a wide sampling of situations with which the climatologist must deal. These are not methods that can be mechanically applied. They do involve rigorous procedures and manipulations, but they also rely particularly strongly on the judgement and expertise of the prac titioner. Although there will be some very useful applications to the drawing INTRODUCTION 3 of inferences when large quantities of relevant data are available, the em phasis will always be on situations where data are relatively scarce. The limit of "relatively scarce" data is no data at all. In that limiting condition, all that is left to the climatologist who is required to make a prediction is his or her judgement and expertise. The methods we will describe allow for a smooth transition from no data to ample data. They indeed are applicable when no data are available. This is the case when the need for the climatologist to be able to quantify his or her judgement is most critical. The climatologist can learn from the formal developments described later how to-we believe better and more systematically-express his or her judgements when data are absent or limited. The predictions generated with no or meager data will not warrant as much confidence as those based on substantial empirical evidence-that will be quite clear. But it will also be clear how much can be gained from additional obser vations. We will not extend our analysis in this monograph into the im portant question of the value of additional observations. Instead we will limit the scope of our treatment. The goal is to produce useful predictions that are consistent with one's best judgements, and to allow consistent revisions of such judgements as data do become available. 1.2 PROBABILITY, THE LANGUAGE OF UNCERTAINTY We use probability to express and to quantify our uncertainty. For the most part the concepts we employ correspond quite closely to our intuitive notions of what we mean when we use the term in our everyday language. We insist on adhering to certain formal rules for assigning and manipulating probabilities, but these are in general necessary to ensure that we will be consistent in applying the basic concepts of probability under circumstances that are occasionally quite complicated. Probabilities are real numbers in the interval between 0 and l (inclu sive) that are associated with "events" or "occurrences". If S represents the set of all possible events (the sample space), then the probability we assign to S, P{S} is one. Iftwo events are mutually exclusive (their inter section cannot occur-it has probability zero), then the probability of at least one of the two events occurring (the probability of their union) is the sum of the probabilities of the two individual events. There are numerous texts that discuss these basic axioms for dealing with probabilities and the consequent rules for manipulating probabilities of compound and condi tional events. We will not try to repeat such a development here, but in order to understand the developments and discussions that follow, the reader should be quite familiar with the basic rules for manipulating prob abilities. In Chapter 2 we will review some of the formal concepts and