Estimation, Evaluation, and Selection of Actuarial Models1 Stuart A. Klugman 4th Printing, May 15, 20042 1Copyright 2002-2004 by Stuart A. Klugman. Reprinted with permission of the author. 2Changesfor2ndprinting-Onpages15—17theprocedureforextrapolatingbeyondthelastdeathagewas corrected. Exercise 19 and its solution have been replaced. The exercise in the first printing was incorrect. For the 3rd printing - In Theorem 3.33 on page 59 the phrase “covariance matrix” has been changed to “variance” and on page 77 n pˆ has been changed to npˆ . For the 4th printing - Definition 2.4 is changed j j j to note that the kernel could use either a location or a scale change. Exercise 61 has been modified to make it clear that conditional variances are to be calculated. In the solution to Exercise 52, consistency of the sample mean requires only that the first moment exist. ii Contents 1 Introduction 1 2 Model estimation 3 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Estimation using data-dependent distributions . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 The empirical distribution for complete, individual data . . . . . . . . . . . . 6 2.2.3 Empirical distributions for grouped data . . . . . . . . . . . . . . . . . . . . . 9 2.2.4 Empirical distributions for censored and truncated data . . . . . . . . . . . . 12 2.2.5 Kernel smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.6 Estimation of related quantities . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Estimation for parametric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.2 Method of moments and percentile matching . . . . . . . . . . . . . . . . . . 23 2.3.3 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3 Sampling properties of estimators 37 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Measures of quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.2 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.3 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.4 Mean squared error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Variance and confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.2 Methods for empirical distributions . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.3 Information matrix and the delta method . . . . . . . . . . . . . . . . . . . . 54 4 Model evaluation and selection 61 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 A review of hypothesis tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3 Representations of the data and model . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4 Graphical comparison of the density and distribution functions . . . . . . . . . . . . 67 4.5 Hypothesis tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5.1 Kolmogorov-Smirnov test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 iii iv CONTENTS 4.5.2 Anderson-Darling test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5.3 Chi-square goodness-of-fit test . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.6 Selecting a model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.6.2 Judgment-based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.6.3 Score-based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5 Models with covariates 85 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.2 Proportional hazards models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.3 The generalized linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 A Solutions to Exercises 95 B Using Microsoft ExcelTM 131 Chapter 1 Introduction Forsometime,thesubjectsof“LossModels”and“SurvivalModels”havebeenpartoftheactuarial syllabus. The former was most often associated with property and liability insurances while the latter was most often associated with life and disability insurances. However, the goal of these two subjects is the same — the determination of models for random phenomena. The purpose of this Study Note is to present the two subjects in a unified context that indicates that either approach may be appropriate for a given actuarial problem. The difference between Parts 3 and 41 is reflected in the course names. Part 3 is “actuarial models”andtheobjectiveistobeabletoworkwiththemostcommonactuarialmodels. Mostsuch models are random variables, such as the time to death or the amount of an automobile physical damage claim. After mastering the content of Part 3, the student is left wondering where the models come from. That is the purpose of Part 4, “actuarial modeling.” Oneoptionistosimplyannouncethemodel. Yourbossneedsamodelforbasicdentalpayments. You announce that it is the lognormal distribution with µ = 5.1239 and σ = 1.0345 (the many decimal places are designed to give your announcement an aura of precision). When your boss, or a regulator, or an attorney who has put you on the witness stand, asks you how you know that to be so, it will likely not be sufficient to answer that “I just know these things.” It may not even be sufficient to announce that your actuarial friend at Gamma Dental uses that model. An alternative is to collect some data and use it to formulate a model. Most distributional modelshavetwo components. The firstisa name, such as“Pareto.” The second is thesetof values of parameters that complete the specification. Matters would be simpler if modeling could be done inthatorder. Mostofthetimeweneedtofixtheparametersthatgowithanamedmodelbeforewe can decide if we want to use that type of model. Thus, this note begins with parameter estimation, the subject of Chapter 2. By the end of that Chapter, we will have a variety of estimation methods at our disposal. In Chapter 3 we learn how to evaluate the various methods and in particular learn how to measure the accuracy of our estimator. Chapter 4 is devoted to model selection. Two basic questions are answered. The first one is “Is our model acceptable for use?” while the second one is “From our set of acceptable models, which one should we use?” Chapter 5 introduces some new models that were not covered in Part 3. 1The Casualty Actuarial Society refers to its education units as “exams” while the Society of Actuaries refers to them as “courses.” To avoid making a decision, this Note will refer to units as “parts,” a term in favor when the author was going through this process. 1 2 CHAPTER 1. INTRODUCTION Exercises are scattered throughout this Note. Those marked with (*) have appeared on either a past Part 4 Examination (from those given in May and November 2000 and May 2001) or Part 160 Examination of the Society of Actuaries. Solutions to these exercises can be found in Appendix A. When named parametric distributions are used, the parameterizations used are those from Appendices A and B of Loss Models. They will be referred to as LMA and LMB in this Note. InordertosolvemanyoftheproblemsinthisNote, itwillbenecessarytoeithersolveequations (eitheroneequationwithoneunknown,ortwoequationswithtwounknowns)ormaximizefunctions of several variables. At the examination, you can count on being given problems that can be solved on the calculator. In real life, that is not always the case. Some of the examples and exercises cannot be done in any reasonable amount of time by hand. For those, solutions were obtained by using features of Microsoft ExcelTM. Appendix B provides information about using Excel for solving equations and for maximization. Because maximization programs have different degrees of accuracy, you may obtain answers to the examples and exercises that differ slightly from those produced by the author. Do not be concerned. Also, when the author was calculating answers, he often entered rounded answers from intermediate answers and then used them to get the final answer. Your answers may differ slightly if you carry more digits in your intermediate work. The author is grateful to the many people who reviewed earlier drafts of this Note. Special thankstoCliveKeatingewho, inadditiontoprovidinggeneralandspecificguidance, checkedevery calculation and problem solution. I remain responsible for what errors may still exist. Comments, suggestions, or corrections can be e-mailed to me at [email protected] Chapter 2 Model estimation 2.1 Introduction In this Chapter it is assumed that the type of model is known, but not the full description of the model. In the Part 3 Note, models were divided into two types–data-dependent and parametric. The definitions are repeated below. Definition 2.1 A data-dependent distribution is at least as complex as the data or knowledge that produced it and the number of “parameters” increases as the number of data points or amount of knowledge increases. Definition 2.2 A parametric distribution is a set of distribution functions, each member of whichisdeterminedbyspecifyingoneormorevaluescalledparameters. Thenumberofparameters is fixed and finite. For this Note, only two data-dependent distributions will be considered. They depend on the data in similar ways. The simplest definitions for the two types considered appear below. Definition 2.3 The empirical distribution is obtained by assigning probability 1/n to each data point. Definition 2.4 A kernel smoothed distribution is obtained by replacing each data point with a continuous random variable and then assigning probability 1/n to each such random variable. The random variables used must be identical except for a location or scale change that is related to its associated data point. Note that the empirical distribution is a special type of kernel smoothed distribution in which therandomvariableassignsprobabilityonetothedatapoint. ThefollowingSectionswillintroduce an alternative to the empirical distribution that is similar in spirit, but produces different numbers. They will also show how the definition can be modified to account for data that have been altered through censoring and truncation (to be defined later). With regard to kernel smoothing, there are several distributions that could be used, a few of which are introduced in this Chapter. A large number of parametric distributions have been encountered in previous readings. The issue for this Chapter is to learn a variety of techniques for using data to estimate the values of the distribution’s parameters. 3 4 CHAPTER 2. MODEL ESTIMATION Throughout this Note, four examples will used repeatedly. Because they are simply data sets, they will be referred to as Data Sets A, B, C and D. Data Set AThis data set is well-known in the casualty actuarial literature. It was first analyzed in the paper “Some Considerations on Automobile Rating Systems Using Individual Driving Records,” by L. Dropkin in the 1959 Proceedings of the Casualty Actuarial Society (pages 165—176). He collected data from 1956—1958 on the number of accidents by one driver in one year. The results for 94,935 drivers are in the following table. Number of accidents Number of drivers 0 81,714 1 11,306 2 1,618 3 250 4 40 5 or more 7 Data Set B These numbers are artificial. They represent the amounts paid on workers compen- sation medical benefits, but are not related to any particular policy or set of policyholders. These payments are the full amount of the loss. A random sample of 20 payments was taken. 27 82 115 126 155 161 243 294 340 384 457 680 855 877 974 1,193 1,340 1,884 2,558 15,743 Data Set C These observations are similar to Table 2.18 on Page 127 of Loss Models. They represent payments on 227 claims from a general liability insurance policy. Payment range Number of payments 0—7,500 99 7,500—17,500 42 17,500—32,500 29 32,500—67,500 28 67,500—125,000 17 125,000—300,000 9 over 300,000 3 Data Set D These numbers are artificial. They represent the time at which a five-year term insurance policy terminates. For some policyholders, termination is by death, for some it is by surrender (the cancellation of the insurance contract), and for the remainder it is expiration of the five-year period. Two separate versions are presented. For Data Set D1, there were 30 policies observed from issue. For each, both the time of death and time of surrender are presented, provided they were before the expiration of the five-year period. Of course, normally we do not know when policyholders who surrendered would have died had they not surrendered and we do not know when policyholderswhodiedwouldhavesurrenderedhadtheynotdied. Notethatthefinal12policyholders survived both death and surrender to the end of the five-year period. 2.1. INTRODUCTION 5 Policyholder Time of death Time of surrender 1 0.1 2 4.8 0.5 3 0.8 4 0.8 3.9 5 3.1 1.8 6 1.8 7 2.1 8 2.5 9 2.8 10 2.9 4.6 11 2.9 4.6 12 3.9 13 4.0 14 4.0 15 4.1 16 4.8 17 4.8 18 4.8 19-30 For Data Set D2, only the time of the first event is observed. In addition, there are 10 more policyholders who were first observed at some time after the policy was issued. The following table presents the results for all 40 policies. The column headed “First obs.” gives the duration at which the policy was first observed; the column headed “Last obs.” gives the duration at which the policy was last observed; and the column headed “Event” is coded “s” for surrender, “d” for death, and “e” for expiration of the five-year period. Policy First obs. Last obs. Event Policy First obs. Last obs. Event 1 0 0.1 s 16 0 4.8 d 2 0 0.5 s 17 0 4.8 s 3 0 0.8 s 18 0 4.8 s 4 0 0.8 d 19-30 0 5.0 e 5 0 1.8 s 31 0.3 5.0 e 6 0 1.8 s 32 0.7 5.0 e 7 0 2.1 s 33 1.0 4.1 d 8 0 2.5 s 34 1.8 3.1 d 9 0 2.8 s 35 2.1 3.9 s 10 0 2.9 d 36 2.9 5.0 e 11 0 2.9 d 37 2.9 4.8 s 12 0 3.9 s 38 3.2 4.0 d 13 0 4.0 d 39 3.4 5.0 e 14 0 4.0 s 40 3.9 5.0 e 15 0 4.1 s 6 CHAPTER 2. MODEL ESTIMATION 2.2 Estimation using data-dependent distributions 2.2.1 Introduction When observations are collected from a probability distribution, the ideal situation is to have the (essentially) exact1 value of each observation. This case is referred to as “complete, individual data.” This is the case in Data Sets B and D1. There are two reasons why exact data may not be available. One is grouping, in which all that is recorded is the range of values in which the observation belongs. This is the case for Data Set C and for Data Set A for those with five or more accidents. Asecondreasonthatexactvaluesmaynotbeavailableisthepresenceofcensoringortruncation. When data are censored from below, observations below a given value are known to be below that value, but the exact value is unknown. When data are censored from above, observations above a given value are known to be above that value, but the exact value is unknown. Note that censoring effectively creates grouped data. When the data are grouped in the first place, censoring has no effect. For example, the data in Data Set C may have been censored from above at 300,000, but we cannot know for sure from the data set and that knowledge has no effect on how we treat the data. Ontheotherhand, were Data Set Btobecensoredat1,000, we would have fifteenindividual observations and then five grouped observations in the interval from 1,000 to infinity. In insurance settings, censoring from above is fairly common. For example, if a policy pays no more than 100,000 for an accident, any time the loss is above 100,000 the actual amount will be unknown, but we will know that it happened. In Data Set D2 we have random censoring. Consider the fifth policy in the table. When the “other information” is not available, all that is known about the time of death is that it will be after 1.8 years. All of the policies are censored at 5 years by the nature of the policy itself. Also, note that Data Set A has been censored from above at 5. This is more common language than to say that Data Set A has some individual data and some grouped data. When data are truncated from below, observations below a given value are not recorded. Trun- cation from above implies that observations above a given value are not recorded. In insurance settings, truncation from below is fairly common. If an automobile physical damage policy has a per claim deductible of 250, any losses below 250 will not come to the attention of the insurance company and so will not appear in any data sets. Data Set D2 has observations 31—40 truncated from below at varying values. The other data sets may have truncation forced on them. For ex- ample, if Data Set B were to be truncated from below at 250, the first seven observations would disappear and the remaining thirteen would be unchanged. 2.2.2 The empirical distribution for complete, individual data As noted in Definition 2.3, the empirical distribution assigns probability 1/n to each data point. That works well when the value of each data point is recorded. An alternative definition is Definition 2.5 The empirical distribution function is number of observations x F (x)= ≤ n n 1Some measurements are never exact. Ages may be rounded to the nearest whole number, monetary amounts to the nearest dollar, car mileage to the nearest tenth of a mile, and so on. This Note is not concerned with such rounding errors. Rounded values will be treated as if they are exact.
Description: