ebook img

Robust and Non-Robust Models in Statistics PDF

317 Pages·2010·2.263 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Robust and Non-Robust Models in Statistics

Robust and Non-Robust Models in Statistics By Lev B. Klebanov, Svetlozar T. Rachev and Frank J. Fabozzi In: Robust and Non-Robust Models in Statistics (cid:13)c 2009 Nova Science Publishers, Inc. Hauppauge NY USA In this book the authors consider so-called ill-posed problems and stability in statistics. Ill-posedproblemsarecertainresultswherearbitrarysmallchangesinthe assumptions lead to unpredictable large changes in the conclusions. In a companion problem published by Nova, the authors explain that ill-posed problems are not a mere curiosity in the field of contemporary probability. The same situation holds in statistics. The objective of the authors of this book is to (1)identify statistical prob- lems of this type, (2) find their stable variant, and (3)propose alternative versions of numerous theorems in mathematical statistics. The layout of the book is as follows. The authors begin by reviewing the central pre-limit theorem, providing a careful definition and characterization of the limiting distributions. Then, they consider pre-limiting behavior of extreme order statistics and the connection of this theory to survival analysis. A study of statistical applications of the pre-limit theorems fol- lows. Based on these theorems, the authors develop a correct version of the theory of statistical estimation, and show its connection with the problem of the choice of an appropriate loss function. As It turns out,a loss function should not be chosen arbitrarily. As they explain, the availability of certain mathematical conveniences (includingthecorrectnessoftheformulationoftheproblemestimation)leadstorigid restrictions on the choice of the loss function. The questions about the correctness of incorrectness of certain statistical problems may be resolved through appropriate choice of the loss function and/or metric on the space of random variables and their characteristics (including distribution functions, characteristic functions, and densi- ties). Some auxiliary results from the theory of generalized functions are provided in an appendix. ISBN 978-1-60741-768-2 Contents Preface ix I Models in Statistical Estimation Theory 1 1 Ill-posed Problems 3 1.1 Introduction and Motivating Examples . . . . . . . . . . . . . . . . . 3 1.1.1 Two Motivating examples . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Principle idea . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Central Pre-Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Sums of a Random Number of Random Variables . . . . . . . . . . . 11 1.4 Local Pre-Limit Theorems and Their Applications to Finance . . . 12 1.5 Pre-Limit Theorem for Extremums. . . . . . . . . . . . . . . . . . . 13 1.6 Relations with Robustness of Statistical Estimators . . . . . . . . . . 15 1.7 Statistical Estimation for Non-Smooth Densities . . . . . . . . . . . 18 2 Loss Functions and the Restrictions Imposed on the Model 27 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Reducible Families of Functions . . . . . . . . . . . . . . . . . . . . . 28 2.2.1 Weakly reducible families . . . . . . . . . . . . . . . . . . . . 29 2.2.2 Reducible families . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.3 Strongly reducible families . . . . . . . . . . . . . . . . . . . . 34 2.2.4 Reducible and non-reducible families . . . . . . . . . . . . . . 35 2.3 The Classification of Classes of Estimators by Their Completeness Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.1 Completeness properties . . . . . . . . . . . . . . . . . . . . . 36 2.3.2 Loss functions satisfying the four conditions . . . . . . . . . . 39 2.4 An Example of a Loss Function . . . . . . . . . . . . . . . . . . . . . 47 2.4.1 A class of loss functions . . . . . . . . . . . . . . . . . . . . . 48 2.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 v vi Lev B. Klebanov, Svetlozar T. Rachev, and Frank J. Fabozzi 3 Loss Functions and the Theory of Unbiased Estimation 53 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Unbiasedness, Lehmann’s Unbiasedness, and W -Unbiasedness . . . 53 1 3.3 Characterizations of Convex and Strictly Convex Loss Functions . . 56 3.3.1 Regular unbiasedness . . . . . . . . . . . . . . . . . . . . . . 57 3.4 Unbiased Estimation, Universal Loss Functions, and Optimal Subal- gebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.4.1 Unbiased estimators . . . . . . . . . . . . . . . . . . . . . . . 68 3.4.2 w -unbiased estimators . . . . . . . . . . . . . . . . . . . . . 71 1 3.5 Matrix-Valued Loss Functions . . . . . . . . . . . . . . . . . . . . . . 85 3.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4 Sufficient Statistics 89 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2 Completeness and Sufficiency . . . . . . . . . . . . . . . . . . . . . . 89 4.3 Sufficiency When Nuisance Parameters are Present . . . . . . . . . . 94 4.4 Bayes Estimators Independent of the Loss Function. . . . . . . . . . 101 5 Parametric Inference 109 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 Parametric Density Estimation versus Parameter Estimation . . . . 109 5.2.1 Some definitions . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.3 Unbiased Parametric Inference . . . . . . . . . . . . . . . . . . . . . 111 5.3.1 Unbiased estimators of parametric functions and of the density113 5.3.2 Estimating the characteristic and distribution functions . . . 117 5.4 Bayesian Parametric Inference. . . . . . . . . . . . . . . . . . . . . . 118 5.5 Parametric Density Estimation for Location Families . . . . . . . . . 121 5.5.1 The problem of the complexity of estimators . . . . . . . . . 123 6 Trimmed, Bayes, and Admissible Estimators 127 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.2 A trimmed Estimator cannot be Bayesian . . . . . . . . . . . . . . . 127 6.3 Linear Regression Model: Trimmed Estimators and Admissibility . . 129 7 Characterization of Distributions and Intensively Monotone Oper- ators 133 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.2 The Uniqueness of Solutions of Operator Equations . . . . . . . . . . 134 7.3 Examples of Intensively Monotone Operators . . . . . . . . . . . . . 139 7.4 Examples of Strongly E-Positive Families . . . . . . . . . . . . . . . 142 7.5 A Generalization of Cramer’s and P`olya’s Theorems . . . . . . . . . 147 7.6 Random Linear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.7 Some Problems Related to Reliability Theory . . . . . . . . . . . . . 155 Contents vii 7.7.1 Relations of reliabilities of two systems . . . . . . . . . . . . . 155 7.7.2 Characterization by relevation-type equality . . . . . . . . . . 159 7.7.3 Recovering a distribution of failures by the reliabilities of sys- tems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 II Robustness For a Fixed Number Of The Observations 165 8 Robustness of Statistical Models 167 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.3 Robustness in Statistical Estimation and the Loss Function . . . . . 169 8.4 A Linear Method of Statistical Estimation . . . . . . . . . . . . . . . 179 8.5 Polynomial and Modified Polynomial Pitman Estimators . . . . . . . 188 8.6 Non-Admissibility of Polynomial Estimators of Location . . . . . . . 194 8.7 The Asymptotic ε-Admisibility of the Polynomial Pitman’s Estima- tors of the Location Parameter . . . . . . . . . . . . . . . . . . . . . 202 8.7.1 Asymptotic ε-admissibility of a linear estimator . . . . . . . . 202 8.7.2 Asymptotic ε-admissibility of a polynomila Pitman estimator 207 9 Entire Function of Finite Exponential Type and Estimation of Den- sity Function 211 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 9.2 Main Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 9.3 Fourier Transform of the Functions from M . . . . . . . . . . . . . 215 ν,p 9.4 Interpolation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . 217 9.5 Inequality of Different Metrics . . . . . . . . . . . . . . . . . . . . . . 218 9.6 Valle’e Poussin Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 218 III Metric Methods in Statistics 227 10 N-Metrics in the Set of Probability Measures 229 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 10.2 A Class of Positive Definite Kernels in the Set of Probabilities and N-Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 10.3 m-Negative Definite Kernels and Metrics . . . . . . . . . . . . . . . . 233 10.4 Statistical Estimates Obtained by the Minimal Distances Method . . 237 10.4.1 Estimating a location parameter, I . . . . . . . . . . . . . . . 237 10.4.2 Estimating a location parameter, II . . . . . . . . . . . . . . 240 10.4.3 Estimating a general parameter . . . . . . . . . . . . . . . . . 241 10.4.4 Estimating a location parameter, III . . . . . . . . . . . . . . 243 10.4.5 Semiparametric estimation . . . . . . . . . . . . . . . . . . . 244 viii Lev B. Klebanov, Svetlozar T. Rachev, and Frank J. Fabozzi 11 Some Statistical Tests Based on N-Distances 245 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 11.2 Multivariate Two-Sample Test . . . . . . . . . . . . . . . . . . . . . 245 11.3 Test for Two Distributions to Belong to the Same Additive Type . . 248 11.4 Some Tests for Observations to be Gaussian . . . . . . . . . . . . . . 250 11.5 A Test for Closeness of Probability Distributions . . . . . . . . . . . 252 A Generalized Functions 255 A.1 Main definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 A.2 Definition of Fourier Transform for Generalized Functions . . . . . . 260 A.3 Functions ϕ and ψ . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 ε ε B Positive and Negative Definite Kernels and Their Properties 267 B.1 Definitions of Positive and Negative Definite Kernels . . . . . . . . . 267 B.2 Examples of Positive Definite Kernels . . . . . . . . . . . . . . . . . 271 B.3 Positive Definite Functions . . . . . . . . . . . . . . . . . . . . . . . 274 B.4 Negative Definite Kernels . . . . . . . . . . . . . . . . . . . . . . . . 275 B.5 Coarse Embeddings of Metric Spaces into Hilbert Space . . . . . . . 279 B.6 Strictly and Strongly Positive and Negative Definite Kernels . . . . . 280 Bibliography 285 Authors’ Index 303 Index 305 Preface Wikipedia, the free online encyclopedia, defines robustness as “the quality of being able to withstand stresses, pressures, or changes in procedureorcircumstance. Asystem, organismordesignmaybesaidto be ’robust’ if it is capable of coping well with variations (sometimes un- predictable variations) in its operating environment with minimal dam- age, alteration or loss of functionality.” With respect to the definition in the field of statistics, robustness is defined as follows in Wikepedia: “A robust statistical technique is one that performs well even if its as- sumptionsaresomewhatviolatedbythetruemodelfromwhichthedata were generated.” Of course, this definition uses some undefined terms, namely it is not clear what is meant by “somewhat violated”. What kind of violations are considered minor, and what are considered major? To apply the notion of robustness, we need to have a way to measure such violations. Moreover, in the second definition above what is meant by “performs well”? Again, we have to define a measure of good (or bad) behavior for a statistical procedure. Of course, in statistics, there are different ways to measure the violation for the true distribution as well as for the quality of the behavior of a statistical procedure. Therefore, one can use different definitions of robustness based on how one decides to measure violations and the quality of a statistical procedure. A class of the most popular robust models in statistical estimation theory was introduced by Peter Huber (1981). His models allow one to “defend” statistical inference from contaminations (that is, the violations are defined as small contami- nations of a theoretical distribution by an unknown distribution), while the quality of a statistical estimator is measured by the asymptotical variance of the estimator. This means that the mathematical definition of this property is applicable for the case of large samples only. Classical statistical models are non-robust in an asymp- totical sense. Consequently, although the presence of some contaminations may dramatically affect asymptotic characteristics of corresponding statistical inferences ix x Lev B. Klebanov, Svetlozar T. Rachev, and Frank J. Fabozzi in classical models, it is not clear how robust they are for a fixed number of obser- vations, and for other classes of violations from the theoretical model. Statistical procedures based on Huber robustness usually ignore some observations which seem to be too large or too small. But such observations may come from the true model and probably may give us essential information on the phenomenon being investi- gated. In this situation, the use of Huber robust procedures may lead to wrong conclusions on the phenomenon. Our goal in the book is to study how to modify classical models in order to have some robust properties of statistical procedures based on not too large a number of observations,whiletheofviolationsaresmallinaweakmetricinthespaceofproba- bilities. For a fixed number of observations, one cannot use the asymptotic variance as a characteristic of the quality of a statistical procedure and, therefore, it is inter- esting to decribe all the characteristics one can use instead, and what properties of the corresponding statistical procedures one will have. It is quite clear that ques- tions regarding the robustness or non-robustness of certain statistical problems may be resolved through appropriate choices of the loss function and/or metric on the spaceofrandomvariablesandtheircharacteristics, includingdistributionfunctions, characteristics functions, and densities. We will describe the loss function leading to some natural properties of classes of statistical estimators, such as the completeness of the class of all non-randomized estimators, or the completeness of the class of all symmetric estimators in the case of identical and independently distributed (i.i.d.) observations. We then choose loss functions connected to robust models. Sufficient statistics allow one to make a reduction of data without any loss of information. We study the notion of sufficient statistics for the models with nuisance parameters. The book is organized as follows. In Chapter 1, we consider so-called ill-posed problems in statistics and probability theory. Ill-posed problems are usually under- stood as certain results where small changes in the assumptions lead to arbitrary large changes in the conclusions. The notion of ill-posed problems is opposite to one of well-posed problems. In his famous paper, Jacques Hamadard (1902) argued that problems that are physically important are both solvable and uniquely solv- able. Today, the notion of well-posed problems can be expressed as a problem that is uniquely solvable and the solution itself is dependent upon data in a continuous way (i.e., a continuous function of the data). This makes the notion of well-posed problems close to that of robust models. In contrast, an ill-posed problem is one in which the solution is dependent on the data in a discontinuous way such that small errors in the data generate large differences in the solution, and, of course, ill-posed problems are connected to non-robust models. These errors can be caused by measurement errors, perturbations that are the result of noise in the data, or even computational rounding errors. In other words, an ill-posed problem is one for which there is no solution or the solution is unstable when the data contain small errors. In Hadamard’s view, ill-posed problems were artificial because such problems were incapable of describing physical systems. Preface xi Nowadays we see that ill-posed problems arise in the form of inverse problems in mathematical physics and mathematical analysis, as well as in such fields as geophysics,acoustics,electrodynamics,tomography,medicine,ecology,andfinancial mathematics. Often, the ill-posedness of certain practical models is due to the lack of their precise mathematical formulation. For example, it can be connected to an improper choices of the topology, in which dependency of the solution on the data is not continuous. For another choice of the topology, this dependence will be continuous. Such a situation, for example, is encountered in tomography (see, Klebanov, Kozubowski, and Rachev (2006)). In Chapter 1, we consider some ill-posed problems in probability, and give their well-posed versions. Among such results provided in the chapter are the central pre-limit theorem for sums of i.i.d. random variables and the pre-limit theorem for extremums. The objective of pre-limit theorems is to avoid considerations due to a large number of i.i.d. random variables by using some fixed number of them. The problem of how to measure the quality of a statistical procedure is covered in Chapter 2. For that purpose, one can define a loss function, and then use the mean loss as the risk of a statistical procedure. Usually, the choice of a loss function in statistical estimation theory seems to be a subjective procedure. But in the chapter, we attempt to demonstrate that the choice of a loss function is by no means subjective. The loss function is defined by such desirable properties as the completeness of the class of all symmetric statistics as estimators of parameters for thecaseofi.i.d. observations,completeuseofinformationcontainedinobservations, and other natural properties. As demonstrated in the chapter, many classical loss functions lead to non-robust statistical models, while some small modifications lead to robustness with respect to different classes of violations. In Chapter 3, we study problems that are similar to those studied in Chapter 2, but for some classes of unbiased estimators. Both classical and Lehmann definitions of unbiasedness are considered. It appears that the unbiasedness property is rather restrictive, and the class of loss functions leading to stable models is small. We propose employing two loss functions instead of one. The first loss function is used to measure the risk of a statistical procedure, while the second is used to define the corresponding (generalized) unbiasedness property. We describe all such pairs of loss functions possessing the property of completeness of some classes of natural statistical procedures. The definitions and properties of sufficient statistics and their modifications for the case of the models with nuisance parameters are given in Chapter 4. In that chapter, we describe a family of distribution which possess the “universal” Bayes estimator, that is, a Bayes estimator that is independent of the choice of the loss function. Chapter 5 discusses the theory of parametric estimation of density functions, characteristics functions, and distribution functions. Here we see that it is sufficient to find a good estimator for the density function only. For other characteristics, xii Lev B. Klebanov, Svetlozar T. Rachev, and Frank J. Fabozzi including the parameters of the distribution, we may generate estimators as corre- sponding functionals of the density estimator. In Chapter 6, we consider some connection between the properties of optimality of statistical estimators and their robustness in the Huber sense. The description of all distributions possessing some desirable properties is the mainproblemofstatisticalcharacterizationtheory(seeKagan,Linnik,Rao (1972)). In Chapter 7, we describe one method of characterizing probability distributions. The method uses so-called intensively monotone operators, allowing one easily to prove the uniqueness of the solution of a wide class of functional equations. Some connections between different definitions of robustness of statistical mod- els, robustness in the Huber sense, and the properties of the loss function are the topics covered in Chapter 8. In that chapter we proffer methods of robust (in dif- ferent senses) estimation. Chapter 9 gives some analytical tools for working with a wide class of heavy- tailed distributions. Here we provide some approximations based on an application of the class of entire functions of the finite exponential type. The use of such types of approximations is especially good for nonparametric density estimation. Finally, in Chapters 10 and 11, we study metric methods in statistics. This metric approach is especially convenient when the metric used for the construction of estimators is also used to define the measure of violations from the true model. Such methods provide a large class of robust estimators. Metric methods lead to a family of statistical tests, such as two-sample test, test of whether two distributions havethesameadditivetype, testofstabilityofadistribution, andmultidimensional two-sample test. There are two appendices. Some auxiliary results from the theory of general- ized functions are provided in Appendix A. Appendix B contains some elementary definitions and properties of positive and negative definite kernels, that are used in Chapter 11. Lev B. Klebanov Svetlozar T. Rachev Frank J. Fabozzi March 2009

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.