ebook img

Statistical Theory and Computational Aspects of Smoothing: Proceedings of the COMPSTAT ’94 Satellite Meeting held in Semmering, Austria, 27–28 August 1994 PDF

272 Pages·1996·5.334 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical Theory and Computational Aspects of Smoothing: Proceedings of the COMPSTAT ’94 Satellite Meeting held in Semmering, Austria, 27–28 August 1994

Contributions to Statistics V. FedorovlW.O. Miillerll.N. Vuchkov (Eds.) Model-Oriented Data Analysis, XTIl248 pages, 1992 1. Antoch (Ed.) Computational Aspects of Model Choice, VIIl285 pages, 1993 w.o. MiillerlH.P. Wynn/A.A. Zhigljavsky (Eds.) Model-Oriented Data Analysis, XIII1287 pages, 1993 P. MandllM. Hu§kova (Eds.) Asymptotic Statistics Xl474 pages, 1994 P. DirschedllR. Ostermann (Eds.) Computational Statistics VW553 pages, 1994 C.P. KitsosIW.O. Miiller (Eds.) MODA4 - Advances in Model-Oriented Data Analysis, XlV/297 pages, 1995 H. Schmidli Reduced Rank Regression, Xl179 pages, 1995 Wolfgang HardIe Michael G. Schimek (Eds.) Statistical Theory and Computational Aspects of Smoothing Proceedings of the COMPSTAT '94 Satellite Meeting held in Semmering, Austria 27-28 August 1994 With 63 Figures Physica-Verlag A Springer-Verlag Company Series Editors Werner A. MUller Peter Schuster Editors Professor Dr. Wolfgang HardIe Center for Computational Statistics Institut fUr Statistik. und Okonometrie Wirtschaftswissenschaftliche Fakultiit Humboldt-Universitat zu Berlin Spandauer StraBe 1 D-10178 Berlin, Germany Professor Dr. Dr. Michael G. Schimek Medical Biometrics Group University of Graz Medical Schools Auenbruggerplatz 301IV A-8036 Graz, Austria Co-sponsored by the International Association of Statistical Computing ISBN-13: 978-3-7908-0930-5 e-ISBN-13: 978-3-642-48425-4 DOl: 10.1007/978-3-642-48425-4 Die Deutsche Bibliothek - CIP-Einheitsaufnahme Statistical theory and computational aspects of smoothing: proceedings of the COMPSTAT '94 satellite meeting held in Semmering, Austria, 27-28 August 1994/wolfgang Hardie; Michael G. Schimek (ed.). With contributions by J.S. Marron ... With discussions by P. Hall... [Co-spon sored by the International Association of Statistical Computing]. - Heidelberg: Physica-Veri., 1996 (Contributions to statistics) ISBN-13: 978-3-7908-0930-5 NE: Hardie, Wolfgang [Hrsg.]; Marron, James S.; COMPSTAT <11,1994, Wien> This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, reci tation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Physica-Verlag. Violations are liable for prosecution under the German Copyright Law. © Physica-Verlag Heidelberg 1996 The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. SPIN: 10534328 88/2202-5 4 3 2 1 0 - Printed on acid-free paper Preface The COMPSTAT '94 Satellite Meeting on Smoothing - Statistical Theory and Computational Aspects was held on August 27-28, 1994 at the hotel" Erzherzog Johann" in Semmering, Austria. It was the first meeting of this kind and scheduled immediately after the COMPSTAT '94 Symposium in Vienna, Austria. The meeting was hosted by the Karl Franzens Universitat Graz in Graz, Austria, and organized by its Medical Biometrics Group of the Medical Schools. The European Section of the International Association of Sta tistical Computing (IASC), part of the International Statistical Institute (lSI), co-sponsored it. Members of the Scientific Programme Committee were R. L. Eubank, L. Gyorfi, W. Hardie and M. G. Schimek (chairman). Session chairs included A. de Falguerolles, R. L. Eubank, T. Gasser, L. Gyorfi, W. Hardie, R. Kohn, M. G. Schimek and A. van der Linde. The emphasis of the scientific programme was on invited introductory papers, as well as on invited and/or discussed papers presenting the state of-the-art of statistical theory and computational aspects with respect to smoothing. There were also a number of contributed papers and presenta tions of relevant software (S-Plus and XploRe). Parallel session were avoided and the number of participants was limited to fifty to provide an atmosphere for informal discussions of new ideas. The main topic of discussion was re cent developments in local regression. The meeting attracted almost fifty scientists from all over the world. Here we present a selection of papers read at the COMPSTAT '94 Satellite Meeting on Smoothing. Only manuscripts not published else where were considered for this volume. All contributions had to undergo a regular review process as usually carried out in scientific journals. Ten of twentysix papers given at the meeting are published here. In addition we have an expository discussed paper by J. S. Marron who was not able to attend. The two contributions on local regression by W. S. Cleveland and C. Loader, B. Seifert and T. Gasser, respectively, were also open to written discussion, printed in this volume together with the rejoinders of the authors. The other papers deal with Bayesian nonparametric regression, invariance prob lems with smoothing splines, variance estimation, cross-validation, extreme percentile regression, additive models and nonlinear principal components analysis. We are grateful to the authors, discussants, referees and the editorial assistant who worked together to make the outcome of this successful meeting accessible to a greater audience interested in topics related to smoothing. Wolfgang Hardie and Michael G. Schimek, Editors Berlin and Graz, December 1995 Contents 1 A Personal View of Smoothing and Statistics 1 J.S. Marron 2 Smoothing by Local Regression: Principles and Methods 10 W.S. Cleveland and C. Loader 3 Variance Properties of Local Polynomials and Ensuing Mod- ifications 50 B. Seifert and Th. Gasser 4 Comments 80 P. Hall and B.A. Turlach 5 Comments 85 M.C. Jones 6 Comments 88 C. Thomas-Agnan 7 Comments 93 S.J. Sheater, M.P. Wand, M.S. Smith and R. Kohn 8 Rejoinder 103 J.S. Marron 9 Rejoinder 113 W.S. Cleveland and C. Loader 10 Rejoinder 121 B. Seifert and Th. Gasser 11 Robust Bayesian Nonparametric Regression 128 C.K. Carter and R. Kohn VIII 12 The Invariance of Statistical Analyses with Smoothing Splines with Respect to the Inner Product in the Reproducing Ker- nel Hilbert Space 149 A. van der Linde 13 A Note on Cross Validation for Smoothing Splines 165 G.P. Neubauer and M.G. Schimek 14 Some Comments on Cross-Validation 178 B. Droge 15 Extreme Percentile Regression 200 O. Rosen and A. Cohen 16 Mean and Dispersion Additive Models 215 R.A. Rigby and M.D. Stasinopoulpos 17 Interaction in Nonlinear Principal Components Analysis 231 G.D. Costanzo and J.L.A. van Rijckevorsel 18 Nonparametric Estimation of Additive Separable Regression Models 247 R. Chen, W. HardIe, O.B. Linton and E. Severance-Lossin A Personal View of Smoothing and Statistics J.S. Marron Department of Statistics, University of North Carolina, Chapel Hill, NC 27599-3260, USA Summary Personal views are offered on the foundations of smoothing methods in statistics. Key points are that smoothing is a useful tool in data analysis, and that the field of data based smoothing parameter selection has matured to the point that effective methods are ready for use as defaults in software packages. A broader lesson available from the discussion is: a combination of computational methods and mathematical statistics is a powerful research tool. Keywords: bandwidth selection, kernel estimation, nonparametric curve estimation, smoothing 1 Introduction This paper contains some personal views on smoothing and its role in statis tics. These views have been shaped by interaction with many people having very diverse opinions. These people are both active methodological statisti cians, and also interested non-statisticians. The statisticians include "main stream smoothers", but many others as well. An important component of my views comes from investigation of the fact that there are many very di vergent, yet strongly held, opinions of "smoothing insiders" on issues such as what is the "right" choice of smoothing method. Section 2 contains what I have learned about choice of smoothing method. Different people have made, and will continue to make, different choices. The main point is that the great majority of choices made are sensible, be cause there are many noncomparable factors involved in this choice. Diverse personal weightings of these factors result in choices that are different, but equally valid when viewed from an overall perspective. Section 3 discusses modern bandwidth selection. The main lesson is that recent research has resulted in methods that are quite effective in finding "the 2 right amount of smoothing". They are far better than the current defaults in most software packages, and in the case of kernel density estimation are ready for use in that role. The discussion of research in bandwidth selection illustrates some impor tant ideas on methodological statistics in general. In Section 4 it is suggested that a combination of mathematical and computational tools provides a par ticularly powerful approach. A final point is: "smoothing is a useful method of analyzing data". Rea sons for not losing sight of this point are given in Section 5. 2 Choice of smoothing method A frequent occurrence is that a scientist consults a statistician (or a non smoothing statistician goes to a "smoother") and says "I have heard about this smoothing stuff and would like to try it, how do I do it?" A number of different answers could be given depending largely upon whom is asked. The recommended method could be any of: Ml a kernel - local polynomial method. M2 a smoothing spline. M3 a regression - B spline (much different from b). M4 a Fourier smoother. M5 a wavelet method. M6 LO(W)ESS. All of these methods have their firm adherents, who enjoy espousing the strengths of their personal favorite. It is interesting that the answers can be so different. Much personal discussion with involved parties has yielded the personal conclusion that everybody is right in their different choices. This seemingly contradictory statement is possible because there are several noncomparable factors involved. In particular, factors that go into the choice of smoother include: Fl availability (is it "right there" in your software package?). F2 interpretability (what does the smooth tell us about the data?). F3 statistical efficiency (how close is the smooth to "the true curve"?) F4 quick computability. F5 integrability into general frameworks (e.g. SjS+). 3 F6 ease of mathematical analysis. All of the methods Ml - M6 listed above have differing strengths and weaknesses in these divergent senses. None of these methods dominates any other in all of the senses Fl - F6. Since these factors are so different, almost any method can be "best", simply by an appropriate personal weighting of the various factors involved. Claims of "wretched performance" are also valid, for any of the above methods, when one adopts a suitably narrow viewpoint. This is not the place to give a complete listing of the relative strengths and weaknesses of M1 - M6, but I shall discuss one case in more detail. The simplest possible naive fixed bandwidth local constant kernel methods have met with substantial criticism. This is justified, because they are relatively weak with respect to F3. I suspect that a contributing factor to this criticism is the popularity of such methods in the theoretical literature, much of which is due to strength in F6. But from a broader viewpoint, this criticism does not provide sufficient reason to rule out their use. In particular, these methods are the best (in my opinion) at F2. Fixed bandwidth local constant kernel methods put the interested analyst in closest possible intuitive contact with the data, because they are simple, understandable, local averages. Note that I am not advocating this estimator as the solution to all problems (in fact it has its drawbacks as does every method, and there are situations where other methods will be much more desirable), but instead am merely pointing out it cannot be dismissed out of hand. While I appreciate and respect researchers' enthusiasm for their more recently developed methods, I suggest a healthy skepticism be employed with respect to claims of solving all problems. Every method has its shortcomings, and it is important to be aware of these before the method is used. 3 Data based bandwidth selection The performance of all smoothing methods is crucially dependent on the specification of a smoothing parameter. This choice is a deep issue. While many things are known about "optimal bandwidth selection", they should be kept in proper perspective. There are two important issues to keep in mind. One is that in some situations there is not a single choice which gives the data analyst all the information available in the data. The other is that the classical mathematical definitions of "best" can be quite different from "useful for data analysis". See Marron and Tsybakov (1995) for reasons behind this, and some approaches to addressing this problem .. For these reasons the most useful method of choosing the amount of smoothing is interactive trial and error by an experienced analyst. But this has three important weaknesses . • It is time consuming. Many statisticians work under time pressure, and are unable to carefully choose a smoothing parameter for every data set

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.