Lecture Notes Editorial Policies Lecture Notes in Statistics provides a fonnat for the Series Editors: infonnal and quick publication of monographs, case Professor P. Bickel studies, and workshops of theoretical or applied Department of Statistics importance. Thus, in some instances, proofs may be University of California merely outlined and results presented which will later Berkeley, California 94720 be published in a different form. USA Publication of the Lecture Notes is intended as a ser Professor P. Diggle vice to the international statistical community, in that Department of Mathematics a commercial publisher, Springer-Verlag, can provide Lancaster University efficient distribution of documents that would other Lancaster LAI 4YL wise have a restricted readership. Once published and England copyrighted, they can be documented and discussed in the scientific literature. Professor S. Fienberg Department of Statistics Lecture Notes are reprinted photographically from the Carnegie Mellon University copy delivered in camera-ready fonn by the author or Pittsburgh, Pennsylvania 15213 editor. Springer-Verlag provides technical instructions USA for the preparation of manuscripts.Volumes should be no less than 100 pages and preferably no more than Professor K. Krickeberg 400 pages. A subject index is expected for authored 3 Rue de L'Estrapade but not edited volumes. Proposals for volumes should 75005 Paris be sent to one of the series editors or addressed to France "Statistics Editor" at Springer-Verlag in New York. Professor I. Olkin Authors of monographs receive 50 free copies of their Department of Statistics book. Editors receive 50 free copies and are responsi Stanford University ble for distributing them to contributors. Authors, edi Stanford, California 94305 tors, and contributors may purchase additional copies USA at the publisher's discount. No reprints of individual Professor N. Wennuth contributions will be supplied and no royalties are Department of Psychology paid on Lecture Notes volumes. Springer-Verlag Johannes Gutenberg University secures the copyright for each volume. Postfach 3980 D-6500 Mainz Germany Professor S. Zeger Department of Biostatistics The Johns Hopkins University 615 N. Wolfe Street Baltimore, Maryland 21205-2103 USA Lecture Notes in Statistics 171 Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. Olkin, N. Wermuth, and S. Zeger Springer Science+Business Media, LLC David D. Denison Mark H. Hansen Christopher C. Holmes Bani Mallick Bin Yu (Editors) Nonlinear Estimation and Classification , Springer David D. Denison Mark H. Hansen Department of Mathematics Room2C283 Imperial College Bell Laboratories, Lucent Technologies 180 Queen's Gate 600 Mountain Avenue London, SW7 2BZ Murray Hill, NJ 07974-0636 UK USA [email protected] [email protected] Christopher C. Holmes Bani Mallick Department of Mathematics Statistical Department Imperial College Texas A&M University 180 Queen's Gate College Station, TX 77843-3143 London, SW7 2BZ USA UK [email protected] [email protected] BinYu Department of Statistics University of California, Berkeley Berkeley, CA 94720-3860 USA [email protected] Library of Congress Cataloging-in-Publication Data Nonlinear estimation and classification / editors, D.D. Denison ... [et al.] p. cm. — (Lecture notes in statistics ; 171 ) Includes bibliographical references and index. ISBN 978-0-387-95471-4 1. Estimation theory. 2. Nonlinear theories. I. Denison, D.D. (David D.) II. Lecture notes in statistics (Springer-Verlag) ; v. 171. QA276.8.N64 2002 519.5'44—dc21 2002030566 ISBN 978-0-387-95471-4 ISBN 978-0-387-21579-2 (eBook) Printed on acid-free paper. DOI 10.1007/978-0-387-21579-2 © 2003 Springer Science+Business Media New York Originally published by Springer-Verlag New York, Inc. in 2003 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 7 6 5 4 3 2 1 SPIN 10874011 Typesetting: Pages created by the authors using a Springer TEX macro package. Contents Introduction 1 David D. Denison, Mark H. Hansen, Christopher C. Holmes, Bani Mallick, and Bin Yu I Longer Papers 7 1 Wavelet Statistical Models and Besov Spaces 9 Hyeokho Choi and Richard G. Baraniuk 2 Coarse-to-Fine Classification and Scene Labeling 31 Donald Geman 3 Environmental Monitoring Using a Time Series of Satellite Images and Other Spatial Data Sets 49 Harri Kiiveri, Peter Caccetta, Norm Campbell, Fiona Evans, Suzanne Furby, and Jeremy Wallace 4 Traffic Flow on a Freeway Network 63 Peter Bickel, Chao Chen, Jaimyoung Kwon, John Rice, Pravin Varaiya, and Erik van Zwet 5 Internet Traffic Tends Toward Poisson and Independent as the Load Increases 83 Jin Cao, William S. Cleveland, Dong Lin, and Don X. Sun 6 Regression and Classification with Regularization 111 Sayan Mukherjee, Ryan Rifkin, and Tomaso Poggio 7 Optimal Properties and Adaptive Tuning of Standard and Nonstandard Support Vector Machines 129 Grace Wahba, Yi Lin, Yoonkyung Lee, and Hao Zhang Contents VI 8 The Boosting Approach to Machine Learning: An Overview 149 Robert E. Schapire 9 Improved Class Probability Estimates from Decision Tree Models 173 Dragos D. Margineantu and Thomas G. Dietterich 10 Gauss Mixture Quantization: Clustering Gauss Mixtures 189 Robert M. Gray 11 Extended Linear Modeling with Splines 213 Jianhua Z. Huang and Charles J. Stone II Shorter Papers 235 12 Adaptive Sparse Regression 237 Mario A. T. Figueiredo 13 Multiscale Statistical Models 249 Eric D. Kolaczyk and Robert D. Nowak 14 Wavelet Thresholding on Non-Equispaced Data 261 Maarten Jansen 15 Multi-Resolution Properties of Semi-Parametric Volatility Models 273 Enrico Capobianco 16 Confidence Intervals for Logspline Density Estimation 285 Charles Kooperberg and Charles J. Stone 17 Mixed-Effects Multivariate Adaptive Splines Models 297 Heping Zhang 18 Statistical Inference for Simultaneous Clustering of Gene Expression Data 307 Katherine S. Pollard and Mark J. van der Laan 19 Statistical Inference for Clustering Microarrays 323 Jorg Rahnenfiihrer 20 Logic Regression - Methods and Software 333 Ingo Ruczinski, Charles Kooperberg, and Michael LeBlanc Contents vii 21 Adaptive Kernels for Support Vector Classification 345 Robert Burbidge 22 Generalization Error Bounds for Aggregate Classifiers 357 Gilles Blanchard 23 Risk Bounds for CART Regression Trees 369 Servane Gey and Elodie Nedelec 24 On Adaptive Estimation by Neural Net Type Estimators 381 Sebastian Dahler and Ludger Ruschendorf 25 Nonlinear Function Learning and Classification Using RBF Networks with Optimal Kernels 393 Adam Krzyzak 26 Instability in Nonlinear Estimation and Classification: Examples of a General Pattern 405 Steven P. Ellis 27 Model Complexity and Model Priors 417 Angelika van der Linde 28 A Strategy for Compression and Analysis of Very Large Remote Sensing Data Sets 429 Amy Braverman 29 Targeted Clustering of Nonlinearly Transformed Gaussians 443 Juan K. Lin 30 Unsupervised Learning of Curved Manifolds 453 Yin de Silva and Joshua B. Tenenbaum 31 ANOVA DDP Models: A Review 467 Maria De Iorio, Peter Muller, Gary L. Rosner, and Steven N. MacEachern Introduction David D. Denison, Mark H. Hansen, Christopher C. Holmes, Bani Mallick, and Bin Yu Background Researchers in many disciplines now face the formidable task of processing massive amounts of high-dimensional and highly-structured data. Advances in data collection and information technologies have coupled with innova tions in computing to make commonplace the task of learning from complex data. As a result, fundamental statistical research is being undertaken in a variety of different fields. Driven by the difficulty of these new problems, and fueled by the explosion of available computer power, highly adaptive, non linear procedures are now essential components of modern "data analysis," a term that we liberally interpret to include speech and pattern recognition, classification, data compression and image processing. The development of new, flexible methods combines advances from many sources, including ap proximation theory, numerical analysis, machine learning, signal processing and statistics. This volume collects papers from a unique workshop designed to promote communication between these different disciplines. History In 1999, Hansen and Yu were both Members of the Technical Staff at Bell Laboratories in Murray Hill, New Jersey. They were exploring the con nections between information theory and statistics. At that time, Denison and Mallick were faculty members at Imperial College, London, researching