ebook img

Recursive Partitioning and Applications PDF

267 Pages·2010·3.274 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Recursive Partitioning and Applications

Springer Series in Statistics Advisors: P.Bickel,P.Diggle, S.Fienberg, U.Gather, I.Olkin, S.Zeger Forothertitlespublishedinthisseries,goto www.springer.com/series/692 Heping Zhang • Burton H. Singer Recursive Partitioning and Applications Second Edition Heping Zhang Burton H. Singer Department of Epidemiology and Public Health Emerging Pathogens Institute Yale University School of Medicine University of Florida 60 College Street PO Box 100009 New Haven, Connecticut 06520-8034 Gainesville, FL 32610 USA USA [email protected] ISSN0172-7397 ISBN978-1-4419-6823-4 e-ISBN978-1-4419-6824-1 DOI10.1007/978-1-4419-6824-1 Springer New York Dordrecht Heidelberg London LibraryofCongressControlNumber:2010930849 (cid:2)c SpringerScience+BusinessMedia,LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printedonacid-freepaper Springer is part of Springer Science+Business Media (www.springer.com) Dedicated to Julan, Jeffrey, and Leon (HZ) and to Eugenia, Gregory, Maureen, and Sheila (BS) Preface Multiple complex pathways, characterized by interrelated events and con- ditions, represent routes to many illnesses, diseases, and ultimately death. Although there are substantial data and plausibility arguments support- ing many conditions as contributory components of pathways to illness and disease end points, we have,historically,lackedan effective methodol- ogyfor identifying the structure of the full pathways.Regressionmethods, withstronglinearityassumptionsanddata-basedconstraintsontheextent and order of interaction terms, have traditionally been the strategies of choice for relating outcomes to potentially complex explanatorypathways. However, nonlinear relationships among candidate explanatory variables are a generic feature that must be dealt with in any characterization of how health outcomes come about. It is noteworthy that similar challenges arise from data analyses in Economics, Finance, Engineering, etc. Thus, the purpose of this book is to demonstrate the effectiveness of a relatively recently developed methodology—recursive partitioning—as a response to this challenge. We also compare and contrast what is learned via recur- sive partitioning with results obtained on the same data sets using more traditionalmethods. This serves to highlight exactly where—and for what kinds of questions—recursive partitioning–based strategies have a decisive advantage over classical regressiontechniques. This book is a revised edition of our first one entitled Recursive Par- titioning in the Health Sciences. A decade has passed since we published the first edition. This new edition reflects recent developments that are either new or have increased in importance. It also covers areas that we neglected before, particularlythe topic of forests. The first edition focused viii Preface ontwo aspects.First, we presentedthe tree-basedmethods entirely within the framework of Breiman et al. (1984). Second, the examples were from healthsciences.Althoughitisdifficulttodojusticetoallalternativemeth- ods to Breiman et al. (1984), we feel they deserve emphasis here. We also realize that the methods presented hereinhave applications beyond health sciences,andanoutreachto otherfieldsofscienceandsocietalsignificance isoverdue.Thisisthereasonthatwehavechangedthetitle.Lastly,wehave experiencedtherapidadvancementofgenomics.Recursivepartitioninghas become one of the most appealing analytic methods for understanding or mining genomic data. In this revision, we demonstrate the application of tree- and forest-based methods to understanding genomic data. Having expanded the scope of our book, we are aiming at three broad groups:(1)biomedicalresearchers,clinicians,bioinformaticists,geneticists, psychologists,sociologists,epidemiologists,healthservicesresearchers,and environmentalpolicy advisers;(2) consulting statisticians who canuse the recursive partitioning technique as a guide in providing effective and in- sightful solutions to clients’ problems; and (3) statisticians interested in methodological and theoretical issues. The book provides an up-to-date summary of the methodologicaland theoreticalunderpinnings of recursive partitioning. More interestingly, it presents a host of unsolved problems whose solutions would advance the rigorous underpinnings of statistics in general. From the perspective of the first two groups of readers, we demonstrate with realapplications the sequentialinterplaybetween automatedproduc- tionofmultiple well-fitting trees andscientific judgment leadingto respec- ification of variables, more refined trees subject to context-specific con- straints (on splitting and pruning, for example), and ultimately selection ofthemostinterpretableandusefultree(s).Inthisrevisionweincludenew and substantively important examples, some of which are related to bioin- formaticsandgenomicsandothersareoutsidetherealmofhealthsciences. Thesectionsmarkedwithasteriskscanbe skippedforapplication-oriented readers. We show a more conventional regression analysis—having the same ob- jective as the recursive partitioning analysis—side by side with the newer methodology. In each example, we highlight the scientific insight derived fromtherecursivepartitioningstrategythatisnotreadilyrevealedbymore conventional methods. The interfacing of automated output and scientific judgment is illustrated with both conventional and recursive partitioning analysis. Theoretically oriented statisticians will find a substantial listing of chal- lenging theoretical problems whose solutions would provide much deeper insightthanheretoforeaboutthe scopeandlimits ofrecursivepartitioning as such and multivariate adaptive splines and forests in particular. We emphasize the development of narratives to summarize the formal Boolean statements that define routes down the trees to terminal nodes. Preface ix Particularlywithcomplex—byscientific necessity—trees,narrativeoutput facilitates understanding and interpretation of what has been provided by automated techniques. We illustrate the sensitivity of trees to variation in choosing misclassi- fication cost, where the variation is a consequence of divergent views by clinicians of the costs associated with differing mistakes in prognosis. The book by Breiman et al. (1984) is a classical work on the subject of recursivepartitioning.InChapter4,wereiteratethekeyideasexpressedin that book and expand our discussions in different directions on the issues that arise from applications. Other chapters on survival trees, adaptive splines, forests, and classification trees for multiple discrete outcomes are new developments since the work of Breiman et al. (1984). Heping Zhang wishes to thank his colleagues and students, Joan Buen- consejo,TheodoreHolford, James Leckman,JuLi, Robert Makuch,Kath- leenMerikangas,BradleyPeterson,NormanSilliker,DanielZelterman,and HongyuZhaoamongothers,fortheirhelpwithreadingandcommentingon the firsteditionofthis book.He is alsogratefulto manycolleaguesinclud- ing Drs. Michael Bracken, Dorit Carmelli, and Brian Leaderer for making their data sets available to the first versionof this book. This revision was supportedinpartbyNIHgrantsK02DA017713andR01DA016750toHep- ing Zhang.BurtonSingerthanks TaraGruenewald(UCLA) andJasonKu (Princeton) for assistance in developing some of the new examples. In ad- dition,Drs.XiangChen,KellyCho,YunxiaoHe,YuanJiang,andMinghui Wang, and Ms. Donna DelBasso assisted Heping Zhang in computation and proofreading of this revised edition.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.