Computer Science Chapman & Hall/CRC Chapman & Hall/CRC E Machine Learning & Pattern Recognition Series Machine Learning & Pattern Recognition Series n s Ensemble Methods “Professor Zhou’s book is a comprehensive introduction to ensemble e methods in machine learning. It reviews the latest research in this exciting area. I learned a lot reading it!” m —Thomas G. Dietterich, Oregon State University, ACM Fellow, and Foundations and Algorithms founding president of the International Machine Learning Society b “This is a timely book. Right time and right book … with an authoritative l e but inclusive style that will allow many readers to gain knowledge on the topic.” M —Fabio Roli, University of Cagliari An up-to-date, self-contained introduction to a state-of-the-art e machine learning approach, Ensemble Methods: Foundations and t Algorithms shows how these accurate methods are used in real- h world tasks. It gives you the necessary groundwork to carry out further research in this evolving field. o d Features • Supplies the basics for readers unfamiliar with machine learning s and pattern recognition • Covers nearly all aspects of ensemble techniques such as combination methods and diversity generation methods • Presents the theoretical foundations and extensions of many ensemble methods, including Boosting, Bagging, Random Trees, and Stacking • Introduces the use of ensemble methods in computer vision, Z computer security, medical imaging, and famous data mining h o competitions u • Highlights future research directions • Provides additional reading sections in each chapter and references at the back of the book Zhi-Hua Zhou K11467 K11467_Cover.indd 1 4/30/12 10:30 AM Ensemble Methods Foundations and Algorithms Chapman & Hall/CRC Machine Learning & Pattern Recognition Series SERIES EDITORS Ralf Herbrich and Thore Graepel Microsoft Research Ltd. Cambridge, UK AIMS AND SCOPE This series reflects the latest advances and applications in machine learning and pattern recognition through the publication of a broad range of reference works, textbooks, and handbooks. The inclusion of concrete examples, appli- cations, and methods is highly encouraged. The scope of the series includes, but is not limited to, titles in the areas of machine learning, pattern recogni- tion, computational intelligence, robotics, computational/statistical learning theory, natural language processing, computer vision, game AI, game theory, neural networks, computational neuroscience, and other relevant topics, such as machine learning applied to bioinformatics or cognitive science, which might be proposed by potential contributors. PUBLISHED TITLES MACHINE LEARNING: An Algorithmic Perspective Stephen Marsland HANDBOOK OF NATURAL LANGUAGE PROCESSING, Second Edition Nitin Indurkhya and Fred J. Damerau UTILITY-BASED LEARNING FROM DATA Craig Friedman and Sven Sandow A FIRST COURSE IN MACHINE LEARNING Simon Rogers and Mark Girolami COST-SENSITIVE MACHINE LEARNING Balaji Krishnapuram, Shipeng Yu, and Bharat Rao ENSEMBLE METHODS: FOUNDATIONS AND ALGORITHMS Zhi-Hua Zhou Chapman & Hall/CRC Machine Learning & Pattern Recognition Series Ensemble Methods Foundations and Algorithms Zhi-Hua Zhou CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2012 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20120501 International Standard Book Number-13: 978-1-4398-3005-5 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, micro- filming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www. copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750- 8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identi- fication and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Tomyparents,wifeandson. Z.-H.Zhou TThhiiss ppaaggee iinntteennttiioonnaallllyy lleefftt bbllaannkk Preface Ensemblemethodsthattrainmultiplelearnersandthencombinethemfor use,withBoostingandBaggingasrepresentatives,areakindofstate-of-the- artlearning approach. Itiswellknownthatanensembleisusually signif- icantly more accurate than a single learner, and ensemble methods have alreadyachievedgreatsuccessinmanyreal-worldtasks. Itisdifficulttotracethestartingpointofthehistoryofensemblemeth- ods since the basic idea of deploying multiple models has been in use in human society for a long time; however, it is clear that ensemble meth- odshavebecomeahottopicsincethe1990s,andresearchersfromvarious fieldssuch as machine learning, pattern recognition, data mining, neural networksandstatisticshaveexploredensemblemethodsfromdifferentas- pects. Thisbookprovidesresearchers,studentsandpractitionerswithanintro- ductionto ensemblemethods.Thebookconsists ofeight chapters which naturallyconstitutethreeparts. PartIiscomposedofChapter1.Thoughthisbookismainly writtenfor readers witha basicknowledgeofmachine learning and patternrecogni- tion, to enable readers who are unfamiliar with these fields to access the main contents, Chapter 1 presents some “background knowledge” of en- semble methods. It is impossible to provide a detailed introduction to all backgroundsinonechapter,andthereforethischapterservesmainlyasa guidetofurtherstudy.Thischapteralsoservestoexplaintheterminology used inthisbook,toavoidconfusion caused by otherterminologiesused indifferentbutrelevantfields. Part II is composed of Chapters 2 to 5 and presents “core knowledge” of ensemble methods. Chapters 2 and 3 introduce Boosting and Bagging, respectively. Inadditionto algorithmsand theories,Chapter 2 introduces multi-classextensionandnoisetolerance,sinceclassicBoostingalgorithms are designed for binary classification, and are usually hurt seriously by noise.Baggingisnaturallyamulti-classmethodandlesssensitivetonoise, andtherefore,Chapter3doesnotdiscusstheseissues;instead,Chapter3 devotes a section to Random Forest and some other random tree ensem- blesthatcanbeviewedasvariantsofBagging.Chapter4introducescombi- nationmethods.Inadditiontovariousaveragingandvotingschemes,the Stackingmethodandsomeothercombinationmethodsaswellasrelevant methodssuchasmixtureofexpertsareintroduced.Chapter5focusesonen- semble diversity. After introducing the error-ambiguity and bias-variance vii viii Preface decompositions, many diversity measures are presented, followed by re- cent advances in information theoretic diversity and diversity generation methods. PartIIIiscomposed ofChapters6to 8,and presents “advancedknowl- edge” of ensemble methods. Chapter 6 introduces ensemble pruning, whichtriestopruneatrainedensembletogetabetterperformance.Chap- ter 7 introduces clustering ensembles, which try to generate better clus- teringresultsbycombiningmultipleclusterings.Chapter8presentssome developments of ensemble methods in semi-supervised learning, active learning, cost-sensitive learning and class-imbalance learning, as well as comprehensibilityenhancement. Itisnotthegoalofthebooktocoverallrelevantknowledgeofensemble methods.AmbitiousreadersmaybeinterestedinFurtherReading sections forfurtherinformation. Twootherbooks[Kuncheva,2004,Rokach,2010]onensemblemethods havebeenpublishedbeforethisone.Toreflectthefastdevelopmentofthis field,Ihaveattemptedtopresentanupdatedandin-depthoverview.How- ever, when writing this book,I found this task more challenging than ex- pected.Despiteabundantresearchonensemblemethods,athoroughun- derstanding of many essentials isstill needed,and there is alack of thor- ough empirical comparisons of many technical developments. As a con- sequence, several chapters of the book simply introduce a number of al- gorithms, while even for chapters with discussions on theoretical issues, there are still important yet unclear problems. On one hand, this reflects the still developingsituation of the ensemblemethodsfield; onthe other hand,suchasituationprovidesagoodopportunityforfurtherresearch. The book could not have been written, at least not in its current form, withoutthehelpofmanypeople.IamgratefultoTomDietterichwhohas carefully readthewholebookandgivenverydetailedandinsightfulcom- ments and suggestions. I want to thank Songcan Chen, Nan Li, Xu-Ying Liu,FabioRoli,Jianxin Wu,YangYuandMin-LingZhangforhelpfulcom- ments.IalsowanttothankRandiCohenandhercolleaguesatChapman& Hall/CRCPressforcooperation. Last,butdefinitelynotleast,Iamindebtedtomyfamily,friendsandstu- dentsfortheirpatience,supportandencouragement. Zhi-HuaZhou Nanjing,China Notations x variable x vector A matrix I identitymatrix X,Y inputandoutputspaces D probabilitydistribution D datasample(dataset) N normaldistribution U uniformdistribution H hypothesisspace H setofhypotheses h(·) hypothesis(learner) L learningalgorithm p(·) probabilitydensityfunction p(·|·) conditionalprobabilitydensityfunction P(·) probabilitymassfunction P(·|·) conditionalprobabilitymassfunction E·∼D[f(·)] mathematical expectation of function f(·) to · under distribution D. D and/or · is ignored when themeaningisclear var·∼D[f(·)] varianceoffunctionf(·)to·underdistributionD I(·) indicator function which takes 1 if · is true, and 0 otherwise sign(·) sign function which takes -1,1 and 0 when · < 0, ·>0and·=0,respectively err(·) errorfunction {...} set (...) rowvector ix
Description: