This page intentionally left blank EvaluatingLearningAlgorithms The field of machine learning has matured to the point where many sophisticated learning approaches can be applied to practical applications. Thus it is of critical importance that researchers have the proper tools to evaluate learning approaches andunderstandtheunderlyingissues. Thisbookexaminesvariousaspectsoftheevaluationprocesswithanemphasis on classification algorithms. The authors describe several techniques for classifier performanceassessment,errorestimationandresampling,andobtainingstatistical significance, as well as selecting appropriate domains for evaluation. They also presentaunifiedevaluationframeworkandhighlighthowdifferentcomponentsof evaluation are both significantly interrelated and interdependent. The techniques presentedinthebookareillustratedusingRandWEKA,facilitatingbetterpractical insightaswellasimplementation. Aimed at researchers in the theory and applications of machine learning, this book offers a solid basis for conducting performance evaluations of algorithms in practicalsettings. NathalieJapkowiczisaProfessorofComputerScienceattheSchoolofInformation TechnologyandEngineeringoftheUniversityofOttawa.Shealsotaughtmachine learningandartificialintelligenceatDalhousieUniversityandOhioStateUniversity. Along with machine learning evaluation, her research interests include one-class learning, the class imbalance problem, and learning in the presence of concept drifts. Mohak Shah is a Postdoctoral Fellow at McGill University. He earned a PhD in Computer Science from the University of Ottawa in 2006 and was a Postdoctoral FellowatCHULGenomicsResearchCenterinQuebecpriortojoiningMcGill.His research interests span machine learning and statistical learning theory as well as theirapplicationtovariousdomains. Evaluating Learning Algorithms A Classification Perspective NATHALIE JAPKOWICZ UniversityofOttawa MOHAK SHAH McGillUniversity cambridgeuniversitypress Cambridge,NewYork,Melbourne,Madrid,CapeTown,Singapore, Sa˜oPaulo,Delhi,Dubai,Tokyo,MexicoCity CambridgeUniversityPress 32AvenueoftheAmericas,NewYork,NY10013-2473,USA www.cambridge.org Informationonthistitle:www.cambridge.org/9780521196000 ©CambridgeUniversityPress2011 Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithoutthewritten permissionofCambridgeUniversityPress. Firstpublished2011 PrintedintheUnitedStatesofAmerica AcatalogrecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloginginPublicationdata Japkowicz,Nathalie. EvaluatingLearningAlgorithms:AClassificationPerspective/NathalieJapkowicz,MohakShah. p. cm. Includesbibliographicalreferences. ISBN978-0-521-19600-0 1.Machinelearning. 2.Computeralgorithms–Evaluation. I.Shah,Mohak. II.Title. Q325.5.J37 2011 006.3(cid:2)1–dc22 2010048733 ISBN978-0-521-19600-0Hardback CambridgeUniversityPresshasnoresponsibilityforthepersistenceoraccuracyofURLsfor externalorthird-partyInternetWebsitesreferredtointhispublicationanddoesnotguaranteethat anycontentonsuchWebsitesis,orwillremain,accurateorappropriate. Thisbookisdedicatedtothememoryofmyfather,MichelJapkowicz (1935–2008),whowasmygreatestsupporterallthroughoutmystudiesand career,takingagreatinterestinanyprojectofmine.Hewasawareofthe factthatthisbookwasbeingwritten,encouragedmetowriteit,andwould betheproudestfatheronearthtoseeitinprinttoday. Nathalie Thisbookisdedicatedtothelovingmemoryofmyfather,UpendraShah (1948–2006),whowasmymentorinlife.Hetaughtmetheimportanceofnot fallingformeansbutlookingformeaninginlife.Hewasalsomygreatest supportthroughalltimes,goodandbad.Hismemoriesareaconstantsource ofinspirationandmotivation.Here’stoyouDad! Mohak Contents Preface pagexi Acronyms xv 1 Introduction 1 1.1 TheDeFactoCulture 3 1.2 MotivationsforThisBook 6 1.3 TheDeFactoApproach 7 1.4 BroaderIssueswithEvaluationApproaches 12 1.5 WhatCanWeDo? 16 1.6 IsEvaluationanEndinItself? 18 1.7 PurposeoftheBook 19 1.8 OtherTakesonEvaluation 20 1.9 MovingBeyondClassification 20 1.10 ThematicOrganization 21 2 MachineLearningandStatisticsOverview 23 2.1 MachineLearningOverview 23 2.2 StatisticsOverview 42 2.3 Summary 72 2.4 BibliographicRemarks 73 3 PerformanceMeasuresI 74 3.1 OverviewoftheProblem 75 3.2 AnOntologyofPerformanceMeasures 81 3.3 IllustrativeExample 82 3.4 PerformanceMetricswithaMulticlassFocus 85 3.5 PerformanceMetricswithaSingle-ClassFocus 94 3.6 IllustrationoftheConfusion-Matrix-Only-BasedMetrics UsingWEKA 107 vii viii Contents 3.7 Summary 108 3.8 BibliographicRemarks 109 4 PerformanceMeasuresII 111 4.1 GraphicalPerformanceMeasures 112 4.2 ReceiverOperatingCharacteristic(ROC)Analysis 112 4.3 OtherVisualAnalysisMethods 131 4.4 ContinuousandProbabilisticClassifiers 137 4.5 SpecializedMetrics 143 4.6 IllustrationoftheRankingandProbabilisticApproaches UsingR,ROCR,andWEKA 146 4.7 Summary 159 4.8 BibliographicRemarks 159 5 ErrorEstimation 161 5.1 Introduction 163 5.2 HoldoutApproach 164 5.3 WhatImplicitlyGuidesResampling? 167 5.4 SimpleResampling 171 5.5 ANoteonModelSelection 177 5.6 MultipleResampling 178 5.7 Discussion 185 5.8 IllustrationsUsingR 187 5.9 Summary 202 5.10 BibliographicRemarks 202 Appendix:ProofofEquation(5.5) 204 6 StatisticalSignificanceTesting 206 6.1 ThePurposeofStatisticalSignificanceTesting 207 6.2 TheLimitationsofStatisticalSignificanceTesting 210 6.3 AnOverviewofRelevantStatisticalTests 213 6.4 ANoteonTerminology 215 6.5 ComparingTwoClassifiersonaSingleDomain 217 6.6 ComparingTwoClassifiersonMultipleDomains 231 6.7 ComparingMultipleClassifiersonMultipleDomains 239 6.8 StatisticalTestsforTwoClassifiersonaSingleDomain BasedonResamplingTechniques 258 6.9 IllustrationoftheStatisticalTestsApplicationUsingR 263 6.10 Summary 289 6.11 BibliographicRemarks 290 7 DatasetsandExperimentalFramework 292 7.1 Repository-BasedApproach 294 7.2 MakingSenseofOurRepositories:Metalearning 300
Description: