ebook img

Obtaining Accurate Probabilities using Classifier Calibration PDF

150 Pages·2016·1.86 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Obtaining Accurate Probabilities using Classifier Calibration

OBTAINING ACCURATE PROBABILITIES USING CLASSIFIER CALIBRATION by Mahdi Pakdaman Naeini B.Sc. in Software Engineering, Iran University of Science and Technology, 2001 M.Sc. in Artificial Intelligence and Robotics, University of Tehran, 2004 M.Sc. in Intelligent Systems Program, University of Pittsburgh, 2013 Submitted to the Graduate Faculty of the Kenneth P. Dietrich School of Arts and Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2016 UNIVERSITYOFPITTSBURGH KENNETHP.DIETRICHSCHOOLOFARTSANDSCIENCES Thisdissertationwaspresented by MahdiPakdamanNaeini Itwasdefendedon August5th2016 Dr. GregoryF.Cooper,DepartmentofBiomedicalInformatics,UniversityofPittsburgh Dr. MilosHauskrecht,DepartmentofComputerScience,UniversityofPittsburgh Dr. ShyamVisweswaran,DepartmentofBiomedicalInformatics,UniversityofPittsburgh Dr. JeffSchneider,TheRoboticsInstitute,CarnegieMellonUniversity DissertationDirector: Dr. GregoryF.Cooper,DepartmentofBiomedicalInformatics,University ofPittsburgh ii Copyright(cid:13)c byMahdiPakdamanNaeini 2016 iii OBTAININGACCURATEPROBABILITIESUSINGCLASSIFIERCALIBRATION MahdiPakdamanNaeini,PhD UniversityofPittsburgh,2016 Learning probabilistic classification and prediction models that generate accurate probabilities is essentialinmanypredictionanddecision-makingtasksinmachinelearninganddatamining. One way to achieve this goal is to post-process the output of classification models to obtain more ac- curateprobabilities. Thesepost-processingmethodsareoftenreferredtoascalibrationmethodsin themachinelearningliterature. Thisthesisdescribesasuiteofparametricandnon-parametricmethodsforcalibratingtheout- put of classification and prediction models. In order to evaluate the calibration performance of a classifier, we introduce two new calibration measures that are intuitive statistics of the calibration curves. We present extensive experimental results on both simulated and real datasets to evalu- ate the performance of the proposed methods compared with commonly used calibration methods in the literature. In particular, in terms of binary classifier calibration, our experimental results show that the proposed methods are able to improve the calibration power of classifiers while re- taining their discrimination performance. Our theoretical findings show that by using a simple non-parametriccalibrationmethod,itispossibletoimprovethecalibrationperformanceofaclas- sifierwithoutsacrificingdiscriminationcapability. Themethodsarealsocomputationallytractable forlarge-scaledatasetsastheyruninO(N logN)time,whereN isthenumberofsamples. In this thesis we also introduce a novel framework to derive calibrated probabilities of causal relationships from observational data. The framework consists of three main components: (1) an approximate method for generating initial probability estimates of the edge types for each pair of variables, (2) the availability of a relatively small number of the causal relationships in the network for which the truth status is known, which we call a calibration training set, and (3) a iv calibration method for using the approximate probability estimates and the calibration training set to generate calibrated probabilities for the many remaining pairs of variables. Our experiments on a range of simulated data support that the proposed approach improves the calibration of edge predictions. The results also support that the approach often improves the precision and recall of thosepredictions. v TABLEOFCONTENTS ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii 1.0 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 HypothesisStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.0 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 ExistingCalibrationMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 ParametricCalibrationMethods . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1.1 Platt’sMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1.2 BetaDistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1.3 AsymmetricLaplaceMethod . . . . . . . . . . . . . . . . . . . . . 8 2.1.1.4 PiecewiseLogisticRegression . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Non-ParametricCalibrationMethods . . . . . . . . . . . . . . . . . . . . . 9 2.1.2.1 HistogramBinning . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2.2 IsotonicRegression . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2.3 SimilarityBinningAveraging . . . . . . . . . . . . . . . . . . . . . 11 2.1.2.4 AdaptiveCalibrationofPredictions . . . . . . . . . . . . . . . . . . 11 2.2 OtherRelatedMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 TheoreticalWorks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 HowtoEvaluateCalibrationMethods . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.1 ProperScoringRules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.2 Calibrationvs. Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.3 EvaluationMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.0 BINARYCLASSIFIERCALIBRATIONMETHODS . . . . . . . . . . . . . . . . 17 vi 3.1 ExtendingHistogramBinningusingnon-parametricBinaryClassification . . . . . 17 3.2 BayesianExtensionofHistogramBinning . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 FullBayesianApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1.1 BayesianCalibrationScore . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1.2 TheSBB andABB models . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1.3 DynamicProgrammingSearchofSBB . . . . . . . . . . . . . . . . 23 3.2.1.4 DynamicProgrammingSearchofABB . . . . . . . . . . . . . . . 25 3.2.2 SelectiveBayesianApproach . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 CalibrationusingNearIsotonicRegression . . . . . . . . . . . . . . . . . . . . . 28 3.3.1 TheModifiedPAVAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 CalibrationusingLinearTrendFiltering . . . . . . . . . . . . . . . . . . . . . . . 34 4.0 EMPIRICALRESULTSONBINARYCLASSIFIERCALIBRATIONMETHODS 40 4.1 ExperimentalResultsonKDE-DPM . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 ExperimentalResultsforBayesianBinningMethods . . . . . . . . . . . . . . . . 43 4.2.1 ExperimentalResultsforABB-SBB . . . . . . . . . . . . . . . . . . . . . . 43 4.2.1.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2.2 ExperimentalResultsforBBQ . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2.1 SimulatedData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2.2 RealData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.3 ABBvs. BBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.4 K2vs. BDeu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3 ExperimentalResultsonENIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.1 SimulatedData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.2 RealData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4 ExperimentalResultsonELiTE . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5 EmpiricalEvaluationofKDE,DPM,ENIR,andELiTE . . . . . . . . . . . . . . . 80 4.6 EffectofCalibrationSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.0 THEORETICALFINDINGSONBINARYCLASSIFIERCALIBRATION . . . . 89 5.1 NotationandAssumptions: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2 CalibrationTheorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 vii 5.2.1 ConvergenceResultsonMCE . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2.2 ConvergenceResultsonECE . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.2.3 ConvergenceResultsonAUCLoss . . . . . . . . . . . . . . . . . . . . . . 92 5.2.3.1 FirstSummationPart . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2.3.2 SecondSummationPart . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3 EmpiricalEvaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.0 MULTI-CLASSCLASSIFIERCALIBRATION . . . . . . . . . . . . . . . . . . . . 101 6.1 ExtendingPlatt’sMethodforMulti-ClassCalibration . . . . . . . . . . . . . . . . 101 6.1.1 ExperimentalResultsonUCIDatasets . . . . . . . . . . . . . . . . . . . . . 103 6.2 ApplicationinCausalNetworkDiscovery . . . . . . . . . . . . . . . . . . . . . . 107 6.2.1 ProblemDefinitionandMotivation . . . . . . . . . . . . . . . . . . . . . . 107 6.2.2 OverviewofGreedyEquivalentSearch . . . . . . . . . . . . . . . . . . . . 110 6.2.3 BootstrappedGreedyEquivalentSearch . . . . . . . . . . . . . . . . . . . . 110 6.2.4 ExperimentalMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.2.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.0 CONCLUSIONANDFUTUREWORK . . . . . . . . . . . . . . . . . . . . . . . . 119 8.0 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 viii LISTOFTABLES 1 GermanCreditCostTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 ProperScoringRulesforDecisionProblemwithaBinaryOutcome . . . . . . . . . 14 3 ExperimentalResultsonSimulateddataset5(b) . . . . . . . . . . . . . . . . . . . . 41 4 ExperimentalResultsonKDD98dataset. Thefirstcolumnofeachtableshowsthe resultofamodelwithoutpost-processingcalibration . . . . . . . . . . . . . . . . . 44 5 ExperimentalResultsonSimulatedandRealdatasets . . . . . . . . . . . . . . . . . 48 6 Time complexity of calibration methods in training on N samples and application on1testsample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7 ExperimentalResultsonthcircularconfigurationsimulateddatasetshowninFigure 5(b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 8 The results of experiments in comparing ABB versus BBQ when we use LR as the base classifier. Bold face indicates the results that are significantly better based on theWilcoxonsignedranktestatthe5%significancelevel. . . . . . . . . . . . . . . 61 9 The results of experiments on comparing ABB versus BBQ when we use SVM as the base classifier. Bold face indicates the results that are significantly better based ontheWilcoxonsignedranktestatthe5%significancelevel. . . . . . . . . . . . . 62 10 TheresultsofexperimentsoncomparingABBversusBBQwhenweuseNBasthe base classifier. Bold face indicates the results that are significantly better based on theWilcoxonsignedranktestatthe5%significancelevel. . . . . . . . . . . . . . . 63 ix 11 TheresultsofexperimentscomparingtheperformanceoftheK2versusBDeuscor- ing functions when we use LR as the base classifier. Bold face indicates the results that are statistically significantly better based on the Wilcoxon signed rank test at the5%significancelevel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 12 TheresultsofexperimentscomparingtheperformanceoftheK2versusBDeuscor- ingfunctionswhenweuseSVMasthebaseclassifier. Boldfaceindicatestheresults that are statistically significantly better based on the Wilcoxon signed rank test at the5%significancelevel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 13 TheresultsofexperimentscomparingtheperformanceoftheK2versusBDeuscor- ingfunctionswhenweuseNBasthebaseclassifier. Boldfaceindicatestheresults that are statistically significantly better based on the Wilcoxon signed rank test at the5%significancelevel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 14 ExperimentalResultsonasimulateddataset: Circularconfiguration . . . . . . . . . 67 15 ExperimentalResultsonSimulateddataset: XORconfiguration . . . . . . . . . . . 67 16 Summarystatisticsofthesizeoftherealdatasetsandthepercentageoftheminority class. Q1andQ3definesthefirstquartileandthirdsquartilerespectively. . . . . . . 68 17 Average rank of the calibration methods on the real datasets using LR as the base classifier. Marker ∗/(cid:126) indicates whether ENIR is statistically superior/inferior to the compared method (using the F test followed by Holm’s step-down procedure F ata0.05significancelevel). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 18 AveragerankofthecalibrationmethodsontherealdatasetsusingSVMasthebase classifier. Marker ∗/(cid:126) indicates whether ENIR is statistically superior/inferior to the compared method (using the F test followed by Holm’s step-down procedure F ata0.05significancelevel). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 19 Average rank of the calibration methods on the real datasets using NB as the base classifier. Marker ∗/(cid:126) indicates whether ENIR is statistically superior/inferior to the compared method (using the F test followed by Holm’s step-down procedure F ata0.05significancelevel). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 x

Description:
The framework consists of three main components: (1) an 2.1.1.3 AsymmetricLaplaceMethod 4.5 Empirical Evaluation of KDE, DPM, ENIR, and ELiTE Platt's method and M is defined as the total number of models used in the .. In Chapter [6] presents our proposed multi-class classifier.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.