ebook img

Minimum Divergence Methods in Statistical Machine Learning: From an Information Geometric Viewpoint PDF

224 Pages·2022·4.695 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Minimum Divergence Methods in Statistical Machine Learning: From an Information Geometric Viewpoint

Shinto Eguchi Osamu Komori Minimum Divergence Methods in Statistical Machine Learning From an Information Geometric Viewpoint Minimum Divergence Methods in Statistical Machine Learning · Shinto Eguchi Osamu Komori Minimum Divergence Methods in Statistical Machine Learning From an Information Geometric Viewpoint ShintoEguchi OsamuKomori InstituteofStatisticalMathematic SeikeiUniversity Tokyo,Japan Tokyo,Japan ISBN978-4-431-56920-6 ISBN978-4-431-56922-0 (eBook) https://doi.org/10.1007/978-4-431-56922-0 ©SpringerJapanKK,partofSpringerNature2022 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerJapanKKpartofSpringerNature. Theregisteredcompanyaddressis:ShiroyamaTrustTower,4-3-1Toranomon,Minato-ku,Tokyo105- 6005,Japan Preface Thisbookexploresminimumdivergencemethodsofstatisticalmachinelearningfor estimation,regression,prediction,andsoforth,inwhichweengageininformation geometrytoelucidatetheirintrinsicpropertiesofthecorrespondinglossfunctions, learning algorithms, and statistical models. One of the most elementary examples is Gauss’s least squares estimator in a linear regression model, in which the esti- matorisgivenbyminimizationofthesumofsquaresbetweenaresponsevectorand a vector of the linear subspace hulled by explanatory vectors. This is extended to Fisher’sMaximumLikelihoodEstimator(MLE)foranexponentialmodel,inwhich theestimatorisprovidedbyminimizationoftheKullback-Leibler(KL)divergence betweenadatadistributionandaparametricdistributionoftheexponentialmodelin anempiricalanalogue.Thus,weenvisageageometricinterpretationofsuchmini- mization procedures such that a right triangle is kept with Pythagorean identity in thesenseoftheKLdivergence.Thisunderstandingsublimatesadualisticinterplay between a statistical estimation and model, which requires a dual-geodesic paths calledm-geodesicande-geodesicpaths,inaframeworkofinformationgeometry. WeextendsuchadualisticstructureoftheMLEandexponentialmodeltothatof theminimumdivergenceestimatorandthemaximumentropymodel,whichisapplied to robust statistics, maximum entropy, density estimation, principal component analysis, independent component analysis, regression analysis, manifold learning, boostingalgorithm,clustering,dynamictreatmentregimes,andsoforth.Weconsider avarietyofinformationdivergencemeasurestypicallyincludingKLdivergenceto expressdeparturefromoneprobabilitydistributiontoanother.Aninformationdiver- gence is decomposed into the cross-entropy and the (diagonal) entropy in which the entropy associates with a generative model as a family of maximum entropy distributions; the cross-entropy associates with a statistical estimation method via minimization of the empirical analogue based on given data. Thus any statistical divergence includes an intrinsic object between the generative model and the esti- mation method. Typically, KL divergence leads to the exponential model and the maximumlikelihoodestimation.Itisshownthatanyinformationdivergenceleads to a Riemannian metric and a pair of the linear connections in the framework of informationgeometry. v vi Preface We focus on a class of information divergence generated by an increasing and convex function U, called U-divergence. It is shown that any generator function U generates the U-entropy and U-divergence, in which there is a dualistic struc- ture between the U-divergence method and the maximum U-entropy model. We observe that a specific choice of U leads to a robust statistical procedure via the minimumU-divergencemethod.IfU isselectedasanexponentialfunction,thenthe correspondingU-entropyandU-divergencearereducedtotheBoltzmann-Shannon entropyandtheKLdivergence;theminimumU-divergenceestimatorisequivalent totheMLE.Forrobustsupervisedlearningtopredictaclasslabel,weobservethat theU-boostingalgorithmperformswellforcontaminationofmislabelexamplesif U isappropriatelyselected.WepresentsuchmaximalU-entropyandminimumU- divergencemethod,inparticular,selectingapowerfunctionasU toprovideflexible performanceinstatisticalmachinelearning. Tokyo,Japan ShintoEguchi October2021 OsamuKomori Acknowledgments WearegratefultoProfessorsShun-ichiAmari,JohnB.Copas,andallourcolleagues for helpful introductions and discussions to information geometry, statistics and machinelearning.Inparticular,HideitsuHinoandKatsuhikoOmaekindlygaveus manysuggestionstoimprovethedraftofthisbook.ThisworkissupportedbyJSPS KAKENHIGrantNumber18H03211. vii Contents PartI MinimumDivergenceGeometry 1 InformationGeometry .......................................... 3 1.1 IntroductoryRemarks ....................................... 3 1.2 NormalDistributionModel .................................. 5 1.3 ContingencyTableModel ................................... 7 1.4 InformationMetric ......................................... 9 1.5 MixtureandExponentialConnections ......................... 11 References ..................................................... 17 2 InformationDivergence ......................................... 19 2.1 Introduction ............................................... 19 2.2 TwoClassesofInformationDivergence ........................ 21 2.3 MinimumInformationDivergenceGeometry ................... 38 2.4 PathConnectedness ......................................... 47 2.5 SpaceofPositive-DefiniteMatrices ........................... 56 2.6 Discussion ................................................ 65 References ..................................................... 66 3 MaximumEntropyModel ....................................... 71 3.1 MaximumEntropyDistribution .............................. 71 3.2 β-Maxent ................................................. 87 References ..................................................... 94 4 MinimumDivergenceMethod ................................... 97 4.1 Introduction ............................................... 97 4.2 MinimumW-DivergenceMethod ............................. 99 4.3 MinimumU-DivergenceMethod ............................. 101 4.4 Minimumγ-PowerEstimator ................................ 106 4.5 MaximumEntropyandMinimumDivergence .................. 110 4.6 Robustness ................................................ 113 4.7 Discussion ................................................ 121 References ..................................................... 121 ix x Contents PartII StatisticalMachineLearning 5 UnsupervisedLearningAlgorithms .............................. 125 5.1 ParetoClustering ........................................... 125 5.2 SpontaneousClustering ..................................... 135 5.3 RobustPCA ............................................... 142 5.4 RobustICA ................................................ 146 5.5 Discussion ................................................ 149 References ..................................................... 150 6 RegressionModel .............................................. 153 6.1 Introduction ............................................... 153 6.2 LinearRegressionModel .................................... 155 6.3 GeneralizedLinearModel ................................... 160 6.4 Quasi-LinearModel ........................................ 165 6.4.1 Log-ExpMeans ...................................... 169 6.4.2 StatisticalModelofLog-ExpMeans .................... 170 6.5 RegressionModelontothePositive-DefiniteSpace .............. 172 6.6 Discussion ................................................ 176 References ..................................................... 178 7 Classification .................................................. 179 7.1 U-Boost .................................................. 179 7.2 η-Boost ................................................... 182 7.3 AUCBoost ................................................ 183 7.4 U-AUCBoost .............................................. 187 7.5 Generalizedt-Statistic ....................................... 188 References ..................................................... 193 8 OutcomeWeightedLearninginDynamicTreatmentRegimes ...... 197 8.1 Introduction ............................................... 197 8.2 DivergenceofDecisionFunctions ............................ 198 8.3 Ψ-LossFunction ........................................... 201 8.4 BoostingAlgorithm ......................................... 210 8.5 Multiple-Stage ............................................. 214 References ..................................................... 216 Glossary .......................................................... 217 Index ............................................................. 219 Part I Minimum Divergence Geometry

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.