ebook img

Density Ratio Estimation in Machine Learning PDF

329 Pages·2012·3.597 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Density Ratio Estimation in Machine Learning

Density Ratio Estimation in Machine Learning MASASHI SUGIYAMA TokyoInstituteofTechnology TAIJI SUZUKI TheUniversityofTokyo TAKAFUMI KANAMORI NagoyaUniversity cambridge university press Cambridge,NewYork,Melbourne,Madrid,CapeTown, Singapore,SãoPaulo,Delhi,MexicoCity CambridgeUniversityPress 32AvenueoftheAmericas,NewYork,NY10013-2473,USA www.cambridge.org Informationonthistitle:www.cambridge.org/9780521190176 ©MasashiSugiyama,TaijiSuzuki,andTakafumiKanamori2012 Firstpublished2012 PrintedintheUnitedStatesofAmerica AcatalogrecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloginginPublicationdataisavailable ISBN978-0-521-19017-6Hardback Contents Foreword page ix Preface xi Part I Density-RatioApproachtoMachineLearning 1 Introduction 3 1.1 MachineLearning 3 1.2 Density-RatioApproachtoMachineLearning 9 1.3 AlgorithmsofDensity-RatioEstimation 13 1.4 TheoreticalAspectsofDensity-RatioEstimation 17 1.5 OrganizationofthisBookataGlance 18 Part II MethodsofDensity-RatioEstimation 2 DensityEstimation 25 2.1 BasicFramework 25 2.2 ParametricApproach 27 2.3 Non-ParametricApproach 33 2.4 NumericalExamples 36 2.5 Remarks 37 3 MomentMatching 39 3.1 BasicFramework 39 3.2 Finite-OrderApproach 39 3.3 Infinite-OrderApproach:KMM 43 3.4 NumericalExamples 44 3.5 Remarks 45 4 ProbabilisticClassification 47 4.1 BasicFramework 47 4.2 LogisticRegression 48 4.3 Least-SquaresProbabilisticClassifier 50 4.4 SupportVectorMachine 51 4.5 ModelSelectionbyCross-Validation 53 4.6 NumericalExamples 53 4.7 Remarks 54 5 DensityFitting 56 5.1 BasicFramework 56 5.2 ImplementationsofKLIEP 57 5.3 ModelSelectionbyCross-Validation 64 5.4 NumericalExamples 65 5.5 Remarks 65 6 Density-RatioFitting 67 6.1 BasicFramework 67 6.2 ImplementationofLSIF 68 6.3 ModelSelectionbyCross-Validation 70 6.4 NumericalExamples 73 6.5 Remarks 74 7 UnifiedFramework 75 7.1 BasicFramework 75 7.2 ExistingMethodsasDensity-RatioFitting 77 7.3 InterpretationofDensity-RatioFitting 81 7.4 PowerDivergenceforRobustDensity-RatioEstimation 84 7.5 Remarks 87 8 DirectDensity-RatioEstimationwithDimensionalityReduction 89 8.1 DiscriminantAnalysisApproach 89 8.2 DivergenceMaximizationApproach 99 8.3 NumericalExamples 108 8.4 Remarks 115 Part III ApplicationsofDensityRatiosinMachine Learning 9 ImportanceSampling 119 9.1 CovariateShiftAdaptation 119 9.2 Multi-TaskLearning 131 10 DistributionComparison 140 10.1 Inlier-BasedOutlierDetection 140 10.2 Two-SampleTest 148 11 MutualInformationEstimation 163 11.1 Density-RatioMethodsofMutualInformationEstimation 164 11.2 SufficientDimensionReduction 174 11.3 IndependentComponentAnalysis 183 12 ConditionalProbabilityEstimation 191 12.1 ConditionalDensityEstimation 191 12.2 ProbabilisticClassification 203 Part IV TheoreticalAnalysisofDensity-Ratio Estimation 13 ParametricConvergenceAnalysis 215 13.1 Density-RatioFittingunderKullback–LeiblerDivergence 215 13.2 Density-RatioFittingunderSquaredDistance 219 13.3 OptimalityofLogisticRegression 223 13.4 AccuracyComparison 225 13.5 Remarks 235 14 Non-ParametricConvergenceAnalysis 236 14.1 MathematicalPreliminaries 236 14.2 Non-ParametricConvergenceAnalysisofKLIEP 242 14.3 ConvergenceAnalysisofKuLSIF 247 14.4 Remarks 250 15 ParametricTwo-SampleTest 252 15.1 Introduction 252 15.2 EstimationofDensityRatios 253 15.3 EstimationofASCDivergence 257 15.4 OptimalEstimatorofASCDivergence 259 15.5 Two-SampleTestBasedonASCDivergenceEstimation 265 15.6 NumericalStudies 269 15.7 Remarks 274 16 Non-ParametricNumericalStabilityAnalysis 275 16.1 Preliminaries 275 16.2 RelationbetweenKuLSIFandKMM 279 16.3 ConditionNumberAnalysis 282 16.4 OptimalityofKuLSIF 286 16.5 NumericalExamples 292 16.6 Remarks 297 Part V Conclusions 17 ConclusionsandFutureDirections 303 ListofSymbolsandAbbreviations 307 References 309 Index 327 Foreword Estimating probability distributions is widely viewed as a central question in machinelearning.Thewholeenterpriseofprobabilisticmodelingusingprobabilis- ticgraphicalmodelsisgenerallyaddressedbylearningmarginalandconditional probability distributions. Classification and regression – starting with Fisher’s fundamental contributions – are similarly viewed as problems of estimating conditionaldensities. Thepresentbookintroducesanexcitingalternativeperspective–namely,that virtuallyallproblemsinmachinelearningcanbeformulatedandsolvedasprob- lems of estimating density ratios – the ratios of two probability densities. This bookprovidesacomprehensivereviewoftheelegantlineofresearchundertaken by the authors and their collaborators over the last decade. It reviews existing workondensity-ratioestimationandderivesavarietyofalgorithmsfordirectly estimating density ratios. It then shows how these novel algorithms can address notonlystandardmachinelearningproblems–suchasclassification,regression, and feature selection – but also a variety of other important problems such as learning under a covariate shift, multi-task learning, outlier detection, sufficient dimensionalityreduction,andindependentcomponentanalysis. Ateachpointthisbookcarefullydefinestheproblemsathand,reviewsexisting work,derivesnovelmethods,andreportsonnumericalexperimentsthatvalidate the effectiveness and superiority of the new methods.Aparticularly impressive aspectoftheworkisthatimplementationsofmostofthemethodsareavailable fordownloadfromtheauthors’webpages. Thelastpartofthebookisdevotedtomathematicalanalysesofthemethods. Thisincludesnotonlyananalysisforthecasewheretheassumptionsunderlying the algorithms hold, but also situations in which the models are misspecified. Carefulstudyoftheseresultswillnotonlyprovidefundamentalinsightsintothe problemsandalgorithmsbutwillalsoprovidethereaderwithanintroductionto manyvaluableanalytictools. Insummary,thisisadefinitivetreatmentofthetopicofdensity-ratioestimation. Itreflectstheauthors’carefulthinkingandsustainedresearchefforts.Researchers andstudentsalikewillfinditanimportantsourceofideasandtechniques.Thereis nodoubtthatthisbookwillchangethewaypeoplethinkaboutmachinelearning andstimulatemanynewdirectionsforresearch. ThomasG.Dietterich SchoolofElectricalEngineering OregonStateUniversity,Corvallis,OR,USA Preface Machine learning is aimed at developing systems that learn. The mathematical foundationofmachinelearninganditsreal-worldapplicationshavebeenexten- sively explored in the last decades. Various tasks of machine learning, such as regression and classification, typically can be solved by estimating probability distributionsbehinddata.However,estimatingprobabilitydistributionsisoneof themostdifficultproblemsinstatisticaldataanalysis,andthussolvingmachine learningtaskswithoutgoingthroughdistributionestimationisakeychallengein modernmachinelearning. Sofar,variousalgorithmshavebeendevelopedthatdonotinvolvedistribution estimation but solve target machine learning tasks directly. The support vector machineisasuccessfulexamplethatfollowsthisline–itdoesnotestimatedata- generating distributions but directly obtains the class-decision boundary that is sufficientforclassification.However,developingsuchanexcellentalgorithmfor eachofthemachinelearningtaskscouldbehighlycostlyanddifficult. Toovercometheselimitationsofcurrentmachinelearningresearch,weintro- duce and develop a novel paradigm called density-ratio estimation – instead of probabilitydistributions,theratioofprobabilitydensitiesisestimatedforstatisti- caldataprocessing.Thedensity-ratioapproachcoversvariousmachinelearning tasks,forexample,non-stationarityadaptation,multi-tasklearning,outlierdetec- tion, two-sample tests, feature selection, dimensionality reduction, independent componentanalysis,causalinference,conditionaldensityestimation,andproba- biliticclassification.Thus,density-ratioestimationisaversatiletoolformachine learning.Thisbookisaimedatintroducingthemathematicalfoundation,practical algorithms,andapplicationsofdensity-ratioestimation. Mostofthecontentsofthisbookarebasedonthejournalandconferencepapers wehavepublishedinthelastcoupleofyears.Weacknowledgeourcollaboratorsfor theirfruitfuldiscussions:HirotakaHachiya,ShoheiHido,YasuyukiIhara,Hisashi Kashima, Motoaki Kawanabe, Manabu Kimura, Masakazu Matsugu, Shin-ichi Nakajima,Klaus-RobertMüller,JunSese,JaakSimm,IchiroTakeuchi,Masafumi PicturetakeninNagano,Japan,inthesummerof2009.Fromlefttoright,TaijiSuzuki, MasashiSugiyama,andTakafumiKanamori. Takimoto, Yuta Tsuboi, Kazuya Ueki, Paul von Bünau, Gordon Wichern, and MakotoYamada. Finally, we thank the Ministry of Education, Culture, Sports, Science and Technology; theAlexander von Humboldt Foundation; the Okawa Foundation; Microsoft Institute for Japanese Academic Research Collaboration Collabora- tive Research Project; IBM FacultyAward; Mathematisches Forschungsinstitut OberwolfachResearch-in-PairsProgram;theAsianOfficeofAerospaceResearch and Development; Support Center forAdvanced Telecommunications Technol- ogyResearchFoundation;andtheJapanScienceandTechnologyAgencyfortheir financialsupport. MasashiSugiyama,TaijiSuzuki,andTakafumiKanamori 1 Introduction The goal of machine learning is to extract useful information hidden in data (Hastieetal.,2001;SchölkopfandSmola,2002;Bishop,2006).Thischapteris devoted to describing a brief overview of the machine learning field and show- ing our focus in this book – density-ratio methods. In Section 1.1, fundamental machinelearningframeworksofsupervisedlearning,unsupervisedlearning,and reinforcementlearningarebrieflyreviewed.Thenweshowexamplesofmachine learningproblemstowhichthedensity-ratiomethodscanbeappliedinSection1.2 and briefly review methods of density-ratio estimation in Section 1.3. A brief overviewoftheoreticalaspectsofdensity-ratioestimationisgiveninSection1.4. Finally,theorganizationofthisbookisdescribedinSection1.5. 1.1 MachineLearning Dependingonthetypeofdataandthepurposeoftheanalysis,machinelearning taskscanbeclassifiedintothreecategories: Supervisedlearning: An input–output relation is learned from input–output samples. Unsupervisedlearning: Someinteresting“structure”isfoundfrominput-only samples. Reinforcementlearning: A decision-making policy is learned from reward samples. Inthissectionwebrieflyrevieweachofthesetasks. 1.1.1 SupervisedLearning In the supervised learning scenario, data samples take the form of input–output pairs and the goal is to infer the input–output relation behind the data. Typi- cal examples of supervised learning problems are regression and classification (Figure1.1): 3

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.