Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Pradeep Ravikumar August 2007 CMU-ML-07-115 Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Pradeep Ravikumar August 2007 CMU-ML-07-115 Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: John Lafferty (Chair) Carlos Guestrin Eric Xing Martin Wainwright, UC Berkeley Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright © 2007 Pradeep Ravikumar The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government of or any other entity. Keywords: Markov Random Fields, Graphical Models, Approximate Infer- ence, Structure Learning, Feature Estimation, Non-parametric Estimation, Spar- sity,` Regularization, AdditiveModels 1 Abstract Markovrandom fields(MRFs),orundirected graphical models, aregraphical rep- resentations of probability distributions. Each graph represents a family of dis- tributions – the nodes of the graph represent random variables, the edges encode independence assumptions, and weights over the edges and cliques specify a par- ticularmemberofthefamily. There are three main classes of tasks within this framework: the first is to perform inference, given the graph structure and parameters and (clique) feature functions; the second is to estimate the graph structure and parameters from data, giventhefeaturefunctions;thethirdistoestimatethefeaturefunctionsthemselves fromdata. Key inference subtasks include estimating the normalization constant (also called the partition function), event probability estimation, computing rigorous upper and lower bounds (interval guarantees), inference given only moment con- straints, andcomputingthemostprobable configuration. Thethesisaddresses alloftheabovetasksandsubtasks. ii Acknowledgements I will start with a couple of historical notes. For the first, we go to the post- Renaissance period, to the German polities. And for the second, we go to 2000 B.C.in India. These are the roots of the research university system, and the roots of the Indian monasteries respectively. What is common to both, is an environ- ment of learned wise people, who devote their ascetic lives to understanding the worldandexistence. Iapologizeforthebombast,butI’mnotdonewiththehistory andculture lesson yet. India has manygood things; among them isamultitude of holy men who devote their life to introspection and religion. A non-Indian might perhaps notfully graspthis, butmythesisadvisor John Laffertyhasalotofanin- tangiblesomethingcommontothebestofsuchholymen;aninnerpeace,wisdom. In the finest of Indian traditions dating back to 2000 B.C., I have learnt under a holywiseman. Myparentsmustbeveryproud. Inhisotherprofessorialrole,John hastaught me,asmuchasIcould learnattheleast, alotabout howtothink about research; I’vebenefitedalotfromhisclarityofthought andcreativeintellect. I have much to thank other monks at Carnegie Mellon as well. William Co- hen showed mehow to zoom in when doing research; Steve Fienberg showed me how to zoom out for the big picture. I’ve had a lot of fun working with Larry Wasserman,whoisasinsightful withacronymsasheiswithAllofStatistics. I’ve benefittednotonlyfromhisinsights,butalso,atafundamentallevel,fromhisstyle ofdoingresearch; whichhasgreatlyinfluencedmyresearch styleaswell. Imustalsothankmythesiscommittee. ThecreativeenergiesofCarlosGuestrin andEricXinghavealwaysfascinated me,andIwasluckytobetheTAofacourse they jointly offered, “Probabilistic Graphical Models”; their insights were quite helpful. Martin Wainwright’s research on graphical models, and his beautifully writtenpapers, wereprincipal motivatorsforthisthesisandmyresearch. I’m thankful to Diane Stidle who has been an ever-present source of help and advice. I’malsogratefulforthekindness andhelpofSharonCavlovich andMon- icaHopes. High-fives to my squash team; Vineet Goyal, Mohit Kumar, Matt Mason; for allthegoodtimesplaying andwinningtheleague. iv Finally, Ithank myparents Ushaand Pattabhiraman Ravikumar, for their love andtheirencouragements. Contents 1 Introduction 1 1.1 Representation theory . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Exponential FamilyRepresentation . . . . . . . . . . . . 5 1.2 PairwiseMRFs . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Tasksinagraphical model . . . . . . . . . . . . . . . . . . . . . 7 1.4 Whatthisthesisisabout . . . . . . . . . . . . . . . . . . . . . . 9 I ApproximateInference 11 2 LogPartition Function 13 2.1 ConjugateDualofthelog-partition function . . . . . . . . . . . . 16 3 PreconditionerApproximations 19 3.1 Preconditioners inLinearSystems . . . . . . . . . . . . . . . . . 19 3.2 GraphicalModelPreconditioners . . . . . . . . . . . . . . . . . . 20 3.2.1 MainIdea . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2 Generalized Eigenvalue Bounds . . . . . . . . . . . . . . 21 3.2.3 MainProcedure . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Generalized SupportTheoryforGraphicalModels . . . . . . . . 25 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 QuadraticProgrammingRelaxations forMAP 29 4.1 MAPEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 LinearRelaxations . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 QuadraticRelaxation . . . . . . . . . . . . . . . . . . . . . . . . 34 4.5 ConvexApproximation . . . . . . . . . . . . . . . . . . . . . . . 35 4.6 IterativeUpdateProcedure . . . . . . . . . . . . . . . . . . . . . 37 4.7 InnerPolytopeRelaxations . . . . . . . . . . . . . . . . . . . . . 38 vi CONTENTS 4.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5 GeneralEventProbabilities, Bounds 43 6 VariationalChernoffBounds 45 6.1 ClassicalandGeneralized ChernoffBounds . . . . . . . . . . . . 46 6.2 GraphicalModelChernoffBounds . . . . . . . . . . . . . . . . . 48 6.3 ExamplesofClassicalandGraphical ModelChernoffBounds . . 49 6.3.1 Example: ClassicalChernoffbounds . . . . . . . . . . . . 49 6.3.2 Example: ChernoffboundsforMarkovmodels . . . . . . 50 6.4 VariationalChernoffBounds . . . . . . . . . . . . . . . . . . . . 51 6.4.1 CollapsingtheNestedOptimization . . . . . . . . . . . . 52 6.5 TightnessofChernoffBounds . . . . . . . . . . . . . . . . . . . 53 6.6 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . 54 7 VariationalChebyshevBounds 57 7.1 GraphicalModelChebyshevbounds . . . . . . . . . . . . . . . . 58 7.2 Chebyshev-Chernoff Bounds . . . . . . . . . . . . . . . . . . . . 60 II Structure Learning 63 8 StructureFromData 65 8.1 Parameterizingedgeselection . . . . . . . . . . . . . . . . . . . 66 9 ` regularized regression 71 1 9.1 ProblemFormulationandNotation . . . . . . . . . . . . . . . . . 72 9.2 MainResultandOutlineofAnalysis . . . . . . . . . . . . . . . . 74 9.2.1 Statementofmainresult . . . . . . . . . . . . . . . . . . 74 9.2.2 Outlineofanalysis . . . . . . . . . . . . . . . . . . . . . 75 9.3 Primal-DualRelationsfor` -Regularized LogisticRegression . . 76 1 9.4 Constructing aPrimal-DualPair . . . . . . . . . . . . . . . . . . 77 9.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . 80 III Feature Estimation 83 10 Featuresfromdata 85 10.1 SmoothingandAdditiveModels . . . . . . . . . . . . . . . . . . 87
Description: