ebook img

The Minimum Description Length Principle PDF

715 Pages·2007·7.62 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Minimum Description Length Principle

The Minimum Description Length Principle The Minimum Description Length Principle PeterD.Grünwald TheMITPress Cambridge,Massachusetts London,England ©2007MassachusettsInstituteofTechnology All rights reserved. No part of this book may be reproducedin any form by any electronicormechanicalmeans(includingphotocopying,recording,orinformation storageandretrieval)withoutpermissioninwritingfromthepublisher. TypesetinPalatinobytheauthorusingLATEX2εwithC.Manning’sfbook.clsand statnlpbook.stymacros. PrintedandboundintheUnitedStatesofAmerica. LibraryofCongressCataloging-in-PublicationInformation Grünwald,PeterD. Theminimumdescriptionlengthprinciple/PeterD.Grünwald. p. cm.—(Adaptivecomputationandmachinelearning) Includesbibliographicalreferencesandindex. ISBN-13:978-0-262-07281-6(alk.paper) 1.Minimumdescriptionlength(Informationtheory)I.Title QA276.9G782007 003’.54—dc22 2006046646 10987654321 Tomyfather Brief Contents I IntroductoryMaterial 1 1 Learning,Regularity,andCompression 3 2 ProbabilisticandStatisticalPreliminaries 41 3 Information-TheoreticPreliminaries 79 4 Information-TheoreticPropertiesofStatisticalModels 109 5 CrudeTwo-PartCodeMDL 131 II UniversalCoding 165 6 UniversalCodingwithCountableModels 171 7 ParametricModels:NormalizedMaximumLikelihood 207 8 ParametricModels:Bayes 231 9 ParametricModels:PrequentialPlug-in 257 10 ParametricModels:Two-Part 271 11 NMLWithInfiniteComplexity 295 12 LinearRegression 335 13 BeyondParametrics 369 III RefinedMDL 403 14 MDLModelSelection 409 15 MDLPredictionandEstimation 459 16 MDLConsistencyandConvergence 501 17 MDLinContext 523 viii BriefContents IV AdditionalBackground 597 18 TheExponentialor“MaximumEntropy”Families 599 19 Information-TheoreticPropertiesofExponentialFamilies 623 Contents ListofFigures xix SeriesForeword xxi Foreword xxiii Preface xxv I IntroductoryMaterial 1 1 Learning,Regularity,andCompression 3 1.1 RegularityandLearning 4 1.2 RegularityandCompression 4 1.3 Solomonoff’sBreakthrough–KolmogorovComplexity 8 1.4 MakingtheIdeaApplicable 10 1.5 CrudeMDL,RefinedMDLandUniversalCoding 12 1.5.1 FromCrudetoRefinedMDL 14 1.5.2 UniversalCodingandRefinedMDL 17 1.5.3 RefinedMDLforModelSelection 18 1.5.4 RefinedMDLforPredictionandHypothesis Selection 20 1.6 SomeRemarksonModelSelection 23 1.6.1 ModelSelectionamongNon-NestedModels 23 1.6.2 GoalsofModelvs.PointHypothesisSelection 25 1.7 TheMDLPhilosophy 26 1.8 MDL,Occam’sRazor,andthe“TrueModel” 29 1.8.1 AnswertoCriticismNo.1 30 x Contents 1.8.2 AnswertoCriticismNo.2 32 1.9 HistoryandFormsofMDL 36 1.9.1 WhatIsMDL? 37 1.9.2 MDLLiterature 38 1.10 SummaryandOutlook 40 2 ProbabilisticandStatisticalPreliminaries 41 2.1 GeneralMathematicalPreliminaries 41 2.2 ProbabilisticPreliminaries 46 2.2.1 Definitions;NotationalConventions 46 2.2.2 ProbabilisticSources 53 2.2.3 LimitTheoremsandStatements 55 2.2.4 ProbabilisticModels 57 2.2.5 ProbabilisticModelClasses 60 ∗ 2.3 KindsofProbabilisticModels 62 2.4 TerminologicalPreliminaries 69 2.5 ModelingPreliminaries: GoalsandMethodsforInductiveInference 71 2.5.1 Consistency 71 2.5.2 BasicConceptsofBayesianStatistics 74 2.6 SummaryandOutlook 78 3 Information-TheoreticPreliminaries 79 3.1 CodingPreliminaries 79 3.1.1 RestrictiontoPrefixCodingSystems; DescriptionsasMessages 83 3.1.2 DifferentKindsofCodes 86 3.1.3 AssessingtheEfficiencyofDescriptionMethods 90 3.2 TheMostImportantSectionofThisBook: ProbabilitiesandCodeLengths 90 3.2.1 TheKraftInequality 91 3.2.2 CodeLengths“Are”Probabilities 95 3.2.3 ImmediateInsightsandConsequences 99 3.3 ProbabilitiesandCodeLengths,PartII 101 3.3.1 (Relative)EntropyandtheInformationInequality 103 3.3.2 UniformCodes,MaximumEntropy,andMinimax Codelength 106 3.4 Summary,Outlook,FurtherReading 106 Contents xi 4 Information-TheoreticPropertiesofStatisticalModels 109 4.1 Introduction 109 4.2 LikelihoodandObservedFisherInformation 111 4.3 KLDivergenceandExpectedFisherInformation 117 4.4 MaximumLikelihood:Datavs.Parameters 124 4.5 SummaryandOutlook 130 5 CrudeTwo-PartCodeMDL 131 5.1 Introduction:MakingTwo-PartMDLPrecise 132 5.2 Two-PartCodeMDLforMarkovChainSelection 133 5.2.1 TheCodeC2 135 5.2.2 TheCodeC1 137 5.2.3 CrudeTwo-PartCodeMDLforMarkovChains 138 5.3 SimplisticTwo-PartCodeMDLHypothesisSelection 139 5.4 Two-PartMDLforTasksOtherThanHypothesisSelection 141 5.5 BehaviorofTwo-PartCodeMDL 142 5.6 Two-PartCodeMDLandMaximumLikelihood 144 5.6.1 TheMaximumLikelihoodPrinciple 144 5.6.2 MDLvs.ML 147 5.6.3 MDLasaMaximumProbabilityPrinciple 148 5.7 ComputingandApproximatingTwo-PartMDLinPractice 150 5.8 JustifyingCrudeMDL:ConsistencyandCodeDesign 152 5.8.1 AGeneralConsistencyResult 153 5.8.2 CodeDesignforTwo-PartCodeMDL 157 5.9 SummaryandOutlook 163 5.A Appendix:ProofofTheorem5.1 163 II UniversalCoding 165 6 UniversalCodingwithCountableModels 171 6.1 UniversalCoding:TheBasicIdea 172 6.1.1 Two-PartCodesasSimpleUniversalCodes 174 6.1.2 FromUniversalCodestoUniversalModels 175 6.1.3 FormalDefinitionofUniversality 177 6.2 TheFiniteCase 178 6.2.1 MinimaxRegretandNormalizedML 179 6.2.2 NMLvs.Two-Partvs.Bayes 182 6.3 TheCountablyInfiniteCase 184 xii Contents 6.3.1 TheTwo-PartandBayesianCodes 184 6.3.2 TheNMLCode 187 6.4 PrequentialUniversalModels 190 6.4.1 DistributionsasPredictionStrategies 190 6.4.2 BayesIsPrequential;NMLandTwo-partAreNot 193 6.4.3 ThePrequentialPlug-InModel 197 ∗ 6.5 Individualvs.StochasticUniversality 199 6.5.1 StochasticRedundancy 199 6.5.2 UniformlyUniversalModels 201 6.6 Summary,OutlookandFurtherReading 204 7 ParametricModels:NormalizedMaximumLikelihood 207 7.1 Introduction 207 7.1.1 Preliminaries 208 7.2 AsymptoticExpa(cid:2)ns(cid:3)ionofParametricComplexity 211 7.3 TheMeaningof detI(θ)dθ 216 Θ 7.3.1 ComplexityandFunctionalForm 217 7.3.2 KLDivergenceandDistinguishability 219 7.3.3 ComplexityandVolume 222 7.3.4 ComplexityandtheNumberofDistinguishable ∗ Distributions 224 7.4 ExplicitandSimplifiedComputations 226 8 ParametricModels:Bayes 231 8.1 TheBayesianRegret 231 8.1.1 BasicInterpretationofTheorem8.1 233 8.2 BayesMeetsMinimax–Jeffreys’Prior 234 8.2.1 Jeffreys’PriorandtheBoundary 237 8.3 HowtoProvetheBayesianandNMLRegretTheorems 239 8.3.1 ProofSketchofTheorem8.1 239 8.3.2 BeyondExponentialFamilies 241 8.3.3 ProofSketchofTheorem7.1 243 ∗ 8.4 StochasticUniversality 244 8.A Appendix:ProofsofTheorem8.1andTheorem8.2 248 9 ParametricModels:PrequentialPlug-in 257 9.1 PrequentialPlug-inforExponentialFamilies 257 9.2 ThePlug-invs.theBayesUniversalModel 262 9.3 MorePreciseAsymptotics 265

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.