Advances in Minimum Description Advances in Minimum Description Length Theory and Applications Length edited by Peter D. Grünwald, In Jae Myung, and Mark A. Pitt Peter D. Grünwald is a researcher at CWI, the National The process of inductive inference—to infer general Research Institute for Mathematics and Computer Theory and Applications laws and principles from particular instances—is the Science, Amsterdam, the Netherlands. He is also affili- computer science/machine learning basis of statistical modeling, pattern recognition, and ated with EURANDOM, the European Research Institute machine learning. The minimum descriptive length for the Study of Stochastic Phenomena, Eindhoven, the Of related interest (MDL) principle, a powerful method of inductive infer- Netherlands. In Jae Myung and Mark A. Pitt are profes- ence, holds that the best explanation, given a limited sors in the Department of Psychology and members of Probabilistic Models of the Brain set of observed data, is the one that permits the great- the Center for Cognitive Science at Ohio State Perception and Neural Function est compression of the data—that the more we are able University. edited by Rajesh P. N. Rao, Bruno A. Olshausen, and Michael S. Lewicki to compress the data, the more we learn about the This book surveys some of the current probabilistic approaches to modeling and understanding brain function, regularities underlying the data. Advances in Minimum presenting top-down computational models as well as bottom-up neurally motivated models. The topics covered Description Length is a sourcebook that will introduce Neural Information Processing series include Bayesian and information-theoretic models of perception, probabilistic theories of neural coding and spike the scientific community to the foundations of MDL, A Bradford Book timing, computational models of lateral and cortico-cortical feedback connections, and the development of recent theoretical advances, and practical applications. receptive field properties from natural signals. Peter D. Grünwald The book begins with an extensive tutorial on MDL, covering its theoretical underpinnings, practical impli- cations as well as its various interpretations, and its Learning with Kernels Support Vector Machines, Regularization, Optimization, and Beyond In Jae Myung underlying philosophy. The tutorial includes a brief his- tory of MDL—from its roots in the notion of Kolmogorov Bernhard Schölkopf and Alexander J. Smola complexity to the beginning of MDL proper. The book Learning with Kernels provides an introduction to Support Vector Machines (SVMs) and related kernel methods. Mark A. Pitt then presents recent theoretical advances, introducing Although the book begins with the basics, it also includes the latest research. It provides all of the concepts modern MDL methods in a way that is accessible to necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine readers from many different scientific fields. The book learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the concludes with examples of how to apply MDL in powerful algorithms that have been developed in recent years. research settings that range from bioinformatics and machine learning to psychology. The MIT Press 0-262-07262-9 Massachusetts Institute of Technology Cambridge, Massachusetts 02142 ,!7IA2G2-ahcgcf!:t;K;k;K;k http://mitpress.mit.edu Advances in Minimum Description Length Grünwald, Myung, and Pitt, editors Advances in Minimum Description Length Neural Information Processing Series Michael I. Jordan, Sara A. Solla, and Terrence J. Sejnowski, Editors Advances in Large Margin Classifiers, Alexander J. Smola, Peter L. Bartlett, Bernhard Scho¨lkopf, and Dale Schuurmans, eds., 2000 Advanced Mean Field Methods: Theory and Practice, Manfred Opper and David Saad, eds., 2001 Probabilistic Models of the Brain: Perception and Neural Function, Rajesh P. N. Rao, Bruno A. Olshausen, and Michael S. Lewicki, eds., 2002 Exploratory Analysis and Data Modeling in Functional Neuroimaging, Friedrich T. Sommer and Andrzej Wichert, eds., 2003 Advances in Minimum Description Length: Theory and Applications, Peter D. Gru¨nwald, In Jae Myung and Mark A. Pit, eds., 2005 Advances in Minimum Description Length Theory and Applications edited by Peter D. Gru¨nwald In Jae Myung Mark A. Pitt A Bradford Book The MIT Press Cambridge, Massachusetts London, England ⃝c 2005 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying,recording,or information storage and retrieval) without permission in writing from the publisher. MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information,please email special Contents Series Foreword vii Preface ix I Introductory Chapters 1 1 Introducing the Minimum Description Length Principle 3 Peter Gru¨nwald 2 Minimum Description Length Tutorial 23 Peter Gru¨nwald 3 MDL, Bayesian Inference, and the Geometry of the Space of Probability Distributions 81 Vijay Balasubramanian 4 Hypothesis Testing for Poisson vs. Geometric Distributions Using Stochastic Complexity 99 Aaron D. Lanterman 5 Applications of MDL to Selected Families of Models 125 Andrew J. Hanson and Philip Chi-Wing Fu 6 Algorithmic Statistics and Kolmogorov’s Structure Functions 151 Paul Vit´anyi II Theoretical Advances 175 7 Exact Minimax Predictive Density Estimation and MDL 177 Feng Liang and Andrew Barron 8 The Contribution of Parameters to Stochastic Complexity 195 Dean P. Foster and Robert A. Stine vi 9 Extended Stochastic Complexity and Its Applications to Learning 215 Kenji Yamanishi 10 Kolmogorov’s Structure Function in MDL Theory and Lossy Data Compression 245 Jorma Rissanen and Ioan Tabus III Practical Applications 263 11 Minimum Message Length and Generalized Bayesian Nets with Asymmetric Languages 265 Joshua W. Comley and David L. Dowe 12 Simultaneous Clustering and Subset Selection via MDL 295 Rebecka J¨ornsten and Bin Yu 13 An MDL Framework for Data Clustering 323 Petri Kontkanen, Petri Myllyma¨ki, Wray Buntine, Jorma Rissanen, and Henry Tirri 14 Minimum Description Length and Psychological Clustering Models 355 Michael D. Lee and Daniel J. Navarro 15 A Minimum Description Length Principle for Perception 385 Nick Chater 16 Minimum Description Length and Cognitive Modeling 411 Yong Su, In Jae Myung, and Mark A. Pitt Index 435 Series Foreword The yearly Neural Information Processing Systems (NIPS) workshops bring to- gether scientists with broadly varying backgrounds in statistics, mathematics, com- puter science, physics, electrical engineering, neuroscience, and cognitive science, unified by a common desire to develop novel computational and statistical strate- gies for information processing, and to understand the mechanisms for information processing in the brain. As opposed to conferences, these workshops maintain a flexible format that both allows and encourages the presentation and discussion of work in progress, and thus serve as an incubator for the development of important new ideas in this rapidly evolving field. The series editors, in consultation with workshop organizers and members of the NIPS Foundation board, select specific workshop topics on the basis of scientific excelllence, intellectual breadth, and technical impact. Collections of papers chosen and edited by the organizers of specific workshops are built around pedagogical introductory chapters, while research monographs provide comprehensive descrip- tions of workshop-related topics to create a series of books that provides a timely, authoritative account of the latest developments in the exciting field of neural com- putation. Michael I. Jordan, Sara Al. Solla, and Terrence J. Sejnowski Preface To be able to forecast future events, science wants to infer general laws and prin- ciples from particular instances. This process of inductive inference is the central theme in statistical modeling, pattern recognition, and the branch of computer sci- ence called “machine learning.” The minimum description length (MDL) principle is a powerful method of inductive inference. It states that the best explanation (i.e., model) given a limited set of observed data is the one that permits the greatest compression of the data. Put simply, the more we are able to compress the data, the more we learn about the regularities underlying the data. The roots of MDL can be traced back to the notion of Kolmogorov complexity, introduced independently by R.J. Solomonoff, A.N. Kolmogorov, and G.J. Chaitin in the 1960s. These and other early developments are summarized at the end of Chapter 1 of this book, where a brief history of MDL is presented. The development of MDL proper started in 1978 with the publication of Modeling by the Shortest Data Description by J. Rissanen. Since then, significant strides have been made in both the mathematics and applications of MDL. The purpose of this book is to bring these advances in MDL together under one cover and in a form that could be easily digested by students in many sciences. Our intent was to make this edited volume a source book that would inform readers about state-of-the-art MDL and provide examples of how to apply MDL in a range of research settings. The book is based on a workshop we organized at the annual Neural Information Processing Systems (NIPS) conference held in Whistler, Canada in December 2001. It consists of sixteen chapters organized into three parts. Part I includes six introductory chapters that present the theoretical foundations of the MDL principle, its various interpretations, and computational techniques. In particular, chapters 1 and 2 offer a self-contained tutorial on MDL in a technically rigorous yet readable manner. In Part II, recent theoretical advances in modern MDL are presented. Part III begins with a chapter by J. Comley and D. Dowe that describes minimum message length (MML), a “twin sister” of MDL, and highlights the similarities and differences between these two principles. This is followed by five chapters that showcase the application of MDL in diverse fields, from bioinformatics to machine learning and psychology. We would like to thank our editor, Bob Prior, for the support and encouragement we received during the preparation of the book. We also thank Peter Bartlett, Alex Smola, Bernhard Scho¨lkopf, and Dale Schuurmans for providing LATEX-macros to facilitate formatting and creation of the book. We also thank the authors for