HH temen iis OF MACHINE LEARNING The Morgan Kaufmann Series in Machine Learning Edited by Pat Langley Machine learning studies the mechanisms through which intelligent sys- tems improve their performance over time. Research on this topic ex- plores learning in many different domains, employs a variety of methods, and aims for quite different goals, but the field is held together by its concerns with computational mechanisms for learning. The Morgan Kaufmann Series in Machine Learning includes monographs and edited volumes that report progress in this area from a wide variety of per- spectives. The series is produced in cooperation with the Institute for the Study of Learning and Expertise, a nonprofit corporation devoted to research on machine learning. Elements of Machine Learning By Pat Langley C4.5: Programs for Machine Learning By J. Ross Quinlan Machine Learning Methods for Planning Edited by Steven Minton Concept Formation: Knowledge and Experience in Unsupervised Learning Edited by Douglas H. Fisher, Jr., Michael J. Pazzani, and Pat Langley Computational Models of Scientific Discovery and Theory Formation Edited by Jeff Shrager and Pat Langley Readings in Machine Learning Edited by Jude W. Shavlik and Thomas G. Dietterich HH emen Td OF MACHINE LEARNING Pat Langley Institute for the Study of Learning and Expertise and Stanford University ii i < Morgan Kaufmann Publishers, Inc. San Francisco, California Sponsoring Editor Michael B. Morgan Production Manager Yonte Overton Production Editor Elisabeth Beller Cover Designer Ross Carron Design (based on series design by Jo Jackson) Copyeditor Jeff Van Bueren Proofreader Ken DellaPenta Printer Courter Corporation This book has been author-typeset using 4TpX. Cover art is from The Celtic Art Source Book by Courtney Davis, © 1988, and is reproduced with permission from Cassell Publishers, London, England. Morgan Kaufmann Publishers, Inc. Editorial and Sales Office 340 Pine Street, Sixth Floor San Francisco, CA 94104-3205 USA Telephone 415/392-2665 Facsimile 415/982-2665 Internet [email protected] Web site http://mkp.com (© 1996 by Morgan Kaufmann Publishers, Inc. All rights reserved Printed in the United States of America 00 99 98 97 96 0 4 3 2 | No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, recording, or otherwise—without the prior written permission of the publisher. Library of Congress Cataloging-in-Publication Data is available for this book. ISBN 1-55860-301-8 Table of Contents Preface 1X 1. An overview of machine learning 1.1 The science of machine learning 1.2 Nature of the environment 1.3 Nature of representation and performance 10 1.4 Nature of the learning component 16 1.5 Five paradigms for machine learning 20 1.6 Summary of the chapter 23 2. The induction of logical conjunctions 2¢ 2.1 General issues in logical induction 27 2.2 Nonincremental induction of logical conjunctions 32 2.3 Heuristic induction of logical conjunctions 39 2.4 Incremental induction of logical conjunctions 43 2.5 Incremental hill climbing for logical conjunctions AY 2.6 Genetic algorithms for logical concept induction 06 2.7 Summary of the chapter 61 3. The induction of threshold concepts 67 3.1 General issues for threshold concepts 67 3.2 Induction of criteria tables 74 3.3 Induction of linear threshold units 18 3.4 Induction of spherical threshold units 89 3.5 Summary of the chapter 9] A. The induction of competitive concepts 95 4.1 Instance-based learning 96 4.2 Learning probabilistic concept descriptions 104 4.3 Summary of the chapter 112 vl TABLE OF CONTENTS . The construction of decision lists 115 9.1 General issues in disjunctive concept induction 1195 O.2 Nonincremental learning using separate and conquer 119 9.3 Incremental induction using separate and conquer 125 o.4 Induction of decision lists through exceptions 129 O.0 Induction of competitive disjunctions 132 9.6 Instance-storing algorithms 138 O.f Complementary beam search for disjunctive concepts 140 9.8 Summary of the chapter 144 . Revision and extension of inference networks 149 6.1 General issues surrounding inference networks 150 6.2 Extending an incomplete inference network 156 6.3 Inducing specialized concepts with inference networks 162 6.4 Revising an incorrect inference network 167 6.5 Network construction and term generation 174 6.6 Summary of the chapter 182 . The formation of concept hierarchies 187 7.1 General issues concerning concept hierarchies 187 7.2 Nonincremental divisive formation of hierarchies 191 7.3 Incremental formation of concept hierarchies 203 7.4 Agglomerative formation of concept hierarchies 212 1.0 Variations on hierarchy formation 217 7.6 Transforming hierarchies into other structures 219 C0 Summary of the chapter 221 . Other issues in concept induction 227 8.1 Overfitting and pruning 22¢ 8.2 Selecting useful features 233 8.3 Induction for numeric prediction 230 8.4 Unsupervised concept induction 240 8.5 Inducing relational concepts 242 8.6 Handling missing features 248 8.7 Summary of the chapter 201 TABLE OF CONTENTS Vil 9. The formation of transition networks 257 9.1 General issues for state-transition networks 257 9.2 Constructing finite-state transition networks 268 9.3 Forming recursive transition networks 214 9.4 Learning rules and networks for prediction 283 9.5 Summary of the chapter 284 10. The acquisition of search-control knowledge 289 10.1 General issues in search control 290 10.2 Reinforcement learning 297 10.3 Learning state-space heuristics from solution traces 304 10.4 Learning control knowledge for problem reduction 314 10.5 Learning control knowledge for means-ends analysis 320 10.6 The utility of search-control knowledge 324 10.7 Summary of the chapter 326 11. The formation of macro-operators 331 11.1 General issues related to macro-operators 332 11.2 The creation of simple macro-operators 337 11.3. The formation of flexible macro-operators 348 11.4 Problem solving by analogy 309 11.5 The utility of macro-operators 370 11.6 Summary of the chapter 3/1 12. Prospects for machine learning 377 12.1 Additional areas of machine learning 370 12.2 Methodological trends in machine learning 381 12.3 The future of machine learning 385 389 References Index 415 Preface Machine learning is a science of the artificial. The field’s main objects of study are artifacts, specifically algorithms that improve their perfor- mance with experience. The goals of this book are to introduce readers to techniques designed to acquire knowledge in this manner and to pro- vide a framework for understanding relationships among such methods. There seemed little point in writing a text that simply reflected the main paradigms within the machine learning community. I might eas- ily have written chapters on decision-tree induction, neural networks, case-based learning, genetic algorithms, and analytic methods. How- ever, surveys of these approaches are already available in the literature, and reiterating their content would ignore many underlying similarities among existing methods. Worse, such a text would reinforce existing di- visions that are too often based on notational and rhetorical differences rather than on substantive ones. Instead, I aimed for an organization that would cut across the stan- dard paradigm boundaries, in an attempt to cast the field in a new, and hopefully more rational, light. My intent was to describe the space of learning algorithms, including not only those that appear regularly in the literature, but also those that have received less attention. Such a “periodic table” of learning methods, even if incomplete, could serve both to clarify previous results and to suggest new directions for learning research. The resulting organization, as reflected in the following pages, builds on a central tenet of modern work on machine learning: that one cannot study learning in the absence of assumptions about how the acquired knowledge is described, how it is structured in memory, or how it is used. As a result, concerns about representation, organization, and performance occupy a primary role in the book’s composition. In particular, after an overview that covers the basic issues in machine learning, the text describes some simple learning methods that incorpo- X PREFACE rate restrictive assumptions about representation, specifically that one can represent knowledge as logical conjunctions (Chapter 2), threshold units (Chapter 3), or simple competitive concepts (Chapter 4). The book then turns to a variety of techniques that organize such descrip- tions into larger memory structures, including decision lists (Chapter 5), inference networks (Chapter 6), and concept hierarchies (Chapter 7). Because the organization of knowledge is orthogonal to representation of the components, these chapters draw on each of the approaches de- scribed in Chapters 2 through 5. A number of extensions to these basic techniques appear in Chapter 8. Most learning methods are designed to deal with static structures, but some work, especially that concerned with natural language and problem solving, focuses instead on structures that describe change over time. For this reason, the final three substantive chapters explore the ac- quisition of state-transition networks (Chapter 9), search-control knowl- edge (Chapter 10), and macro-operators (Chapter 11). These distinc- tions also cut across those made in earlier chapters, and although much of the work on this topic assumes a logical representation, other for- malisms are also covered. The closing chapter discusses some method- ological issues in machine learning and its relationship to other fields. I have done my best to present the material clearly and without un- necessary formalism. Because some precision is necessary, I have used a common pseudocode language to state the various algorithms, but I have tempered this approach with illustrative examples of various methods’ behavior on a few simple domains that cross chapter boundaries. I have also aimed for consistency in style and organization across the chapters. In each case, I have attempted to give a clear specification of both the learning task and the performance task that have driven research on the algorithms under examination. Finally, I typically consider issues of representation, organization, and performance before turning to the learning algorithms themselves, because (as argued above) one cannot understand the latter in the absence of the former. Despite my carefully nontraditional organization, some readers may prefer to focus initially on the best-known learning algorithms and work outward from there. Rest assured, they will find descriptions of stan- dard methods for learning logical decision lists (Section 5.2), inducing univariate decision trees (Section 7.2), altering weights in neural net- works (Section 6.4), nearest neighbor methods (Section 5.6), genetic