ebook img

Introduction to Machine Learning PDF

301 Pages·1989·14.702 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to Machine Learning

Introduction to Machine Learning Yves Kodratoff Research Director French National Scientific Research Council MORGAN KAUFMANN PUBLISHERS MORGAN KAUFMANN PUBLISHERS, INC. 2929 Campus Drive, Suite 260, San Mateo, CA 94403 Order Fulfillment: PO Box 50490, Palo Alto, CA 94303 ©YvesKodratoff First published in Great Britain in 1988 by Pitman Publishing 128 Long Acre, London WC2E 9AN First published in French as Leçons d'Apprentissage Symbolique Automatique by Cepadues-Editions, Toulouse, France (1986) Library of Congress Catalog Card #: 88-046077 ISBN 1-55860-037-X All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photo- copying, recording, or otherwise—without the prior written permission of the publishers. Printed in Great Britain at The Bath Press, Avon Cover design by Eitetsu Nozawa / v Foreword and Acknowledgements V ) This book has developed from a set of postgraduate lectures delivered at the University of Paris-Sud during the years 1983-1988. All the members of my research group at the 'Laboratoire de Recherche en Informatique' helped me during the preparation of this text. Without several European grants, and particularly the ESPRIT programme, I would never have had the possibility of creating such a group. In my group, I particularly thank Norbert Benamou, Jean-Jacques Cannât, Marta Franova, Jean-Gabriel Ganascia, Nicholas Graner, Michel Manago, Jean-Francois Puget, Jose Siquiera and Christel Vrain. Outside my group, Toni Bollinger, Christian de Sainte Marie and Gheorghe Tecuci were also very helpful. Special thanks are due to Ryszard Michalski who re-read Chapter 8 which concerns his own contribution to inductive machine learning. Special thanks are also due to my wife Marta. Besides the comfort she provides me as a wife, she is also a first-rate researcher and helps me a lot in my scientific work in addition to doing her own. She entirely re-read this English version and found many mistakes that had been left in the original French version. This English edition has been produced by Stephen Thorp who read and understood most of it while translating it. He pointed out many of my ambiguous French ways of speaking so this edition may be easier to understand than the French one. Yves Kodratoff LRI, Paris, 1988 ^^ 1 Why Machine Learning and Artificial Intelligence? I I The Contribution of Artificial Intelligence to Learning Techniques V J The approach to learning developed by Artificial Intelligence, as it will be described here, is a very young scientific discipline whose birth can be placed in the mid-seventies and whose first manifesto is constituted by the documents of the "First Machine Learning Workshop", which took place in 1980 at Carnegie-Mellon Univer- sity. From these documents a work was drawn which is the "Bible" of learning in Artificial Intelligence, entitled "Machine Learning: An Artificial Intelligence Approach". "Machine Learning" is written ML throughout this book. 1 HISTORICAL SKETCH The first attempts at learning for computers go back about 25 years. They consist principally of an attempt to model self-organization, self-stabilization and abilities to recognize shapes. Their common characteristic is that they attempt to describe an "incremental" system where knowledge is quasi-null at the start but grows progres- sively during the experiments "experienced" by the system. The most famous of these models is that of the perceptron due to F. Rosenblatt [Rosenblatt 1958], whose limitations were shown by Minsky and Pappert [Minsky & Pappert 1969]. Let us note that these limitations have been rejected recently by the new connectionist approach [Touretzky & Hinton 1985]. The most spectacular result obtained in this period was Samuel's (1959, 1963). It consists of a system which learns to play checkers, and it achieved mastery through learning. A detailed study of this program enables us to understand why it disap- pointed the fantastic hopes which emerged after this success (of which the myth of the super-intelligent computer is only a version for the general public). In fact, Samuel had provided his program with a series of parameters each of which was able to take numerical values. It was these numerical values which were adjusted by experience and Samuel's genius had consisted in a particularly judicious choice of these parame- ters. Indeed, all the knowledge was contained in the definition of the parameters, rather than in the associated numerical values. For example, he had defined the con- cept of "move centrality" and the real learning was done by inventing and recognizing 1 the importance of this parameter rather than its numerical value, so that in reality it was done by Samuel himself. During the Sixties another approach emerged: that of symbolic learning, oriented toward the acquisition of concepts and structured knowledge. The most famous of the supporters of this approach is Winston (1975) and the most spectacular result was obtained by Buchanan's META-DENDRAL program [Buchanan 1978] which gen- erates rules that explain mass spectroscopy data used by the expert system DENDRAL [Buchanan 1971]. As written above, a new approach began about ten years ago, it does not reject the two previous ones but includes them. It consists in recognizing that the main successes of the past, those of Samuel or Buchanan for example, were due to the fact that an important mass of knowledge was used in their systems implicitly. How could it now be included explicitly? And above all how could it be controlled, augmented, modified? These problems appear important to an increasingly high proportion of AI researchers. At this moment ML is in a period of rapid growth. This is principally due to the successes encountered by the initiators of the AI approach to Learning. 2 VARIOUS SORTS OF LEARNING Keep it clearly in mind that many other approaches to automatic knowledge acquisition exist apart from AI: the Adaptive Systems of Automata Theory, Grammati- cal Inference stemming from Shape Recognition, Inductive Inference closely connected with Theoretical Computer Science and the many numerical methods of which Con- nectionism is the latest incarnation. But it turns out that even within the AI approach there are numerous approaches to the automatic acquisition of knowledge: these are the ones that we shall devote our- selves to describing. In Appendix 2, we shall describe some problems of inductive inference and pro- gram synthesis which, although marginal, seem nevertheless to belong to our subject. Before describing the main forms of learning, it must be emphasized that three kinds of problem can be set in each of them. The first is that of clustering, (which is called "classification" in Data Analysis): given a mass of known items, how can the features common to them be discovered in such a way that we can cluster them in sub-groups which are simpler and have a meaning? The immense majority of procedures for clustering are numerical in nature. This is why we shall recall them in chapter 10. The problem of conceptual classification is well set by a classic example due to Michalski. 2 The points A and C are very far apart. Must they belong to the same sub-group? The second problem (of discrimination) is that of learning classification pro- cedures. Given a set of examples of concepts, how is a method to be found which enables each concept to be recognized in the most efficient way? The great majority of exist- ing methods rest on numerical evaluations bound up with the diminution of an entropy measure after the application of descriptors. This is described in chapter 10. We shall also present a symbolic approach to this problem. The third problem is that of generalization. Starting from concrete examples of a situation or a rule, how can a formula be deduced which will be general enough to describe this situation or this rule, and how can it be explained that the formula has this descriptive capacity? For example, it can be asked how starting from a statement like: "France buys video-recorders from Japan", the more general rule can be derived: "Countries which have not sufficiently developed their research in solid- state physics buy electronic equipment from countries which have." It is not yet reasonable to expect from a learning system that it should be really capable of making such inferences without being led step by step. The rest of the book is going to show how we are at least beginning to glimpse the solution to this problem. 2.1 SBL versus EBL It was during the 1985 "International Workshop in Machine Learning" that was defined the distinction between Similarity Based Learning (SBL) [Lebowitz 1986, 3 Michalski 1984, Quinlan 1983] and Explanation Based Learning (EBL) [DeJong 1981, Silver 1983, Mitchell 1985]. In SBL, one learns by detecting firstly similarities in a set of positive examples, secondly dissimilarities between positive and negative examples. The two chapters 8 and 9 are devoted to methods which enable this to be achieved. In EBL, the input to the learning consists of explanations derived from the analysis of a positive or negative example of the concept or rule which is being learned. Generally, this kind of learning is done with a problem-solving system. Each time the system arrives at a solution it is, of course, either a success or a failure (in that case one talks of negative examples). A module then analyzes the reasons for this success or failure. These reasons are called "explanations" and they are used to improve the system. A detailed study of several approaches of this type will be found in chapter 5, 6, and 7. 2.1.1 A simple example of SBL Let us consider the positive examples: {B, D, E, F, H, K, L}. The reader can detect that these are all capital letters which have in common the fact that their biggest left-hand vertical line touches two small horizontal lines to its left. Let us suppose we are given: {C} as a negative example to the above series; then we detect that the similarity found above does indeed separate the positive examples from the negative ones. If we now add: {M, N} as negative examples then we have to look for a new similarity between the positive examples which must be a dissimilarity from the nega- tive examples. A suggestion: they are capital letters whose biggest left-hand vertical line touches two small horizontal lines to its left, and if there is a big line toward the right beginning from the top of the vertical line, then this line is horizontal. 2.1.2 A simple example of EBL An explanation is always, in practice, a proof. This proof points (in the sense of pointing with the finger) to the important piece of knowledge which is going to have to be preserved. Suppose we had a complete description of a hydrogen balloon with its dimensions, its color, the fact that it is being rained on, the political context in which it was inflated, etc... 4 An SBL system would ascertain that a red balloon rises in air, that a blue balloon does too, that a green balloon does too etc... to conclude that the color has nothing to do with whether the balloon rises in air. An EBL system, on the other hand, given a single example of a red balloon that flies off, will seek to prove that it must indeed rise. To cut a long argument short, it will ascertain in the end that if the weight of the volume of air displaced is bigger than the weight of the balloon, then it must rise. The arguments will be about the weight of the balloon's envelope, the density of hydrogen, the temperature and the degree of humidity of the air. It will conclude with certainty that color and politics have nothing to do with the matter, and that, on the other hand, the data contained in the arguments are the significant descriptors for this problem. 2.2 Numerical versus conceptual learning These two forms of learning are opposite in their means and their goals. The numerical approach aims to optimize a global parameter such as entropy in the case of Quinlan's ID3 program [Quinlan 1983] or such as distance between examples in Data Analysis [Diday & al. 1982]. Its aim is to show up a set of descriptors which are the "best" relative to this optimization. It also has as a consequence the generation of "clusters" of examples. It is well-known that the numerical approach is efficient and resistant to noise but that it yields rules or concepts which are in general incomprehensible to humans. Conversely, the symbolic approach is well-suited to interaction with human experts, but it is very sensitive to noise. It aims at optimizing a recognition function which is synthesized on the basis of examples. This function is usually required to be complete, which means that it must recognize all the positive examples, and to to be discriminant, which means that it rejects all the negative examples. Its aim is to attempt to express a conceptual relationship between the examples. The examples of EBL and SBL given above are also examples of symbolic learn- ing. Examples of numerical learning will be found in chapter 10. 2.3 Learning by reward/punishment Weightings are associated with each concept or rule to indicate the importance of using it. In this kind of learning, the system behaves a bit like a blind man who gropes in all directions. 5 Each time it obtains a positive outcome (where the notions of positive and nega- tive are often very dependent on the problem set), the system will assign more weight to the rules which brought it to this positive outcome. Each time it obtains a negative result, it reduces the weighting for the use of the rules it has just used. This kind of learning is very spectacular, since it makes it possible to obtain sys- tems which are independent of their creator once they begin to work. On the other hand, you can well imagine that the definition of the concepts or rules, the definition of positive and negative depend closely on the problem set. These systems are very hard to apply outside their field of specialization and are very difficult to modify. 2.4 Empirical versus rational learning In empirical learning, the system acquires knowledge in a local manner. For example, if a new rule helps it with a problem it is solving, the rule is added to the knowledge base, provided it does not contradict the others already there. Learning is said to be rational, on the other hand, when the addition of the new rule is examined by a module which seeks to connect it with the other global knowledge about the Universe in which the system is situated. So it is clear that rational learning will be able to introduce environment-dependent data naturally, whereas empirical learning is going to be frustrated by this type of question. In the case of learning by testing examples, a similar difference exists. Since the difference between the empirical and rational approaches is always illustrated by EBL, we, in contrast, are now going to give an example of the difference between these two approaches using SBL. - An example of rational versus empirical similarity detection 2.4.1 Studying the positive examples Let us suppose that we wish to learn a concept given the two following positive examples. E : DOG(PLUTO) x E : CAT(CRAZY) & WOLF(BIGBAD) 2 where PLUTO, CRAZY and BIGBAD are the names of specific animals. 6 In both cases one still uses general pieces of knowledge of the universe in which the learning takes place. Suppose that we know that dogs and cats are domestic animals, that dogs and wolves are canids, and that they are all mythical animals (referring to Walt Disney's 'Pluto', R. Crumb's 'Crazy Cat' and the 'Big Bad Wolf of the fairy-tales). This knowledge is known by theorems like Egl ri ai [WOLF(x) => CANID(x)]. empi C Empirical learning will use one such piece of knowledge to find one of the possi- ble generalizations. For example, it will detect the generalizations: Eglempiricai CANID(x) & NUMBEROFOCCURRENCES(x) = 1 Eglempiricai DOMESTIC(x) & NUMBEROFOCCURRENCES(x) = 1 Eglempirical MYTHICAL-ANIMAL(x) & NUMBEROFOCCURRENCES(x) = 1 OR 2 which says that there is a canid in each example etc... The negative examples will serve to choose the "right" generalization (or generalizations), as we shall see a little farther on. Rational learning is going to try to find the generalization which preserves all the information which can possibly be drawn from the examples. The technique used for this has been called structural matching. Before even attempting to generalize, one tries to structurally match the examples to use the known features. The examples are going to be re-written as follows. Ei : DOG(PLUTO) & DOG(PLUTO) & DOMESTIC(PLUTO) & CANID(PLUTO) & MYTHICAL-ANIMAL(PLUTO) & MYTHICAL-ANIMAL(PLUTO) E{ : CAT(CRAZY) & WOLF(BIGBAD) & DOMESTIC(CRAZY) & CANID(BIGBAD) & MYTHICAL-ANIMAL(CRAZY) & MYTHICAL-ANIMAL(BIGBAD) In these expressions all the features of the domain have been used at once, dupli- cating them if necessary, to improve the matching of the two examples. Here we use the standard properties of the logical connectives A <ί=> A & A, and A => B is equivalent to A & B <=> A to be able to declare that E <*=> Εχ x E <=*> E\ 2 2 In the final generalization we only keep what is common to both examples, so it will be Egrationai ' DOMESTIC(x) & CANID(y) & MYTHICAL-ANIMAL(x) & MYTHICAL-ANIMAL(y). 7

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.