ebook img

Con - CWI Amsterdam | Research in mathematics and computer science PDF

119 Pages·2007·0.79 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Con - CWI Amsterdam | Research in mathematics and computer science

Philosophical Applications of Computational Learning Theory: Chomskyan Innateness and Occam's Razor Afstudeerscriptie Filoso(cid:12)e van de Informatica (15sp) R. M. de Wolf 114903 Erasmus Universiteit Rotterdam Vakgroep: Theoretische Filoso(cid:12)e Begeleider: Dr. G. J. C. Lokhorst Juli 1997 Contents Preface v 1 Introduction to Computational Learning Theory 1 1.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Algorithms that Learn Concepts from Examples : : : : : : : : : 2 1.3 The PAC Model and E(cid:14)ciency : : : : : : : : : : : : : : : : : : : 4 1.4 PAC Algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 1.5 Sample Complexity : : : : : : : : : : : : : : : : : : : : : : : : : : 8 1.6 Time Complexity : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 1.6.1 Representations : : : : : : : : : : : : : : : : : : : : : : : : 10 1.6.2 Polynomial-Time PAC Learnability: : : : : : : : : : : : : 11 1.7 Some Related Settings : : : : : : : : : : : : : : : : : : : : : : : : 14 1.7.1 Polynomial-Time PAC Predictability : : : : : : : : : : : : 14 1.7.2 Membership Queries : : : : : : : : : : : : : : : : : : : : : 15 1.7.3 Identi(cid:12)cation from Equivalence Queries : : : : : : : : : : 15 1.7.4 Learning with Noise : : : : : : : : : : : : : : : : : : : : : 16 1.8 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 2 Application to Language Learning 19 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 2.2 A Brief History of the Chomskyan Revolutions : : : : : : : : : : 20 2.2.1 Against Behaviorism : : : : : : : : : : : : : : : : : : : : : 20 2.2.2 Transformational-Generative Grammar : : : : : : : : : : 21 2.2.3 Principles-and-Parameters : : : : : : : : : : : : : : : : : : 23 2.2.4 The Innateness of Universal Grammar : : : : : : : : : : : 23 2.3 Formalizing Languages and Grammars : : : : : : : : : : : : : : : 25 2.3.1 Formal Languages : : : : : : : : : : : : : : : : : : : : : : 25 2.3.2 Formal Grammars : : : : : : : : : : : : : : : : : : : : : : 26 2.3.3 The Chomsky Hierarchy : : : : : : : : : : : : : : : : : : : 27 2.4 Simplifying Assumptions for the Formal Analysis : : : : : : : : : 28 2.4.1 Language Learning Is Algorithmic Grammar Learning : : 29 2.4.2 All Children Have the Same Learning Algorithm : : : : : 30 2.4.3 No Noise in the Input : : : : : : : : : : : : : : : : : : : : 30 2.4.4 PAC Learnability Is the Right Analysis : : : : : : : : : : 31 2.5 Formal Analysis of Language Learnability : : : : : : : : : : : : : 31 2.5.1 The PAC Setting for Language Learning : : : : : : : : : : 31 i ii CONTENTS 2.5.2 A Conjecture : : : : : : : : : : : : : : : : : : : : : : : : : 32 2.6 The Learnability of Type 4 Languages : : : : : : : : : : : : : : : 32 2.7 The Learnability of Type 3 Languages : : : : : : : : : : : : : : : 35 2.8 The Learnability of Type 2 Languages : : : : : : : : : : : : : : : 36 2.8.1 Of What Type Are Natural Languages? : : : : : : : : : : 36 2.8.2 A Negative Result : : : : : : : : : : : : : : : : : : : : : : 37 2.8.3 k-Bounded Context-Free Languages Are Learnable : : : : 37 2.8.4 Simple Deterministic Languages are Learnable : : : : : : 38 2.9 The Learnability of Type 1 Languages : : : : : : : : : : : : : : : 39 2.10 Finite Classes Are Learnable : : : : : : : : : : : : : : : : : : : : 39 2.11 What Does All This Mean? : : : : : : : : : : : : : : : : : : : : : 41 2.12 Wider Learning Issues : : : : : : : : : : : : : : : : : : : : : : : : 42 2.13 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 43 3 Kolmogorov Complexity and Simplicity 45 3.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 3.2 De(cid:12)nition and Properties : : : : : : : : : : : : : : : : : : : : : : 47 3.2.1 Turing Machines and Computability : : : : : : : : : : : : 48 3.2.2 De(cid:12)nition : : : : : : : : : : : : : : : : : : : : : : : : : : : 49 3.2.3 Objectivity up to a Constant : : : : : : : : : : : : : : : : 50 3.2.4 Non-Computability : : : : : : : : : : : : : : : : : : : : : : 51 3.2.5 Relation to Shannon's Information Theory: : : : : : : : : 52 3.3 Simplicity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53 3.4 Randomness : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 57 3.4.1 Finite Random Strings : : : : : : : : : : : : : : : : : : : : 57 3.4.2 In(cid:12)nite Random Strings : : : : : : : : : : : : : : : : : : : 59 3.4.3 An Interesting Random String : : : : : : : : : : : : : : : 62 3.5 Go(cid:127)del's Theorem Without Self-Reference: : : : : : : : : : : : : : 64 3.5.1 The Standard Proof : : : : : : : : : : : : : : : : : : : : : 65 3.5.2 A Proof Without Self-Reference: : : : : : : : : : : : : : : 67 3.5.3 Randomness in Mathematics? : : : : : : : : : : : : : : : : 68 3.6 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 69 4 Occam's Razor 71 4.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 71 4.2 Occam's Razor : : : : : : : : : : : : : : : : : : : : : : : : : : : : 73 4.2.1 History : : : : : : : : : : : : : : : : : : : : : : : : : : : : 73 4.2.2 Applications in Science : : : : : : : : : : : : : : : : : : : 74 4.2.3 Applications in Philosophy : : : : : : : : : : : : : : : : : 77 4.3 Occam and PAC Learning : : : : : : : : : : : : : : : : : : : : : : 79 4.4 Occam and Minimum Description Length : : : : : : : : : : : : : 81 4.5 Occam and Universal Prediction : : : : : : : : : : : : : : : : : : 85 4.6 On the Very Possibility of Science : : : : : : : : : : : : : : : : : 88 4.7 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 89 4.A Proof of Occam's Razor (PAC Version): : : : : : : : : : : : : : : 90 CONTENTS iii 5 Summary and Conclusion 93 5.1 Computational Learning Theory : : : : : : : : : : : : : : : : : : 93 5.2 Language Learning : : : : : : : : : : : : : : : : : : : : : : : : : : 94 5.3 Kolmogorov Complexity : : : : : : : : : : : : : : : : : : : : : : : 94 5.4 Occam's Razor : : : : : : : : : : : : : : : : : : : : : : : : : : : : 95 5.5 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 96 List of Symbols 97 Bibliography 99 Index of Names 107 Index of Subjects 109 iv CONTENTS Preface Whatislearning? Learningiswhatmakesusadapttochangesandthreats, and what allows us to cope with a world in (cid:13)ux. In short, learning is what keeps us alive. Learning has strong links to almost any other topic in philosophy: scienti(cid:12)cinference,knowledge,truth,reasoning(logic),language,anthropology, behaviour (ethics), good taste (aesthetics), and so on. Accordingly, it can be seen as one of the quintessential philosophical topics|an appropriate topic for a graduation thesis! Much can be said about learning, too much to (cid:12)t in a single thesis. Therefore this thesis is restricted in scope, dealing only with computational learning theory (often abbreviated to COLT). Learning seems so simple: we do it every day, often without noticing it. Nevertheless,itisobviousthatsomefairlycomplexmechanismsmustbeatwork whenwelearn. COLTisthebranchofArti(cid:12)cialIntelligencethat dealswiththe computational propertiesand limitationsof such mechanisms. The (cid:12)eldcan be seen as the intersectionof Machine Learningandcomplexity theory. COLTisa veryyoung(cid:12)eld|thepublicationofValiant'sseminalpaperin1984maybeseen as its birth|and it appears that virtually none of its many interesting results are known to philosophers. Some philosophicalwork has referredto complexity theory (for instance [Che86]) and some has referred to Machine Learning (for instance [Tha90]), but as far as I know, thus far no philosophical use has been made of the results that have sprung from COLT. For instance, at the time of writingofthisthesis,thePhilosopher'sIndex,adatabasecontainingmostmajor publications in philosophical journals or books, contains no entries whatsoever that refer to COLT or to its main model, the model of PAC learning; there is hardly any reference to Kolmogorov complexity, either. Stuart Russell devotes twopagestoPAClearning[Rus91,pp.43{44]andJamesMcAllisterdevotesone pagetoKolmogorovcomplexityasaquantitativemeasureofsimplicity[McA96, pp. 119{120], but both do not provide more than a sketchy and super(cid:12)cial explanation. As its title already indicates, the present thesis tries to make contact be- tween COLT and philosophy. The aim of the thesis is threefold. The (cid:12)rst and most shallowgoal isto obtaina degree inphilosophyforitsauthor. Thesecond goal is to take a number of recent results from computational learning theory, insertthemintheirappropriatephilosophicalcontext, andseehowtheybearon various ongoing philosophical discussions. The third and most ambitious goal is to draw the attention of philosophers to computational learning theory in general. Unfortunately, the traditionalreluctance of philosophers(inparticular those of a non-analytical strand) to use formal methods, as well as their inapt- v vi CONTENTS ness with formal methods once they have outgrown that reluctance, makes this goal rather hard to attain. Nevertheless, it is my opinion that computational learning theory|or, for that matter, complexity theory as a whole|has much to o(cid:11)er to philosophy. Accordingly, this thesis may be seen as a plea for the philosophical relevance of computational learning theory as a whole. The thesis is organized as follows. In the (cid:12)rst chapter, we will give an overview of the main learning setting used in COLT. We will stick to the rig- orous formal de(cid:12)nitions as used in COLT, but supplement them with a lot of informal and intuitive comment in order to make them accessible and readable for philosophers. After that introductory chapter, the second chapter applies results from COLT to Noam Chomsky's ideas about language learning and the innateness of linguistic biases. The third chapter gives an introduction to the theory of Kolmogorov complexity, which provides us with a fundamental mea- sure of simplicity. Kolmogorov complexity does not belong to computational learning theory proper (its invention in fact pre-dates COLT), but its main application for us lies in Occam's Razor. This fundamental maxim tells us to go for the simplest theories consistent with the data, and is highly relevant in the context of learning in general, and scienti(cid:12)c theory construction in partic- ular. The fourth chapter provides di(cid:11)erent formal settings in which some form of the razor can be mathematically justi(cid:12)ed, using Kolmogorov complexity to quantitatively measure simplicity. Finally, we end with a brief chapter that summarizes the thesis in non-technical terms. Let me end this preface by expressing my gratitude to a number of peo- ple who contributed considerably to the contents of this thesis. First of all, I would of course like to thank my thesis advisor Gert-Jan Lokhorst|one of those philosophers who, like myself, do not hesitate to invoke formal de(cid:12)ni- tions and results whenever these might be useful|for his many comments and suggestions. Secondly, many thanks should go to Shan-Hwei Nienhuys-Cheng, who put me on the track of inductive learning in the (cid:12)rst place, by inviting me to join her in writing a book on inductive logic programming [NW97]|a book which eventually took us more than two and a half years to (cid:12)nish. Chapter 18 of that book actually provided the basis for a large part of the (cid:12)rst chapter of the present thesis. Finally, I would like to thank Jeroen van Rijen for his many helpful comments, Peter Sas and Peter Gru(cid:127)nwald for some references on linguistics, and Paul Vita(cid:19)nyi for his course on learning and Kolmogorov complexity. Chapter 1 Introduction to Computational Learning Theory 1.1 Introduction This thesis is about philosophical applications of \computational learning the- ory", and the present chapter provides an introduction to this (cid:12)eld. Why should we, as philosophers, be interested in something like a theory of learning? The importance of learning can be illustrated on the basis of the following quotation from Homer's Iliad: Him she found sweating with toil as he moved to and fro about his bellows in eager haste; for he was fashioning tripods, twenty in all, to stand around the wall of his well-buildedhall, and golden wheels had he set beneath the base of each that of themselves they might enter the gathering of the gods at his wish and again return to his house, a wonder to behold. Iliad, XVIII, 372{377 (pp. 315{317 of [Hom24], second volume). Thisquotation might wellbethe (cid:12)rstever reference to somethinglike Arti(cid:12)cial Intelligence: man-made (or in this case, god-made) artifacts displaying intelli- gent behaviour. As Thetis, Achilles' mother, enters Hephaestus' house in order to fetch her son a new armour, she (cid:12)nds Hephaestus constructing something we today would call robots. His twenty tripods are of themselves to serve the gathering of the gods (bring them food, etc.), whenever Hephaestus so desires. Let us consider for a moment the kind of behaviour such a tripod should display. Obviously, it should be able to recognise Hephaestus' voice, and to extract hiswishesfrom hiswords. But furthermore, whenservingthe gods, the tripod should \know" and act upon many requirements, such as the following: 1. If there is roasted owl for dinner, don't give any to Pallas Athena. 2. Don't come too close to Hera if Zeus has committed adultery again. 3. Stop fetching wine for Dionysus when he is too drunk. ::: 1 2 CHAPTER 1. COMPUTATIONAL LEARNING THEORY It is clear that this list can be continued without end. Again and again, one can thinkof new situationsthat the tripodshouldbe ableto adapt to properly. It seems impossibleto take all these requirements into account explicitlyin the construction of the intelligent tripod. The task of \coding" each of the in(cid:12)nite number of requirements into the tripods may be considered too much, even for Hephaestus, certainly one of the most industrious among the Greek gods. One solution to this problem would be to initially endow the tripod with a modestamountofgeneralknowledgeaboutwhatitshoulddo,andtogiveitthe abilitytolearnfromthewaytheenvironmentreactstoitsbehaviour. Thatis,if the tripoddoes somethingwrong, it can adjust itsknowledge and itsbehaviour 1 accordingly, thus avoiding to make the same mistake in the future. In that way, the tripod need not know everything beforehand. Instead, it can build up most of the required knowledge along the way. Thus the tripod's ability to learn would save Hephaestus a lot of trouble. Theimportanceoflearningisnot restrictedto artifacts builtto serve divine wishes. Human beings, from birth till death, engage in an ongoing learning process. After all, children are not born with their native language, polite manners, the abilities to read and write, earn their living, make jokes, or to do philosophy or science (or even philosophy of science). These things have to be learned. In fact, if we take `learning' in a su(cid:14)ciently broad sense, any kind of adaptive behaviour will fall under the term. Since learning is of crucial importance to us, a theory of learning is of crucial importance to philosophy. 1.2 Algorithms that Learn Concepts from Exam- ples After the previous section, the importance of a theory of learning should be clear. But why computational learning theory? Well, a theory of learning should be about the way we human beings learn, or, more generally, about the way any kind of learning can be achieved. A description of \a way of learning" will usually boil down to something like a \recipe": learning amounts to doing such-and-such things, taking such-and-such steps. Now, it seems that the only precise way we have to specify this \such and such steps"|and of course we shouldaimforprecision|isbymeansofanalgorithm. Ingeneral,analgorithm, or a mechanical procedure, precisely prescribes the sequence of steps that have to be taken in order to solve some problem. Hence it is obvious that learning 2 theorycan(orperhapsevenshould) involvethestudyofalgorithms thatlearn. Sincealgorithmsperformcomputation,algorithmictheoryoflearningisusually called computational learning theory. In this thesis, we will mainly be concerned with learning a concept from 1 Ofcourse,forthisschemetowork,wehavetoassumethatthetripod\survives"itsinitial failures. IfZeusimmediatelysmashesthetripodintopiecesforbringinghimwhiteinsteadof redwine, thetripod won't be able to learn from its experience. 2 Thenotionofanalgorithmincludesneural networks,atleastthosethatcanbesimulated onanordinarycomputer. Infact,thelearnabilityofneuralnetworkshasbeenoneofthemost prominentresearch areas in computational learning theory. 1.2. ALGORITHMS THAT LEARN CONCEPTS FROM EXAMPLES 3 examples. If we take the term `example' in a su(cid:14)ciently broad sense, almost any kind of learning will be based on examples, so restricting attention to learningfrom examplesisnotreallyarestriction. Ontheotherhand,restricting attention to learning a concept appears to be quite restrictive, since it excludes learning \know-how" knowledge, for instance learning how to run a marathon. However,manycasesof\know-how"learningcanactuallyquitewellbemodeled or redescribed as cases of concept learning. For instance, learning a language may at (cid:12)rst sight appear to be a case of learning know-how (i.e., knowing how to use words), but it can also be modeled as the learning of a grammar for 3 a language. Accordingly, concept learning can be used to model any kind of learning in which the learned \thing" can feasibly be represented by some mathematical construct|a grammar, a logical formula, a neural network, and what not. This includes a very wide range of topics, from language learning to large parts of empirical science. Induction, which is the usual name for learning from examples, has been a topic of inquiry for centuries. The study of induction can be approached from many angles. Like most other scienti(cid:12)c disciplines, it started out as a part of philosophy. Philosophers particularly focused on the role induction plays in the empirical sciences. For instance, Aristotle characterized science roughly as deductionfrom(cid:12)rstprinciples,whichweretobeobtainedbymeansofinduction from experience [Ari60]. (Though it should be noted that Aristotle's notion of induction was rather di(cid:11)erent from the modern one, involving the \seeing" of the \essential forms" of examples.) After the Middle Ages, Francis Bacon [Bac94] revived the importance of induction from experience (in the modern sense) as the main scienti(cid:12)c activity. In later centuries, induction was taken up by many philosophers. David Hume [Hum56,Hum61]formulatedwhatisnowadayscalledthe`problemofinduction', or `Hume's problem': how can inductionfrom a (cid:12)nite number of cases result in knowledge about the in(cid:12)nity of cases to which an induced general rule applies? What justi(cid:12)es inferringa general rule, or \law of nature", from a (cid:12)nite number of cases? Surprisingly, Hume's answer was that there is no such justi(cid:12)cation. In his view, it is simply a psychological fact about humans beings that when we observe some particular pattern recur in di(cid:11)erent cases (without observing counterexamples to the pattern), we tend to expect this pattern to appear in all similar cases. In Hume's view, this inductive expectation is a habit, analogous to the habit of a dog who runs to the door after hearing his master call, expecting to be let out. Later philospherssuchas JohnStuart Mill[Mil58] triedtoanswerHume'sproblembystatingconditionsunderwhichaninductive inference is justi(cid:12)ed. Other philosophers who made important comments on induction were Stanley Jevons [Jev74] and Charles Sanders Peirce [Pei58]. In our century, induction was mainly discussed by philosophersand mathe- maticians who were also involved in the development and application of formal logic. Their treatment of induction was often in terms of the probability or the \degree of con(cid:12)rmation" that a particular theory or hypothesis receives from available empirical data. Some of the main contributors are Bertrand 3 This is what theChomskyanrevolution in linguistics is all about, see the nextchapter.

Description:
Philosophical Applications of Computational Learning Theory: Chomsky an Innateness and Occam's Razor Afstudeerscriptie Filoso e v an de Informatica (15sp) R. M. de
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.