ebook img

Strength or Accuracy: Credit Assignment in Learning Classifier Systems PDF

314 Pages·2004·9.47 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Strength or Accuracy: Credit Assignment in Learning Classifier Systems

Distinguished Dissertations Springer-Verlag London Ltd. OthertitlespublishedinthisSeries: User-DeveloperCoooperation inSoftwareDevelopment EamonnO'Neill ACombinationofGeometryTheoremProvingandNonstandardAnalysis,withApplication toNewton'sPrincipia JacquesFleuriot AccurateVisualMetrologyfromSingleandMultiple UncalibratedImages Antonio Criminisi InheritanceRelationshipsforDisciplinedSoftwareConstruction TracyGardner AsynchronousSystem-on-ChipInterconnect John Bainbridge StochasticAlgorithmsforVisualTracking JohnMacCormick AutomatedTheoryFormation inPureMathematics SimonColton DynamicFlexibleConstraintSatisfactionanditsApplicationtoAIPlanning IanMiguel ImageMosaicingandSuper-resolution DavidCapel Tim Kovacs .Strength or Accuracy: Credit Assignment in Learning Classifier Systems Springer Tim Kovacs, BA, MSc, PhD Series Editor Professor C.J. van Rijsbergen Department of Computing Science, University of Glasgow, G12 8RZ, UK British Ubrary Cataloguing in Publication Data A catalogue record for this book is available from the British Ubrary Ubrary of Congress Cataloging-in-Publication Data Kovacs, Tim, 1971- Strength or accuracy : credit assignment in learning c1assifier systems / Tim Kovacs. p. cm.-(Distinguished dissertations, ISSN 1439-9768) Inc1udes index. ISBN 978-1-4471-1058-3 ISBN 978-0-85729-416-6 (eBook) DOI 10.1007/978-0-85729-416-6 1. Machine learning. I. Title. II. Distinguished dissertation series. QA325.5.K68 2003 006.3'I-dc22 2003061884 Apart from any fair dealing for the purposes of research or private study, or critieism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographie reproduction in accordance with the terms of licences issued by the Copyright Lieensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. Distinguished Dissertations ISSN 1439-9768 ISBN 978-1-4471-1058-3 springeronline.co.uk © Springer-Verlag London 2004 Originally published by Springer-Verlag London Berlin Heidelberg in 2004 Softcover reprint ofthe hardcover 1st edition 2004 The use of registered names, trademarks etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or Iiability for any errors or omissions that may be made. Typesetting: Electronie text files prepared by author 34/3830-543210 Printed on acid-free paper SPIN 10945254 Preface Classifier systems are an intriguing approach to a broad range of machine learning problems, based on automated generation and evaluation of condi tion/action rules. Inreinforcementlearningtasks they simultaneously address the two major problems oflearning a policyand generalisingoverit (and re lated objects, such as value functions). Despite over 20 years of research, however,classifier systems have met with mixed success, for reasons which were often unclear. Finally, in 1995Stewart Wilson claimed a long-awaited breakthrough with his XCS system, which differsfrom earlier classifiersys tems in a number of respects, the most significant of which is the way in which it calculates the value of rules for use by the rule generation system. Specifically, XCS (likemost classifiersystems) employs a genetic algorithm forrule generation, and the wayin whichit calculates rule fitness differsfrom earlier systems. Wilson described XCSas an accuracy-based classifiersystem and earliersystems asstrength-based.The twodifferinthat instrength-based systems the fitness of a rule is proportional to the return (reward/payoff) it receives,whereas in XCS it is a function of the accuracy with which return is predicted. The differenceis thus one of credit assignment, that is, of how a rule's contribution to the system's performance is estimated. XCS is a Q learning system; in fact, it is a proper generalisation of tabular Q-learning, in which rules aggregate states and actions. In XCS, as in other Q-learners, Q-valuesare usedto weightaction selection.Inrulegeneration, arule's weight is an inversefunction ofthe error ofits Q-valuejthus XCSfinds rules which minimise errors in the Q-function, and is able to accurately approximate it. Wilson argued XCS's superiority overearlier systems, and demonstrated im proved performance on two tasks. However, the theoretical basis of XCS's results - and indeed those of earlier systems - needed development. Conse quently, this thesis has been devoted to the development oftheory to explain howand whyclassifiersystems behaveas they do.Inparticular, a theory un derlying the differencebetween strength and accuracy-based fitness has been sought. In order to compare XCS with more traditional classifier systems, SB-XCS, XCS's strength-based twin, is introduced. SB-XCS is designed to VI Preface resemble xes as much as possible, except that it uses the Q-value of a rule as its weight in both action selection and rule generation. It is shown that this minor algorithmic change has profound and subtle implications on the way in which SB-XeS operates, and on its capacity to adapt. Two types of non-sequential learning task are identified, distinguished by whether the reward function is biased or unbiased. (An unbiased reward function is con stant over the action(s) in each state which maximise reward.) It is argued that although SB-XeS can be expected to adapt to unbiased reward func tions, its ability to adapt to biased reward functions is limited. Further, it is claimed that SB-XeS willonlyadapt wellto trivial sequential decisiontasks. SB-XeS's difficultiesare attributed to twoformsofpathological relationships between rules, termed strong and fit overgeneral rules. The analysis of xes and SB-XeS presented herein supports the study of accuracy-based fitness, and fitness sharing in strength-based systems as directions forfuture work. Bristol, April 2003 Tim Kovacs Acknowledgements My supervisor, Manfred Kerber, has alwaysbeen particularly generous with his time and ideas. Hisintuition, patience, and thoroughness continue to sur prise me. Histalent as a proofreader and his expertise with Jb.1EX raised the standard ofmy work, but his example raised it further. Thank you Manfred. lowe a great deal to Stewart Wilson, whohas been a consistently invalu able source of information and encouragement since I was an MSc student. Through hisownworkhehas, morethan anyone,sustained the study ofclas sifier systems in difficult times. But, in addition to'this, his encouragement and assistance to others has meant a great deal, as many willattest. Alwyn Barry has been a cheerfulsourceofassistance and encouragement for many years, has shown an interest in both my research and my career, and has been good company in distant places. For all that I am indebted to him. I wouldalso liketo thank Larry Bull for his interest and efforts to help me, and for pointing out the importance offitness sharing. I am fortunate to have benefited from contact with a number of other classifiersystems researchers who have made my studies much more stimu lating, both through email andfacetoface.Inparticular.LarryBull.Luis Hercog, Pier Luca Lanzi, Sonia Schulenburg and WolfgangStolzmann have helped make every conferencea memorable one. I wouldliketo thank Riccardo Poli myinternal examiner and thesis com mittee member for his kind words and advice over the years, for his careful consideration ofthis workand for his encouragement. Thanks alsoto my ex ternal examiner, Robert Smith, for hisconsideration and for making my viva lessdifficultthan it might have been. I am indebted to Aaron Sloman,myMScco-supervisorand member ofmy thesis committee, as anyone who reads his workand mine might guess. I am also indebted to Ian Wright, my other MScco-supervisor, who first steered xes. me towards It wasmy pleasure to share an officewith Stuart Reynolds forfour years, and such is his good nature, humour, and, it must be said, tolerance, that VIII I would welcomeanother four. He has also been a great source of technical assistance in various forms and on innumerable occasions. On the subject ofproviding assistance, technical and otherwise, there can be few who excel as Axel Gro,Bmann does. Most of his contemporaries at Birmingham have benefited from his expertise with computers and the culi nary arts, just two ofhis skills. I am grateful to Jeremy Wyatt for his patient efforts to teach me a little about reinforcement learning, and also for a little poker, a few parties, and many pints. Many others at Birmingham are due thanks; Peter Hancox and Achim Jung for making my PhD and teaching assistantship possible, Jonny Page for his surreal sanity checks, Adrian Hartley for getting me to the airport on time and for our excellent adventures, Marcos Quintana Hernandez for being excellent himself, John Woodward for his humour, Gavin Brown for being Gavin, PaolaManeggiaforpancakes, MathiasKegelmann forlate discussions, Pawel Waszkiewiczfor guitar lessons, Xin Yaofor making sure I was not the only one working late at night, and Richard Pannell and Sammy Snow for many little things. I would like to thank my parents for their support in this and earlier endeavours, without which this work would not have been possible. Finally, I would liketo thank Qulsom, who has shared with me more ofthe ups and downs ofthis work than anyone, and for whosesuffering I am sorry. xes The descriptionof presented in this workisdrawn fromStewart Wil son's published descriptions ofit in [298,304, 54],from his "NetQ" questions xes and answers on to be found on his home page [314], and from extensive correspondence with him on the details of its implementation. Our valuable discussions with Alwyn Barry in 1998helped bring out more details ofStew xes xes art's implementation. This and the growing interest in suggested the needfora new,morecomprehensive description, whichfinallyappearedin 2000[54] thanks to theenergyofMartinButz.Iam alsogratefulto Martinfor hisproofreadingthe examplecycleAppendix.Needlessto say,the mechanisms described herein are due entirely to Stewart Wilson, with the fewexceptions duly noted. Contents 1 Introduction ';........... 1 1.1 Two Example MachineLearning Tasks ..................... 2 1.2 Types ofTask.... ............................... ........ 3 1.2.1 Supervised and Reinforcement Learning ...... ........ 3 1.2.2 Sequential and Non-sequentialDecisionTasks......... 4 1.3 Two Challengesfor ClassifierSystems...................... 4 1.3.1 Problem 1:Learning a Policyfrom Reinforcement ..... 5 1.3.2 Problem 2:Generalisation .......................... 5 1.4 Solution Methods........................................ 6 1.4.1 Method 1:ReinforcementLearning Algorithms. ... .. .. 6 1.4.2 Method 2:Evolutionary Algorithms. ................. 6 1.5 Learning ClassifierSystems ............................... 6 1.5.1 The Tripartite LCSStructure ..... .... ...... .... .... 7 1.5.2 LCS = PolicyLearning + Generalisation ............. 7 1.5.3 CreditAssignment in ClassifierSystems.............. 8 1.5.4 Strength and Accuracy-based ClassifierSystems. ...... 8 1.6 About the Book. ........................................ 9 1.6.1 Why Compare Strength and Accuracy? 10 1.6.2 Are LCSEC- or RL-based?......................... 11 1.6.3 Movingin DesignSpace............................ 14 1.7 Structure ofthe Book 16 2 Learning Classifier Systems ................................ 19 2.1 Types of ClassifierSystems ............................... 21 2.1.1 Michiganand Pittsburgh LCS 21 2.1.2 XCSand Traditional LCS? 21 2.2 Representing Rules ...................................... 22 2.2.1 The Standard Ternary Language 22 2.2.2 Other Representations ....................... 24 2.2.3 Summary ofRule Representation 25 2.2.4 Notation forRules. ................................ 25 X Contents 2.3 XCS 25 2.3.1 Wilson's Motivation forXCS........................ 26 2.3.2 OverviewofXCS.................................. 27 2.3.3 Wilson's Explore/Exploit Framework 30 2.3.4 The PerformanceSystem ........................... 32 2.3.4.1 The XCSPerformance System Algorithm .. .. . 32 2.3.4.2 The Match Set and Prediction Array. ........ 32 2.3.4.3 Action Selection........................... 34 2.3.4.4 Experience-weighting ofSystem Prediction .... 34 2.3.5 The Credit AssignmentSystem 35 2.3.5.1 The MAMTechnique " ., 35 2.3.5.2 The Credit AssignmentAlgorithm 36 2.3.5.3 Sequential and Non-sequentialUpdates 36 2.3.5.4 Parameter Update Order 37 2.3.5.5 XCSParameter Updates .................... 38 2.3.6 The Rule Discovery System......................... 41 2.3.6.1 Random Initial Populations ................. 41 2.3.6.2 Covering...... ....... ........ ....... ...... 41 2.3.6.3 The NicheGenetic Algorithm 43 2.3.6.4 Alternative Mutation Schemes 44 2.3.6.5 Triggeringthe NicheGA 45 2.3.6.6 DeletionofRules. ......................... 45 2.3.6.7 ClassifierParameter Initialisation. ........... 46 2.3.6.8 Subsumption Deletion 47 2.4 SB-XCS 47 2.4.1 SpecificationofSB-XCS 48 2.4.2 Comparison ofSB-XCS and Other Strength LCS 51 2.5 Initial Tests ofXCSand SB-XCS... .. ..... ................ 52 2.5.1 The 6 Multiplexer ................................. 52 2.5.2 Woods2. .... ....... .. .. ... ........ .... ... .. ...... 55 2.6 Summary............... .... ... ............... .. .. .. .... 60 3 How Strength and Accuracy Differ 63 3.1 Thinking about ComplexSystems 63 3.2 Holland's Rationale for CS-l and his Later LCS 65 3.2.1 SchemaTheory 65 3.2.2 The Bucket Brigade 65 3.2.3 SchemaTheory + Bucket Brigade = Adaptation 66 3.3 Wilson's Rationale forXCS 66 3.3.1 A Bias towards Accurate Rules , 67 3.3.2 A Bias towards General Rules. ...................... 67 3.3.3 Complete Maps ........................... 70 3.3.4 Summary. ... ........... ..... ... ......... .... ..... 71 3.4 A Rationale forSB-XCS 71 3.5 AnalysisofPopulations EvolvedbyXCSand SB-XCS 71

Description:
Classifier systems are an intriguing approach to a broad range of machine learning problems, based on automated generation and evaluation of condi­ tion/action rules. Inreinforcement learning tasks they simultaneously address the two major problems of learning a policy and generalising over it (and
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.