MACHINE LEARNING The Art and Science of Algorithms that Make Sense of Data As one of the most comprehensive machine learning texts around, this book does justice to the field’s incredible richness, but without losing sight of the unifying principles. Peter Flach’s clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss. He covers a wide range of logical, geometric and statistical models, and state-of-the-art topics such as matrix factorisation and ROC analysis. Particular attention is paid to the central role played by features. Machine Learning will set a new standard as an introductory textbook: The Prologue and Chapter 1 are freely available on-line, providing an accessible first step into machine learning. The use of established terminology is balanced with the introduction of new and useful concepts. Well-chosen examples and illustrations form an integral part of the text. Boxes summarise relevant background material and provide pointers for revision. Each chapter concludes with a summary and suggestions for further reading. A list of ‘Important points to remember’ is included at the back of the book together with an extensive index to help readers navigate through the material. MACHINE LEARNING The Art and Science of Algorithms that Make Sense of Data PETER FLACH CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9781107096394 © Peter Flach 2012 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2012 Printed and bound in the United Kingdom by the MPG Books Group A catalogue record for this publication is available from the British Library ISBN 978-1-10709639-4 Hardback ISBN 978-1-107-42222-3 Paperback Additional resources for this publication at www.cs.bris.ac.uk/home/flach/mlbook Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. To Hessel Flach (1923–2006) Brief Contents Preface Prologue: A machine learning sampler 1 The ingredients of machine learning 2 Binary classification and related tasks 2 Binary classification and related tasks 3 Beyond binary classification 4 Concept learning 5 Tree models 6 Rule models 7 Linear models 8 Distance-based models 9 Probabilistic models 10 Features 11 Model ensembles 12 Machine learning experiments Epilogue: Where to go from here Important points to remember References Index Contents Preface Prologue: A machine learning sampler 1 The ingredients of machine learning 1.1 Tasks: the problems that can be solved with machine learning Looking for structure Evaluating performance on a task 1.2 Models: the output of machine learning Geometric models Probabilistic models Logical models Grouping and grading 1.3 Features: the workhorses of machine learning Two uses of features Feature construction and transformation Interaction between features 1.4 Summary and outlook What you’ll find in the rest of the book 2 Binary classification and related tasks 2.1 Classification Assessing classification performance Visualising classification performance 2.2 Scoring and ranking Assessing and visualising ranking performance Turning rankers into classifiers 2.3 Class probability estimation Assessing class probability estimates Turning rankers into class probability estimators 2.4 Binary classification and related tasks: Summary and further reading 3 Beyond binary classification 3.1 Handling more than two classes Multi-class classification Multi-class scores and probabilities 3.2 Regression 3.3 Unsupervised and descriptive learning Predictive and descriptive clustering Other descriptive models 3.4 Beyond binary classification: Summary and further reading 4 Concept learning 4.1 The hypothesis space Least general generalisation Internal disjunction 4.2 Paths through the hypothesis space Most general consistent hypotheses Closed concepts 4.3 Beyond conjunctive concepts Using first-order logic 4.4 Learnability 4.5 Concept learning: Summary and further reading 5 Tree models 5.1 Decision trees 5.2 Ranking and probability estimation trees Sensitivity to skewed class distributions 5.3 Tree learning as variance reduction Regression trees Clustering trees 5.4 Tree models: Summary and further reading 6 Rule models 6.1 Learning ordered rule lists Rule lists for ranking and probability estimation 6.2 Learning unordered rule sets Rule sets for ranking and probability estimation A closer look at rule overlap 6.3 Descriptive rule learning Rule learning for subgroup discovery Association rule mining 6.4 First-orderrule learning 6.5 Rule models: Summary and further reading 7 Linear models 7.1 The least-squares method Multivariate linear regression Regularised regression Using least-squares regression for classification 7.2 The perceptron 7.3 Support vector machines Soft margin SVM 7.4 Obtaining probabilities from linear classifiers 7.5 Going beyond linearity with kernel methods 7.6 Linear models: Summary and further reading 8 Distance-based models 8.1 So many roads 8.2 Neighbours and exemplars 8.3 Nearest-neighbour classification 8.4 Distance-based clustering K-means algorithm Clustering around medoids Silhouettes 8.5 Hierarchical clustering 8.6 From kernels to distances 8.7 Distance-based models: Summary and further reading 9 Probabilistic models 9.1 The normal distribution and its geometric interpretations 9.2 Probabilistic models for categorical data Using a naive Bayes model for classification Training a naive Bayes model 9.3 Discriminative learning by optimising conditional likelihood 9.4 Probabilistic models with hidden variables Expectation-Maximisation Gaussian mixture models 9.5 Compression-based models 9.6 Probabilistic models: Summary and further reading 10 Features 10.1 Kinds of feature Calculations on features Categorical, ordinal and quantitative features Structured features 10.2 Feature transformations Thresholding and discretisation Normalisation and calibration Incomplete features 10.3 Feature construction and selection Matrix transformations and decompositions 10.4 Features: Summary and further reading 11 Model ensembles 11.1 Bagging and random forests 11.2 Boosting Boosted rule learning 11.3 Mapping the ensemble landscape Bias, variance and margins Other ensemble methods Meta-learning 11.4 Model ensembles: Summary and further reading 12 Machine learning experiments 12.1 What to measure 12.2 How to measure it 12.3 How to interpret it Interpretation of results over multiple data sets 12.4 Machine learning experiments: Summary and further reading
Description: