ebook img

A First Course in Machine Learning (Chapman & Hall/CRC Machine Learning & Pattern Recognition) PDF

428 Pages·2016·164.524 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A First Course in Machine Learning (Chapman & Hall/CRC Machine Learning & Pattern Recognition)

A FIRST COURSE IN MACHINE LEARNING Second Edition Chapman & Hall/CRC Machine Learning & Pattern Recognition Series SERIES EDITORS Ralf Herbrich Thore Graepel Amazon Development Center Microsoft Research Ltd. Berlin, Germany Cambridge, UK AIMS AND SCOPE This series reflects the latest advances and applications in machine learning and pattern recognition through the publication of a broad range of reference works, textbooks, and handbooks. The inclusion of concrete examples, applications, and methods is highly encouraged. The scope of the series includes, but is not limited to, titles in the areas of machine learning, pattern recognition, computational intelligence, robotics, computational/statistical learning theory, natural language processing, computer vision, game AI, game theory, neural networks, computational neuroscience, and other relevant topics, such as machine learning applied to bioinformatics or cognitive science, which might be proposed by potential contribu- tors. PUBLISHED TITLES BAYESIAN PROGRAMMING Pierre Bessière, Emmanuel Mazer, Juan-Manuel Ahuactzin, and Kamel Mekhnacha UTILITY-BASED LEARNING FROM DATA Craig Friedman and Sven Sandow HANDBOOK OF NATURAL LANGUAGE PROCESSING, SECOND EDITION Nitin Indurkhya and Fred J. Damerau COST-SENSITIVE MACHINE LEARNING Balaji Krishnapuram, Shipeng Yu, and Bharat Rao COMPUTATIONAL TRUST MODELS AND MACHINE LEARNING Xin Liu, Anwitaman Datta, and Ee-Peng Lim MULTILINEAR SUBSPACE LEARNING: DIMENSIONALITY REDUCTION OF MULTIDIMENSIONAL DATA Haiping Lu, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos MACHINE LEARNING: An Algorithmic Perspective, Second Edition Stephen Marsland SPARSE MODELING: THEORY, ALGORITHMS, AND APPLICATIONS Irina Rish and Genady Ya. Grabarnik A FIRST COURSE IN MACHINE LEARNING, SECOND EDITION Simon Rogers and Mark Girolami STATISTICAL REINFORCEMENT LEARNING: MODERN MACHINE LEARNING APPROACHES Masashi Sugiyama MULTI-LABEL DIMENSIONALITY REDUCTION Liang Sun, Shuiwang Ji, and Jieping Ye REGULARIZATION, OPTIMIZATION, KERNELS, AND SUPPORT VECTOR MACHINES Johan A. K. Suykens, Marco Signoretto, and Andreas Argyriou ENSEMBLE METHODS: FOUNDATIONS AND ALGORITHMS Zhi-Hua Zhou Chapman & Hall/CRC Machine Learning & Pattern Recognition Series A FIRST COURSE IN MACHINE LEARNING Second Edition Simon Rogers University of Glasgow United Kingdom Mark Girolami University of Warwick United Kingdom MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MAT- LAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2017 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20160524 International Standard Book Number-13: 978-1-4987-3856-9 (eBook - VitalBook) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor- age or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copy- right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro- vides licenses and registration for a variety of users. For organizations that have been granted a photo- copy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents List of Tables xv List of Figures xvii Preface to the First Edition xxvii Preface to the Second Edition xxix Section I Basic Topics Chapter 1(cid:4) Linear Modelling: A Least Squares Approach 3 1.1 LINEAR MODELLING 3 1.1.1 Defining the model 4 1.1.2 Modelling assumptions 5 1.1.3 Defining a good model 6 1.1.4 The least squares solution – a worked example 8 1.1.5 Worked example 12 1.1.6 Least squares fit to the Olympic data 13 1.1.7 Summary 14 1.2 MAKING PREDICTIONS 15 1.2.1 A second Olympic dataset 15 1.2.2 Summary 17 1.3 VECTOR/MATRIX NOTATION 17 1.3.1 Example 25 1.3.2 Numerical example 26 1.3.3 Making predictions 27 1.3.4 Summary 27 1.4 NON-LINEAR RESPONSE FROM A LINEAR MODEL 28 1.5 GENERALISATION AND OVER-FITTING 31 v vi (cid:4) Contents 1.5.1 Validation data 31 1.5.2 Cross-validation 32 1.5.3 Computational scaling of K-fold cross-validation 34 1.6 REGULARISED LEAST SQUARES 34 1.7 EXERCISES 37 1.8 FURTHER READING 39 Chapter 2(cid:4) Linear Modelling: A Maximum Likelihood Approach 41 2.1 ERRORS AS NOISE 41 2.1.1 Thinking generatively 42 2.2 RANDOM VARIABLES AND PROBABILITY 43 2.2.1 Random variables 43 2.2.2 Probability and distributions 44 2.2.3 Adding probabilities 46 2.2.4 Conditional probabilities 46 2.2.5 Joint probabilities 47 2.2.6 Marginalisation 49 2.2.7 Aside – Bayes’ rule 51 2.2.8 Expectations 52 2.3 POPULAR DISCRETE DISTRIBUTIONS 55 2.3.1 Bernoulli distribution 55 2.3.2 Binomial distribution 55 2.3.3 Multinomial distribution 56 2.4 CONTINUOUS RANDOM VARIABLES – DENSITY FUNCTIONS 57 2.5 POPULAR CONTINUOUS DENSITY FUNCTIONS 60 2.5.1 The uniform density function 60 2.5.2 The beta density function 62 2.5.3 The Gaussian density function 63 2.5.4 Multivariate Gaussian 64 2.6 SUMMARY 66 2.7 THINKING GENERATIVELY...CONTINUED 67 2.8 LIKELIHOOD 68 2.8.1 Dataset likelihood 69 2.8.2 Maximum likelihood 70 Contents (cid:4) vii 2.8.3 Characteristics of the maximum likelihood solution 73 2.8.4 Maximum likelihood favours complex models 75 2.9 THE BIAS-VARIANCE TRADE-OFF 75 2.9.1 Summary 76 2.10 EFFECT OF NOISE ON PARAMETER ESTIMATES 77 2.10.1 Uncertainty in estimates 78 2.10.2 Comparison with empirical values 83 2.10.3 Variability in model parameters – Olympic data 84 2.11 VARIABILITY IN PREDICTIONS 84 2.11.1 Predictive variability – an example 86 2.11.2 Expected values of the estimators 86 2.12 CHAPTER SUMMARY 91 2.13 EXERCISES 92 2.14 FURTHER READING 93 Chapter 3(cid:4) The Bayesian Approach to Machine Learning 95 3.1 A COIN GAME 95 3.1.1 Counting heads 97 3.1.2 The Bayesian way 98 3.2 THE EXACT POSTERIOR 103 3.3 THE THREE SCENARIOS 104 3.3.1 No prior knowledge 104 3.3.2 The fair coin scenario 112 3.3.3 A biased coin 114 3.3.4 The three scenarios – a summary 116 3.3.5 Adding more data 117 3.4 MARGINAL LIKELIHOODS 117 3.4.1 Model comparison with the marginal likelihood 119 3.5 HYPERPARAMETERS 119 3.6 GRAPHICAL MODELS 120 3.7 SUMMARY 122 3.8 A BAYESIAN TREATMENT OF THE OLYMPIC 100m DATA 122 3.8.1 The model 122 3.8.2 The likelihood 124 3.8.3 The prior 124 viii (cid:4) Contents 3.8.4 The posterior 124 3.8.5 A first-order polynomial 126 3.8.6 Making predictions 129 3.9 MARGINAL LIKELIHOOD FOR POLYNOMIAL MODEL ORDER SELECTION 130 3.10 CHAPTER SUMMARY 133 3.11 EXERCISES 133 3.12 FURTHER READING 135 Chapter 4(cid:4) Bayesian Inference 137 4.1 NON-CONJUGATE MODELS 137 4.2 BINARY RESPONSES 138 4.2.1 A model for binary responses 138 4.3 A POINT ESTIMATE – THE MAP SOLUTION 141 4.4 THE LAPLACE APPROXIMATION 147 4.4.1 Laplace approximation example: Approximating a gamma density 148 4.4.2 Laplace approximation for the binary response model150 4.5 SAMPLING TECHNIQUES 152 4.5.1 Playing darts 152 4.5.2 The Metropolis–Hastings algorithm 154 4.5.3 The art of sampling 162 4.6 CHAPTER SUMMARY 163 4.7 EXERCISES 163 4.8 FURTHER READING 164 Chapter 5(cid:4) Classification 167 5.1 THE GENERAL PROBLEM 167 5.2 PROBABILISTIC CLASSIFIERS 168 5.2.1 The Bayes classifier 168 5.2.1.1 Likelihood – class-conditional distributions 169 5.2.1.2 Prior class distribution 169 5.2.1.3 Example – Gaussian class-conditionals 170 5.2.1.4 Making predictions 171 5.2.1.5 The naive-Bayes assumption 172 Contents (cid:4) ix 5.2.1.6 Example – classifying text 174 5.2.1.7 Smoothing 176 5.2.2 Logistic regression 178 5.2.2.1 Motivation 178 5.2.2.2 Non-linear decision functions 179 5.2.2.3 Non-parametric models – the Gaussian process 180 5.3 NON-PROBABILISTIC CLASSIFIERS 181 5.3.1 K-nearest neighbours 181 5.3.1.1 Choosing K 182 5.3.2 Support vector machines and other kernel methods 185 5.3.2.1 The margin 185 5.3.2.2 Maximising the margin 186 5.3.2.3 Making predictions 189 5.3.2.4 Support vectors 189 5.3.2.5 Soft margins 191 5.3.2.6 Kernels 193 5.3.3 Summary 196 5.4 ASSESSING CLASSIFICATION PERFORMANCE 196 5.4.1 Accuracy – 0/1 loss 196 5.4.2 Sensitivity and specificity 197 5.4.3 The area under the ROC curve 198 5.4.4 Confusion matrices 200 5.5 DISCRIMINATIVE AND GENERATIVE CLASSIFIERS 202 5.6 CHAPTER SUMMARY 202 5.7 EXERCISES 202 5.8 FURTHER READING 203 Chapter 6(cid:4) Clustering 205 6.1 THE GENERAL PROBLEM 205 6.2 K-MEANS CLUSTERING 206 6.2.1 Choosing the number of clusters 208 6.2.2 Where K-means fails 210 6.2.3 Kernelised K-means 210 6.2.4 Summary 212 6.3 MIXTURE MODELS 213

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.