Table Of Content

7 1 0 2 A Brief Introduction to Machine p e Learning for Engineers S 8 ] Osvaldo Simeone (2017), “A Brief Introduction to Machine Learning G for Engineers”, : Vol. XX, No. XX, pp 1–201. DOI: XXX. L . s Osvaldo Simeone c [ Department of Informatics 1 King’s College London v osvaldo.simeone@kcl.ac.uk 0 4 8 2 0 . 9 0 7 1 : v i X r a Contents 1 Introduction 5 1.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Goals and Outline . . . . . . . . . . . . . . . . . . . . . . 7 2 A Gentle Introduction through Linear Regression 11 2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . 11 2.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Frequentist Approach . . . . . . . . . . . . . . . . . . . . 15 2.4 Bayesian Approach . . . . . . . . . . . . . . . . . . . . . 31 2.5 Minimum Description Length (MDL) . . . . . . . . . . . . 37 2.6 Interpretation and Causality . . . . . . . . . . . . . . . . . 39 2.7 Information-Theoretic Metrics . . . . . . . . . . . . . . . . 41 2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3 Probabilistic Models for Learning 45 3.1 The Exponential Family . . . . . . . . . . . . . . . . . . . 46 3.2 Maximum Entropy Property . . . . . . . . . . . . . . . . . 51 3.3 Frequentist Learning . . . . . . . . . . . . . . . . . . . . . 52 3.4 Bayesian Learning . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Energy-based Models . . . . . . . . . . . . . . . . . . . . 62 3.6 Supervised Learning via Generalized Linear Models (GLM) 64 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4 Classification 66 4.1 Classification as a Supervised Learning Problem . . . . . . 67 4.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . 69 4.3 Discriminative Deterministic Models . . . . . . . . . . . . 71 4.4 Discriminative Probabilistic Models . . . . . . . . . . . . . 83 4.5 Generative Probabilistic Models . . . . . . . . . . . . . . . 86 4.6 Multi-Class Classification . . . . . . . . . . . . . . . . . . 88 4.7 Non-linear Discriminative Models: Deep Neural Networks . 90 4.8 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5 Statistical Learning Theory 96 5.1 A Formal Framework for Supervised Learning . . . . . . . 96 5.2 PAC Learnability and Sample Complexity . . . . . . . . . . 101 5.3 PAC Learnability for Finite Hypothesis Classes . . . . . . . 103 5.4 VC Dimension and Fundamental Theorem of PAC Learning 106 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6 Unsupervised Learning 110 6.1 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . 111 6.2 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . 114 6.3 ML, ELBO and EM . . . . . . . . . . . . . . . . . . . . . 116 6.4 Directed Generative Models . . . . . . . . . . . . . . . . 127 6.5 Undirected Generative Models . . . . . . . . . . . . . . . 134 6.6 Discriminative Models . . . . . . . . . . . . . . . . . . . . 137 6.7 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . 138 6.8 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7 Probabilistic Graphical Models 142 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.2 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . 146 7.3 Markov Random Fields . . . . . . . . . . . . . . . . . . . 155 7.4 Bayesian Inference in Probabilistic Graphic Models . . . . 158 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 8 Approximate Inference and Learning 162 8.1 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . 163 8.2 Variational Inference . . . . . . . . . . . . . . . . . . . . . 165 8.3 Monte Carlo-Based Variational Inference . . . . . . . . . . 172 8.4 Approximate Learning . . . . . . . . . . . . . . . . . . . 174 8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9 Concluding Remarks 177 Appendices 180 A Appendix A: Information Measures 181 A.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 A.2 Conditional Entropy and Mutual Information . . . . . . . . 184 A.3 Divergence Measures . . . . . . . . . . . . . . . . . . . . 186 B Appendix B: KL Divergence and Exponential Family 189 Acknowledgements 191 References 192 A Brief Introduction to Machine Learning for Engineers Osvaldo Simeone1 1Department of Informatics, King’s College London; osvaldo.simeone@kcl.ac.uk ABSTRACT This monograph aims at providing an introduction to key concepts, algorithms, and theoretical frameworks in ma- chinelearning,includingsupervisedandunsupervisedlearn- ing, statistical learning theory, probabilistic graphical mod- elsandapproximateinference.Theintendedreadershipcon- sistsofelectricalengineerswithabackgroundinprobability and linear algebra. The treatment builds on first principles, and organizes the main ideas according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approxi- mateinference,directedandundirectedmodels,andconvex andnon-convexoptimization.Themathematicalframework uses information-theoretic measures as a unifyingtool. The text offers simpleand reproduciblenumericalexamples pro- vidinginsightsintokeymotivationsandconclusions.Rather than providing exhaustive details on the existing myriad solutions in each specific category, for which the reader is referred to textbooks and papers, this monograph is meant as an entry point for an engineer into the literature on machine learning. ISSN; DOI XXXXXXXX (cid:13)c2017 XXXXXXXX Notation Random variables or random vectors – both abbreviated as rvs • – are represented using roman typeface, while their values and realiza- tions are indicated by the corresponding standard font. For instance, the equality x = x indicates that rv x takes value x. Matrices are indicated using uppercase fonts, with roman type- • face used for random matrices. Vectors will be taken to be in column form. • XT and X† are the transpose and the pseudoinverse of matrix X, • respectively. The distribution of a rv x, either probability mass function (pmf) • for a discrete rvs or probability density function (pdf) for continuous rvs, is denoted as p , p (x), or p(x). x x The notation x p indicates that rv x is distributed according x • ∼ to p . x For jointly distributed rvs (x,y) p , the conditional distribu- xy • ∼ tion of x given the observation y = y is indicated as p , p (xy) x|y=y x|y | or p(xy). | The notation x y = y p indicates that rv x is drawn ac- x|y=y • | ∼ cording to the conditional distribution p . x|y=y The notation E [] indicates the expectation of the argument • x∼px · with respect to the distribution of the rv x p . Accordingly, we will x ∼ also write E [ y] for the conditional expectation with respect to x∼p x|y ·| the distribution p . When clear from the context, the distribution x|y=y over which the expectation is computed may be omitted. The notation Pr [] indicates the probability of the argument • x∼px · event with respect to the distribution of the rv x p . When clear x ∼ 1 2 Notation from the context, the subscript is dropped. The notation log represents the logarithm in base two, while ln • represents the natural logarithm. x (µ,Σ) indicates that random vector x is distributedaccord- • ∼ N ing to a multivariate Gaussian pdf with mean vector µ and covariance matrix Σ. The multivariate Gaussian pdf is denoted as (xµ,Σ) as a N | function of x. x (a,b) indicates that rv x is distributed according to a uni- • ∼ U form distribution in the interval [a,b]. The corresponding uniform pdf is denoted as (xa,b). U | δ(x) denotes the Dirac delta function or the Kronecker delta func- • tion, as clear from the context. a 2 = N a2 is the quadratic, or l , norm of a vector a = • || || i=1 i 2 [a ,...,a ]T. We similarly define the l norm as a = N a , and 1 N P 1 || ||1 i=1| i| the l pseudo-norm a as the number of non-zero entries of vector a. 0 || ||0 P I denotestheidentity matrix,whosedimensionswillbeclearfrom • the context. Similarly, 1 represents a vector of all ones. R is the set of real numbers; and R+ the set of non-negative real • numbers. 1( ) is the indicator function: 1(x)= 1 if x is true, and 1(x)= 0 • · otherwise. represents the cardinality of a set . • |S| S x represents a set of rvs x indexed by the integers k . S k • ∈ S Acronyms AI: Artificial Intelligence AMP: Approximate Message Passing BN: Bayesian Network DAG: Directed Acyclic Graph ELBO: Evidence Lower BOund EM: Expectation Maximization ERM: Empirical Risk Minimization GAN: Generative Adversarial Network GLM: Generalized Linear Model HMM: Hidden Markov Model i.i.d.: independent identically distributed KL: Kullback-Leibler LBP: Loopy Belief Propagation LL: Log-Likelihood LLR: Log-Likelihood Ratio LS: Least Squares MC: Monte Carlo MCMC: Markov Chain Monte Carlo MDL: Minimum Description Length MFVI: Mean Field Variational Inference ML: Maximum Likelihood MRF: Markov Random Field NLL: Negative Log-Likelihood PAC: Probably Approximately Correct pdf: probability density function pmf: probability mass function 3 4 Acronyms PCA: Principal Component Analysis PPCA: Probabilistic Principal Component Analysis QDA: Quadratic Discriminant Analysis RBM: Restricted Boltzmann Machine SGD: Stochastic Gradient Descent SVM: Support Vector Machine rv: random variable or random vector (depending on the context) s.t.: subject to VAE: Variational AutoEncoder VC: Vapnik–Chervonenkis VI: Variational Inference 1 Introduction Having taught courses on machine learning, I am often asked by col- leagues and students with a background in engineering to suggest “the best place to start” to get into this subject. I typically respond with a list of books – for a general, but slightly outdated introduction, read this book; for a detailed survey of methods based on probabilistic models, check this other reference; to learn about statistical learning, I found this text useful; and so on. This answers strikes me, and most likely also my interlocutors, as quite unsatisfactory. This is especially so since the size of many of these books may be discouraging for busy professionals and students working on other projects. This monograph is my first attempt to offer a basic and compact reference that de- scribes key ideas and principles in simple terms and within a unified treatment, encompassing also more recent developments and pointers to the literature for further study. 1.1 Machine Learning In engineering, pattern recognition refers to the automatic discovery of regularities in data for decision-making, prediction or data mining. 5

A Brief Introduction to Machine Learning for Engineers PDF

206 Pages·2017·1.834 MB·English

by it-ebooks

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download A Brief Introduction to Machine Learning for Engineers PDF Free - Full Version

by it-ebooks| 2017| 206 pages| 1.834| English

Download A Brief Introduction to Machine Learning for Engineers by it-ebooks in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About A Brief Introduction to Machine Learning for Engineers

No description available for this book.

Detailed Information

Author:	it-ebooks
Publication Year:	2017
ISBN:	2946898
Pages:	206
Language:	English
File Size:	1.834
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free A Brief Introduction to Machine Learning for Engineers Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download A Brief Introduction to Machine Learning for Engineers PDF?

Yes, on https://PDFdrive.to you can download A Brief Introduction to Machine Learning for Engineers by it-ebooks completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read A Brief Introduction to Machine Learning for Engineers on my mobile device?

After downloading A Brief Introduction to Machine Learning for Engineers PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of A Brief Introduction to Machine Learning for Engineers?

Yes, this is the complete PDF version of A Brief Introduction to Machine Learning for Engineers by it-ebooks. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download A Brief Introduction to Machine Learning for Engineers PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.