ebook img

Introduction to Machine Learning with Applications in Information Security PDF

347 Pages·2018·3.888 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to Machine Learning with Applications in Information Security

INTRODUCTION TO MACHINE LEARNING with APPLICATIONS INFORMATION in SECURITY Mark Stamp San Jose State University California CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2018 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business International Standard Book Number-13: 978-1-138-62678-2 (Hardback) Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface xiii 1 Introduction 1 1.1 What Is Machine Learning? . . . . . . . . . . . . . . . . . . . 1 1.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Necessary Background . . . . . . . . . . . . . . . . . . . . . . 4 1.4 A Few Too Many Notes . . . . . . . . . . . . . . . . . . . . . 4 I Tools of the Trade 5 2 A Revealing Introduction to Hidden Markov Models 7 2.1 Introduction and Background . . . . . . . . . . . . . . . . . . 7 2.2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 The Three Problems . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.1 HMM Problem 1 . . . . . . . . . . . . . . . . . . . . . 14 2.4.2 HMM Problem 2 . . . . . . . . . . . . . . . . . . . . . 14 2.4.3 HMM Problem 3 . . . . . . . . . . . . . . . . . . . . . 14 2.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 The Three Solutions . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.1 Solution to HMM Problem 1 . . . . . . . . . . . . . . 15 2.5.2 Solution to HMM Problem 2 . . . . . . . . . . . . . . 16 2.5.3 Solution to HMM Problem 3 . . . . . . . . . . . . . . 17 2.6 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 20 2.7 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.8 All Together Now . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.9 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 A Full Frontal View of Profile Hidden Markov Models 37 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Overview and Notation . . . . . . . . . . . . . . . . . . . . . 39 3.3 Pairwise Alignment . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Multiple Sequence Alignment . . . . . . . . . . . . . . . . . . 46 3.5 PHMM from MSA . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4 Principal Components of Principal Component Analysis 63 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.1 A Brief Review of Linear Algebra . . . . . . . . . . . . 64 4.2.2 Geometric View of Eigenvectors . . . . . . . . . . . . 68 4.2.3 Covariance Matrix . . . . . . . . . . . . . . . . . . . . 70 4.3 Principal Component Analysis . . . . . . . . . . . . . . . . . 73 4.4 SVD Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.5 All Together Now . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.5.1 Training Phase . . . . . . . . . . . . . . . . . . . . . . 80 4.5.2 Scoring Phase . . . . . . . . . . . . . . . . . . . . . . . 82 4.6 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . 83 4.7 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5 A Reassuring Introduction to Support Vector Machines 95 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Constrained Optimization . . . . . . . . . . . . . . . . . . . . 102 5.2.1 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . 104 5.2.2 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . 108 5.3 A Closer Look at SVM . . . . . . . . . . . . . . . . . . . . . . 110 5.3.1 Training and Scoring . . . . . . . . . . . . . . . . . . . 112 5.3.2 Scoring Revisited . . . . . . . . . . . . . . . . . . . . . 114 5.3.3 Support Vectors . . . . . . . . . . . . . . . . . . . . . 115 5.3.4 Training and Scoring Re-revisited . . . . . . . . . . . . 116 5.3.5 The Kernel Trick . . . . . . . . . . . . . . . . . . . . . 117 5.4 All Together Now . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.5 A Note on Quadratic Programming . . . . . . . . . . . . . . . 121 5.6 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6 A Comprehensible Collection of Clustering Concepts 133 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.2 Overview and Background . . . . . . . . . . . . . . . . . . . . 133 6.3 -Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.4 Measuring Cluster Quality . . . . . . . . . . . . . . . . . . . . 141 6.4.1 Internal Validation . . . . . . . . . . . . . . . . . . . . 143 6.4.2 External Validation . . . . . . . . . . . . . . . . . . . 148 6.4.3 Visualizing Clusters . . . . . . . . . . . . . . . . . . . 150 6.5 EM Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.5.1 Maximum Likelihood Estimator . . . . . . . . . . . . 154 6.5.2 An Easy EM Example . . . . . . . . . . . . . . . . . . 155 6.5.3 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . 159 6.5.4 Gaussian Mixture Example . . . . . . . . . . . . . . . 163 6.6 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 7 Many Mini Topics 177 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 7.2 -Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . 177 7.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.4 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 7.4.1 Football Analogy . . . . . . . . . . . . . . . . . . . . . 182 7.4.2 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.5 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . 186 7.6 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . 192 7.7 Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . 202 7.8 Na¨ıve Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 7.9 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . 205 7.10 Conditional Random Fields . . . . . . . . . . . . . . . . . . . 208 7.10.1 Linear Chain CRF . . . . . . . . . . . . . . . . . . . . 209 7.10.2 Generative vs Discriminative Models . . . . . . . . . . 210 7.10.3 The Bottom Line on CRFs . . . . . . . . . . . . . . . 213 7.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 8 Data Analysis 219 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 8.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . 220 8.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 8.4 ROC Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 8.5 Imbalance Problem . . . . . . . . . . . . . . . . . . . . . . . . 228 8.6 PR Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 8.7 The Bottom Line . . . . . . . . . . . . . . . . . . . . . . . . . 231 8.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 II Applications 235 9 HMM Applications 237 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 9.2 English Text Analysis . . . . . . . . . . . . . . . . . . . . . . 237 9.3 Detecting Undetectable Malware . . . . . . . . . . . . . . . . 240 9.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 240 9.3.2 Signature-Proof Metamorphic Generator . . . . . . . . 242 9.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 243 9.4 Classic Cryptanalysis . . . . . . . . . . . . . . . . . . . . . . . 245 9.4.1 Jakobsen’s Algorithm . . . . . . . . . . . . . . . . . . 245 9.4.2 HMM with Random Restarts . . . . . . . . . . . . . . 251 10 PHMM Applications 261 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 10.2 Masquerade Detection . . . . . . . . . . . . . . . . . . . . . . 261 10.2.1 Experiments with Schonlau Dataset . . . . . . . . . . 262 10.2.2 Simulated Data with Positional Information . . . . . . 265 10.3 Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . 269 10.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 270 10.3.2 Datasets and Results . . . . . . . . . . . . . . . . . . . 271 11 PCA Applications 277 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 11.2 Eigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 11.3 Eigenviruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 11.3.1 Malware Detection Results . . . . . . . . . . . . . . . 280 11.3.2 Compiler Experiments . . . . . . . . . . . . . . . . . . 282 11.4 Eigenspam. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 11.4.1 PCA for Image Spam Detection. . . . . . . . . . . . . 285 11.4.2 Detection Results . . . . . . . . . . . . . . . . . . . . . 285 12 SVM Applications 289 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 12.2 Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . 289 12.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 290 12.2.2 Experimental Results . . . . . . . . . . . . . . . . . . 293 12.3 Image Spam Revisited . . . . . . . . . . . . . . . . . . . . . . 296 12.3.1 SVM for Image Spam Detection . . . . . . . . . . . . 298 12.3.2 SVM Experiments . . . . . . . . . . . . . . . . . . . . 300 12.3.3 Improved Dataset . . . . . . . . . . . . . . . . . . . . 304 13 Clustering Applications 307 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 13.2 -Means for Malware Classification . . . . . . . . . . . . . . 307 13.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 308 13.2.2 Experiments and Results . . . . . . . . . . . . . . . . 309 13.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 313 13.3 EM vs -Means for Malware Analysis . . . . . . . . . . . . . 314 13.3.1 Experiments and Results . . . . . . . . . . . . . . . . 314 13.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 317 Annotated Bibliography 319 Index 338 Preface “Perhaps it hasn’t one,” Alice ventured to remark. “Tut, tut, child!” said the Duchess. “Everything’s got a moral, if only you can find it.” — Lewis Carroll, Alice in Wonderland Forthepastseveralyears,I’vebeenteachingaclasson“TopicsinInformation Security.” Each time I taught this course, I’d sneak in a few more machine learning topics. For the past couple of years, the class has been turned on its head, with machine learning being the focus, and information security only making its appearance in the applications. Unable to find a suitable textbook, I wrote a manuscript, which slowly evolved into this book. In my machine learning class, we spend about two weeks on each of the major topics in this book (HMM, PHMM, PCA, SVM, and clustering). For each of these topics, about one week is devoted to the technical details in Part I, and another lecture or two is spent on the corresponding applica- tions in Part II. The material in Part I is not easy—by including relevant applications, the material is reinforced, and the pace is more reasonable. I also spend a week covering the data analysis topics in Chapter 8 and several of the mini topics in Chapter 7 are covered, based on time constraints and student interest.1 Machine learning is an ideal subject for substantive projects. In topics classes, Ialwaysrequireprojects, whichareusuallycompletedbypairsofstu- dents,althoughindividualprojectsareallowed. Atleastoneweekisallocated to student presentations of their project results. A suggested syllabus is given in Table 1. This syllabus should leave time for tests, project presentations, and selected special topics. Note that the applications material in Part II is intermixed with the material in Part I. Also note that the data analysis chapter is covered early, since it’s relevant to all of the applications in Part II. 1Who am I kidding? Topics are selected based on my interests, not student interest. xiv PREFACE Table 1: Suggested syllabus Chapter Hours Coverage 1. Introduction 1 All 2. Hidden Markov Models 3 All 9. HMM Applications 2 All 8. Data Analysis 3 All 3. Profile Hidden Markov Models 3 All 10. PHMM Applications 2 All 4. Principal Component Analysis 3 All 11. PCA Applications 2 All 5. Support Vector Machines 3 All 12. SVM Applications 3 All 6. Clustering 3 All 13. Clustering Applications 2 All 7. Mini-topics 6 LDA and selected topics Total 36 My machine learning class is taught at the beginning graduate level. For an undergraduate class, it might be advisable to slow the pace slightly. Re- gardless of the level, labs would likely be helpful. However, it’s important to treat labs as supplemental to—as opposed to a substitute for—lectures. Learningchallengingtechnicalmaterialrequiresstudyingitmultipletimes in multiple different ways, and I’d say that the magic number is three. It’s no accident that students who read the book, attend the lectures, and conscien- tiouslyworkonhomeworkproblemslearnthismaterialwell. Ifyouaretrying to learn this subject on your own, the author has posted his lecture videos online, and these might serve as a (very poor) substitute for live lectures.2 I’m also a big believer in learning by programming—the more code that you write, the better you will learn machine learning. Mark Stamp Los Gatos, California April, 2017 2In my experience, in-person lectures are infinitely more valuable than any recorded or online format. Something happens in live classes that will never be fully duplicated in any dead (or even semi-dead) format. Chapter 1 Introduction I took a speed reading course and read War and Peace in twenty minutes. It involves Russia. — Woody Allen 1.1 What Is Machine Learning? For our purposes, we’ll view machine learning as a form of statistical discrim- ination, where the “machine” does the heavy lifting. That is, the computer “learns” important information, saving us humans from the hard work of trying to extract useful information from seemingly inscrutable data. For the applications considered in this book, we typically train a model, then use the resulting model to score samples. If the score is sufficiently high, we classify the sample as being of the same type as was used to train the model. And thanks to the miracle of machine learning, we don’t have to work too hard to perform such classification. Since the model parameters are (more-or-less) automatically extracted from training data, machine learning algorithms are sometimes said to be data driven. Machine learning techniques can be successfully applied to a wide range of important problems, including speech recognition, natural language pro- cessing, bioinformatics, stock market analysis, information security, and the homework problems in this book. Additional useful applications of machine learning seem to be found on a daily basis—the set of potential applications is virtually unlimited. It’spossibletotreatanymachinelearningalgorithmasablackboxand,in fact, thisisamajorsellingpointsofthefield. Manysuccessfulmachinelearn- ers simply feed data into their favorite machine learning black box, which, surprisingly often, spits out useful results. While such an approach can work,

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.