Behavior of Machine Learning Algorithms in Adversarial Environments Blaine Nelson Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2010-140 http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-140.html November 23, 2010 Copyright © 2010, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Behavior of Machine Learning Algorithms in Adversarial Environments by Blaine Alan Nelson A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Anthony D. Joseph, Chair Professor J. D. Tygar Professor Peter L. Bartlett Professor Terry Speed Fall 2010 Behavior of Machine Learning Algorithms in Adversarial Environments Copyright © 2010 by Blaine Alan Nelson Abstract Behavior of Machine Learning Algorithms in Adversarial Environments by Blaine Alan Nelson Doctor of Philosophy in Computer Science University of California, Berkeley Professor Anthony D. Joseph, Chair Machine learning has become a prevalent tool in many computing applications and modern enterprise systems stand to greatly benefit from learning algorithms. However, one concern with learning algorithms is that they may introduce a security fault into the system. The key strengths of learning approaches are their adaptability and ability to infer patterns that canbeusedforpredictionsordecisionmaking. However, theseassetsoflearningcanpoten- tially be subverted by adversarial manipulation of the learner’s environment, which exposes applications that use machine learning techniques to a new class of security vulnerabilities. Ianalyzethebehavioroflearningsystemsinadversarialenvironments. Mythesisisthat learning algorithms are vulnerable to attacks that can transform the learner into a liability forthesystemtheyareintendedtoaid,butbycriticallyanalyzingpotentialsecuritythreats, the extent of these threat can be assessed, proper learning techniques can be selected to minimize the adversary’s impact, and failures of system can be averted. I present a systematic approach for identifying and analyzing threats against a machine learning system. I examine real-world learning systems, assess their vulnerabilities, demon- strate real-world attacks against their learning mechanism, and propose defenses that can successful mitigate the effectiveness of such attacks. In doing so, I provide machine learn- ing practitioners with a systematic methodology for assessing a learner’s vulnerability and developing defenses to strengthen their system against such threats. Additionally, I also examineandanswertheoreticalquestionsaboutthelimitsofadversarialcontaminationand classifier evasion. 1 Contents Contents i List of Figures iii List of Tables ix Acknowledgments xi 1 Introduction 1 1.1 Motivation and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Guidelines from Computer Security . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Historical Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2 Background and Notation 21 2.1 Notation and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Statistical Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 A Framework for Secure Learning 33 3.1 Analyzing Phases of Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Exploratory Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5 Causative Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.6 Repeated Learning Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.7 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 I Protecting against False Positives and False Negatives in Causative Attacks: Two Case Studies of Availability and Integrity Attacks 59 4 Availability Attack Case Study: SpamBayes 61 4.1 The SpamBayes Spam Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2 Threat Model for SpamBayes . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3 Causative Attacks against SpamBayes’ Learner . . . . . . . . . . . . . . . . 71 4.4 The Reject On Negative Impact (RONI) defense . . . . . . . . . . . . . . . 75 4.5 Experiments with SpamBayes . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5 Integrity Attack Case Study: PCA Detector 93 i 5.1 PCA Method for Detecting Traffic Anomalies . . . . . . . . . . . . . . . . . 96 5.2 Corrupting the PCA subspace . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.3 Corruption-Resilient Detectors . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 II Partial Reverse-Engineering of Classifiers through Near-Optimal Evasion 125 6 Near-Optimal Evasion of Classifiers 127 6.1 Characterizing Near-Optimal Evasion . . . . . . . . . . . . . . . . . . . . . 129 6.2 Evasion of Convex Classes for ℓ Costs . . . . . . . . . . . . . . . . . . . . . 136 1 6.3 Evasion for General ℓ Costs . . . . . . . . . . . . . . . . . . . . . . . . . . 148 p 6.4 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7 Conclusion 161 7.1 Discussion and Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . 164 7.2 Review of Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 List of Symbols 174 Glossary 178 Bibliography 196 III Appendices 197 A Background 199 A.1 Covering Hyperspheres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 A.2 Covering Hypercubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 B Analysis of SpamBayes 207 B.1 SpamBayes’ I( ) Message Score . . . . . . . . . . . . . . . . . . . . . . . . . 207 · B.2 Constructing Optimal Attacks on SpamBayes . . . . . . . . . . . . . . . . . 208 C Proofs for Near-Optimal Evasion 217 C.1 Proof of K-step MultiLineSearch Theorem . . . . . . . . . . . . . . . . 217 C.2 Proof of Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 C.3 Proof of Theorem 6.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 C.4 Proof of Theorem 6.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 ii List of Figures 1.1 Diagrams of the virus detection system architecture described in Martin [2005], Sewani [2005], Nelson [2005]. (a) The system was designed as an extrusion detector. Messages sent from local hosts are routed to our detec- tor by the mail server for analysis—benign messages are subsequently sent whereasthoseidentifiedasviralarequarantinedforreviewbyanadministra- tor. (b) Within the detector, messages pass through a classification pipeline. Afterthemessageisvectorized,itisfirstanalyzedbyaone-classSVMnovelty detector. Messages flagged as ‘suspicious’ are then re-classified by a per-user naive Bayes classifier. Finally, if the message is labeled as ‘viral’ a throttling module is used to determine when a host should be quarantined. . . . . . . 12 1.2 Depictions of the concept of hypersphere outlier detection and the vulnera- bility of naive approaches. (a) A bounding hypersphere centered at x¯ mean of fixed radius R is used to encapsulate the empirical support of a distribu- tion by excluding outliers beyond its boundary. Samples from the ‘normal’ distribution are indicated by ’s with three outliers on the exterior of the ∗ hypersphere. (b) How an attacker with knowledge about the state of the outlier detector can shift the outlier detector toward the goal xA. It will take several iterations of attacks to sufficiently shift the hypersphere before it encompasses xA and classifies it as benign. . . . . . . . . . . . . . . . . . 17 2.1 Diagrams depicting the flow of information through different phases of learn- ing. (a)Allmajorphasesofthelearningalgorithmexceptformodelselection. Here objects drawn from P are parsed into measurements which then are Z used in the feature selector FS. It selects a feature mapping φ which is used to create training and evaluation datasets, D(train) and D(eval). The learn- ing algorithm H(N) selects a hypothesis f based on the training data and its predictions are assessed on D(eval) according to the loss function L. (b) The training and prediction phases of learning with implicit data collection phases. These learning phases are the focus of this dissertation. . . . . . . . 26 3.1 Diagram of an Exploratory attack against a learning system (see Figure 2.1). 41 3.2 Diagram of a Causative attack against a learning system (see Figure 2.1). . 50 iii 4.1 Probabilistic graphical models for spam detection. (a) A probabilistic model that depicts the dependency structure between random variables in Spam- Bayes for a single token (SpamBayes models each token as a separate indi- cator of ham/spam and then combines them together assuming each is an independenttest). Inthismodel, thelabely fortheith emaildependsonthe i token score q for the jth token if it occurs in the message; i.e., X = 1. The j i,j parameters s and x parameterize a beta prior on q . (b) A more traditional j generative model for spam. The parameters π(s), α, and β parameterize the prior distributions for y and q . Each label y for the ith email is drawn i j i independently from a Bernoulli distribution with π(s)as the probability of spam. Each token score for the jth token is drawn independently from a beta distribution with parameters α and β. Finally, given the label for a message and the token scores, X is drawn independently from a Bernoulli. Based i,j on the likelihood function for this model, the token scores q computed by j SpamBayes can be viewed simply as the maximum likelihood estimators for the corresponding parameter in the model.. . . . . . . . . . . . . . . . . . . 66 4.2 Effect of three dictionary attacks on SpamBayes in two settings. Figure (a) and (b) have an initial training set of 10,000 messages (50% spam) while Figure(c)and(d)haveaninitialtrainingsetof2,000messages(75%spam). Figure(b)and(d)alsodepictthestandarderrorsintheexperimentsforboth of the settings. I plot percent of ham classified as spam (dashed lines) and as spam or unsure (solid lines) against the attack as percent of the training set. I show the optimal attack ( ), the Usenet-90k dictionary attack (♦), △ the Usenet-25k dictionary attack ((cid:3)), and the Aspell dictionary attack ( ). (cid:13) Each attack renders the filter unusable with adversarial control over as little as 1% of the messages (101 messages). . . . . . . . . . . . . . . . . . . . . . 80 4.3 Effect of the focused attack as a function of the percentage of target tokens known by the attacker. Each bar depicts the fraction of target emails clas- sified as spam, ham, and unsure after the attack. The initial inbox contains 10,000 emails (50% spam). . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4 Effect of the focused attack as a function of the number of attack emails with a fixed fraction (F=0.5) of tokens known by the attacker. The dashed line shows the percentage of target ham messages classified as spam after the attack, and the solid line the percentage of targets that are spam or unsure after the attack. The initial inbox contains 10,000 emails (50% spam). . . 82 4.5 Effect of the focused attack on three representative emails—one graph for eachtarget. Eachpointisatokenintheemail. Thex-axisisthetoken’sspam score in Equation (4.2) before the attack (0 indicates ham and 1 indicates spam). The y-axis is the token’s spam score after the attack. The ’s are × tokens that were included in the attack and the ’s are tokens that were not (cid:13) in the attack. The histograms show the distribution of spam scores before the attack (at bottom) and after the attack (at right). . . . . . . . . . . . 84 4.6 Effect of the pseudospam attack when trained as ham as a function of the number of attack emails. The dashed line shows the percentage of the ad- versary’s messages classified as ham after the attack, and the solid line the percentagethatareham orunsure aftertheattack. Theinitialinboxcontains 10,000 emails (50% spam). . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 iv
Description: