On Concept Drift, Deployability, and Adversarial Selection in Machine Learning-Based Malware Detection A Dissertation Presented to the Graduate Faculty of the University of Louisiana at Lafayette In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Anshuman Singh Summer 2012 (cid:13)c Anshuman Singh 2012 All Rights Reserved On Concept Drift, Deployability, and Adversarial Selection in Machine Learning-Based Malware Detection Anshuman Singh APPROVED: Arun Lakhotia, Chair C. Henry Chu Professor of Computer Science Professor of Computer Engineering The Center for Advanced Computer The Center for Advanced Computer Studies Studies Vijay V. Raghavan Andrew Walenstein Professor of Computer Science Assistant Professor of Computer Science The Center for Advanced Computer School of Computing and Informatics Studies David Breaux Dean of the Graduate School ACKNOWLEDGMENTS I thank my advisor, dissertation committee members, family, and friends for their help, support, and guidance throughout the course of this research. I would also like to thank the University of Louisiana at Lafayette for awarding me the doctoral fellowship as well as providing me the opportunity to undertake doctoral research. TABLE OF CONTENTS ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x CHAPTER1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4. Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 CHAPTER2: BACKGROUND AND RELATED WORK . . . . . . . . . . . . . 12 2.1. Malware detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2. Machine learning based classification . . . . . . . . . . . . . . . . . . . . 13 2.3. Machine learning based malware detection . . . . . . . . . . . . . . . . . 17 2.4. Performance of classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1. Measures of performance . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.2. Generalization performance . . . . . . . . . . . . . . . . . . . . . 24 2.5. Concept drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.6. Classifier selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6.1. Classifier selection in intrusion detection, actionability, and applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6.2. Dynamic classifier selection and classifier evasion . . . . . . . . . 31 2.7. Game theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.7.1. Solution of games . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 CHAPTER3: CONCEPT DRIFT IN MALWARE . . . . . . . . . . . . . . . . . 35 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2. Malware evolution and impact on concept drift . . . . . . . . . . . . . . . 38 3.2.1. Natural evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.2. Environmental evolution . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.3. Polymorphic evolution . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3. Measures for tracking concept drift in malware . . . . . . . . . . . . . . . 40 3.3.1. Relative temporal similarity . . . . . . . . . . . . . . . . . . . . . 41 3.3.2. Metafeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.3. Retraining performance . . . . . . . . . . . . . . . . . . . . . . . 45 3.4. Empirical study of drift in malware families . . . . . . . . . . . . . . . . 45 3.4.1. Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.2. Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.3. Relative temporal similarity . . . . . . . . . . . . . . . . . . . . . 48 3.4.4. Tracking metafeatures . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.5. Retraining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 CHAPTER4: DEPLOYABLE CLASSIFIERS FOR MALWARE DETECTION . 63 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2. Deployable classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.1. The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.2. Strong and weak deployability . . . . . . . . . . . . . . . . . . . . 65 4.2.3. Condition for existence of non-extremal deployable performance . 67 4.2.4. Method of deployable classifier selection . . . . . . . . . . . . . . 68 4.3. Experimental illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3.2. Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 CHAPTER5: ADVERSARIAL DYNAMIC CLASSIFIER SELECTION . . . . . 75 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2. Modeling heterogeneous AV systems . . . . . . . . . . . . . . . . . . . . 77 5.2.1. Classifier selection vs. classifier fusion . . . . . . . . . . . . . . . . 78 5.2.2. Game-theoretic model of a DCS-based AV system . . . . . . . . . 79 5.3. Game-theoretic analysis of optimal strategies . . . . . . . . . . . . . . . . 84 5.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 CHAPTER6: CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . 91 6.1. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.1.1. Concept drift in malware . . . . . . . . . . . . . . . . . . . . . . . 91 6.1.2. Deployability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.1.3. Adversarial configuration of DCS . . . . . . . . . . . . . . . . . . 92 6.2. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.2.1. Concept drift in malware . . . . . . . . . . . . . . . . . . . . . . . 93 vi 6.2.2. Deployability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.2.3. Adversarial configuration of DCS . . . . . . . . . . . . . . . . . . 95 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 vii LIST OF TABLES Table2.1: Papers that use machine learning for malware detection . . . . . . 20 Table2.2: Confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Table2.3: Classification of games . . . . . . . . . . . . . . . . . . . . . . . . . 33 Table3.1: Agobot features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Table3.2: Yearwise number of samples from different malware families based on PE header timestamp . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Table3.3: Number of samples with the same PE header timestamp . . . . . . 48 Table3.4: Number of samples in two treatments for each family . . . . . . . . 49 Table3.5: Number of mnemonic 2-grams before and after feature selection . . 52 Table3.6: Accuracy of different classifiers trained on original and recent dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Table4.1: Performance evaluation results for opcode 2-gram based classifiers for malware detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Table4.2: Deployable classifiers and their performance for confidence level of α = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Table4.3: Comparison of classifiers selected by minimum risk criterion and deployability criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 LIST OF FIGURES Figure1.1: Growth in malware signatures in recent years (Source: Symantec) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Figure1.2: Malware detection process . . . . . . . . . . . . . . . . . . . . . . 3 Figure1.3: Machine learning process . . . . . . . . . . . . . . . . . . . . . . . 4 Figure2.1: Stages of classifier design . . . . . . . . . . . . . . . . . . . . . . . 14 Figure2.2: Extraction of mnemonic n-grams . . . . . . . . . . . . . . . . . . 18 Figure2.3: A ROC graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Figure2.4: An example ROC curve . . . . . . . . . . . . . . . . . . . . . . . 24 Figure3.1: Relative temporal similarity of Agobot samples . . . . . . . . . . 43 Figure3.2: Relative temporal similarity of version-controlled benign samples . 44 Figure3.3: Relative similarity with byte 2-grams and mnemonic 2-grams for three malware families . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure3.4: Relative similarity with respect to PE header date for mnemonic 2-grams for three malware families . . . . . . . . . . . . . . . . . . . . . 54 Figure3.5: Agent: Band for cos(1,i) in [0.8-0.85] . . . . . . . . . . . . . . . . 55 Figure3.6: Agent: Band for cos(1,i) in [0.17-0.2] . . . . . . . . . . . . . . . . 56 Figure3.7: Pcclient: Band for cos(1,i) in [0.9-1] . . . . . . . . . . . . . . . . . 57 Figure3.8: Pcclient: Band for cos(1,i) in [0.75-0.8] . . . . . . . . . . . . . . . 58 Figure3.9: Relative similarity with mnemonic 2-grams with TF features for three malware families . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Figure3.10: Metafeature drift for three malware families . . . . . . . . . . . . 60 Figure4.1: Classification tree with costs and probabilities . . . . . . . . . . . 65 Figure4.2: Deployability of different opcode 2-gram based classifiers for malware detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Figure5.1: Classifier fusion and selection . . . . . . . . . . . . . . . . . . . . 78 Figure5.2: Game tree for the DCS architecture . . . . . . . . . . . . . . . . . 83 Figure5.3: Plot of pAC −pBC vs. t when condition 5.3.6 holds . . . . . . . 87 D D AC Figure5.4: Plot of pAC −pBC vs. t when condition 5.3.9 holds . . . . . . . 88 D D AC Figure5.5: Plot of ∆ vs. t . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 AC x
Description: