Univerista´ degli studi del Sannio DIPARTIMENTO DI INGEGNERIA Corso di Laurea Magistrale in Ingegneria Informatica Master Thesis in Sicurezza delle reti e dei sistemi software Detecting Android Malware Variants using Opcodes Frequency Distribution and Call Graphs Isomorphism Analysis Supervisor: Prof. Corrado Aaron Author : Visaggio Antonio Pirozzi mat. 399/37 Co-Supervisor: Ing. Francesco Mercaldo Academic Year 2013/2014 Dedicated to my Family. Dedicated to my parents Angelo and Orsola, for all that i received from them, for their daily sacrifices, for the entire life. Dedicated to my aunt Loredana, always present in my life, a guardian angel for the whole family, thanks for your support. Dedicated to my uncle Vincenzo, my real older brother, undisputed Master of Jazz. Dedicated to Luciana, a special aunt. Dedicated to Joshua, the joy of the family. Dedicated to my Grandparents, Antonio, Maddalena, Italo, Maria unfortunately they are not there more, they would be very proud of me. Dedicated to my Grandmother Maria, my other mom, I will take her forever in my heart, i know that you are always beside me. Dedicated to my love Alessia, the best girl that I could never find, thanks for making my life better. Dedicated to Steve Jobs and Nikola Tesla, the greatest revolutionaries, they made this world a better place. i Acknowledgements First and foremost a Special Thanks to Prof. Corrado Aaron Visaggio for his help and for the trust he has always placed in me. Thanks also to Ing. Francesco Mercaldo for its continued motivation and collaboration. A special thanks to a special person, Tito. A special thanks to my best friend Marco, he is always there. A special thanks to my friend Angelo, we started this adventure together, all those days passed in “Labis”, believing in our ideas ... A special thanks to Nicola, he gave me a place in his company, he has always believed in me from day one, taught me day by day the qualities that must have a good engineer. A special thanks also to my colleagues Sonia Laudanna, Giovanni Izzo, Fabio Giovine, Angelo Ciampa, Piero Uniti, special people with whom i shared this long University journey. A special thanks to all my University Professors, each of which has been able to give me a great teaching for life. ii “Here’s to the crazy ones. The misfits. The rebels. The troublemakers. The round pegs in the square holes. The ones who see things differently. They’re not fond of rules. And they have no respect for the status quo. You can praise them, disagree with them, quote them, disbelieve them, glorify or vilify them. About the only thing you can’t do is ignore them. Because they change things. They invent. They imagine. They heal. They explore. They create. They inspire. They push the human race forward. Maybe they have to be crazy...While some see them as the crazy ones, we see genius. Because the people who are crazy enough to think they can change the world, are the ones who do.” Steve Jobs Abstract Android platform starts became the universal front-end in the IoE and IoT, mobile attacks will continue to grow rapidly as new technologies expand the attack surface. Vendors, manufacturer, providers should extend vulnerability shielding and exploit- prevention technologies and Anti-Malware vendors have to enhance their actual solu- tions, because new type of malware have a fileless payload that only runs in memory and to circumvent detection as well it adopt more complex obfuscation techniques. The idea behind this work, arises from the awareness that a more effective and holistic anti malware approach have to first outline the phylogenesis, understand its evolution andsophistication,theirbelongingsemantics. Thismethodologymovetowardthisdirec- tion, implementing a clone-detection heuristic for outline common payload components in order to identify malware variants. Our Heuristic, is a contribution in the Malware Analysis phase, not in the Detection phase, to well-understand Android Malware and their evolution, to trace back a possible Malware descent. To achieve these goals, we start from the analysis of the Opcodes Frequency Distribu- tion, obtaining by similarities, the 10 nearest vectors from the Data-set (build from the Android Drebin Project), then, an n-grams heuristic on the Adjacency Lists, detect isomorphism features in the Call Graphs to identify payloads components as common sub-graphs. Then we a re able to outline a possible genome for each malware family and are able to define a possible descent for each malware variant, also multiple-descents, proving the effectiveness of this methodology. This work aims to lay the foundation of a new types of methodologies based on the study of the payload philogenesy. Contents Acknowledgements ii Abstract iv Table of Contents v List of Figures viii List of Tables xi List of Listings xiii Abbreviations xv 1 Introduction 1 1.1 Motivation and Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Android: A Security Overview 3 2.1 The Android Security Model . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 Android Security Program . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Android Platform Security Architecture . . . . . . . . . . . . . . . 5 2.1.2.1 Kernel-level Security . . . . . . . . . . . . . . . . . . . . . 6 2.1.2.2 The Application Sandbox . . . . . . . . . . . . . . . . . . 6 2.1.2.3 The System Partitions . . . . . . . . . . . . . . . . . . . . 8 2.1.2.4 The Secure BootLoader and QFuses . . . . . . . . . . . . 9 2.1.2.5 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2.6 Security-Enhanced Android and Samsung Knox . . . . . 13 2.1.2.7 Secure IPC : INTENTS and BINDER . . . . . . . . . . . 18 2.1.3 Android Application Security . . . . . . . . . . . . . . . . . . . . . 34 2.1.3.1 The Android Permission Model and Protected APIs . . . 34 2.1.3.2 Application Signing Permission Enforcement . . . . . . . 36 2.2 Malicious Android apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2.1 An Overview: Botnet, Data collectors and Madware . . . . . . . . 47 2.2.2 Android Malware Characterization . . . . . . . . . . . . . . . . . . 53 v Contents vi 2.2.2.1 Malware Installation . . . . . . . . . . . . . . . . . . . . . 55 2.2.2.2 Activation . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.2.2.3 Malicious Payload . . . . . . . . . . . . . . . . . . . . . . 58 2.2.2.4 Permission Used . . . . . . . . . . . . . . . . . . . . . . . 60 2.2.3 Evolutions and Challenges . . . . . . . . . . . . . . . . . . . . . . . 65 3 Malware Detection Methodologies 67 3.1 Malware Detection Techniques . . . . . . . . . . . . . . . . . . . . . . . . 68 3.1.1 Signature-based Detection Techniques . . . . . . . . . . . . . . . . 69 3.1.2 Anomaly-based Detection Techniques . . . . . . . . . . . . . . . . 72 3.1.3 Application Permission Analysis . . . . . . . . . . . . . . . . . . . 73 3.1.4 Cloud-based Detection Analysis . . . . . . . . . . . . . . . . . . . . 78 3.2 Malware trasformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3 Evaluating Android anti-malware Products . . . . . . . . . . . . . . . . . 85 3.4 Malware Analysis using Call Graphs . . . . . . . . . . . . . . . . . . . . . 91 3.4.1 Similarity Detection using Call Graphs . . . . . . . . . . . . . . . . 92 3.4.2 Software Clones Taxonomy . . . . . . . . . . . . . . . . . . . . . . 94 3.4.3 The Isomorphism Problem . . . . . . . . . . . . . . . . . . . . . . 101 4 Detecting Android Malware Variants using Opcodes Frequency Distri- bution and Call Graphs Isomorphism Analysis 102 4.1 The Data-Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.1.1 The Malicious Data-Set, Malware Variants . . . . . . . . . . . . . 104 4.1.1.1 Malware families . . . . . . . . . . . . . . . . . . . . . . . 104 4.1.2 The Trusted Data-set . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.2 The multi-staged Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.3 The Relative Opcodes Frequency Distribution . . . . . . . . . . . . . . . . 112 4.4 Isomorphism Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 4.4.1 The Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . 142 4.4.2 The Function Call Graph . . . . . . . . . . . . . . . . . . . . . . . 143 4.4.3 Vector-Space Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 144 4.4.4 N-grams Analysis stage . . . . . . . . . . . . . . . . . . . . . . . . 156 4.4.4.1 N-grams Analysis for each of the 10 returned vectors . . 158 4.4.5 The Method-level similarity analysis stage. Type I and II clones detection at method-level . . . . . . . . . . . . . . . . . . . . . . . 164 4.5 The Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 4.6 Real System implementation . . . . . . . . . . . . . . . . . . . . . . . . . 167 4.6.0.1 DescentDroid.pm. . . . . . . . . . . . . . . . . . . . . . . 170 5 Experimental Phase 173 5.1 Focus of the Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.2 Focus I: Performance evaluation : A MultiClass Classification Problem . . 173 5.2.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.2.2 Evaluations Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 5.2.2.1 Classification Accuracy, Precision&Recall for each Mal- ware Class. . . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.2.3 Conclusions and Considerations . . . . . . . . . . . . . . . . . . . . 187 5.3 Focus II: Detection Evaluation of Android Malware . . . . . . . . . . . . . 188 Contents vii 5.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 5.3.2 Evaluations Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 5.3.2.1 I classifier: 0.99800 CosSim . . . . . . . . . . . . . . . . 189 5.3.2.2 II classifier: 0.99500 CosSim . . . . . . . . . . . . . . . . 189 5.3.2.3 III classifier: 0.99000 CosSim . . . . . . . . . . . . . . . 190 5.3.3 Conclusions and Considerations . . . . . . . . . . . . . . . . . . . . 191 6 Applications and future works 195 A Partial code of Descent tool. Opcodes Frequency Distribution Com- putation 197 Bibliography 214 List of Figures 2.1 Android Software Stack [1] . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Android Sandbox Mechanism [2] [3] . . . . . . . . . . . . . . . . . . . . . 7 2.3 Applicazions sharing the same UID [2] [3] . . . . . . . . . . . . . . . . . . 7 2.4 Motorola OMAP Secure Boot Chain [4] . . . . . . . . . . . . . . . . . . . 10 2.5 Samsung Knox efuse warranty bit . . . . . . . . . . . . . . . . . . . . . . . 12 2.6 HTC S-OFF NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.7 SEAndroid partialc coverage [5] . . . . . . . . . . . . . . . . . . . . . . . . 14 2.8 Samsung KNOX System Security Overview [6] . . . . . . . . . . . . . . . 15 2.9 Samsung KNOX Secure Boot [7] . . . . . . . . . . . . . . . . . . . . . . . 15 2.10 Samsung KNOX Application Container [6] . . . . . . . . . . . . . . . . . . 16 2.11 Samsung KNOX Support CAC [6] . . . . . . . . . . . . . . . . . . . . . . 17 2.12 Samsung KNOX Client Certificate Management (CCM) [7] . . . . . . . . 18 2.13 Android simple form of IPC [8] . . . . . . . . . . . . . . . . . . . . . . . . 19 2.14 Android Process IPC [9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.15 Binder Communication [10] . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.16 Binder Framework [10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.17 Binder Permission Enforcement [10] . . . . . . . . . . . . . . . . . . . . . 31 2.18 Binder Token Object Reference [10] . . . . . . . . . . . . . . . . . . . . . 32 2.19 Display of permissions for applications [11] . . . . . . . . . . . . . . . . . 38 2.20 Securing Activities with Custom permission [12] . . . . . . . . . . . . . . 40 2.21 Securing Services with Custom permission [12] . . . . . . . . . . . . . . . 41 2.22 Securing BroadcastReceiver with Custom permission [12] . . . . . . . . . 42 2.23 Securing ContentProvider with Custom permission [12]. . . . . . . . . . . 43 2.24 GMail AndroidManifest prior to 2.3.5 without signature-level enforcing . . 45 2.25 GMail AndroidManifest after to 2.3.5 with signature-level enforcing. . . . 45 2.26 Securing with URI permissions [12] . . . . . . . . . . . . . . . . . . . . . . 46 2.27 Apps collect your Information [13] . . . . . . . . . . . . . . . . . . . . . . 47 2.28 Apps collect your Information [13] . . . . . . . . . . . . . . . . . . . . . . 48 2.29 Which apps are abusing [13] . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.30 AdLibraries’ privacy scores [13] . . . . . . . . . . . . . . . . . . . . . . . . 50 2.31 AdLibraries brought to you by malware [13] . . . . . . . . . . . . . . . . . 51 2.32 An Update Attack from BaseBridge [14] . . . . . . . . . . . . . . . . . . . 56 2.33 Comparison of the top 20 permission requested by malicious and bening Android apps [14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.34 An Overview of existing Android Malware (PART I: INSTALLATION AND ACTIVATION) 1 of 2 [14] . . . . . . . . . . . . . . . . . . . . . . . 61 2.35 An Overview of existing Android Malware (PART I: INSTALLATION AND ACTIVATION) 2 of 2 [14] . . . . . . . . . . . . . . . . . . . . . . . 62 viii List of Figures ix 2.36 An Overview of existing Android Malware (PART II: MALICIOUS PAY- LOADS) 1 of 2 [14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.37 An Overview of existing Android Malware (PART II: MALICIOUS PAY- LOADS) 2 of 2 [14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.1 Malware detection Technologies Taxonomy [15] . . . . . . . . . . . . . . . 69 3.2 Dynamic Signature Extraction [15] . . . . . . . . . . . . . . . . . . . . . . 71 3.3 Top 20 requested permissions which has the most different requested rate in different dataset. The ordinate is the difference between the requested rate in malware dataset and the requested rate in benign dataset [16] . . . 74 3.4 Difference in the frequencies of 18 selected permission in malware and benign .apk files [17] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.5 COMPARATIVEANALYSISOFBI-NORMALSEPARATIONANDMU- TUAL INFORMATION FEATURE SELECTION METHOD [17] . . . . 75 3.6 TOP 5 PERMISSION COMBINATIONSWHEN K = 5 [18] . . . . . . . . 76 3.7 TOP 5 PERMISSION COMBINATIONSWHEN K = 6 [18] . . . . . . . . 77 3.8 Cloud-based malware protection techniques: (a) Paranoid Android and (b) Crowdroid [19] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.9 Before ProGuard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.10 After ProGuard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.11 Before ProGuard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.12 String Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.13 Before ProGuard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.14 Code Reordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.15 An example of a junk code fragment [20] . . . . . . . . . . . . . . . . . . . 83 3.16 Av-test carried out in the 2014: Detection rates in the endurance test [21] 86 3.17 ANTI-MALWARE PRODUCTS EVALUATED [20] . . . . . . . . . . . . 87 3.18 MALWARESAMPLESUSEDFORTESTINGANTI-MALWARETOOLS [20] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.19 Trasformation Keys [20] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.20 DROIDDREAMTRANSFORMATIONSANDANTI-MALWAREFAIL- URE [20] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.21 FAKEPLAYER TRANSFORMATIONS AND ANTI-MALWARE FAIL- URE [20] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.22 EVALUATION SUMMARY [20] . . . . . . . . . . . . . . . . . . . . . . . 89 3.23 “Android.Trojan.FakeInst.AS” from the FakeInstaller Malware Family [22] 93 3.24 Complete function call graph of “Android:RuFraud-C” from the malware family FakeInstaller. Dark shading of nodes indicate malicious structures identified by the SVM [22] . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.25 Twocorrespondingmethodsintwoappclonesarefromdifferentmarkets. The first method has one more function call to initialize several ads [23] . 94 4.1 Android Drebin logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.2 The Whole System Multi-staged Architecture . . . . . . . . . . . . . . . . 111 4.3 66d4fb0ba082a53eaedf8909f65f4f9d60f0b038e6d5695dbe6d5798853904aasam- ple, Opcode frequency Distribution . . . . . . . . . . . . . . . . . . . . . . 116 4.4 Opcodes Distribution extraction and Computation . . . . . . . . . . . . . 117
Description: