Finding Data Leaks in Xamarin Apps by Performing Taint Analysis on CIL Code DEIS1027F18 MIKKEL CHRISTIAN LYBECK CHRISTENSEN SIMON ELLEGAARD LARSEN SØREN AKSEL HELBO BJERGMARK THOMAS PILGAARD NIELSEN SoftwareEngineering AalborgUniversity SPRING2018 Cassiopeia Department of Computer Science Selma Lagerlöfs Vej 300 9220 Aalborg Ø Tlf. 9940 9940 http://www.cs.aau.dk Abstract: Title FindingDataLeaksinXamarinApps byPerformingTaintAnalysisonCIL Today, smartphones handle a lot of personal and private data of their users. Code WiththeimplementationofGDPR,more focus is put on the handling of private Theme data, and the importance of preventing MasterThesis dataleaks. Projectperiod In this report, we present a tool to SpringSemester2018 perform taint analysis on Common In- termediateLanguage(CIL)codewiththe intent of analyzing Android apps made Projectgroup usingXamarin. deis1027f18 An intermediate language called Simple Members: CIL (SCIL) is created to simplify the MikkelChristianLybeckChristensen analysis on CIL code. SCIL is formally SimonEllegaardLarsen defined,andaflowanalysisinregardsto SørenAkselHelboBjergmark acontrolflowanalysis(CFA)isdefinedas well.ToaccompanySCIL,SimpleCILAn- ThomasPilgaardNielsen alyzer(SCIL/A)andFlixAnalyzer(Flix/A) arepresentedtoperformtheanalysis. Supervisor RenéRydhofHansen Together, SCIL/A and Flix/A are able to MagnusMadsen scanaXamarinappandfindpotentially insecuredataflow. Thisinvolvesusinga No.ofprintedcopies 0 controlflowgraph(CFG),transformingto staticsingleassignment(SSA)form, and No.ofpages 94 resolving dynamic methods, in order to analyze the flow through the app. In an Appendices A-D evaluation of SCIL/A and Flix/A where 2,866 Xamarin apps were scanned, 20% Completed 2018-06-15 wereflaggedwithpotentialproblems. Thecontentsofthisreportisfreelyaccessible,howeverpublication(withsourcereferences)isonlyalloweduponagree- mentwiththeauthors. FindingDataLeaksinXamarinAppsbyPerformingTaintAnalysisonCILCode Summary Inthisreport,ataintanalysisofCommonIntermediateLanguage(CIL)ispresentedwiththe intention of scanning Android apps made with Xamarin. The intermediate language Simple CIL (SCIL) is presented along with Simple CIL Analyzer (SCIL/A) and Flix Analyzer (Flix/A), whicharetoolstoperformtheanalysisontheSCILcode. SCIL/AtransformsCILcodetoSCIL andthentoFlixfacts,whereFlix/AperformsthetaintanalysisontheFlixfacts. TheprojectstartswiththeinitialproblemstatementinSection1.1: Appshaveaccesstoprivatedataandtheuserdoesnotknowwhathappenstothe data. Howcanatoolbecreated,whichcantrackdatathroughasmartphoneapp madeusingXamarin? From this initial problem statement, the problem area is analyzed first with a description of GeneralDataProtectionRegulation(GDPR)andthenexistingtoolsexamined. ThetoolFlow- DroidisusedtoperformataintanalysisonnativeAndroidapps,where11.7%of66,969apps arefoundtopotentiallyhavedataleakage. ItisexpectedthattheXamarinappshaveasimilar numberofappswithpotentialproblems. ThisleadstotheproblemstatementinSection2.4, whichisthebasisfortherestofthereport: Howcanastatictaintanalysistoolbeconstructedtoexaminetheflowofdatain appsdevelopedusingXamarin? Inthetheorychapter,varioussubjectsareinvestigatedforthepurposeofgeneratingrequire- mentsandobtainingthenecessaryknowledgefortheimplementationofatooltoanalyzeXa- marinapps,whichleadstotherequirementsforthetoolinSection3.7. InChapter4,thedefinitionofSCILandtheimplementationofSCIL/AandFlix/Aaredescribed. First,thestructure,componentsandoperationalsemanticsforSCILaredefinedandexplained. ThedefinitionofSCIListhenusedtodefineabstractdomainsandflowlogicrulestoperform acontrolflowanalysis(CFA)ofSCIL.ThisCFAisthefoundationforthetaintanalysis. WithSCILdefined,SCIL/Aisimplementedwithfocusonthefollowing: • AndroidPackageKit(APK)parsing • Thecreationofacontrolflowgraph(CFG)andacallgraph • Conversiontostaticsingleassignment(SSA) • Handlingbranching • Handlingasynchronoustasks TheoutputofSCIL/AisFlixfactsforfurtheranalysis. WiththeimplementationofSCIL/Acomplete,Flix/Aisimplemented. Flix/Aisexecutedwith the Flix facts from SCIL/A, where Flix facts are used to determine taint propagation through the code. In Flix/A, extra effort was placed on implementing branching, method calls, and asynchronoustasks. InadditiontothetaintanalysisofFlix/A, asimplestringanalysisbased oncharacterinclusionisperformed. ToevaluateSCIL/AandFlix/A,automatedtestsintheformofunittestsarecreated,whichtest differentaspectsoftheanalysis. Furthermore,ageneralevaluationisperformedbyscanning appswiththetool,wherestatisticsarecollected.Furthermore,threedetectedappsareinvesti- gated,tocheckiftheappswerecorrectlyflaggedbytheanalysis. Page2of94 Toroundofftheproject,adiscussionisdone,whereselectedshortcomingsoftheprojectare discussed. In the conclusion, it is investigated if we succeeded in what was planned for the projectinregardstotheproblemstatementsandtherequirements.Finally,futureworkforthe projectisdiscussed,outliningwhatthenextstepforSCIL,SCIL/A,andFlix/Ais. Page3of94 Preface This report and the associated product were developed as a project on the master’s program inSoftware Engineering at AalborgUniversity. Theprojectis basedonthe problemoriented modelfromAAU. BasicknowledgeaboutthestructureofAndroidappsisexpectedofthereader.Throughoutthe report,anytermsthatthereaderisnotexpectedtoknow,willbeexplained. Inordertoensurereproducibilityoftheresultsinthisprojects,allsourcecodecanbefoundat: https://github.com/sahb1239/SCIL.Appsusedinthisreportcanberequestedbycontact- ingtheauthorsofthisreport. Reading guide Sourcereference ReferencestosourcematerialusetheVancouverstyle. Thenumberinsquarebracketsatthe end of a given statement refers to an entry in the bibliography at the end of the report. The followingisanexampleofasourcereferenceonasimplestatement. AalborgUniversityoffersamaster’sdegreeinSoftwareEngineering [0]. Bibliography TheVancouversystemalsospecifiesthewayindividualentriesinthebibliographyarestruc- tured. The information is listed as follows: author(s), title of article/section, relevant pages, booktitle,editor,publisher,yearofpublicationandISBN.Anyinformationthatisunavailable ordoesnotapplytoagivenentrymaybeexcluded. Figureandtablesreference Allfiguresandtablesinthereportareassignedauniquenumberthatcanbereferencedrepeat- edlythroughoutthereport.Thefirstnumberinthereferencereferstothechapterofthefigure, whilethesecondnumberindicatesthepositioninthesequenceoffigures/tablesinthechap- ter. Immediatelybelow, ashortdescriptionisfound. Allfiguresandtableswithnoindicated sourcehavebeenproducedbytheprojectgroup.AnexampleisseeninFigure1. Figure1:Androidlogo[0] Listings Alllistingadheretothesamerulesasfiguresandtables,numberedseparately.Thecodeshown may have some parts removed that are irrelevant to the example, which will be marked with comments in the code. All listings are followed by the name of the programming language usedintheexample.AnexampleofacodelistingisseeninListing1. 1 public static void main(String[] args) { 2 System.out.println("Hello, World"); 3 } Listing1:Exampleofalisting(Java) Page1of94 FindingDataLeaksinXamarinAppsbyPerformingTaintAnalysisonCILCode Table of Contents 1 Introduction 3 1.1 InitialProblemStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 ProblemAreaAnalysis 5 2.1 GeneralDataProtectionRegulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 ExistingTools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 TargetAudience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 ProblemStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Theory 11 3.1 Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Xamarin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 CommonIntermediateLanguage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 ControlFlowAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5 StaticSingleAssignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6 TaintAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.7 RequirementsElicitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Implementation 25 4.1 ProgramStructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 OperationalSemantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 FlowLogic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 AnalyzerOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.5 SimpleCILAnalyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.6 FlixAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5 Test&Evaluation 61 6 Discussion 66 7 Conclusion 71 8 FutureWork 74 A SourcesandSinksConfiguration 78 B ZIPFile 81 C FlowLogicRules 82 D AllSCILInstructions 85 Bibliography 86 Page2of94 Chapter 1 Introduction Smartphonesaregrowingmorepopulareveryyear,andtheamountofpersonaldatatheyhave access to is growing. Apps that make use of this data have a responsibility to their users to keep it secure, but this is not always the case. Regardless of whether the apps are malicious orbenign,someappsdoleakpersonalinformation. Tostrengthenandunifydataprotection forcitizensoftheEuropeanUnion(EU), theEuropeanParliamenthasapprovedtheGeneral Data Protection Regulation (GDPR) [1]. This regulation enforces a Privacy by Design (PbD) approach,whichmeansthatcompaniesareobligatedtointegrateprivacyconcernsintotheir design.Personaldataisonlytobeprocessedwhenneededbythesystem. In modern Android app development, developers have multiple choices on how to develop apps, e.g. native, web or cross-platform apps. To develop cross-platform apps, there exists different frameworks, e.g. Xamarin1, React Native2 and Unity3 to make the development as smoothandeasyaspossible. Xamarin,acquiredbyMicrosoftin2016[2],makesitpossibleto developAndroidappsbyusingC#. Inthisreport,wewillcontinuetheworkfromourproject last semester[3] and focus on Xamarin apps. Android apps are most commonly distributed throughGooglePlayStore4,whichcontainsaround3.6millionAndroidappsasofMarch2018 [4],buttherearenoofficialnumbersofhowmanyoftheseappsaremadewithXamarin. FerraraandSpoto[5]publishedapapercalled“StaticAnalysisforGDPRCompliance”atITA- SEC18,anItalianconferenceoncybersecurity. Thispaperraisesawarenessofhowstaticpro- gramanalysiscanbeusedtocheckwhetherappsviolatetheGDPR. Theypointoutthattaint analysiscanbeusedforcheckingifprivacyleakscanoccur.Manytaintanalysistoolsfornative Androidappsalreadyexist. However,tothebestofourknowledge,suchatooldoesnotexist forappsdevelopedwith.NETbasedcross-platformtools,suchasXamarin. 1.1 Initial Problem Statement Withthecomplexityofmodernsmartphoneappsitisnearlyimpossibleforaregularuserto figure out what happens to the personal data given to an app, often on the premise that it is necessaryfortheapporservicetofunctionnormally.Furthermore,thecomplexityalsomakes analyzingappbinariesdifficultforthirdparties,forinstanceresearchers,toverifythatapartic- ularappdoesnotleakpersonalinformation. Thisleadstothefollowinginitialproblemstate- ment: Appshaveaccesstoprivatedataandtheusersdonotknowwhathappenstothe data. How can a tool be created, to track data through a smartphone app made usingXamarin? Withtheinitialproblemstatementdefined, theproblemareacanbeanalyzed, withthepur- 1https://www.xamarin.com/ 2https://facebook.github.io/react-native/ 3https://unity3d.com/ 4https://play.google.com/store Page3of94 FindingDataLeaksinXamarinAppsbyPerformingTaintAnalysisonCILCode poseofgainingknowledgeabouttheproblem.Thisknowledgewillthenleadtoafinalproblem statement,whichwillbethefoundationfortheproject. Page4of94 Chapter 2 Problem Area Analysis Inthischapter,weanalyzetheproblemarea,includingadescriptionoftheGDPR(Section2.1), existing taint analysis tools for Android (Section 2.2) and a discussion of the potential target audiences of a taint analysis tool for Xamarin (Section 2.3). This analysis leads up to a final problemstatementinSection2.4,whichformsthebasisfortherestofthereport. 2.1 General Data Protection Regulation TheGeneralDataProtectionRegulation(GDPR)isaregulationoriginallyproposedbytheEu- ropeanCommissionin2012inordertostrengthenandunifydataprotectionforcitizensofthe EU. Theregulationwasadoptedin2016andbecameenforceableon25May2018afteratwo yearpost-adoptiongraceperiod.[1] 2.1.1 Background GDPRoriginatesfromtheEU’ssingledigitalmarketstrategy,whichisintendedtosimplifythe rulesforcompaniesinsidetheEU. Thegoaloftheregulationistostrengthencitizens’rights, facilitatebusinessandreduceadministrativeburdens.Specifically,theGDPRprotectstheper- sonaldataofEUcitizensandthefreemovementofsuchdata. Personaldataisonlyallowedto becollectedatthetimewhenitisneeded, andthedataisrequiredtobeprotectedandonly usedforlegalpurposes.[1] TheGDPRreplacestheexistingdataprotectiondirectivefrom1995,implementingseveralim- portantchanges[6].Amongthesechangesareincreasedterritorialscope,astheGDPRapplies toallcompaniesprocessingpersonaldataofcitizensoftheEU,regardlessofthelocationofthe company. Otherchangesincludeanincreaseinthepotentialpenaltyfornon-complianceand strengthenedconditionsforrequestingconsentfromusers. Additionally,theregulationoffers morelegislativepowercomparedtotheexistingdirective,sinceregulationsarelegallybinding intheirentirety,whiledirectivessimplysetagoalthatindividualcountriescandecidehowto achieve.[1] Oneimportantgoaloftheregulationistosecureanumberofobligationsforcompaniesthat processpersonaldata,andrightsforthesubjectsofthisdataprocessing.Itwillbecomemanda- torytonotifycustomersinthecaseofadatabreachwithin72hours,ifthedatabreachposes arisktothecustomers. Customerswillhavetherighttoknowwhatpersonaldataconcerning them is being processed, where and for what purpose. Data controllers are also obligated to provideacopyofthedataforfreeinanelectronicformat.Datasubjectsreservetherighttobe forgotten,i.e. havethedatacontrollererasepersonaldata,stopanyfurtherprocessingofthe dataandstopanypotentialthirdparties’processingofthedata.Thisrightisinvokedwhenthe dataisnolongerrelevanttoitsoriginalpurposeorifthedatasubjectwithdrawsconsent.Data subjectsalsohavetherighttoreceiveanypersonaldatainacommonlyusedmachinereadable format,andtotransmitthisdatatoanotherdatacontroller. Lastly,theGDPRrequiressystems tobedesignedwithdataprotectioninmind,knownasPrivacybyDesign(PbD).[1] Page5of94 FindingDataLeaksinXamarinAppsbyPerformingTaintAnalysisonCILCode 2.1.2 PrivacybyDesign PrivacybyDesign(PbD)isnotanewconcept,butwiththeGDPRitisbecomingalegalrequire- mentfordevelopingsystemsthattheymustincludedataprotectionbydefault,notasanaddi- tion. Datacontrollerswillhavetoimplementtechnicalandorganizationalmeasuresinorder tosufficientlyprotecttherightsofdatasubjects. Thisapproachischaracterizedbyproactive rather than reactive measures: privacy compromising events are anticipated and prevented beforetheyhappen.[7] 2.2 Existing Tools Inthissection,twooftheexistingtoolsforperformingstaticanalysisonprogramsareexam- ined. Thesetoolsareexaminedtofindoutiftheycontainfeaturesthatcouldserveasinspira- tionforanewtool.ThetoolsweexamineareFlowDroidandGendarme. 2.2.1 FlowDroid FlowDroid is a “context-, flow-, field-, object-sensitive and lifecycle-aware static taint analysis toolforAndroidapps”,asdescribedbyoneofthecreators,Bodden[8]. WhatFlowDroiddoesdifferentlythanothertaintanalysistools,isthatitpreciselymodelsthe Androidlifecycle.TheactivitylifecycleinAndroidcreatesvariousentrypoints,e.g.withtheuse ofasynchronouslyexecutingcomponentsandcallbacks,whichhavetobetakenintoconsid- erationwhenanalyzingAndroidapps. Fromtheapplifecycleinformation,FlowDroidcreates adummymainmethod,fromwhichaninter-proceduralcontrolflowgraphisgeneratedand traversedtofollowtaintpropagation.[9] AnexampleofthistraversalcanbeseeninFigure2.1.Thisdepictsthecombinationofforward and on-demand backwards analysis, where every time a heap object is tainted, a backwards analysisisdonetocombataliasing. Figure2.1:ExampleofFlowDroidtaintanalysisinregardtoaliasing.[10] The purpose of the analysis is to check if there is a connection between a source and a sink. Sourcesandsinksareusuallydefinedinoneoftwodifferentways.Thefirstiswherethesource indicatessomekindofprivatedata(e.g. auser’slocation)andthesinkpublishesthisinforma- tion(e.g.toawebserver).Thisisthedefinitionusedwhencheckingforprivacyleaks.However, taintanalysisisalsousedforidentifyingvulnerabilitiescomingfromunsanitizeduserinput.In thiscase,thesinkwouldbeavulnerablefunction,whichcouldbeafunctionformakingSQL Page6of94
Description: