ebook img

String Analysis for Software Verification and Security PDF

175 Pages·2017·5.03 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview String Analysis for Software Verification and Security

Tevfik Bultan • Fang Yu • Muath Alkhalaf Abdulbaki Aydin String Analysis for Software Verification and Security 123 TevfikBultan FangYu DepartmentofComputerScience DepartmentofManagement UniversityofCalifornia,SantaBarbara InformationSystems SantaBarbara,CA,USA NationalChenchiUniversity Taipei,Taiwan MuathAlkhalaf ComputerScienceDepartment AbdulbakiAydin KingSaudUniversity Microsoft(UnitedStates) Riyadh,SaudiArabia Redmond,WA,USA ISBN978-3-319-68668-4 ISBN978-3-319-68670-7 (eBook) https://doi.org/10.1007/978-3-319-68670-7 LibraryofCongressControlNumber:2017956344 ©SpringerInternationalPublishingAG2017 Preface This monograph is mainly based on the research that has been conducted in the Verification Laboratory at the University of California, Santa Barbara, in the last decade.Stringanalysishasbeenaninterestingandfruitfulareatoworkon,leading to many research results some of which are discussed here. We observe that the researchinanalysisofstringmanipulatingcodeisexpandingduetotheimportance ofthecorrectnessofthestringmanipulationcodefordependabilityandsecurityof modern software systems. We hope that this monograph can inspire and motivate moreresearchinthisareaandacceleratethetransitionofstringanalysisresearchto practice. We would like to thank all current and past members of the Verification Laboratory for their help and support. We would also like to acknowledge the support provided by the National Science Foundation (NSF) and the Defense Advanced Research Projects Agency (DARPA) for the string analysis research at theVerificationLaboratory.1 SantaBarbara,CA,USA TevfikBultan Taipei,Taiwan FangYu Riyadh,SaudiArabia MuathAlkhalaf Redmond,WA,USA AbdulbakiAydin 1NSFundergrantsCCF0916112,CNS-1116967,CCF-1423623,andCCF-1548848,andDARPA under agreement number FA8750-15-2-0087. The U.S. Government is authorized to reproduce anddistributereprintsforGovernmentalpurposesnotwithstandinganycopyrightnotationthereon. Theviewsandconclusionscontainedhereinarethoseoftheauthorsandshouldnotbeinterpreted asnecessarilyrepresentingtheofficialpoliciesorendorsements,eitherexpressedorimplied,of DARPAortheU.S.Government. Contents 1 Introduction................................................................. 1 1.1 CommonVulnerabilitiesDuetoStringManipulationErrors ...... 4 1.2 ExamplesofStringManipulatingCodeandErrors................. 6 1.3 Overview.............................................................. 12 2 StringManipulatingProgramsandDifficultyofTheirAnalysis ..... 15 2.1 ASimpleStringManipulationLanguage........................... 15 2.2 AutomatedandPreciseVerificationofStringPrograms IsNotPossible........................................................ 16 2.3 ARicherStringManipulationLanguage............................ 18 2.4 Summary.............................................................. 22 3 StateSpaceExploration ................................................... 23 3.1 SemanticsofStringManipulationLanguages ...................... 23 3.2 ExplicitStateSpaceExploration .................................... 26 3.2.1 ForwardReachability....................................... 27 3.2.2 BackwardReachability..................................... 28 3.3 SymbolicExploration................................................ 29 3.3.1 SymbolicReachability ..................................... 30 3.3.2 Fixpoints .................................................... 31 3.4 Summary.............................................................. 35 4 AutomataBasedStringAnalysis.......................................... 37 4.1 ALatticeforSetsofStrings ......................................... 37 4.2 SymbolicReachabilityAnalysiswithAutomata ................... 38 4.3 SymbolicAutomataRepresentation................................. 41 4.3.1 MTBDDRepresentation ................................... 43 4.3.2 ModelingNon-DeterminismUsingSymbolicDFA ...... 44 4.3.3 Symbolicvs.ExplicitDFA................................. 45 4.4 Post-ConditionComputation ........................................ 45 4.4.1 ConcatenationandReplaceOperations.................... 46 4.4.2 Post-ConditionofConcatenation .......................... 47 4.4.3 Post-ConditionofReplacement............................ 48 4.5 Pre-ConditionComputation ......................................... 51 4.5.1 Pre-ConditionofConcatenation ........................... 51 4.5.2 Pre-ConditionofReplacement............................. 53 4.6 Summary.............................................................. 55 5 RelationalStringAnalysis................................................. 57 5.1 RelationsAmongStringVariables .................................. 57 5.2 Multi-TrackDFAs.................................................... 59 5.3 RelationalStringAnalysis........................................... 59 5.3.1 WordEquations............................................. 60 5.4 ConstructionofMulti-TrackDFAsforBasicWordEquations .... 61 5.5 SymbolicReachabilityAnalysis..................................... 65 5.5.1 ForwardFixpointComputation............................ 66 5.5.2 Summarization.............................................. 68 5.6 Summary.............................................................. 68 6 AbstractionandApproximation.......................................... 69 6.1 RegularAbstraction ................................................. 69 6.2 AlphabetAbstraction ................................................ 70 6.3 RelationAbstraction ................................................. 74 6.4 ComposingAbstractions ............................................ 76 6.5 AutomataWideningOperation...................................... 77 6.6 Summary.............................................................. 80 7 Constraint-BasedStringAnalysis ........................................ 83 7.1 SymbolicExecutionwithStringConstraints ....................... 83 7.2 Automata-BasedStringConstraintSolving......................... 87 7.2.1 MappingConstraintstoAutomata......................... 87 7.3 RelationalConstraintSolvingwithMulti-TrackDFA.............. 93 7.3.1 RelationalStringConstraintSolving ...................... 94 7.3.2 RelationalIntegerConstraintSolving ..................... 95 7.3.3 MixedStringandIntegerConstraintSolving ............. 95 7.4 ModelCounting...................................................... 96 7.4.1 Automata-BasedModelCounting......................... 99 7.4.2 CountingAllSolutionswithinaGivenBound............ 101 7.5 Summary.............................................................. 102 8 VulnerabilityDetectionandSanitizationSynthesis..................... 103 8.1 VulnerabilityDetectionandRepair.................................. 104 8.2 PatchingAlgorithm .................................................. 111 8.3 VulnerabilityAnalysis ............................................... 112 8.4 VulnerabilitySignatures ............................................. 114 8.5 RelationalSignatures................................................. 116 8.6 SanitizationGeneration.............................................. 119 8.7 Summary.............................................................. 122 9 DifferentialStringAnalysisandRepair.................................. 123 9.1 FormalModelingofValidationandSanitizationFunctions........ 126 9.1.1 Post-andPre-ImageofaSanitizer......................... 130 9.2 DiscoveringClient-andServer-SideInputValidationand SanitizationInconsistencies.......................................... 133 9.2.1 Extracting and Mapping Input Validation and SanitizationFunctionsattheClient-andthe Server-Side.................................................. 133 9.2.2 InconsistencyIdentification................................ 134 9.3 SemanticDifferentialRepairforInputValidation andSanitization ...................................................... 136 9.3.1 DifferentialRepairProblem................................ 140 9.3.2 DifferentialRepairAlgorithm ............................. 141 9.4 Summary.............................................................. 147 10 Tools.......................................................................... 149 10.1 LIBSTRANGER ....................................................... 149 10.2 STRANGER ........................................................... 150 10.3 SEMREP .............................................................. 151 10.4 ABC................................................................... 153 11 ABriefSurveyofRelatedWork.......................................... 155 11.1 StringAnalysis ....................................................... 155 11.1.1 GrammarBasedStringAnalysis........................... 155 11.1.2 Automata-BasedStringAnalysis .......................... 157 11.1.3 HybridStringAnalysis..................................... 158 11.2 StringConstraintSolvers ............................................ 158 11.2.1 Automata-BasedSolvers................................... 159 11.2.2 BoundedSolvers............................................ 160 11.2.3 CombinationofTheories................................... 161 11.2.4 ModelCountingStringConstraintSolvers................ 161 11.3 BugandVulnerabilityDetectioninWebApplications............. 161 11.3.1 Client-SideAnalysis........................................ 162 11.3.2 Server-SideAnalysis ....................................... 162 11.4 DifferentialAnalysisandRepair .................................... 163 11.4.1 DifferentialAnalysis ....................................... 163 11.4.2 DifferentialRepair.......................................... 164 12 Conclusions.................................................................. 165 References......................................................................... 167 Chapter 1 Introduction In computing, a sequence of characters is called a string. For example, “Hello, World!” is a string (we use double quotes to indicate the beginning and end of the string). This particular string is very familiar to programmers since the first programming assignment in many programming textbooks is to write a program thatoutputsthestring“Hello,World!”Althoughprogrammersbecomefamiliarwith the creation and manipulation of strings very early on in their training, errors in string manipulating code is a major cause of software faults and vulnerabilities. This indicates that stringmanipulation is a challenging task for programmers, and automatedtechniquesforanalyzingstringmanipulatingcode,whichisthetopicof thismonograph,areverydesirable. Handling of strings in programming languages vary widely. In the C program- ming language, characters are a basic data type, but strings are not. Strings are represented as arrays of characters, which is a natural way to store strings. Since strings do not have a type dedicated to them, basic operations on strings (such as concatenating two strings) are not part of C, and are instead provided as library functions(suchasstrcat,strcpy,strcmp,strlen). In the Java programming language, strings are objects corresponding to the instances of String class. String operations are provided as the methods of theStringclass(suchasthelength,charatandconcatmethods)andthe String class is implemented by storing strings as arrays of characters. Java also providesastringconcatenationoperator(“+”). In the more recent JavaScript programming language, strings are supported as oneoftheprimitivedatatypes,andstringscanbeconstructedusingoperatorssuch asstringconcatenationoperator“+”.JavaScriptalsoprovidesasetofmethodsfor stringmanipulation(suchassubstring,indexOf,replace). We see that different languages treat strings differently. In general, support for string manipulation in programming languages has been increasing. This is due to increasing use of string manipulation in implementation of modern software 2 1 Introduction applications. Here are some common uses of string manipulation in modern softwaredevelopment: • Inputsanitizationandvalidation:Manymodernsoftwareapplicationsareweb- basedandtheuserinputstowebapplications typically comeinthestringform (for example a text field entered by the user). Since any user with an Internet connectioncanaccesswebapplicationsandinteractwiththem,webapplication developershavetoassumethattheuserinputcouldbemalicious.Therearewell- known cyber-attack techniques that involve a malicious user submitting hidden commandsviauserinputfields.Executionoftheseharmfulcommandscanlead tounauthorizedaccesstoorlossofdata.Topreventsuchscenarios,application developershavetoeitherrejecttheuserinputthatdoesnotfitexpectedpatterns (which is called input validation), or clean up the user input by removing unwanted characters (which is called input sanitization). Both input validation and sanitization involve string manipulation since user inputs are typically in stringform. • Query generation for back-end databases: Many modern software applications useaback-enddatabasetostoredata.Whenusersinteractwithmodernsoftware applications,userrequestsresultingenerationofaquerythatissenttoaback- end database. The user request triggers a piece of code that constructs the database query as a string. So, string manipulation is an essential part of query generation. • Generation of data formats such as XML, JSON, and HTML: Modern software applications typically use well-known data formats to store, exchange, or describedata.XMLandJSONaretwoofthewidelyuseddataformats.HTML is the format used to describe web documents and to display user input forms in web applications. Many modern applications dynamically create documents inXML,JSON,orHTMLformatduringprogramexecution.CreationofXML, JSON,orHTMLdocumentsinvolvemanipulationofstrings. • Dynamic code generation: Software applications are becoming increasingly dynamic. For example, many modern web applications dynamically generate code based on user requests. Web applications are multi-tiered systems where partofthecodeisexecutedonwebservers(thataretypicallyhostedincompute- clouds),andpartofthecodeisexecutedontheclientmachine.Itiscommonfor theserver-sidecodetogenerateclient-sidecodedynamicallyatruntime.Client- codeisgeneratedusingstringmanipulation. • Dynamic class loading and method invocation: Programming languages are alsobecomingincreasinglydynamic.ModernlanguagessuchasJavaScriptand PHP allow functions to be specified using string variables, where the invoked methods or loaded classes depend on the values of the string variables at runtime. Reflection in Java allows programs to load classes dynamically at runtime. Similarly, in Objective-C, one can load classes from string variables. Thisprovidesdeveloperspowerfulmeanstoadjustprogramexecutionsaccording to their runtime environment and status. However, since the loaded classes or invokedmethodsdependonthevaluesofstringvariablesatruntime,malicious 1 Introduction 3 developerscouldmanipulatestringvaluestoobfuscatetheprogrambehavior,and preventstaticdetectionofmaliciousbehaviorssuchasaccessingsensitive/private APIsinAndroid/iOSmobileapplications. Due to extended use of strings in modern software development, errors in string manipulating code or maliciously written string manipulating code can have disastrous effects. It would be very helpful for developers to be able to automatically check if the string manipulation code works correctly with respect totheirexpectations.Thisisthecoreproblemthatstringanalysisaddresses.String analysis is a static program analysis technique that determines the values that a stringexpressioncantakeduringprogramexecutionatagivenprogrampoint.String analysiscanbeusedtosolvemanyproblemsinmodernsoftwaresystemsthatrelate tostringmanipulation,suchas: • String analysis can be used to identify security vulnerabilities by checking if a securitysensitivefunctioncanreceiveaninputstringthatcontainsanexploit[78, 119,130,132].Thistypeofvulnerabilitiesarecommoninwebapplicationswhen userinputisnotadequatelyvalidatedorsanitized. • Moderndynamiclanguagesenableexecutionofdynamicallygeneratedcode.For example, in JavaScript, the eval function can be used to execute dynamically generate code. The eval function takes a string value as an argument and executestheJavaScriptexpression,variable,statement,orsequenceofstatements thatisgivenastheargument.So,inordertoanalyzethebehaviorofaJavaScript programthatusestheevalfunction,wefirstneedtounderstandthesetofstring values thatcan bepassedtothe evalfunction asanargument. Stringanalysis canbeusedforthispurpose[57]. • For applications that dynamically generate data in XML, JSON or HTML format,stringanalysiscanbeusedtoidentifyformattingerrorsinthegenerated documents, which would then identify bugs in the XML, JSON or HTML generatingcode[78]. • Forapplicationsthatdynamicallygeneratequeriesforaback-enddatabase,string analysis can be used to identify the set of queries that are sent to the back- end database by analyzing the code that generates the SQL queries. This can be used to identify vulnerabilities, or to generate test queries for the back-end database[118,119]. • For web applications where server-side code dynamically generates client-side code, string analysis can be used to determine the client-side code that can be generated[82,83],andthiscanbeusedtoanalyzeandfindpotentialproblemsin thegeneratedclient-sidecode. • Stringanalysiscanalsobeusedforautomaticallyrepairingfaultystringmanip- ulation code. For example, input validation and sanitization functions can be repaired automatically by identifying the set of input values that can cause a vulnerability[128]. • Forprogramsthathaveclassesloadedfromstrings,stringanalysiscanbeusedin advance to find all potential values of loaded classes or invoked methods, for example, for taming reflection in Java programs [22, 72]. Particularly, string 4 1 Introduction analysis can be used to analyze complex string manipulation to improve the precisionofstatictaint/flowanalysis. Likemanyotherprogram-analysisproblems,itisnotpossibletosolvethestring analysis problem precisely (i.e., it is not possible to precisely determine the set of string values that can reach a program point). However, one can compute over- or under-approximations of possible string values. If the approximations are precise enough, they can enable us to demonstrate existence or absence of bugs and vulnerabilities in string manipulating code. String analysis has been an active research area in the last decade, resulting in a wide variety of string- analysistechniquessuchas,grammar-basedstringanalysis[63,78],automata-based symbolic string analysis [26, 34, 55, 113, 114], string constraint solving [3, 12, 15, 71, 93, 112], string abstractions [21, 131], relational string analysis [133], vulnerability detection using string analysis [100, 119, 129], differential string analysis[5,7],andautomatedrepairusingstringanalysis[5,128]. 1.1 CommonVulnerabilitiesDuetoStringManipulation Errors Let us discuss the security vulnerabilities related to string manipulation errors further. Security vulnerabilities in web applications are common. Ubiquity and global accessibility of web applications make this a critical problem. Malicious users all around the world can exploit vulnerable web applications resulting in unauthorized access to private data or loss of critical information. Since web applications are continuously in use without any available downtime for manual analysisorrepair,vulnerabilitiesinwebapplicationsshouldnotonlybediscovered fast, but should also be repaired fast. String analysis techniques can be used to addressthisproblem. TheCommonVulnerabilitiesandExposures(CVE)repositorykeepstrackofall reported computer security vulnerabilities [30]. According to the CVE repository, web application vulnerabilities form a significant portion of all reported computer securityvulnerabilities.Twoofthemostcommonvulnerabilitiesinwebapplications are:Cross-siteScripting(XSS)andSQLInjection(SQLI). A XSS vulnerability results from the application inserting part of a malicious user’s input in an HTML page that it renders. For example, a malicious user can attack a web forum application by posting a comment that contains a link that contains an executable script (such as JavaScript code) within the text of the comment.Theattackoccurs,if,whileviewingthecomment,anotheruserclickson the link that contains the malicious code. The victim’s browser will then execute themaliciousJavaScriptcodeandthiscanresultinstealingofbrowsercookiesand othersensitivedata. AnSQLInjectionvulnerability,ontheotherhand,resultsfromtheapplication’s use of user input in constructing database queries. The attacker can invoke the applicationwithamaliciousinputthatbecomespartofanSQLcommandthatthe

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.