ebook img

String Analysis for Software Verification and Security PDF

177 Pages·2017·8.23 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview String Analysis for Software Verification and Security

Tevfi k Bultan · Fang Yu Muath Alkhalaf · Abdulbaki Aydin String Analysis for Software Verifi cation and Security String Analysis for Software Verification and Security Tevfik Bultan • Fang Yu • Muath Alkhalaf Abdulbaki Aydin String Analysis for Software Verification and Security 123 TevfikBultan FangYu DepartmentofComputerScience DepartmentofManagement UniversityofCalifornia,SantaBarbara InformationSystems SantaBarbara,CA,USA NationalChenchiUniversity Taipei,Taiwan MuathAlkhalaf ComputerScienceDepartment AbdulbakiAydin KingSaudUniversity Microsoft(UnitedStates) Riyadh,SaudiArabia Redmond,WA,USA ISBN978-3-319-68668-4 ISBN978-3-319-68670-7 (eBook) https://doi.org/10.1007/978-3-319-68670-7 LibraryofCongressControlNumber:2017956344 ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface This monograph is mainly based on the research that has been conducted in the Verification Laboratory at the University of California, Santa Barbara, in the last decade.Stringanalysishasbeenaninterestingandfruitfulareatoworkon,leading to many research results some of which are discussed here. We observe that the researchinanalysisofstringmanipulatingcodeisexpandingduetotheimportance ofthecorrectnessofthestringmanipulationcodefordependabilityandsecurityof modern software systems. We hope that this monograph can inspire and motivate moreresearchinthisareaandacceleratethetransitionofstringanalysisresearchto practice. We would like to thank all current and past members of the Verification Laboratory for their help and support. We would also like to acknowledge the support provided by the National Science Foundation (NSF) and the Defense Advanced Research Projects Agency (DARPA) for the string analysis research at theVerificationLaboratory.1 SantaBarbara,CA,USA TevfikBultan Taipei,Taiwan FangYu Riyadh,SaudiArabia MuathAlkhalaf Redmond,WA,USA AbdulbakiAydin 1NSFundergrantsCCF0916112,CNS-1116967,CCF-1423623,andCCF-1548848,andDARPA under agreement number FA8750-15-2-0087. The U.S. Government is authorized to reproduce anddistributereprintsforGovernmentalpurposesnotwithstandinganycopyrightnotationthereon. Theviewsandconclusionscontainedhereinarethoseoftheauthorsandshouldnotbeinterpreted asnecessarilyrepresentingtheofficialpoliciesorendorsements,eitherexpressedorimplied,of DARPAortheU.S.Government. v Contents 1 Introduction................................................................. 1 1.1 CommonVulnerabilitiesDuetoStringManipulationErrors ...... 4 1.2 ExamplesofStringManipulatingCodeandErrors................. 6 1.3 Overview.............................................................. 12 2 StringManipulatingProgramsandDifficultyofTheirAnalysis ..... 15 2.1 ASimpleStringManipulationLanguage........................... 15 2.2 AutomatedandPreciseVerificationofStringPrograms IsNotPossible........................................................ 16 2.3 ARicherStringManipulationLanguage............................ 18 2.4 Summary.............................................................. 22 3 StateSpaceExploration ................................................... 23 3.1 SemanticsofStringManipulationLanguages ...................... 23 3.2 ExplicitStateSpaceExploration .................................... 26 3.2.1 ForwardReachability....................................... 27 3.2.2 BackwardReachability..................................... 28 3.3 SymbolicExploration................................................ 29 3.3.1 SymbolicReachability ..................................... 30 3.3.2 Fixpoints .................................................... 31 3.4 Summary.............................................................. 35 4 AutomataBasedStringAnalysis.......................................... 37 4.1 ALatticeforSetsofStrings ......................................... 37 4.2 SymbolicReachabilityAnalysiswithAutomata ................... 38 4.3 SymbolicAutomataRepresentation................................. 41 4.3.1 MTBDDRepresentation ................................... 43 4.3.2 ModelingNon-DeterminismUsingSymbolicDFA ...... 44 4.3.3 Symbolicvs.ExplicitDFA................................. 45 4.4 Post-ConditionComputation ........................................ 45 4.4.1 ConcatenationandReplaceOperations.................... 46 vii viii Contents 4.4.2 Post-ConditionofConcatenation .......................... 47 4.4.3 Post-ConditionofReplacement............................ 48 4.5 Pre-ConditionComputation ......................................... 51 4.5.1 Pre-ConditionofConcatenation ........................... 51 4.5.2 Pre-ConditionofReplacement............................. 53 4.6 Summary.............................................................. 55 5 RelationalStringAnalysis................................................. 57 5.1 RelationsAmongStringVariables .................................. 57 5.2 Multi-TrackDFAs.................................................... 59 5.3 RelationalStringAnalysis........................................... 59 5.3.1 WordEquations............................................. 60 5.4 ConstructionofMulti-TrackDFAsforBasicWordEquations .... 61 5.5 SymbolicReachabilityAnalysis..................................... 65 5.5.1 ForwardFixpointComputation............................ 66 5.5.2 Summarization.............................................. 68 5.6 Summary.............................................................. 68 6 AbstractionandApproximation.......................................... 69 6.1 RegularAbstraction ................................................. 69 6.2 AlphabetAbstraction ................................................ 70 6.3 RelationAbstraction ................................................. 74 6.4 ComposingAbstractions ............................................ 76 6.5 AutomataWideningOperation...................................... 77 6.6 Summary.............................................................. 80 7 Constraint-BasedStringAnalysis ........................................ 83 7.1 SymbolicExecutionwithStringConstraints ....................... 83 7.2 Automata-BasedStringConstraintSolving......................... 87 7.2.1 MappingConstraintstoAutomata......................... 87 7.3 RelationalConstraintSolvingwithMulti-TrackDFA.............. 93 7.3.1 RelationalStringConstraintSolving ...................... 94 7.3.2 RelationalIntegerConstraintSolving ..................... 95 7.3.3 MixedStringandIntegerConstraintSolving ............. 95 7.4 ModelCounting...................................................... 96 7.4.1 Automata-BasedModelCounting......................... 99 7.4.2 CountingAllSolutionswithinaGivenBound............ 101 7.5 Summary.............................................................. 102 8 VulnerabilityDetectionandSanitizationSynthesis..................... 103 8.1 VulnerabilityDetectionandRepair.................................. 104 8.2 PatchingAlgorithm .................................................. 111 8.3 VulnerabilityAnalysis ............................................... 112 8.4 VulnerabilitySignatures ............................................. 114 8.5 RelationalSignatures................................................. 116 8.6 SanitizationGeneration.............................................. 119 8.7 Summary.............................................................. 122 Contents ix 9 DifferentialStringAnalysisandRepair.................................. 123 9.1 FormalModelingofValidationandSanitizationFunctions........ 126 9.1.1 Post-andPre-ImageofaSanitizer......................... 130 9.2 DiscoveringClient-andServer-SideInputValidationand SanitizationInconsistencies.......................................... 133 9.2.1 Extracting and Mapping Input Validation and SanitizationFunctionsattheClient-andthe Server-Side.................................................. 133 9.2.2 InconsistencyIdentification................................ 134 9.3 SemanticDifferentialRepairforInputValidation andSanitization ...................................................... 136 9.3.1 DifferentialRepairProblem................................ 140 9.3.2 DifferentialRepairAlgorithm ............................. 141 9.4 Summary.............................................................. 147 10 Tools.......................................................................... 149 10.1 LIBSTRANGER ....................................................... 149 10.2 STRANGER ........................................................... 150 10.3 SEMREP .............................................................. 151 10.4 ABC................................................................... 153 11 ABriefSurveyofRelatedWork.......................................... 155 11.1 StringAnalysis ....................................................... 155 11.1.1 GrammarBasedStringAnalysis........................... 155 11.1.2 Automata-BasedStringAnalysis .......................... 157 11.1.3 HybridStringAnalysis..................................... 158 11.2 StringConstraintSolvers ............................................ 158 11.2.1 Automata-BasedSolvers................................... 159 11.2.2 BoundedSolvers............................................ 160 11.2.3 CombinationofTheories................................... 161 11.2.4 ModelCountingStringConstraintSolvers................ 161 11.3 BugandVulnerabilityDetectioninWebApplications............. 161 11.3.1 Client-SideAnalysis........................................ 162 11.3.2 Server-SideAnalysis ....................................... 162 11.4 DifferentialAnalysisandRepair .................................... 163 11.4.1 DifferentialAnalysis ....................................... 163 11.4.2 DifferentialRepair.......................................... 164 12 Conclusions.................................................................. 165 References......................................................................... 167 Chapter 1 Introduction In computing, a sequence of characters is called a string. For example, “Hello, World!” is a string (we use double quotes to indicate the beginning and end of the string). This particular string is very familiar to programmers since the first programming assignment in many programming textbooks is to write a program thatoutputsthestring“Hello,World!”Althoughprogrammersbecomefamiliarwith the creation and manipulation of strings very early on in their training, errors in string manipulating code is a major cause of software faults and vulnerabilities. This indicates that stringmanipulation is a challenging task for programmers, and automatedtechniquesforanalyzingstringmanipulatingcode,whichisthetopicof thismonograph,areverydesirable. Handling of strings in programming languages vary widely. In the C program- ming language, characters are a basic data type, but strings are not. Strings are represented as arrays of characters, which is a natural way to store strings. Since strings do not have a type dedicated to them, basic operations on strings (such as concatenating two strings) are not part of C, and are instead provided as library functions(suchasstrcat,strcpy,strcmp,strlen). In the Java programming language, strings are objects corresponding to the instances of String class. String operations are provided as the methods of theStringclass(suchasthelength,charatandconcatmethods)andthe String class is implemented by storing strings as arrays of characters. Java also providesastringconcatenationoperator(“+”). In the more recent JavaScript programming language, strings are supported as oneoftheprimitivedatatypes,andstringscanbeconstructedusingoperatorssuch asstringconcatenationoperator“+”.JavaScriptalsoprovidesasetofmethodsfor stringmanipulation(suchassubstring,indexOf,replace). We see that different languages treat strings differently. In general, support for string manipulation in programming languages has been increasing. This is due to increasing use of string manipulation in implementation of modern software ©SpringerInternationalPublishingAG2017 1 T.Bultanetal.,StringAnalysisforSoftwareVerificationandSecurity, https://doi.org/10.1007/978-3-319-68670-7_1 2 1 Introduction applications. Here are some common uses of string manipulation in modern softwaredevelopment: • Inputsanitizationandvalidation:Manymodernsoftwareapplicationsareweb- basedandtheuserinputstowebapplications typically comeinthestringform (for example a text field entered by the user). Since any user with an Internet connectioncanaccesswebapplicationsandinteractwiththem,webapplication developershavetoassumethattheuserinputcouldbemalicious.Therearewell- known cyber-attack techniques that involve a malicious user submitting hidden commandsviauserinputfields.Executionoftheseharmfulcommandscanlead tounauthorizedaccesstoorlossofdata.Topreventsuchscenarios,application developershavetoeitherrejecttheuserinputthatdoesnotfitexpectedpatterns (which is called input validation), or clean up the user input by removing unwanted characters (which is called input sanitization). Both input validation and sanitization involve string manipulation since user inputs are typically in stringform. • Query generation for back-end databases: Many modern software applications useaback-enddatabasetostoredata.Whenusersinteractwithmodernsoftware applications,userrequestsresultingenerationofaquerythatissenttoaback- end database. The user request triggers a piece of code that constructs the database query as a string. So, string manipulation is an essential part of query generation. • Generation of data formats such as XML, JSON, and HTML: Modern software applications typically use well-known data formats to store, exchange, or describedata.XMLandJSONaretwoofthewidelyuseddataformats.HTML is the format used to describe web documents and to display user input forms in web applications. Many modern applications dynamically create documents inXML,JSON,orHTMLformatduringprogramexecution.CreationofXML, JSON,orHTMLdocumentsinvolvemanipulationofstrings. • Dynamic code generation: Software applications are becoming increasingly dynamic. For example, many modern web applications dynamically generate code based on user requests. Web applications are multi-tiered systems where partofthecodeisexecutedonwebservers(thataretypicallyhostedincompute- clouds),andpartofthecodeisexecutedontheclientmachine.Itiscommonfor theserver-sidecodetogenerateclient-sidecodedynamicallyatruntime.Client- codeisgeneratedusingstringmanipulation. • Dynamic class loading and method invocation: Programming languages are alsobecomingincreasinglydynamic.ModernlanguagessuchasJavaScriptand PHP allow functions to be specified using string variables, where the invoked methods or loaded classes depend on the values of the string variables at runtime. Reflection in Java allows programs to load classes dynamically at runtime. Similarly, in Objective-C, one can load classes from string variables. Thisprovidesdeveloperspowerfulmeanstoadjustprogramexecutionsaccording to their runtime environment and status. However, since the loaded classes or invokedmethodsdependonthevaluesofstringvariablesatruntime,malicious

Description:
This book discusses automated string-analysis techniques, focusing particularly on automata-based static string analysis. It covers the following topics: automata-bases string analysis, computing pre and post-conditions of basic string operations using automata, symbolic representation of automata,
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.