ebook img

Static Program Analysis PDF

82 Pages·2017·0.463 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Static Program Analysis

Static Program Analysis Anders Møller and Michael I. Schwartzbach January23,2017 Copyright(cid:13)c 2008–2017AndersMøllerandMichaelI.Schwartzbach DepartmentofComputerScience AarhusUniversity,Denmark {amoeller,mis}@cs.au.dk Contents Preface iii 1 Introduction 1 1.1 UndecidabilityofStaticProgramAnalysis . . . . . . . . . . . . . 4 2 ATinyProgrammingLanguage 7 2.1 TheSyntaxofTIP . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 ExamplePrograms . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 ControlFlowGraphs . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 TypeAnalysis 13 3.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 TypeConstraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 SolvingConstraints . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 SlackandLimitations . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 LatticeTheory 19 4.1 Example: SignAnalysis . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 ConstructingLattices . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 EquationsandFixed-Points . . . . . . . . . . . . . . . . . . . . . 24 5 DataflowAnalysiswiththeMonotoneFramework 27 5.1 Fixed-PointAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Example: SignAnalysis,Revisited . . . . . . . . . . . . . . . . . 29 5.3 Example: Liveness . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.4 Example: AvailableExpressions . . . . . . . . . . . . . . . . . . . 34 5.5 Example: VeryBusyExpressions . . . . . . . . . . . . . . . . . . 38 5.6 Example: ReachingDefinitions . . . . . . . . . . . . . . . . . . . 39 5.7 Forwards,Backwards,May,andMust . . . . . . . . . . . . . . . 40 5.8 Example: InitializedVariables . . . . . . . . . . . . . . . . . . . . 41 i ii CONTENTS 5.9 Example: ConstantPropagation . . . . . . . . . . . . . . . . . . . 42 5.10 Example: IntervalAnalysis . . . . . . . . . . . . . . . . . . . . . 42 5.11 Widening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.12 Narrowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6 PathSensitivity 47 6.1 Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.2 BranchCorrelations . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7 InterproceduralAnalysis 55 7.1 InterproceduralControlFlowGraphs . . . . . . . . . . . . . . . 55 7.2 Example: InterproceduralSignAnalysis . . . . . . . . . . . . . . 57 8 ControlFlowAnalysis 59 8.1 ClosureAnalysisfortheλ-calculus . . . . . . . . . . . . . . . . . 59 8.2 TheCubicAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . 60 8.3 ControlFlowGraphsforFunctionPointers . . . . . . . . . . . . 61 8.4 ControlFlowinObjectOrientedLanguages . . . . . . . . . . . . 64 9 PointerAnalysis 67 9.1 Points-ToAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 9.2 Andersen’sAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . 68 9.3 Steensgaard’sAlgorithm . . . . . . . . . . . . . . . . . . . . . . . 69 9.4 InterproceduralPoints-ToAnalysis . . . . . . . . . . . . . . . . . 70 9.5 Example: NullPointerAnalysis . . . . . . . . . . . . . . . . . . . 71 9.6 Example: ShapeAnalysis . . . . . . . . . . . . . . . . . . . . . . 73 9.7 Example: EscapeAnalysis . . . . . . . . . . . . . . . . . . . . . . 76 Preface Staticprogramanalysisistheartofreasoningaboutthebehaviorofcomputer programswithoutactuallyrunningthem. Thisisusefulnotonlyinoptimizing compilers for producing efficient code but also for automatic error detection andothertoolsthatcanhelpprogrammers. Astaticprogramanalyzerisapro- gramthatreasonsaboutthebehaviorofotherprograms. Foranyoneinterested in programming, what can be more fun than writing programs that analyze programs? AsknownfromTuringandRice, allinterestingpropertiesofthebehavior of programs written in common programming languages are mathematically undecidable. Thismeansthatautomatedreasoningofsoftwaregenerallymust involveapproximation. Itisalsowellknownthattestingmayrevealerrorsbut generallycannotshowtheirabsence. Incontrast,staticprogramanalysiscan– withtherightkindofapproximations–checkallpossibleexecutionsofthepro- gramsandprovideguaranteesabouttheirproperties. Oneofthekeychallenges whendevelopingsuchanalysesishowtoensurehighprecisionandefficiency tobepracticallyuseful. These notes present principles and applications of static analysis of pro- grams. Wecoverbasictypeanalysis,latticetheory,controlflowgraphs,dataflow analysis,fixed-pointalgorithms,narrowingandwidening,path-sensitivity,in- terproceduralanalysisandcontext-sensitivity,controlflowanalysis,andpointer analysis. A tiny imperative programming language with heap pointers and functionpointersissubjectedtonumerousdifferentstaticanalysesillustrating thetechniquesthatarepresented. We emphasize a constraint-based approach to static analysis where suitable constraint systems conceptually divide the analysis task into a front-end that generates constraints from program code and a back-end that solves the con- straintstoproducetheanalysisresults. Thisapproachenablesseparatingthe analysisspecification,whichdeterminesitsprecision,fromthealgorithmicas- pects that are important for its performance. In practice when implementing analyses,weoftensolvetheconstraintson-the-fly,astheyaregenerated,with- outrepresentingthemexplicitly. iii iv Preface Wefocusonanalysesthatarefullyautomatic(i.e.,notinvolvingprogrammer guidance,forexampleintheformofloopinvariants)andconservative(usually meaning sound but incomplete), and we only consider Turing complete lan- guages(likemostprogramminglanguagesusedinordinarysoftwaredevelop- ment). Theanalysesthatwecoverareexpressedusingdifferentkindsofconstraint systems,eachwiththeirownconstraintsolvers: • termunificationconstraints,withanalmost-linearunion-findalgorithm, • conditionalsubsetconstraints,withacubicalgorithm,and • monotoneconstraintsoverlattices,withvariationsoffixpointsolvers. The style of presentation is intended to be precise but not overly formal. Thereadersareassumedtobefamiliarwithadvancedprogramminglanguage conceptsandthebasicsofcompilerconstruction. We will see the basic tools that are required to perform static analysis of programs. Real-lifeapplicationsinvariablygravitatebacktothetechniquesthat wewillcovered,thoughmanyvariationsandextensionsareusuallyrequired. Twomajorareaswillnotbeencoveredatall. Thequalityofananalysiscan only be measured relatively to a suite of intended applications. It is rare that competinganalysescanbeformallycompared,somuchworkinthisareaiscon- cernedwithperformingexperimentstoestablishtheprecisionandefficiencyof proposedanalyses. Thecorrectnessofananalysisrequiresaformalsemanticsof theunderlyingprogramminglanguage. Completelyformalproofsofcorrect- ness of analyses are exceedingly laborious and remain mostly academic exer- cises. Even so, it is often possible to provide convincing informal correctness arguments. The notes are accompanied by a web site that provides lecture slides, an implementation (in Scala) of most of the algorithms we cover, and additional exercises: http://cs.au.dk/˜amoeller/spa/ Chapter 1 Introduction Therearemanyinterestingquestionsthatcanbeaskedaboutagivenprogram, forexample: • doestheprogramterminateoneveryinput? • howlargecantheheapbecomeduringexecution? • doesthereexistaninputthatleadstoanullpointerdereference,division- by-zero,orarithmeticoverflow? • areallvariablesinitializedbeforetheyareread? • arearraysalwaysaccessedwithintheirbounds? • areallassertionsguaranteedtosucceed? • does the program contain dead code, or more specifically, is function f reachablefrommain? • doesthevalueofvariablexdependontheprograminput? • isitpossiblethatthevalueofxwillbereadinthefuture? • dopandqpointtodisjointstructuresintheheap? • canthere bedanglingreferences, e.g. pointerstomemorythat hasbeen freed? • areallresourcesproperlyreleasedbeforetheprogramterminates? Suchquestionsarisewhenreasoningaboutcorrectnessofprogramsandwhen optimizingprogramsforimprovingtheirperformance. Regardingcorrectness, programmersroutinelyusetestingtogainconfidencethattheirprogramsworks asintended,butasfamouslystatedbyDijkstra: “Programtestingcanbeusedto showthepresenceofbugs,butnevertoshowtheirabsence.” Ideallywewantguaran- teesaboutwhatourprogramsmaydoforallpossibleinputs,andwewantthese guaranteestobeprovidedautomatically,thatis,byprograms. Aprogramana- lyzerissuchaprogramthattakesotherprogramsasinputanddecideswhether ornottheyhaveagivenproperty. 2 1INTRODUCTION Rice’stheoremisageneralresultfrom1953thatinformallystatesthatallin- terestingquestionsaboutthebehaviorofprograms(writteninTuring-complete programming languages1) are undecidable. This is easily seen for any special case. Assumeforexampletheexistenceofananalyzerthatdecidesifavariable inaprogramhasaconstantvalue. Inotherwords,theanalyzerisaprogramA thattakesasinputaprogramT andoneofT’svariablesx,anddecideswhether ornotxhasaconstantvaluewheneverT isexecuted. A yes (T,x) Is the value of variable x in T always a constant when T is executed? no We could then exploit this analyzer to also decide the halting problem by using as input the following program where TM(j) simulates the j’th Turing machineonemptyinput: x = 17; if (TM(j)) x = 18; Here x has a constant value if and only if the j’th Turing machine does not haltonemptyinput. Ifthehypotheticalconstant-valueanalyzerAexists,then we have a decision procedure for the halting problem, which is known to be impossible. Thisseemslikeadiscouragingresult. However,ourrealgoalisnottodecide suchpropertiesbutrathertosolvepracticalproblemslikemakingtheprogram runfasteroruselessspace,orfindingbugsintheprogram. Thesolutionisto settleforapproximativeanswersthatarestillpreciseenoughtofuelourapplica- tions. Mostoften,suchapproximationsareconservative(orsafe),meaningthatall errorsleantothesameside,whichisdeterminedbyourintendedapplication. Consideragaintheproblemofdeterminingifavariablehasaconstantvalue. If our intended application is to perform constant propagation optimization, then the analysis may only answer yes if the variable really is a constant and must answer maybe if the variable may or may not be a constant. The trivial solutionisofcoursetoanswermaybeallthetime,sowearefacingtheengineer- ingchallengeofansweringyesasoftenaspossiblewhileobtainingareasonable analysisperformance. A yes, definitely! (T,x) Is the value of variable x in T always a constant when T is executed? maybe, don’t know 1Fromthispointon,weonlyconsiderTuringcompletelanguages. 3 A different example is the question: to which variables may the pointer p point? If our intended application is to replace *p with x in order to save a dereference operation, then the analysis may only answer “&x” if p certainly mustpointtoxandmustanswer“?” ifthisisfalseortheanswercannotbede- termined. Ifourintendedapplicationisinsteadtodeterminethemaximalsize of*p,thentheanalysismustreplywithapossiblytoolargeset{&x,&y,&z,...} thatisguaranteedtocontainalltargets. Ingeneral,alloptimizationapplicationsneedconservativeapproximations. Ifwearegivenfalseinformation,thentheoptimizationisunsoundandchanges thesemanticsoftheprogram. Conversely, ifwearegiventrivialinformation, thentheoptimizationfailstodoanything. Approximative answers may also be useful for finding bugs in programs, which may be viewed as a weak form of program verification. As a case in point,considerprogrammingwithpointersintheClanguage. Thisisfraught with dangers such as null dereferences, dangling pointers, leaking memory, andunintendedaliases. Ordinarycompilertechnologyofferslittleprotection from pointer errors. Consider the following small program which performs everykindoferrorifexecutedwithprecisely42arguments: int main(int argc, char *argv[]) { if (argc == 42) { char *p,*q; p = NULL; printf("%s",p); q = (char *)malloc(100); p = q; free(q); *p = ’x’; free(p); p = (char *)malloc(100); p = (char *)malloc(100); q = p; strcat(p,q); } } Thestandardtoolssuchasgcc -Wallandlintdetectnoerrors. Findingthe errors by testing might miss the errors, unless we happen to have a test case that runs the program with exactly 42 arguments. However, if we had even approximativeanswerstoquestionsaboutnullvaluesandpointertargets,then manyoftheaboveerrorscouldbecaughtstatically, withoutactuallyrunning theprogram. Exercise1.1: Describealltheerrorsintheaboveprogram. 4 1INTRODUCTION 1.1 Undecidability of Static Program Analysis (ThissectionrequiresfamiliaritywiththeconceptofuniversalTuringmachines; itisnotaprerequisiteforthefollowingchapters.) Thereductionfromthehaltingproblempresentedaboveshowsthatsome staticanalysisproblemsareundecidable,inparticulartheproblemofdeciding whetheragivenvariableinaprogramhasaconstantvalue. However, termi- nationisoftentheleastoftheconcernsprogrammershaveaboutwhethertheir programsworkcorrectly. Istheproblemofdecidingwhetheravariablehasa constantvaluestillundecidableifweareallowedtoassumethatourprograms alwaysterminate? Andwhataboutotherdesiredprogramproperties? Usingadiagonalizationargumentincombinationwiththeclassicproofof Rice’s theorem, we can once and for all show a very strong result: Every non- trivial2 propertyofthebehavior3 ofprogramsisundecidable,evenifweonlyconsider programsthatterminateoneveryinput. Inotherwords,everyattempttobuildafullyautomaticprogramanalyzer, thatis,aprogramthatcanchecknontrivialcorrectnesspropertiesoftheinput- outputbehaviorofotherprogramsisdoomedtofail,eveniftheprogramana- lyzerisallowedtoassumethatthegivenprogramalwaysterminates. Approx- imationisinevitable. If we model programs as deterministic Turing machines, it seems reason- able to define “correctness” as unreachability of a special fail state.4 That is, onagiveninput,aTuringmachinewilleventuallyterminateinitsacceptstate (intuitively returning “yes”), in its reject state (intuitively returning “no”‘), in itsfailstate(meaningthatthecorrectnessconditionhasbeenviolated),orthe machinediverges(i.e.,neverterminates). Wecanshowtheundecidabilityresultusinganelegantproofbycontradic- tion. Assume P is a program that can decide whether or not the fail state is reachableinagiventotalTuringmachine. (IftheinputtoP isnotatotalTuring machine,P’soutputisunspecified–weonlyrequireittocorrectlyanalyzeTur- ingmachinesthatalwaysterminate.) LetussaythatP haltsinitsacceptstate if and only if the fail state of the given Turing machine is unreachable, and it haltsintherejectstateotherwise. Thatis,intuitivelyP says“yes”ifthegiven Turingmachineworkscorrectly,and“no”otherwise. Ourgoalistoshowthat P cannotexist. IfP exists,thenwecanalsobuildanotherTuringmachine,letuscallitM, thattakesasinputtheencodinge(T)ofaTuringmachineT andthenbuildsthe encoding e(S ) of yet another Turing machine S , which behaves as follows: T T S isessentiallyauniversalTuringmachinethatisspecializedtosimulateT on T 2Apropertyisnontrivialifthereexistsatleastoneprogramthatsatisfiesthepropertyandat leastoneprogramthatdoesnot.Conversely,trivialpropertiescanbedecidedevenwithoutlooking attheinputprogram,andallthepropertieswecareaboutinpracticearenontrivial. 3By “behavior”of programs, we meantheir input-output semantics, in contrastto syntactic properties. 4Technically,wehererestrictourselvestosafetyproperties;livenesspropertiescanbeaddressed similarlyusingothermodelsofcomputability.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.