Structural Analysis: Shape Information via Points-To Computation MarkMarron IMDEASoftwareInstitute,[email protected] 2 Abstract. Thispaperintroducesanewhybridmemoryanalysis,StructuralAnal- 1 0 ysis,whichcombinesanexpressiveshapeanalysisstyleabstractdomainwith 2 efficientandsimplepoints-tostyletransferfunctions.Usingdatafromempiri- calstudiesontheruntimeheapstructuresandtheprogrammaticidiomsusedin n modernobject-orientedlanguagesweconstructaheapanalysiswiththefollowing a J characteristics:(1)itcanexpressarichsetofstructural,shape,andsharingprop- ertieswhicharenotprovidedbyaclassicpoints-toanalysisandthatareusefulfor 5 optimizationanderrordetectionapplications(2)itusesefficient,weakly-updating, ] set-based transfer functions which enable the analysis to be more robust and L scalablethanashapeanalysisand(3)itcanbeusedasthebasisforascalable P interproceduralanalysisthatproducespreciseresultsinpractice. . Theanalysishasbeenimplementedfor.Netbytecodeandusingthisimplemen- s c tation we evaluate both the runtime cost and the precision of the results on a [ numberofwellknownbenchmarksandrealworldprograms.Ourexperimental evaluationsshowthatthedomaindefinedinthispaperiscapableofprecisely 1 expressingthemajorityoftheconnectivity,shape,andsharingpropertiesthat v occurinpracticeand,despitetheuseofweakupdates,thestaticanalysisisable 7 7 topreciselyapproximatetheidealresults.Theanalysisiscapableofanalyzing 2 largereal-worldprograms(over30Kbytecodes)inlessthan65secondsandusing 1 lessthan130MBofmemory.Insummaryhisworkpresentsanewtypeofmem- . oryanalysisthatadvancesthestateoftheartwithrespecttoexpressivepower, 1 precision,andscalabilityandrepresentsanewareaofstudyontherelationships 0 2 betweenandcombinationofconceptsfromshapeandpoints-toanalyses. 1 : v 1 Introduction i X r Techniquesforanalyzingthememorystructurescreatedandoperatedonbyaprogram a havegenerallyfallenintotwofamilies:Points-To(Alias)AnalysisandShapeAnalysis. These approaches lie at far ends of the spectrum of analysis cost and precision. In particularpoints-toanalysestrackverysimpleproperties,oftenlittlemorethanpoints-to setinformation,andtheabstracttransferfunctions,whichsimulatetheeffectsofvarious program statements, use simple and efficient set operations. At the other end of the spectrum, shape analyses track a range of rich heap properties and generally utilize computationallycomplextransferfunctions,involvingmaterializationoperations,case splitting,andstrongupdates[36,12,37,43,11].Whileindividuallyeachoftheseareas hasseenintensiveresearch,theconstructionofanalysistechniquesthatcombinethe expressivenessofshapestyleheapdomainswiththesimplicityandefficiencyofpoints-to styletransferfunctionsisanopenproblem.Amajorchallengeinconstructingsucha hybridanalysisisthequestionof:Arestrongupdatesafundamentalcomponentofa shapestyleanalysisorisitpossibletocomputepreciseshape,sharing,etc.information withananalysisthatusessimplerandmoreefficientpoints-tostyletransferfunctions? Recentempiricalworkonthestructureandbehavioroftheheapinmodernobject- oriented programs has shed light on how heap structures are constructed [41,3], the configurationofthepointersandobjectsinthem[5],andtheirinvariantstructuralproper- ties[31,2,1].Theseresultsaffirmseveralcommonassumptionsabouthowobject-oriented programsaredesignedandhowtheheapstructuresinthembehave.Inparticular[41,3,5] demonstratethatobject-orientedprogramsexhibitextensivemostly-functionalbehaviors: makingextensiveuseoffinal(orquiescing)fields,stationaryfields,copyconstruction, and when fields are updated the new target is frequently a newer (often freshly allo- cated)object.Theresultsin[31,2,1]provideinsightintowhatheuristicscanbeused toeffectivelygroupsectionsoftheheapbasedonhowtheyareusedintheprogram, howprevalenttheuseoflibrarycontainersis,andwhatsortsstructuresarebuilt.The resultsshowthatinpracticeobject-orientedprogramstendtoorganizeobjectsonthe heapintowelldefinedgroupsbasedontheirrolesintheprogram,theyavoidtheuseof linkedpointerstructuresinfavororlibraryprovidedcontainers,andthatconnectivityand sharingpropertiesbetweengroupsofobjectsarerelativelysimpleandstablethroughout theexecutionoftheprogram. The information in these empirical studies provide the central design principles thatguidetheconstructionoftheheapanalysisinthispaper.Theprevalenceofmostly functionalbehaviorimpliesthatthedomainandtransferfunctionscan,generally,handle writes as weak updates without large precision losses. However, to precisely handle object initialization and the frequent case of updating a field to point to a newly (or veryrecently)allocatedobject,thedomainshouldmodelsuchobjectswithextracare. The extensive use of standard collections and libraries implies that by specializing theanalysistohandlethesecollectionsprecisely,asin[10,32],alargeportionofthe potentiallycomplexpointerandindexingoperationsthatwouldotherwisedependonthe analysisperformingstrongupdatescanbeeliminated.Finally,giventhatobject-oriented programsarenotcompletelyfunctional,therewillbecaseswherethesimplifiedabstract transferfunctionsintroduceimprecision.Thus,theabstractheapdomainshouldprovide strongdisjointnessandisolationpropertiesbetweenthevariouspartsoftheheap.These propertiesservetobothminimizetheimpactofanyimprecisionthatisintroducedand topreventcascadingofthisimprecision. TheStructuralAnalysisabstractdomain(section2)isbasedontheclassicstorage shape graph approach and is able to express a rich set of commonly occurring and generallyusefulpropertiesincluding,structureidentification,connectivity,sharing,and shape.Additionally,duetotheimplicitdisjointnessinformationinthegraphstructure, theresultingabstractheapmodelpossessstrongseparabilityandisolationcharacteristics thatlimitthepropagationofimprecision.Thenormalform(section3)isdefinedinterms of an efficient congruence closure computation, O((N+E)∗log(N)) where N is the number of nodes in the shape graph and E is the number of edges. This congruence relation is based on the structures identified in the empirical studies and enables the analysis to rapidly converge to a fixpoint without either a large loss of information 2 on the domain properties of interest or the generation of large amounts of irrelevant detail. The points-to style transfer functions (section 5) are based on set-operations and weak updates. In practice they precisely model the heap properties of interest andareefficientlycomputable,O(N+E)worstcasebutinpracticearenearconstant time. In order to quantify the performance and precision of this analysis we present an extensive experimental evaluation (section 6) of several well known benchmarks includingprogramsfromSPECJVM98andDaCapo.Thisevaluationincludesboththe timingandmemoryusecharacteristicsoftheanalysisaswellasarigorousevaluationof theprecisionoftheresults. PracticalContribution. Thepracticalcontributionofthispaperistheconstructionof anovelstaticheapanalysis,StructuralAnalysis,thatcombinesarichshapeanalysis styleabstractheapmodelwithefficientlycomputablepoints-toanalysisstyleabstract transferfunctions.Ourexperimentalevaluationsshowthatthedomaindefinedinthis paper is capable of precisely expressing the majority of the connectivity, shape, and sharingpropertiesthatoccurinpractice.Despitetheuseofweakupdatesandtheabsence ofcasesplitting/materializationthestaticanalysisisabletoprecisely(witharateof 80-90%)approximatetheidealresults.Thememoryanalysisis,inconjunctionwith theinterproceduralanalysisin[29],capableofanalyzingrealworldprogramsofupto 30Kbytecodes,whicharebeyondthecapabilitiesofexistingshapeanalyses,andnever requiresmorethan65secondsor130MBofmemory. Theoretical Contribution. The theoretical contribution of the paper is an answer to thequestionofthenecessityofstrongupdatesvs.thesufficiencyofweakupdatesin computingshapeandsharinginformation.Theresultsinthispapershowthat,despite previous experience suggesting otherwise [12,7], strong updates and the associated machineryarenotcriticalinpractice,andthatweakupdatesaresufficientforcomputing large amounts of useful shape and sharing information in real world object-oriented programs.Thisconclusionisreachedviaexperimentalevaluationwiththeheapanalysis constructedinthispaper,StructuralAnalysis,andananalysisofotherrecentempirical research[2,1,41,3,5].Thus,thisworkopensnewpossibilitiesforexploringtherelation- shipsbetweenshapeandpoints-toanalysesandrepresentsanewapproachtobuilding scalableandprecisememoryanalysistools. 2 AbstractHeapDomain Webeginbyformalizingconcreteprogramheapsandtherelevantpropertiesthatwillbe capturedbytheabstraction.Later,wedefinetheabstractheapandformallyrelatethe abstractiontotheconcreteheapsusingaconcretization(γ)functionfromtheframework ofabstractinterpretation[8,34].Thesedefinitionsaredesignedtosupporttheexpression of a range of generally useful properties (e.g., shape, sharing, reachability) that are common in shape analysis [12,7,30] and that are useful for a wide range of client optimizationanderrordetectionapplications. 3 2.1 ConcreteHeaps Thestateofaconcreteprogramismodeledinastandardwaywherethereisanenviron- ment,mappingvariablestoaddresses,andastore,mappingaddressestoobjects.We refertoaninstanceofanenvironmenttogetherwithastoreandasetofobjectsasa concreteheap.Givenaprogramthatdefinesasetofconcretetypes,Types,andasetof fields(andarrayindices),Labels,aconcreteheapisatuple(Env,σ,Ob)where: Env∈Environment =Vars(cid:42)Addresses σ ∈Store =Addresses→Objects∪{null} Ob∈2Objects Objects=OID×Types×(Labels(cid:42)Addresses) wheretheobjectidentifiersetOID=N EachobjectointhesetObisatupleconsistingofauniqueidentifierfortheobject,the typeoftheobject,andamapfromfieldlabelstoconcreteaddressesforthefieldsdefined intheobject.WeassumethattheobjectsinObandthevariablesintheenvironmentEnv, aswellasthevaluesstoredinthem,arewelltypedaccordingtothestore(σ)andthe setsTypesandLabels. InthefollowingdefinitionsweusethenotationTy(o)torefertothetypeofagiven object.Theusualnotationo.l toreferstothevalueofthefield(orarrayindex)l inthe object.Itisalsousefultobeabletorefertoanon-nullpointerasaspecificstructurein anumberofdefinitions.Thereforewedefineanon-nullpointer passociatedwithan objectoandalabelasl inaspecificconcreteheap,(Env,σ,Ob),as p=(o,l,σ(o.l)) whereσ(o.l)(cid:54)=null.WedefineahelperfunctionFld:Types(cid:55)→2Labelstogetthesetof allfieldsthataredefinedforagiventype(orarrayindicesforanarraytype). In the context of a specific concrete heap, (Env,σ,Ob), a region of memory is a subsetofconcreteheapobjectsC⊆Ob.ItisusefultodefinethesetP(C ,C ,σ)ofall 1 2 non-nullpointerscrossingfromregionC toregionC as: 1 2 P(C ,C ,σ)={(o ,l,σ(o .l))|∃o ∈C ,l∈Fld(Ty(o )).σ(o .l)∈C } 1 2 s s s 1 s s 2 Injectivity. GiventwodisjointregionsC andC intheheap,(Env,σ,Ob),thenon-null 1 2 pointerswiththelabell fromC toC areinjective,writteninj(C ,C ,l,σ),ifforall 1 2 1 2 pairsofnon-nullpointers(o ,l,o)and(o(cid:48),l,o(cid:48))drawnfromP(C ,C ,σ),o (cid:54)=o(cid:48) ⇒ s t s t 1 2 s s o (cid:54)=o(cid:48).Asaspecialcasewhenwehaveanarrayobject,wesaythenon-nullpointer t t set P(C ,C ,σ) is array injective, written, inj (C ,C ,σ), if for all pairs of non-null 1 2 [] 1 2 pointers (o ,i,o) and (o(cid:48),j,o(cid:48)) drawn from P(C ,C ,σ) and i, j valid array indices, s t s t 1 2 i(cid:54)= j⇒o (cid:54)=o(cid:48). t t Thesedefinitionscapturethegeneralcaseofaninjectiverelationbeingdefinedfrom asetofobjectsandfieldstotargetobjects.Theyalsocapturethespecial,butimportant caseofarrayswhereeachindexinanarraycontainsapointertoadistinctobject. Shape. Wecharacterizetheshapeofregionsofmemoryusingstandardgraphtheoretic notionsoftreesandgeneralgraphstreatingtheobjectsasverticesinagraphandthe non-nullpointersasdefiningthe(labeled)edgeset.Wenotethatinthisstyleofdefinition 4 thesetofgraphsthataretreesisasubsetofthesetofofgeneralgraphs.Givenaregion Cintheconcreteheap(Env,σ,Ob): – Thepredicateany(C)istrueforanygraph.Weuseitasthemostgeneralshapethat doesn’tsatisfyamorerestrictiveproperty. – Thepredicatetree(C)holdsifthesubgraph(C,P(C,C,σ))isacyclicanddoesnot containanypointersthatcreatecrossedges. – Thepredicatenone(C)holdsiftheedgesetinthesubgraphisempty,P(C,C,σ)=0/. 2.2 AbstractHeap Anabstractheapisaninstanceofastorageshapegraph[7].Moreprecisely,anabstract heapgraphisatuple:(E(cid:100)nv,σ(cid:98),O(cid:99)b)where: (cid:92) E(cid:100)nv∈Environments =Vars(cid:42)Addresses σ(cid:98)∈Stores =Ad(cid:92)dresses→Inj×2Nodes wheretheinjectivityvaluesInj={true,false} O(cid:99)b∈Heaps =2Nodes Nodes=NID×2Types×Sh×(L(cid:92)abels(cid:42)Ad(cid:92)dresses) wheretheshapevalues,Sh={none,tree,any} andthenodeidentifiersetNID=N Theabstractstore(σ(cid:98))mapsfromabstractaddressestotuplesconsistingoftheinjectivity associatedwiththeabstractaddressandasetoftargetnodes.Eachnodenintheset O(cid:99)bisatupleconsistingofauniqueidentifierforthenode,asetoftypes,ashapetag, andamapfromabstractlabelstoabstractaddresses.Theuseofaninfinitesetofnode identitytags,NID,allowsforanunboundednumberofnodesassociatedwithagiven type/allocationcontextallowingthelocalanalysistopreciselyrepresentfreshlyallocated objectsforaslongastheyappeartobeofspecialinterestintheprogram(asdefinedvia thenormalform,section3,andusedinthetransferfunctions,section5)Theabstract (cid:92) labels(Labels)arethefieldlabelsandthespeciallabel[].Thespeciallabel[]abstracts the indices of all array elements (i.e., array smashing). Otherwise an abstract label(cid:98)l representstheobjectfieldwiththegivenname. AswiththeobjectsweintroducethenotationT(cid:99)y(n)torefertothetypesetassociated withanode.ThenotationS(cid:99)h(n)isusedtorefertotheshapeproperty,andtheusualn.(cid:98)l notationtorefertotheabstractvalueassociatedwiththelabel(cid:98)l.Sincetheabstractstore (σ(cid:98))nowmapstotuplesofinjectivityandnodetargetinformationweusethenotation (cid:91) I(cid:99)nj(σ(cid:98)((cid:98)a)) to refer to the injectivity and Trgts(σ(cid:98)((cid:98)a)) to refer to the set of possible abstractnodetargetsassociatedwiththeabstractaddress.Wedefinethehelperfunction (cid:92) F(cid:99)ld:2Types→2Labelstorefertothesetofallabstractlabelsthataredefinedforthetypes inagivenset(including[]ifthesetcontainsanarraytype). 5 2.3 AbstractionRelation Wearenowreadytoformallyrelatetheabstractheapgraphtoitsconcretecounterparts byspecifyingwhichheapsareintheconcretization(γ)ofanabstractheap: (Env,σ,Ob)∈γ((E(cid:100)nv,σ(cid:98),O(cid:99)b))⇔∃anembeddingµ whereTyping(µ,Ob,O(cid:99)b) ∧Injective(µ,Env,σ,Ob,E(cid:100)nv,σ(cid:98),O(cid:99)b)∧Shape(µ,Env,σ,Ob,E(cid:100)nv,σ(cid:98),O(cid:99)b) Aconcreteheapisaninstanceofanabstractheap,ifthereexistsanembeddingfunction µ :Ob→O(cid:99)bsatisfyingthegraphembedding,typing,injectivity,andshaperelations betweenthestructures.Theauxiliarypredicatesaredefinedasfollows. Embed(µ,Env,σ,Ob,E(cid:100)nv,σ(cid:98),O(cid:99)b)= (cid:91) ∀v∈Vars.µ(σ(Env(v)))∈Trgts(σ(cid:98)(E(cid:100)nv(v))) ∧∀o ∈Obandnon-nullpointers p=(o ,l,o) s s t (cid:91) ∃(cid:98)l∈F(cid:99)ld(T(cid:99)y(µ(os))).µ(ot)∈Trgts(σ(cid:98)(µ(os).(cid:98)l))∧l∈γL((cid:98)l) Theembedpredicatemakessurethatalloftheobjectsandpointersoftheconcreteheap arepresentintheabstractheapgraph,connectingcorrespondingabstractnodes,and thatthestoreandlabelsintheabstractgraphrespecttheconcretestoreandlabels.The embeddingmustalsopreserveanyvariablemappings. Typing(µ,Ob,O(cid:99)b)=∀n∈O(cid:99)b,o∈µ−1(n).Ty(o)∈T(cid:99)y(n) ThetypingrelationguaranteesthatthetypeTy(o)foreveryconcreteobjectoisinthe setoftypesoftheabstractnodeT(cid:99)y(n)associatedwitho. Injective(µ,Env,σ,Ob,E(cid:100)nv,σ(cid:98),O(cid:99)b)=∀ns,nt ∈O(cid:99)b,(cid:98)l∈F(cid:99)ld(T(cid:99)y(ns)).I(cid:99)nj(σ(cid:98)(ns.(cid:98)l))⇒ ((cid:98)l(cid:54)=[]⇒inj(µ−1(ns),µ−1(nt),l,σ))∧((cid:98)l=[]⇒inj[](µ−1(ns),µ−1(nt),σ)) Theinjectivityrelationguaranteesthateverypointersetmarkedasinjectivecorresponds toinjective(andarrayinjectiveasneeded)pointersbetweentheconcretesourceand targetregionsoftheheap. Shape(Env,σ,Ob,E(cid:100)nv,σ(cid:98),O(cid:99)b)=∀n∈O(cid:99)b S(cid:99)h(n)=tree⇒tree(µ−1(n,σ))∧S(cid:99)h(n)=none⇒none(µ−1(n,σ)) Theshaperelationguaranteesthatforeverynoden,theconcretesubgraphµ−1(n,σ) abstractedbynodensatisfiesthecorrespondingconcreteshapepredicates. 2.4 ExampleHeap Figure1(a)showsasnapshotoftheconcreteheapfromasimpleprogramthatmanip- ulatesexpressiontrees.AnexpressiontreeconsistsofbinarynodesforAdd,Sub,and Multexpressions,andleafnodesforConstantsandVariables.Thelocalvariable 6 (a) AConcreteHeap. (b) CorrespondingAbstractHeap. Fig.1.ConcreteandAbstractHeap exp (rectangular box) points to an expression tree consisting of 4 interior binary ex- pressionobjects,2Var,and2Constobjects.Thelocalvariableenvpointstoanarray representinganenvironmentofVarobjectsthataresharedwiththeexpressiontree. Figure1(b)showsthecorrespondingnormalform(seesection3)abstractheapfor thisconcreteheap.Toeasediscussionwelabeleachnodeinagraphwithauniquenode id($id).Theabstractionsummarizestheconcreteobjectsintothreeregions.Theregions arerepresentedbythenodesintheabstractheapgraph:(1)anoderepresentingallinterior recursiveobjectsintheexpressiontree(Add,Mult,Sub),(2)anoderepresentingthe twoVarobjects,and(3)anoderepresentingthetwoConstobjects.Theedgesrepresent possiblesetsofnon-nullcrossregionpointersassociatedwiththegivenabstractlabels. Detailsabouttheorderandbranchingstructureofexpressionnodesareabsentbutother moregeneralpropertiesarestillpresent.Forexample,thefactthatthereisnosharingor cyclesamongtheinteriorexpressionnodesisapparentintheabstractgraphbylooking attheself-edgerepresentingthepointersbetweenobjectsintheinterioroftheexpression tree.Thelabeltree{l,r}ontheself-edgeexpressesthatpointersstoredinthelandr fieldsoftheobjectsrepresentedbynode1formatreestructure. Theabstractgraphmaintainsanotherusefulpropertyoftheexpressiontree,namely thatnoConstobjectisreferencedfrommultipleexpressionobjects.Ontheotherhand, severalexpressionobjectsmightpointtothesameVarobject.Theabstractgraphshows this possible non-injectivity using wide orange colored edges (if color is available), whereasnormaledgesindicateinjectivepointers.Similarlytheedgefromnode4(the envarray)tothesetofVarobjectsrepresentedbynode2isinjective,notshadedand wide.Thisimpliesthatthereisnoaliasingbetweenthepointersstoredinthearray,i.e. everyindexinthearraycontainsapointertoauniqueobject.Additionally,theabstract heap,viaacombinationofreachability,shape,andsharinginformation,showsthereis noaliasingonanydistinctpairofpathsstartingfromexpandendingwithadereference oftherfield.Thiscanbededucedfromthefactthatnode1isatreelayout,sothereis noaliasinginternallyoneitherthelorrfields,andthatbothoutgoingedgesredges 7 areinjective(narrowandunshaded).Sinceweknowallpathsthroughthetreedonot alias(leadtodifferentobjects)thisimpliesthefinaldereferencesoftherfields,which canonlycontaininjectivepointerstoConstorVarobjects,donotaliaseither. 3 NormalForm Giventhedefinitionsfortheabstractheapitisclearthatthedomainisinfinite.This allowssubstantialflexibilitywhendefiningthetransferfunctionsandmorepreciseresults whenanalyzingstraightlineblocksofcode.However,itisproblematicwhendefining themerge/equalityoperationsandcanresultinthefinalanalysishavinganunacceptably largecomputationalcost.Topreventthiswedefineanefficientlycomputablenormal form,O((N+E)∗log(N))whereN isthenumberofnodesintheabstractheapgraph and E is the number of edges. The normal form ensures that the set of normal form abstractheapsforanygivenprogramisfiniteandthattheabstractheapsinthissetcan easilybemergedandcompared. The normal form leverages the idea that locally (within a basic block or method call) invariants can be broken and subtle details are critical to program behavior but before/after these local components invariants should be restored. The basis for the normalform,andtheselectionofwhatareimportantpropertiestopreserve,comesfrom studiesoftheruntimeheapstructuresproducedinobject-orientedprograms[31,2].Thus weknowthat,ingeneral,thesedefinitionsarewellsuitedtocapturingthefundamental structuralpropertiesoftheheapthatareofinterestwhilesimplifyingthestructureof abstractheapsanddiscardingsuperfluousdetails. Definition1 (NormalForm).Wesaythattheabstractheapisinnormalformiff: 1. Allnodesarereachablefromavariableorstaticfield. 2. Allrecursivestructuresaresummarized(Definition2). 3. Allequivalentsuccessorsaresummarized(Definition4). 4. Allvariable/globalequivalenttargetsaresummarized(Definition5). Thatistherearenounreachablenodesandstructurallytheabstractheaprepresents thecongruenceclosureoftherecursivestructure,equivalentsuccessor,andequivalent targetrelations. Whilethenormalformdefinitionisfundamentallydrivenbyheuristicsderivedfrom empiricalstudiesoftheheapstructuresinrealprograms(andthusonecouldimagine anumberofvariants)therearethreekeypropertiesthatitpossesses:(1)theresulting abstractheapgraphhasaboundeddepth,(2)eachnodehasaboundedoutdegree,and (3) for each node the possible targets of the abstract addresses associated with it are uniquewrt.thelabelandthetypesinthetargetnodes.Thefirsttwopropertiesensure thatthenumberofabstractheapsinthenormalformsetarefinite,whilethethirdallows ustodefineefficientmergeandcompareoperations(section4). 8 3.1 EquivalencePartitions Aseachoftheproperties(recursivestructures,ambiguoussuccessors,andambiguous targets)aredefinedintermsof,congruencebetweenabstractnodesthetransformationof anabstractheapintothecorrespondingnormalformisfundamentallythecomputation ofacongruenceclosureoverthenodesintheabstractheapfollowedbymergingthe resultingequivalencesets.Thus,webuildamapfromtheabstractnodestoequivalence sets(partitions)usingaTarjanunion-findstructure.FormallyΠ :O(cid:99)b→{π1,...,πk} where πi ∈2O(cid:99)b and {π1,...,πk} are a partition of O(cid:99)b. The union-find structure can alsobeusedtomaintainthesetofallthetypesassociatedwiththenodesinapartition ((cid:83) T(cid:99)y(n)).Initiallythepartitionissetasasingleton(i.e.,∀n∈O(cid:99)b.Π(n)={n}). n∈π Thefirststepincomputingthenormalformistoidentifyanynodesthatmaybe partsofunboundeddepthstructures.Thisisaccomplishedbyexaminingthetypesystem fortheprogramthatisunderanalysisandidentifyingallthetypesthatarepartofthe samerecursivetypedefinitions.Thisisacommonlyusedtechnique[4,28,9]andensures thatanyheapgraphproducedhasafinitedepth.Wesaytypesτ andτ arerecursive 1 2 (τ ∼τ )iftheyarepartofthesamerecursivetypedefinition. 1 2 Definition2 (Recursive Structure). Given two partitions π and π we define the 1 2 recursivestructurecongruencerelationas: π1≡Πr π2⇔∃τ1∈(cid:83)n1∈π1T(cid:99)y(n1),τ2∈(cid:83)n2∈π2T(cid:99)y(n2).τ1∼τ2 (cid:91) ∧∃n∈π1,(cid:98)l∈F(cid:99)ld(T(cid:99)y(n)).Trgts(σ(cid:98)(n.(cid:98)l))∩π2(cid:54)=0/ Theotherpartofthenormalformcomputationistoidentifyanypartitionsthathave equivalentsuccessorsandvariablesthathaveequivalenttargets.Bothoftheseoperations dependonthenotionofasuccessorpartitionwhichisbasedontheunderlyingstructureof (cid:91) theabstractheapgraph:π1asuccessorofπ2and(cid:98)l⇔∃n2∈π2.Trgts(σ(cid:98)(n2.(cid:98)l))∩π1(cid:54)=0/. Definition3 (PartitionCompatibility).WedefinetherelationCompatible(π ,π )as: 1 2 Compatible(π1,π2)⇔(cid:83)n(cid:48)∈π1T(cid:99)y(n(cid:48))∩(cid:83)n(cid:48)∈π2T(cid:99)y(n(cid:48))(cid:54)=0/. Definition4 (Equivalent Successors). Given π , π which are successors of π on 1 2 labels(cid:98)l1,(cid:98)l2wedefinetherelationequivalentsuccessorsonthemas:π1≡Πs π2⇔(cid:98)l1= (cid:98)l2∧Compatible(π1,π2). Definition5 (EquivalentonTargets).Givenarootr(avariableorastaticfield)and two partitions π , π where r refers to a node in π and a node in π we define the 1 2 1 2 equivalent targets relation as: π ≡Π π ⇔Compatible(π ,π )∧(risastaticfield∨ 1 t 2 1 2 π ,π onlyhavelocalvarpredecessors). 1 2 Usingtherecursivestructurerelationandtheequivalentsuccessor(target)relations wecanefficientlycomputethecongruenceclosureoveranabstractheapproducingthe correspondingnormalformabstractheap(Definition2).Thiscomputationcanbedone viaastandardworklistalgorithmthatmergespartitionsthatcontainequivalentnodes andcanbedoneinO((N+E)∗log(N))timewhereN isthenumberofabstractnodes intheinitialabstractheap,andE isthenumberofabstractaddressesintheheap. 9 3.2 ComputingSummaryNodes After partitioning the nodes in the graph with the congruence closure computation weneedtomergeallthenodesineachpartitionintoasummarynode.Theresulting summarynodeshouldsafelysummarizethepropertiesoftheallthenodesinthepartition. Similarly,wemayneedtoupdatetargetandinjectivityinformationforthesummary nodesintheabstractstore.Givenanodepartition(π)thatwewanttoreplacewitha newsummarynode(n )weusethefollowingtocomputetheabstractpropertiesforthe s summarynodeandnewabstractstoreσ(cid:98)s: ns=((cid:98)ιfrsh,(cid:116)type(π),(cid:116)shape(π),{[(cid:98)l(cid:55)→a(cid:98)(cid:98)l]|(cid:98)l∈F(cid:99)ld((cid:116)type(π)),a(cid:98)(cid:98)l afreshaddress}) where(cid:98)ιfrshisafreshnodeidentifierinNID σ(cid:98)s=MergeStore(σ(cid:98)s,(cid:98)l,π)foreach(cid:98)l∈F(cid:99)ld((cid:116)type(π)) (cid:91) (cid:116)type(π)= T(cid:99)y(n) n∈π Oncethismergeiscompletewecanupdatetheinformationontheabstractaddresses associatedwitheachvariableinE(cid:100)nvbyreplacinganynodesinthetargetsetswiththe appropriatenewlycreatedsummarynodes. Shape. TheShapeinformationisnon-trivialtomergeasitdependsbothontheshapes oftheindividualnodesthatarebeinggroupedandalsoontheconnectivityproperties between them. We first perform a traversal of the subgraph of the partition and the (non-self)abstracttargetsbetweenthem.Thenbasedonthediscoveryofback,cross,or treereferences(inagraphtheoreticsense)andifanyoftheseabstractstoragelocation arenotinjectivewecomputetheshapeas(cid:116)shape(π)=struct(π)(cid:116)(cid:70)n∈πS(cid:99)h(n)where: none ifNoInternalEdgesExist struct(π)= tree if∀n∈π,(cid:98)l∈F(cid:99)ld(T(cid:99)y(n)).¬I(cid:99)nj(σ(cid:98)(n.(cid:98)l))∧n.(cid:98)laTreeEdgeinπ ∧(S(cid:99)h(n)=none∨∀n(cid:48)∈T(cid:91)rgts(σ(cid:98)(n.(cid:98)l))∩π.S(cid:99)h(n(cid:48))=none) any otherwise InjectivityandAbstractTargets. Givenamappingfromthepartitionstothenewsummary nodes,Φ :Img(Π)→{ns1,...,nsk},thenforeachlabel,(cid:98)l,andabstractaddress,a(cid:98)(cid:98)l,that mayappearinasummarynode,n ,wesetthevaluesintheabstractstoreas: s MergeStore(σ(cid:98)s,(cid:98)l,π)=σ(cid:98)s+[a(cid:98)(cid:98)l (cid:55)→(inj,trgts)]where trgts={Φ(Π(n(cid:48)))|n(cid:48)∈(cid:83)n∈πT(cid:91)rgts(σ(cid:98)(n.(cid:98)l))} inj=∀n∈π.I(cid:99)nj(σ(cid:98)(n.(cid:98)l))∧∀n(cid:48)∈π\{n}.T(cid:91)rgts(σ(cid:98)(n1.(cid:98)l))∩T(cid:91)rgts(σ(cid:98)(n2.(cid:98)l))=0/ Injectivityisthelogicalconjunctionoftheinjectivityofallthesourcelabellocations, andthattherespectivetargetssetsofthenodesthataremergeddonotoverlap.Inthecase wherethetargetsetsdooverlap,i.e.,twodistinctnodeshaveabstractlabels/addressesthat containthesamenode,theresultingaddressmaynotonlybeassociatedwithinjective pointers.Thus,theinjectivityvalueisconservativelysettofalse(i.e.,notinjective).The 10