Abstracting Runtime Heaps for Program Understanding MarkMarron1 CesarSanchez1,2 ZhendongSu3 ManuelFahndrich4 1IMDEASoftwareInstitute 2CSIC 3UCDavis 4MicrosoftResearch {mark.marron,cesar.sanchez}@imdea.org,[email protected],[email protected] 2 Abstract. Modern programming environments provide extensive support for 1 inspecting,analyzing,andtestingprogramsbasedonthealgorithmicstructureof 0 aprogram.Unfortunately,supportforinspectingandunderstandingruntimedata 2 structuresduringexecutionistypicallymuchmorelimited.Thispaperprovidesa generalpurposetechniqueforabstractingandsummarizingentireruntimeheaps. n a Wedescribetheabstractheapmodelandtheassociatedalgorithmsfortransforming J aconcreteheapdumpintothecorrespondingabstractmodelaswellasalgorithms 6 formerging,comparing,andcomputingchangesbetweenabstractmodels.The abstractmodelisdesignedtoemphasizehigh-levelconceptsaboutheap-baseddata ] structures,suchasshapeandsize,aswellasrelationshipsbetweenheapstructures, L suchassharingandconnectivity.Wedemonstratetheutilityandcomputational P tractabilityoftheabstractheapmodelbybuildingamemoryprofiler.Wethenuse s. thistooltocheckfor,pinpoint,andcorrectsourcesofmemorybloatfromasuite c ofprogramsfromDaCapo. [ 1 v 1 Introduction 7 2 Modernprogrammingenvironmentsprovideexcellentsupportforvisualizingandde- 3 bugging code, but inspecting and understanding the high-level structure of the data 1 . manipulated at runtime by said code is typically not well supported. Visualizing en- 1 tire runtime heap graphs is a non-trivial problem, as the number of nodes and edges 0 istypicallysolargethatdisplayingthesegraphsdirectly—evenwithexcellentgraph 2 1 visualizationtools—resultsinuselessjumblesofnodesandedges.Asaresult,littleof : interestcanbegleanedfromsuchvisualizations. v Inthispaper,weproposeanabstractdomainforruntimeheapgraphsthatcaptures i X manyfundamentalpropertiesofdatastructuresontheheap,suchasshape,connectivity, r andsharing,butabstractsawayotheroftenlessusefuldetails.Theabstractheapgraphs a wecomputearebothsmallenoughtovisualizeandnavigate,andatthesametimeprecise enoughtocaptureessentialinformationusefulininteractivedebuggingandmemory profilingscenarios.Further,theabstractheapscanbecomputedefficientlyfromasingle concreteheapandfurthermerged/comparedwithotherabstractheapgraphs.fromacross asetofprogramruns,orfrommultipleprogrampointsinordertogetanevenmore generalviewoftheheapconfigurationsthatoccurduringprogramexecution. Example. Figure 1(a) shows a heap snapshot of a simple program that manipulates expressiontrees.AnexpressiontreeconsistsofbinarynodesforAdd,Sub,andMult, andleafnodesforConstantsandVariables.Thelocalvariableexp(rectangularbox) (a) AConcreteHeap. (b) CorrespondingAbstractHeap. Fig.1.Aconcreteheapandcorrespondingabstraction. pointstoanexpressiontreeconsistingof4interiorbinaryexpressionobjects,2Var,and 2Constobjects.Localvariableenvpointstoanarrayrepresentinganenvironmentof Varobjectsthataresharedwiththeexpressiontree. Figure1(b)showstheabstractheapproducedbyourtoolsfromthisconcreteheap withthedefaultvisualizationmode.1Theabstractionsummarizestheconcreteobjects into three distinct summary nodes in the abstract heap graph: (1) an abstract node representingallinteriorrecursiveobjectsintheexpressiontree(Add,Mult,Sub),(2)an abstractnoderepresentingthetwoVarobjects,and(3)anabstractnoderepresentingthe twoConstobjects.Specificdetailsabouttheorderandbranchingstructureofexpression nodesareabsentintheabstraction,butothermoregeneralpropertiesarestillpresent. Forexample,thefactthatthereisnosharingorcyclesamongtheinteriorexpression nodes is apparent in the abstract graph by looking at the self-edge representing the pointersbetweenobjectsintheinterioroftheexpressiontree.Thelabeltree{l,r}on theself-edgeexpressesthatpointersstoredinthelandrfieldsoftheobjectsinthis regionformatreestructure(i.e.,nosharingandnocycles). Theabstractgraphmaintainsanotherusefulpropertyoftheexpressiontree,namely thatnoConstobjectisreferencedfrommultipleexpressionobjects.Ontheotherhand, severalexpressionobjectsmightpointtothesameVarobject.Theabstractgraphshows possiblesharingusingwideorangecolorededges(ifcolorisavailable),whereasnormal edgesindicatenon-sharingpointers.Theabstractgraphshowspointernullityviafullvs. dashedlines–inourexampleallpointers,exceptintheenvironmentarray,arenon-null. Rudimentary information on the number of objects represented by each node is encodedintheshading.Nodesthatalwaysabstractasingleobjectaregivenawhite backgroundwhilenodeswhichrepresentmultipleobjectsareshaded(silverifcoloris available).Sizeinformationofarraysandothercontainersisencodedbyannotatingthe typelabelwiththecontainersize(Var[3]toindicateanarrayisoflength3). 1Additionalinformationcanbeobtainedbyhoveringoverthenodes/edgesorbyrestylingfora specifictaskasinourcasestudiesinsection6. 2 Overview. Thispaperaddressestheproblemofturninglargeconcreteruntimeheapsinto compactabstractheapswhileretainingmanyinterestingpropertiesoftheoriginalheapin theabstraction.Ourabstractionissafeinthesensethatpropertiesstatedontheabstract heapgraphalsoholdinthecorrespondingconcreteheaps.Toachievethisabstraction safety, we adopt the theory for the design of abstract domains developed in abstract interpretation[7,25].Thetheoryofabstractinterpretationprovidesageneralframework for(1)defininganabstractdomainandrelatingittopossibleconcreteprogramstates and(2)amethodfortakinganabstractdomainandcomputinganover-approximation ofthecollectingsemanticsforagivenprogramasastaticanalysis.Thestaticanalysis component of the abstract interpretation framework is not relevant here, as we are interestedinabstractingruntimeheaps.However,theframeworkforconstructingthe abstractdomains,aswellasthepropertiesofoperationsforcomparing((cid:118))andmerging ((cid:116)(cid:101)) abstract domain elements, allows us to formally describe the relationship of our abstractheapgraphstotheirconcretecounterparts,andtoobtainsafeoperationsfor comparingandsummarizingheapsfromdifferentprogrampointsordifferentprogram runsinasemanticallymeaningfulway.Theseguaranteesprovideconfidencethatall inferencesmadebyexaminingtheabstractmodelarevalid. Ourabstractheapdomainencodesafixedsetofheappropertiesidentifiedinprevious workonstaticheapanalysis[5,10,20]thatarefundamentalpropertiesofheapsandcan becomputedefficiently.Thesepropertiesincludethesummarizationofrecursiveand compositedatastructures,theassignmentofshapeinformationtothesestructuresand injectivityoffields(giventwodistinctobjectsdoesthefieldfineachobjectpointto a distinct target). The abstraction is also able to provide information on the number andtypesofobjectsinthevariousstructures,aswellasnullityinformation.Ourfocus on a fixed set of heap properties (as opposed to user defined properties) enables the abstractiontobecomputedefficientlyintimeO((Ob+Pt)∗log(Ob)),whereObisthe numberofobjectsandPtisthenumberofpointersintheconcreteheap. Thecontributionsofthispaperare: – Theabstractdomainforheapgraphsanditsconcretizationfunctionformalizingthe saferelationshiptoconcreteheaps. – Anefficientalgorithmforcomputingtheabstractionandalgorithmsforcomparing andjoiningabstractheaps. – Graphicalrepresentationsofabstractheapgraphsthatallowon-demandcollapsing orexpansionofsub-structures,allowingaformofsemanticzoom[8]fromavery abstractviewoftheheapdowntothelevelofindividualobjects. – Theconstructionofageneralpurposeheapmemoryprofilerandanalysistoolthat augmentsthebasicabstractionwithspecializedsupportforprofilingandidentifying commonmemoryproblemsinaprogram. – Aqualitativeevaluationofthevisualizationandmemoryprofilerintrackingdown andidentifyingsolutionstomemoryinefficienciesinarangeofprograms(uptoa 25%reductioninmemoryuse). 3 2 AbstractHeapGraph Webeginbyformalizingconcreteprogramheapsandtherelevantpropertiesofconcrete heapsthatwillbecapturedbytheabstraction.Later,wedefinetheabstractheapgraph andformallyrelatetheabstractiontoitsconcreteheapcounterpartsusingaconcretization (γ)functionfromtheframeworkofabstractinterpretation. 2.1 ConcreteHeaps Forthepurposesofthispaper,wemodeltheruntimestateofaprogramasanenvironment, mappingvariablestovalues,andastore,mappingaddressestovalues.Werefertoan instanceofanenvironmenttogetherwithastoreasaconcreteheap.Formally,aconcrete heapisalabeleddirectedgraph(root,null,Ob,Pt,Ty),wherethenodesareformedby thesetofheapobjects(Ob)andtheedges(Pt)correspondtopointers.Weassumea distinguishedheapobjectroot∈Obwhosefieldsarethevariablesfromtheenvironment. Thisrepresentationavoidsdealingwithdistinctsetsofvariablelocationsandmakesthe formalizationmoreuniform.WealsoassumeadistinguishedobjectnullamongObto modelnullpointers.ThesetofpointersPt⊆Ob×Ob×Labelconnectasourceobjectto atargetobjectwithapointerlabelfromLabel.Theselabelsareeitheravariablename(if thesourceobjectisroot),afieldname(ifthesourceobjectisaheapobject),oranarray index(ifthesourceobjectisanarray).Finally,Ty:Ob→Typeisamapthatassigns aconcreteprogramtypetoeachobject.WeassumetheconcretesetoftypesinType p containsatleastobjecttypesandarraytypes.Weusethenotationo −→o toindicate 1 2 thatobjecto referstoo viapointerlabel p. 1 2 AregionofmemoryC⊆Ob\{null,root}isasubsetoftheconcreteheapobjects, notcontainingtherootnodeornull.ItishandytodefinethesetofpointersP(C ,C ) 1 2 crossingfromaregionC toaregionC as: 1 2 p P(C ,C )={o −→o ∈Pt|o ∈C ,o ∈C } 1 2 1 2 1 1 2 2 2.2 ConcreteHeapProperties Wenowformalizethesetofconcretepropertiesofobjects,pointers,andentireregions oftheheapthatwelaterusetocreatetheabstractheapgraph. Type. ThesetoftypesassociatedwitharegionCistheunionofalltypesoftheobjects intheregion:{Ty(o)|o∈C}. Cardinality. ThecardinalityofaregionCisthenumberofobjectsintheregion|C|. Nullity. Apointero →o isanullpointerifo =nullandnon-nullpointerifo (cid:54)=null. 1 2 2 2 Injectivity. GiventworegionsC andC ,wesaythatpointerslabeled pfromC toC 1 2 1 2 p p areinjective,writteninj(C ,C ,p),ifforallpairsofpointerso −→t ando −→t drawn 1 2 1 1 2 2 fromP(C ,C ),o (cid:54)=o ⇒t (cid:54)=t .Inwords,thepointerslabeled pfromtwodistinct 1 2 1 2 1 2 objectso ando pointtodistinctobjectst andt . 1 2 1 2 4 Shape. WecharacterizeregionsofmemoryCbyshapeusingstandardgraphtheoretic notionsoftreesandgeneralgraphs.Foradditionalprecision,weconsidertheshapeof subgraphsformedfromC,andP(C,C)↓ ,i.e.,thesubgraphconsistingofobjectsfromC L andpointerswithlabelsl∈Lonly.Thisway,wecandescribe,forexample,thatatree structurewithparentpointersisstillatreeifweonlyconsidertheleftandrightpointers, butnottheparentpointers. – Thepredicateany(C,L)issimplytrueforanygraph.Weuseitonlytoclarifyshapes invisualizationsthatdon’tsatisfythemorerestrictivetreeproperty. – Thepredicatetree(C,L)holds,ifP(C,C)↓ isacyclicandthesubgraphP(C,C)↓ L L doesnotcontainanycrossedges. 2.3 HeapGraphAbstraction Anabstractheapgraphisaninstanceofstorageshapegraphs[5].Moreprecisely,the abstractheapgraphsusedinthispaperaretuples: (root,null,Ob#,Pt#,Ty#,Cd#,Ij#,Sh#) whereOb# isasetofabstractnodes(eachofwhichabstractsaregionoftheconcrete heap),andPt#⊆Ob#×Ob#×Label# isasetofgraphedges,eachofwhichabstracts asetofpointers.Edgesareannotatedwithlabelsfrom,Label#,andarethefieldlabels andthespeciallabel[].Thespeciallabel[]abstractstheindicesofallarrayorcontainer elements(i.e.,arraysmashing). WedistinguisharootnodeinOb#formodelingthevariableenvironmentasfieldson root.Anotherdistinguishednodenullisusedtorepresentthenullpointer.Theremaining parts of an abstract heap (Ty#,Cd#,Ij#,Sh#) capture abstract properties of the heap graph.Ty#:Ob#(cid:55)→2Typemapsabstractnodestothesetoftypesoftheconcretenodes representedbytheabstraction.Cd#:Ob#(cid:55)→Intervalrepresentsthecardinalityofeach abstractedregion.Cd#mapseachabstractnodentoanumericalinterval[l,u]∈Interval, wherethelowerboundlisanaturalnumber,anduisanaturalnumberor∞. TheabstractinjectivityIj#:Pt#→boolexpresseswhetherthesetofpointersrepre- sentedbyanabstractedgeisinjective.Finally,theabstractshapeSh#isasetoftuples (n,L,s)∈Ob#×2Label#×{tree,any}indicatingtheshapesofaregionrepresentedbyn withedgesrestrictedtoL. 2.4 AbstractionRelation Wearenowreadytoformallyrelatetheabstractheapgraphtoitsconcretecounterparts byspecifyingwhichheapsareintheconcretizationofanabstractheap: (root,null,Ob,Pt,Ty)∈γ(root,null,Ob#,Pt#,Ty#,Cd#,Ij#,Sh#)⇔ ∃µ.Embed(µ,Ob,Pt,Ob#,Pt#) ∧Typing(µ,Ob,Ty,Ob#,Ty#)∧Counting(µ,Ob,Ob#,Cd#) ∧Injective(µ,Pt,Pt#,Ij#)∧Shape(µ,Pt,Pt#,Sh#) 5 Aconcreteheapisaninstanceofanabstractheap,ifthereexistsanembeddingµ:Ob→ Ob# satisfyingthegraphembedding,typing,counting,injectivity,andshaperelation betweenthegraphs.Theauxiliarypredicatesaredefinedasfollows. Embed(µ,Ob,Pt,Ob#,Pt#)⇔µ(root)=root∧µ(null)=null ∧∀o −→p o ∈Pt.∃l.µ(o )→−l µ(o )∈Pt#∧p∈γ (l) 1 2 1 2 L Theembedpredicatemakessurethatalledgesoftheconcretegrapharepresentinthe abstractgraph,connectingcorrespondingabstractnodes,andthattheedgelabelinthe abstractgraphencompassestheconcreteedgelabel.Theembeddingmappingµ must alsomapthespecialobjectsrootandnulltotheirexactabstractcounterparts. Typing(µ,Ob,Ty,Ob#,Ty#)⇔∀o∈Ob.Ty(o)∈Ty#(µ(o)) ThetypingrelationguaranteesthatthetypeTy(o)foreveryconcreteobjectoisinthe setoftypesTy#(µ(o))oftheabstractnodeµ(o)ofo. Counting(µ,Ob,Ob#,Cd#)⇔∀n∈Ob#.|µ−1(n)|∈Cd#(n) Thecountingrelationguaranteesthatforeachabstractnoden,thesetofconcretenodes µ−1(n)abstractedbynhasacardinalityinthenumericintervalCd#(n). Injective(µ,Pt,Pt#,Ij#)⇔ ∀(n ,n ,l)∈Pt#.Ij#(n ,n ,l)⇒∀p∈γ (l).inj(µ−1(n ),µ−1(n ),p) 1 2 1 2 L 1 2 Theinjectivityrelationguaranteesthateverypointersetmarkedasinjectivecorresponds toinjectivepointersbetweentheconcretesourceandtargetregionsoftheheap. Shape(µ,Pt,Pt#,Sh#)⇔∀(n,L,tree)∈Sh#.tree(µ−1(n),γ (L)) L Finally,theshaperelationguaranteesthatforeveryabstractshapetuple(n,L,s),the concretesubgraphµ−1(n)abstractedbynodenrestrictedtolabelsLsatisfiesthecorre- spondingconcreteshapepredicates(treeandimplicitlyany). 2.5 VisualRepresentationofAbstractHeapGraphs Intheiconographyforourabstractgraphvisualizations,thescreenshotsinFigure1(a), Figure1(b),andsection6,weleverageanumberofconventionstoconveyinformation. Anedge(root,o,p)whosesourceistherootnoderepresentsthecontentofvariable p.Insteadofdrawingarootnodewithsuchedges,wesimplydrawavariablenode p andanunlabelededgetoo.Thus,therootnodeisneverdrawn,asitdoesnotappearas thetargetofanyedgeinconcreteorabstractgraphs. Thesetofabstracttypesofanabstractnodeisrepresentedasthelabeloftheabstract node.Shapeinformationisrepresentedaslabelsontherecursiveselfedgesofabstract nodes.Anabstractnodewithcardinality1isrepresentedbyawhitebackground.Other cardinalitiesarerepresentedwithshadedabstractnodes. We do not draw explicit edges which only point to null. If an edge is associated withalabelthatcontainsbothpointerstonullandpointerstootherheapobjectswe foldthepossibilityintotheedgebyusingadashededgeinsteadofafulledge.Finally, injectiveedgesarerepresentedwithnormalthinedges,whereasnon-injectiveedgesare representedbywideedges(andifcolorisavailablearealsohighlightedinorange). 6 3 ComputingtheAbstraction This section describes the computation of the abstract graph from a given concrete heap.Thetransformationisperformedinthreephases.1)recursivedatastructuresare identifiedandcollapsedbasedonidentifyingcyclesinthetypedefinitions,2)nodesthat representobjectsinthesamelogicalheapregionbasedonequivalentedgesoriginating thesameabstractnodearemerged,andfinally3)abstractpropertieslikecardinality, injectivity,andshapearecomputedfortheabstractedgesandnodes. 3.1 Partition(µ)Computation Initially,weassociatewitheachconcreteobjecto anabstractpartitionn representing i i anequivalenceusingaTarjanunion-findstructure.Themappingµ fromconcreteobjects to abstract partitions is given at any point in time by: µ(o)=ecr(n), i.e., by the i i equivalenceclassoftheoriginaln associatedwitho.Theunion-findstructuremaintains i i thereversemappingµ−1providingthesetofconcreteobjectsabstractedbyanode.The abstracttypemapTy#canbemaintainedefficientlyintheunion-findstructureaswell. Figure2(a)showstheinitialstateoftheseequivalencepartitionsforourexample fromFigure1(a)(onepartitionperobject,plustheroots,andaspecialpartitionfornull). Eachnodeislabeledwithitspartitionidandthetypesoftheobjectsinthatpartition. Thefirstabstractionidentifiespartsoftheheapgraphthatrepresentunboundeddepth recursivedatastructures.Thebasicapproachconsistsofexaminingthetypeinformation intheprogramandtheheapconnectivityproperties[2,19,9]andensuresthatanyheap graphproducedhasafinitedepth.Wesaytypesτ andτ arerecursive(τ ∼τ )ifthey 1 2 1 2 arepartofthesamerecursivetypedefinition. Definition1 (SameDataStructureObjects).Twodistinctobjectso ,o arepartof 1 2 p thesamedatastructureifthereisareferenceo −→o intheheapandthetypesofthe 1 2 twoobjectsareinthesamedatastructureTy(o )∼Ty(o ). 1 2 Therecursivecomponentsarethusidentifiedbyvisitingeachpointero →o inthe i j heapandifo ando areinthesamedatastructureaccordingtoDefinition1,thenwe i j unionthecorrespondingabstractnodesn andn . i j Figure2(b)showstheresultofmergingSameDataStructureNodesontheinitial partitionsshowninFigure2(a).Thealgorithmidentifiedobjects1,2,4,5(theAdd,Sub, andMultobjectsfromtheinterioroftheexpressiontree)asbeingpartoftherecursive datastructureandreplacedthemwithasinglerepresentativesummarynode. Next we group objects based on predecessor partitions. The motivation for this abstractioncanbeseeninFigure2(b)whereVarobjectsinpartitions7and8represent “variablesintheenvironment”.There’snoneedtodistinguishthemastheyareboth referencedfromtheenvironmentarray.Similarly,thetwoconstantobjectsreferenced fromtherecursivecomponentbothrepresent“constantsintheexpressiontree”. l Definition2 (EquivalentonAbstractPredecessors).Giventwopointerso →− o and 1 2 o(cid:48) −→l(cid:48) o(cid:48) where µ(o )= µ(o(cid:48)), we say that their target nodes are equivalent when- 1 2 1 1 ever: The labels agree l =l(cid:48), and the target nodes have some types in common, i.e., Ty#(µ(o ))∩Ty#(µ(o(cid:48)))(cid:54)=0/. 2 3 7 (a) InitialPartition. (b) MergeSameDataStructure. (c) MergePredecessors. Fig.2.Stepsinabstractioncomputation. Thealgorithmforgroupingequivalentobjectsisbasedonaworklistwheremerging two partitions may create new opportunities for merging. The worklist consists of pointersthatmayhaveequivalenttargetobjects.Whenprocessingapointerfromthe worklist we check if we need merge any partitions and, as needed, we merge these partitions.Finally,allpointersincidenttothemergedpartitionsareaddedtotheworklist. DuetothepropertiesoftheTarjanunion-findalgorithm,eachpointercanenterthework listatmostlog(N)times,whereNisthenumberofabstractpartitionsthatcanbemerged, andE isthenumberofpointers.ThusthecomplexityofthisstepisO(E∗log(N)). Figure 2(c) shows the result of performing the required merge operations on the partitions from Figure 2(b). The algorithm has merged the Var regions into a new summaryregion(sincetheobjectsrepresentedbypartitions7and8inFigure2(b)are referredtofromthesamearray).SimilarlytheConstpartitionsfromFigure2(b)have beenmergedastheyarebothstoredinthesamerecursivestructure(theexpressiontree). Figure2(c)differsfromtheabstractgraphinFigure1(b)wherethereisonlyone edgebetweentheexpressiontreenodeandthevariablesnode.Thereasonisthat,despite theunderlyingabstractionbeingamulti-graph,ourvisualizationapplicationcollapses multi-edges as they frequently lead to poor graph layouts and rarely provide useful informationtothedeveloper.Also,notethatthereareexplicitreferencestonullandthat thesewerenotmergedsinceweassociatenotypeswiththenullobject. 3.2 AbstractPropertyComputation Type,Cardinality,andNullity. TheabstracttypemapTy#hasalreadybeencomputedas partoftheunion-findoperationonabstractnodes.Similarly,theunion-findoperation computestheexactcardinality,whichresultsinapreciseintervalvalue[i,i]ifanode abstractsexactlyiobjects.Thenullityinformationisrepresentedasexplicitedgestothe nullabstractobject. l Injectivity. The Injectivity information for an abstract edge n →− n is computed by 1 2 iteratingoverallpointersfromobjectso representedbyn toobjectso represented i 1 j byn withlabel pcompatiblewithl.Wedetermineifeveryconcretetargetobjectis 2 referencedatmostonce,inwhichcasetheabstractedgeisinjective.Otherwise,theedge isnotinjective. 8 Shape. Thefundamentalobservationthatenablesinterestingshapepredicatestobepro- ducedfortheabstractgraphsisthattheshapepropertiesarerestrictedtothesubgraphs representedbyanabstractnode.Inaddition,weallowtheexaminationofavarietyof furthersubgraphsbyrestrictingthesetoflabelsconsideredinthesubgraph.Restricting the label set allows e.g., to determine that the {l,r} edges in a tree actually form a tree, even though there are also parent pointers p, which if included would allow no interesting shape property to be determined. Selecting the particular subsets of edge labelstoconsiderinthesubgraphselectionisbasedonheuristics.Wecanstartwithall labelstogetanoverallshapeandusethatcomputationtoguesswhichlabelstothrow outandtryagain.Forsmallsetsoflabels,allcombinationscanbetried. AfterpartitioningtheheapasshowninFigure2(c)thefinalmapfortheobjectsis: µ−1=(cid:8)n (cid:55)→{o ,o ,o ,o },n (cid:55)→{o ,o },n (cid:55)→{o ,o },n (cid:55)→{o }(cid:9) 1 1 2 4 5 3 3 6 7 7 8 9 9 Thus,forFigure1(b)wedeterminetheabstractedgerepresentingthecrosspartition l l pointersetn →− n isnotinjective,sinceitabstractsthetwoconcretepointerso →− o 1 7 4 7 l and o →− o both refer to the same Var object o . On the other hand, since the two 5 7 7 Constobjectso ,o aredistinct,thealgorithmwilldeterminethatedgerepresenting 3 6 r thecrosspartitionpointersetn →− n isinjective.TheShapecomputationforthenode 1 3 representingpartition1requiresatraversalofthefourobjects.Astherearenocrossor backedgesthelayoutforthisistree{l,r}. 4 MergeandComparisonOperations Manyprogramanalysisandunderstandingtasksrequiretheabilityto1)accumulate abstractgraphs,and2)compareabstractgraphs(bothfromthesameprogramexecution andacrossexecutions).Forexample,tosupportcomputingdifferencesintheheapstate duringprofilingactivitiesorforcomputinglikelyheapinvariants.Sowecannotsimply trackobjectidentitiesandusethemtocontrolthemergeandcompareoperations.Thus, thedefinitionsmustbeentirelybasedontheabstractgraphstructure. 4.1 Compare Formally,theorderbetweentwoabstractgraphsg (cid:118)g canbedefinedviaourabstrac- 1 2 tionrelationfromsubsection2.4as:g (cid:118)g ⇔∀h.h∈γ(g )⇒h∈γ(g ). 1 2 1 2 However,thisisnotdirectlycomputable.Instead,weimplementaO(E)timeap- proximationofthisrelationthatfirstdeterminesthestructuralequalityoftheabstract graphsbycomputinganisomorphism,followedbyanimplicationcheckthatallabstract edgeandnodepropertiesing covertheequivalentnodeandedgepropertiesofg . 2 1 To efficiently compute the subgraph isomorphism between g and g we use a 1 2 property of the abstract graphs established by Definition 2. From this definition we knowthateverypairofoutedgesfromanodeeitherdifferinthelabelorhavethesame labelbutnon-overlappingsetsoftypesinthenodestheyreferto.Thus,tocomputean isomorphismbetweentwographswecansimplystartpairingthelocalandglobalroots 9 andthenfromeachpairmatchupedgesbasedontheirlabelandtypesets,leadingto newpairings.Thiseitherresultsinanisomorphismmap,oritresultsinapairofnodes reachablefromtherootsalongthesamepaththathaveincompatibleedges.Anysuch edgedifferencescanthenbereported.Withthesubgraphisomorphismφ,wedefinethe orderingrelation: g (cid:118) g ⇔∀n∈Ob#.Ty#(n)⊆Ty#(φ(n)) 1 φ 2 1 1 2 ∧∀n∈Ob#.Cd#(n)(cid:118)Cd#(φ(n))∧∀φ(e)∈Pt#.Ij#(φ(e))⇒Ij#(e) 1 1 2 2 2 1 ∧∀(φ(n),L ,s )∈Sh#.∃(n,L ,s )∈Sh#.L ⊆L ∧s (cid:118)s 2 2 2 1 1 1 2 1 1 2 Notehowabstractshapepredicatesarecontra-variantinthelabelsetL.Inotherwords, if a shape property holds for the subgraph based on L , then it holds for the smaller 1 subgraphbasedonthesmallersetL . 2 4.2 Merge Themergeoperationtakestwoabstractgraphsandproducesanewabstractgraphthat isanoverapproximationofalltheconcreteheapstatesthatarerepresentedbythetwo input graphs. In the standard abstract interpretation formulation this is typically the leastelementthatisanoverapproximationofbothmodels.However,tosimplifythe computationwedonotenforcethisproperty(formallywedefineanupperapproximation insteadofajoin).Ourapproachistoleveragetheexistingdefinitionsfromtheabstraction functioninthefollowingsteps. Giventwoabstractheapgraphs,g andg oftheformg =(root,null,Ob#,Pt#, 1 2 i i i i i Ty#,Cd#,Ij#,Sh#)wecandefinethegraph,g ,thatistheresultoftheirmergeasfollows. i i i i 3 Firstweproducetheunionofthetwographsbysimplyaddingallnodesandedgesfrom bothgraphs.Oncewehavetakentheunionofthetwographswemergethevariable/static rootsthathavethesamenames.ThenweuseDefinition1andDefinition2tozipdown thegraphmergingnodesandedgesuntilnomorechangesareoccurring.Duringthe mergewebuilduptwomappingsη :g →g andη :g →g fromnodes(edges)in 1 1 3 2 2 3 theoriginalgraphs,g andg respectively,tothenodes(edges)inthemergedgraph. 1 2 Usingthesemappings,wedefineupperapproximationsofallthegraphproperties: Ty#(n)= (cid:91) Ty#(n )∪ (cid:91) Ty#(n ) 3 1 1 2 2 n1∈η1−1(n) n2∈η2−1(n) Cd#(n)= ∑ Cd#(n )(cid:116) ∑ Cd#(n ) 3 1 1 2 2 n1∈η1−1(n) n2∈η2−1(n) Ij#(e)=(|η−1(e)|=|{n |n →−l n ∈η−1(e)}|) 3 1 2 1 2 1 ∧(|η−1(e)|=|{n |n →−l n ∈η−1(e)}|) 2 2 1 2 2 ∧ (cid:94) Ij#(e )∧ (cid:94) Ij#(e ) 1 1 2 2 e1∈η1−1(e) e2∈η2−1(e) Thesetoftypesassociatedwiththeresultisjusttheunionofalltypesabstractedbythe nodeinbothgraphs.Thecardinalityismorecomplicatedtocompute.Itcomputesthe 10