Table Of Content

Closing the Gap – Formally Verifying Dynamically Typed Programs like Statically Typed Ones Using Hoare Logic – Extended Version – Björn Engelmann, Ernst-Ru¨diger Olderog, and Nils Erik Flick 5 1 University of Oldenburg, Germany 0 bjoern.engelmann@uni-oldenburg.de 2 ernst.ruediger.olderog@informatik.uni-oldenburg.de n flick@informatik.uni-oldenburg.de a J 2 Abstract. Dynamicallytypedobject-orientedlanguagesenableprogram- 1 mers to write elegant, reusable and extensible programs. However, with ] the current methodology for program verification, the absence of static L type information creates significant overhead. Our proposal is two-fold: P First, we propose a layer of abstraction hiding the complexity of dy- . namic typing when provided with sufficient type information. Since this s c essentially creates the illusion of verifying a statically-typed program, [ the effort required is equivalent to the statically-typed case. Second, we show how the required type information can be efficiently 1 v derived for all type-safe programs by integrating a type inference al- 9 gorithm into Hoare logic, yielding a semi-automatic procedure allowing 9 theusertofocusonthosetypingproblemsreallyrequiringhisattention. 6 While applying type inference to dynamically typed programs is a well- 2 establishedmethodbynow,ourapproachcomplementsconventionalsoft 0 typing systems by offering formal proof as a third option besides modi- 1. fyingtheprogram(statictyping)andacceptingthepresenceofruntime 0 type errors (dynamic typing). 5 1 : 1 Introduction v i X Dynamically typed programming languages refrain from restricting their pro- r grams to ensure operations are only applied to suitable operands. While this a allows experienced programmers to write more elegant, concise and reusable code, it has the obvious drawback that type errors may occur at runtime. Recently, object-oriented dynamically typed languages like Python, Ruby and JavaScript are gaining popularity also on the server-side (Ruby on Rails, node.js) and are used even for business- [27] and safety-critical [3] applications. Unfortunately, despite the growing need for correctness guarantees, the lack of type information causes a large overhead in formal methods like Hoare logic andseverelydecreasestheeffectivenessofautomaticreasoningenginescompared to the statically-typed setting (see Section 2.1). 2 There are two ways to deal with this problem: 1) Annotation: Most contemporary approaches to verifying dynamically typed programs ask the user to manually supply the needed type information in loop invariants and method contracts [12,23,29,21]. For larger programs, this inducessignificantoverhead.Wearguethatmanuallysupplyingtypeinformation for all variables is not only tedious, but also often unnecessary, as most of this information could have been inferred automatically. 2) Translation: Obviously, translating the dynamically typed program into anequivalentstaticallytypedversion1andthenusingaHoarelogicforstatically- typed programs (like [2,5]) for verification is also possible. In such a translation process, type inference algorithms like [17,11] are usually of significant help. Note, however, that gradual typing [26,4] it not useful in this context, as such Hoarelogicsrequirethetheentireprogramtobewell-typedpriortoverification. Additionally, this approach removes any benefit of dynamic typing since it is equivalent to verifying a statically typed language with type inference. We propose to get the best of both worlds by integrating an automatic type safety verifier with Hoare logic into a semi-automatic procedure and using the derived type information to reduce overhead and enable effective automated reasoningaboutdynamicallytypedprogramsjustlikewithstaticallytypedones. Inthecontextofsofttyping[6],ourapproachcanalsobeunderstoodasoffering proofs of type safety as a third option besides rewriting the program (static typing) and runtime-checks (dynamic typing). Concretely, in this paper we describe two components: 1) A layer of abstraction that, given suitable type information, abstracts from the complexities of dynamic typing and hence reduces the verification of dynamically typed programs to that of statically typed ones. This also works with partial type information on a per-expression basis (see Section 2.1). 2) A construction for complementing a Hoare logic with an automatic type safetyverifier,yieldingasemi-automaticprocedureforderivingtypeinformation with the following properties (see Section 2.4): – Automation – only typing problems beyond the reach of the automatic verifier require manual intervention. – CompletenessrelativetotheHoarelogic–iftheHoarelogiciscomplete,then type information can be derived for all typesafe programs (see Section 5.2). – Bidirectional exchange of results – automatically derived type information can be used in Hoare logic proofs and vice versa, proof results are used by the automatic verifier to increase precision. Together,thesetwocomponentsformanovelverificationsystemthatmakes the effort additionally required to verify a dynamically typed program propor- tional to the total complexity of hard typing problems in this program. Unlike 1 Afterthistranslation,thestatictypesystemshouldbeabletoensuretheabsenceof type errors, unlike in the embeddings discussed in [15]. Finding such an equivalent version is undecidable in general and hence requires manual effort (see Section 2.2) 3 in annotation-based-approaches, programs with only trivial typing problems re- quirenoadditionaleffortandunlikeintranslation-based-approaches,alltypesafe programs can be verified. This paper constitutes our first step towards connecting the (relative) com- plete Hoare logics [2,5] and advanced reasoning engines developed for statically- typed object-oriented languages with the advancing automatic type safety verifiers for dynamically typed languages [21,29,9,17]. In this extended version, proofs for all theorems and lemmas can be found in Appendix E. →− Notation p is a sequence p ,...,p where n is obvious from context or does →− 1 n →−→− not matter, {p} the smallest set containing all its elements and a b sequence concatenation. =∆ means “is defined as” and N =∆ {0,...,n}, N1 =∆ {1,...,n}. n n 2 Overview / Motivation We will first discuss how correctness proofs can be simplified using sufficient type information and then how this information can be derived. 2.1 Static- vs. Dynamically-typed Hoare Logic Apart from the additional need to establish type safety, there are other differ- encesbetweenHoarelogicfordynamicallytyped-andstaticallytypedlanguages (HL and HL ). The latter (like [2,5]), usually share a type system between d s programming- and assertion language: the assertion x > 8 denotes the set of states where the value of a numeric program variable x is larger than 8. In HL (like [12]) however, as types are not statically known, all variables are of d typeO(object).Theassertionx>8ishencemeaninglessas>isnotdefinedfor type O. In this setting, a similar set of states can be denoted by the assertion2 ∃i.N(x,i)∧i>8whichcanbeautomaticallyderivedfromx>8givensufficient type information (the fact that the object referenced by x always represents a number). Furthermore,HL usuallyincludeside-effect-free(pure)programexpressions s (e)intotheassertionlanguage,allowingefficientreasoningusingproofruleslike {q[u:=e]}u:=e{q} Here, q[u := e] denotes the substitution of all occurrences of a variable u by e in the assertion q. This rule allows directly deducing weakest preconditions over assignments like {x+5 > 8}x := x+5{x > 8} (1) by letting the expression e traverse the boundary between program and logic. In HL , this is not possible d since program expressions could have side-effects. While a subset of side-effect- free methods can be defined, identifying such pure expressions requires type information. Without it, establishing a property equivalent to (1) requires ≥ 6 rule applications. 2 the precise meaning of N(x,i) will be explained in Subsection 4.2. 4 This observation is given significance by the fact that usually most expressions only involve immutable data types like numbers and strings. Regarding them as side-effecting operations on general object-structures not only com- plicates proofs, but also significantly decreases the effectiveness of automated reasoning engines . For instance, assertions can often be efficiently established by SMT solvers over Presburger arithmetic while no similar decision procedure exists for arbitrary operations over general object-structures. Section4willshowhowtypeinformationcanbeusedtocountertheseprob- lems and create the illusion of proving a statically typed program. 2.2 Providing Type Information Sufficient type information for dynamically typed programs is uncomputable in general (Section 5.1). However, a number of good approximations exist [17,11] that we will refer to as automatic type safety verifiers. Itisknownthatmanydynamicallytypedprogramsonlyoccasionallydiverge from what would also be possible in static typing disciplines3 and consequently, that the output of such algorithms is usually sufficient for typing most of their subexpressions [17, Section 5][11, Section 6]. If the entire program can be typed by a sound automatic verifier, then HL could be applied. However, the whole point of dynamic typing is the pos- s sibility to go beyond the limits of such automatic procedures (type systems). Approachestoverifyingtheselanguagesthusmustalsobeabletooperateunder less ideal circumstances. The following example will illustrate this point. 2.3 The Evaluator Example Figure1depictsadynamicallytypedprogramevaluatingarithmeticexpressions. Whilecraftedtoprovideahardtypingproblem,itsuseofad-hocdatastructures is not uncommon in Ruby, Python or Javascript. The class Evaluator has two methods parse() and calc(). The former parsesastringandstorestheresultingparsetreeintheinstancevariable@tree, while the latter evaluates a given parse tree (defaulting to @tree) over a given environment (a mapping from variable names (strings) to integers). The example is hard to type because the parse trees are represented as ad- hoc constructions of nested lists. Numeric constants VALUE, VAR and OP in the first element distinguish value-, variable- and operation nodes. The types of the remaining list elements depend on these node types: the second element is numeric(thevalue)forvalue-nodes,astring(thevariablenametobelookedup in the environment) for var-nodes and numeric (representing the operation to beperformed)forop-nodes.Onlyop-nodesusenesting:furtherlistelementsare sub-parse-trees that are to be recursively evaluated to operands. 3 Advanced dynamic features like mixins, traits, method update and dynamic class hierarchiesincreasethecomplexityoftypeinference.However,inthispaperweaim tostudytheproblemofdynamictypinginisolationandleavethemasfuturework. 5 class Evaluator { method parse(str) { ... } method calc(env, tree = @tree) { if tree[0] = VALUE then tree[1] elseif tree[0] = VAR then env[tree[1]] elseif tree[0] = OP then if tree[1] = ADD then calc(env, tree[2]) + calc(env, tree[3]) elseif ... else nil fi else nil fi } } new Evaluator().parse(input).calc(ENV) Fig.1. Relevant part of the evaluator example source code Typing this example requires deducing precise types for heterogeneous lists frompropositions(liketree[0] = VALUE)abouttheirfirstelement.Tothebest of our knowledge there is no automatic procedure able to establish such impli- cations. Also note that the typing problem can be made even harder: allowing an arbitrary number of operands in op-nodes, returning strings instead of null, etc. This example will be used to demonstrate our technique. 2.4 Semi-Automation Filter x ∈ T high fo o Trusted Assumptions (cid:74) (cid:75) rP d e low = typing rey π ATyuptoemSaatfiecty ty Translation aL-o w Verifier T Fig.2. Overview of the concept In the concept depicted in Figure 2, the correctness proof is split into two “layers” (see Section 6.3). While the user (supported by a theorem prover) de- rives his proof in the higher layer, the lower layer contains type information and is created and modified solely by the automatic type safety verifier. For this purpose, the typings (ty) derived for the program π by the verifier are trans- latedintoproofs(seeSection6.2).Whiletheinformationcontainedinthislower layer proof is already useful for supporting the user’s higher-layer proof (see 6 Section 4), the user may at any time decide to refine it by deriving more precise type information in the higher layer. This information is filtered to make it interpretable for the verifier and then supplied as trusted assumptions to refine the lower-layer type information (see Sections 6.1, 6.3). Notethatderivingtypeinformationandusingthelayerofabstractionisnot a strict 2-step process. The latter requires the former only on a per-expression basis, allowing an interleaving of steps. Concretely, the layer of abstraction ap- plies to all expressions proven type-safe (see Section 4). Expressions with open typing problems may be included at any time by proving them type-safe. This interleaving is possible as our refinements are monotonic (see Section 6.3). 3 Setting 3.1 Model Languages: Dyn and Stat To explain our methodology in a setting facilitating formal proof we introduce a pair of minimalistic programming languages that differ only in the fact that one is dynamically typed (dyn) while the other uses a static type system with type inference (stat). Like their real-world siblings, the two are imperative, class-based object-oriented languages including inheritance, method renaming, dynamic dispatch and constructors. However, they do not support advanced dynamic features like a dynamic class hierarchy, method update or eval(). Syntax of dyn: −−−→ Prog (cid:51)π::=class S u∈V ,@x∈V ,C∈C,m∈M d −−−→ L I Class (cid:51)class ::=class C < C {meth} d −→ Meth (cid:51)meth ::=method m(u){S}|rename m m, Stmt (cid:51)S ::=S;S |e d d −→ −→ Expr (cid:51)e::=null|u|@x|this|e==e|e is a? C|e.m(e)|new C(e) d |u:=e|@x:=e|if e then S else S fi|while e do S od Syntax of stat: – coincides with dyn, except for −→ −→ Stmt (cid:51)S ::=S;S |e |u:=e .m(e )|u:=new C(e )|u:=e |@x:=e s ε ε ε ε ε ε |if e then S else S fi|while e do S od ε ε −→ Expr (cid:51)e ::=null|u|@x|this|e ==e c ∈Cnst ,op(e )∈Op s ε ε ε ε s ε s −→ |e is a? C|c |op(e )|if e then e else e fi ε ε ε ε ε ε Syntactic sugar in dyn: Basic data types in stat: e1⊕e2 =∆ e1.m⊕(e2) true,false ∈Cnsts, ¬:B(cid:55)→B∈Ops if e then S fi=∆ if e then S else null fi ∧,∨,→:B×B(cid:55)→B∈Ops false =∆ new bool(null), true =∆ false.not() 0,1,...∈Cnsts, =:N×N(cid:55)→B∈Ops 0≡new num(null), n=∆ (n−1).succ() +,∗,div:N×N(cid:55)→N∈Ops Fig.3. Syntax of dyn and stat Syntax: The syntax of both dyn and stat is depicted in Figure 3. In dyn, method bodies consist of statements (S) which in contrast to expressions (e) can contain sequential composition. Expressions are composed of null, the only 7 constant, local- and instance variables (prefixed with @), the self-reference this, operators for object identity and dynamic type checks, method- and constructor calls, assignments, conditionals and while loops. Note that equality (=) is desugaredtoa(class-specific)methodcall,whileobjectidentity(==)isabuild- in operation yielding true iff the two expressions refer to the same object (We stipulate null == null yields true). Each class except the predefined class object must specify a parent class whose methods are inherited. The inheritance relation must be acyclic. Every class thus transitively inherits from object. Inherited methods may be overwrit- ten or renamed (using rename). Like in actual dynamically typed languages, inheritance is mere code reuse and can be removed using an automatic expan- sion step [22]. Furthermore, we will assume this step to be completed and not concern ourselves any further with inheritance or renaming. Semantics: Both dyn and stat programs consist of a main statement S and sets of classes C, methods M and variables V = V (cid:93)V where V and V are L I L I the sets of local- and instance variables respectively. While each class C ∈ C has a subset of method declarations M ⊆M and instance variables V ⊆V , C C I every method C.m ∈ M has a subset of local variables V ⊆ V used in its C.m L method body S . V ={this,r}⊂V is a set of special variables. While this C.m S L referencesthecurrentobjectandisnotallowedtobeassignedtoinprograms,r holdstheresultofthelastevaluatedexpressionandcannotbeusedinprograms. Dyn’s value domain is the set of all objects Dd = DO and its type system is the lattice of union types represented as sets of class names {C ,...,C } ∈ 1 n T = 2C with the subset-ordering ⊆ (see Figure 4). The null value is con- d tained in every such type. Stat on the contrary distinguishes basic data types Ts = {O,N,B,S,L,M,...} and its value domain Ds ∼= (cid:85)TDT includes objects, numbers, booleans, strings, lists and finite maps. We omit definitions of states, state update etc. as they are standard. To keep track of instance-class relation- shipsweuseclassreferencesandforeveryclassCintroduceadistinctobjectρ C as well as a special instance variable @c such that o.@c=ρ iff o is an instance C of class C. Using @c in programs is not permitted. (cid:62) (cid:62)={num,...,C }=O n {C ,...,C }=O 1 n {num,bool} N B L {C }{C }...{C } 1 2 n {num}{bool}{list}{C1}{C2}...{Cn} ∅=Null ⊥=∅=Null ⊥ Fig.4. Type lattices of dyn (left) and stat (right) Comparing Dyn with Stat: Dyn is a pure object-oriented language (objects are the only values) while stat has basic data types. However, both provide the sameconstantsandpure(i.e.side-effect-free)operationsonthem.Dyndesugars 8 them to constructor and method calls (see Figure 3), while stat (like usual in →− statically typed languages) provides them build-in (c and op(e ) in Figure 3). ε ε Also, stat expressions are pure. Side-effects are only allowed in statements, which must only have pure subexpressions. This is not a restriction, as every dyn-expressioncanbetransformedintoasequenceof statstatementsbyrecur- sively (and in the order of evaluation) replacing subexpressions e by fresh local variables u and prepending the assignment u:=e;. Every stat program is also a dyn program that evaluates to (an object- orientedversionof)thesameresult.Theonlyreasonthattheoppositedirection does not hold is the language restriction imposed by stat’s static type system. Type Errors:Contrarytostat,whichrejectsprogramsdeemedunsafeatcom- pile time, dyn allows every syntactically correct program to be executed and raises type errors at runtime when – a method call is not supported by its receiver (in this arity) or – a condition of a conditional or while loop is not boolean While“messagenotunderstood”-errorsarefundamentallylinkedtotype-checking in class-based OO-languages, dynamically typed languages often allow condi- tions to be of arbitrary type. Nevertheless, the second error condition models a common error class where a built-in operation supports a fixed set of types. Many dynamically typed languages raise type errors when accessing variables prior to assignment. We will leave this as future work and consider all local (instance) variables to be initialized to null prior to method executions (on instantiation). Also, type errors are often treated as exceptions, allowing interception and handling. For simplicity, we will consider them as fatal. 3.2 Hoare Logic Thepresentationof dynandstat’sprogramlogicscloselyfollows[2,1].Westart by introducing the assertion language (Figure 5). Essentially, it is weak second →− order logic, extended with the same constants c , operations op( l ) and types ε used in stat. It will be used to reason about both dyn and stat, however. Assertions contain typed logical expressions (l). Such expressions consist of typed logical variables, local/instance program variables u/l.@x (of type O in dyn and of some type T∈T in stat / same, with l being of type O) including s this, typed constants and typed operations. Contrary to program expressions, logical expressions can access instance variables of objects other than this. Logical expressions may only occur as parts of well-typed equations. Fol- lowing [5], undefined operations like dereferencing a null value or accessing a sequence with an index out of bounds (l[n] with n ≥ |l|) yield a null value and equalityisnon-strictwithrespecttosuchvalues(null =null istrue)toensurea two-valuedlogic.Assertionsarebooleancombinationsofsuchequationsallowing quantification over finite sequences of elements of basic types. We also introduce the following abbreviation for making reasoning about runtime types more convenient: 9 Asrt (cid:51)p,q ::= l ∈T, T ∈T ∆ (cid:74) (cid:75) l ∈{C ,...,C } = l(cid:54)=null →[l.@c=ρ ∨...∨l.@c=ρ ] 1 n C1 Cn (cid:74) (cid:75) Thereadermayconvincehimself/herselfthatthefollowingimplicationshold: l ∈T ∧ l ∈T → l ∈T (cid:117)T l (cid:54)∈T → l ∈(cid:62)\T 1 2 1 2 (cid:74)l(cid:75)∈T ∨(cid:74)l(cid:75)∈T →(cid:74)l(cid:75)∈T (cid:116)T l(cid:74) (cid:75)=l →(cid:74)∃T(cid:75). l ∈T ∧ l ∈T 1 2 1 2 1 2 1 2 (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) SelecteddifferencesbetweentheHoare-styleaxiomaticsemanticsfordynand stat are contrasted in Figure 6. Whilethe semantics for stat are standard4, the rulesfordynweremodeledafter[12].OmittedrulesarelistedinAppendixB.In Hoaretriples{p}S{q},thespecialvariablerisonlyallowedinthepostcondition q and denotes the return value of S. The rules will be analyzed in the next section. 4 Layer of Abstraction Let us compare the proof rules given in Figure 6. Obviously, the dyn rules are more complicated than their stat counterparts. Analyzing their differences, one canidentifythreecorereasonswhyreasoningaboutdynamicallytypedprograms is more complex than reasoning about statically typed ones. 1. Type safety: In Figure 6, the parts ensuring type safety are marked . Such type safety preconditions are unnecessary in statically typed languages. 2. Mapping objects to values: Hoare logic for dynamically typed languages often uses predicates to map between program objects and logical values. For instance, the COND rule has to use the predicate B() to establish a correspon- dence between the program expressions e and the logical expression b of type B. Thisadditionallayerofindirectionnotonlyreducesreadability,butalsohinders substitutions for pure expressions (see next paragraph). 3. Side-effecting expressions:Inthestat-rulesASGNandCOND,purepro- gramexpressionse andb aredirectlyusedinlogicalassertions.Here,theclever ε ε design choice of a shared type system pays off. Unfortunately, dynamic typing forces us to relinquish this benefit, as the types of expressions are not statically known and impure expressions are ill-suited for logical reasoning. Observe also howdyn’sMETHrulemodelstheevaluationorderusingasequenceofinterme- diatepredicatesp ,whichwouldnotbenecessaryforpureexpressions.However, i sincedyntreatsoperationsasmethodcalls,theMETHruleneedstobeapplied even for pure operations like +,<,∧, etc, making properties of assignments and conditionals even more tedious to derive. The following sections will explain how the layer of abstraction mitigates these issues. 4.1 Type Safety Preconditions Like already mentioned, the fact that type errors are runtime events in dynamically typed languages gives rise to the following notion of correctness: 4 they closely follow other Hoare logics for statically typed languages [2,5,1] 10 p,q∈Asrt ::= l=l|¬p|p∧p|∃v:T∗.p T∈T s −→ l∈LExp ::=v|u|l.@x|null |this| if l then l else l fi |l=l||l||l[l]|c |op( l ) ε withtheusualabbreviations:p∨q=∆ ¬(¬p∧¬q),p→q=∆ ¬p∨q,p↔q=∆ p→q∧q→p, ∃v:T.p=∆ ∃v(cid:48) :T∗.|v(cid:48)|=1∧p[v/v(cid:48)[0]], ∀v:T.p=∆ ¬∃v:T.¬p Fig.5. Syntax of the assertion language Hoare logic rules for dyn stat RULE: Assignment (ASGN) {p}e{q[u:=r]} {p[u:=e ]}u:=e {p} ε ε {p}u:=e{q} RULE: Conditional (COND) ( type-safe partial correctness) {p}e{r∧bool test} {r∧b}S {q} {p∧b }S {q} {p∧¬b }S {q} 1 ε 1 ε 2 {r∧¬b}S2{q} {p} if bε then S1 else S2 fi {q} {p} if e then S else S fi {q} 1 2 where b is a predicate and bool test =∆ r ∈{bool} ∧B(r,b) (cid:74) (cid:75) RULE: Method Call (METH) {pi}ei{pi+1[ui :=r]} for i∈Nn {p}u:=u0.m(u1,...,un){q} {pn+1}u0.m(u1,...,un){q} {p[u0,...,un :=eε0,...,eεn]} {p0}e0.m(e1,...,en){q} u:=eε0.m(eε1,...,eεn){q} where u ∈V fresh,u (cid:54)∈var(e )∪change(e ) for all i,j ∈N . i L i j j n RULE: Recursion (REC) (dyn and stat) ( type-safe partial correctness) A(cid:96){p}S{q}, A(cid:96){p }begin local this,−→u :=v(cid:48),−→v ;S end{q },i∈N1 i i i i i i n p → v(cid:48) ∈{C },i∈N1 i i i n (cid:74) (cid:75) {p}S{q} where method m (−→u ){S }∈M , A≡{p }v(cid:48).m (−→v ){q },...,{p }v(cid:48).m (−v→){q }. i i i Ci 1 1 1 1 1 n n n n n Fig.6. Comparison of dynamically typed and statically typed Hoare logic rules

Closing the Gap -- Formally Verifying Dynamically Typed Programs like Statically Typed Ones Using Hoare Logic -- Extended Version -- PDF

0.64 MB·English

by Björn Engelmann

#journals #arxiv

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Closing the Gap -- Formally Verifying Dynamically Typed Programs like Statically Typed Ones Using Hoare Logic -- Extended Version --

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.