Automated proof-producing abstraction of C code David Greenaway School of Computer Science and Engineering University of New South Wales Sydney, Australia Submittedinfulfilmentoftherequirementsforthedegreeof DoctorofPhilosophy August2014 ii Abstract Beforesoftwarecanbeformallyreasonedabout, itmustfirstberepresentedinsomeform oflogic. Therearetwoapproachestocarryingoutthistranslation: thefirstistogenerate anidealisedrepresentationoftheprogram, convenientforreasoningabout. Thesecond, saferapproachistoperformaprecise, conservativetranslation, atthecostofburdening verificationeffortswithlow-levelimplementationdetails. Inthisthesis, wepresentmethodsforbridgingthegapbetweenthesetwoapproaches. In particular, we describe algorithms for automatically abstracting low-level C code se- manticsintoahigherlevelrepresentation. Thesetranslationsincludesimplifyingprogram control flow, converting finite machine arithmetic into idealised integers, and translat- ing the byte-level C memory model to a split heap model. The generated abstractions are easier to reason about than the input representations, which in turn increases the productivityofformalverificationtechniques. Critically, weguaranteesoundnessbyau- tomatically generating proofs that our abstractions are correct. Previous work carrying out such transformations has either done so using unverified translations, or required significantmanualproofengineeringeffort. Our algorithms are implemented in a new tool named AutoCorres, built on the Isa- belle/HOLinteractivetheoremprover.Wedemonstratetheeffectivenessofourabstractions in a number of case studies, and show the scalability of AutoCorres by translating real- worldprogramsconsistingoftensofthousandsoflinesofcode. Whileourworkfocuses onasubsetoftheC programminglanguage, webelievemostofouralgorithmsarealso applicabletootherimperativelanguages,suchasJavaorC#. iii iv Publication List This thesis is partly based on work described in the following publications: • D. Greenaway, J. Andronick and G. Klein. ‘Bridging the Gap: Automatic Verified Abstraction of C’. In: Proceedings of the 3rd International Conference on Interactive Theorem Proving. Volume 7406. LNCS. 2012, pages 99–115. doi: 10.1007/978-3-642-32347-8_8. • D. Greenaway, J. Lim, J. Andronick and G. Klein. ‘Don’t Sweat the Small Stuff: Formal Verification of C Code Without the Pain’. In: Proceedingsofthe35thACM SIGPLANConferenceonProgrammingLanguageDesignandImplementation. 2014, pages 429–439. doi: 10.1145/2594291.2594296. v vi Acknowledgements I owe many thanks to my wonderful supervisors Gerwin Klein, June Andronick, and Kevin Elphinstone, for their wise advice, the hours willingly spent poring over my poorly writtendrafts,andtheirtimelywordsofencouragement. Iwouldalsoliketothankallthepeople—pastandpresent—intheNICTATrustworthy Systems group. You ensured that my PhD was never a lonely experience, but provided a constant source of laughter, ideas, and high quality coffee beans. I am particularly indebted to Andrew Boyton (a constant encouragement); Matthew Fernandez, Corey Lewis, and Rohan Ben Jacob Rao (my willing lab rats); Peter Gammie and Toby Murray (for their wise counsel); Japheth Lim (a partner in crime); Thomas Sewell (my oracle for all things Isabelle); and the countless other people who have helped bounce ideas and point me in the right direction. I would also like to thank Lars Noschinski and Christine Rizkallah for their valuable feedbackandboundlesspatiencewhileusingearlyversionsofAutoCorres. Finally, I would like to thank Hui Yee Greenaway—not only for her long hours of wading through the terrible prose in this thesis—but for her love, encouragement and support;andZoëGreenaway,forremindingmeofwhatisactuallyimportant. vii viii Contents 1 Introduction 1 1.1 From source code to logic 1 1.2 Thesis objectives and contributions 3 • Summary of thesis contributions 1.3 Document overview 5 2 Relatedwork 9 2.1 C verification 9 • Automatic verification of C • Semi-automatic verification of C • Interactive verification of C 2.2 Abstraction of low-level semantics 14 2.3 Summary 16 3 Background 17 3.1 The C programming language 17 • Features of C • Undefined and implementation-defined behaviour 3.2 The Isabelle/HOL theorem prover 22 • Interacting with Isabelle • Isabelle’s meta-logic • Notation 3.3 The Simpl language 26 3.4 Translating C into Isabelle/HOL 30 • Translation overview • Converting C types to Isabelle/HOL types • Generation of state types • Generation of Simpl 3.5 Summary 36 4 Fromdeeptoshallowembeddings 37 4.1 Reasoning in deep and shallow embeddings 37 4.2 Cock et al.’s monadic framework 39 • Introducing the state monad x contents • The state monad • Reasoning about the state monad • Modelling abrupt termination • Reasoning about the exception monad 4.3 Monadic loops 48 • Reasoning about the while-loop combinator 4.4 Converting Simpl to a monadic representation 58 • Proving conversion • Function calls and recursion 4.5 Structural simplifications of monadic programs 69 • Peephole optimisations • Exception elimination 4.6 Related work 75 4.7 Conclusion 75 4.8 Summary 76 5 Localvariablelifting 77 5.1 Lifting local variables out of the program’s state 77 • Analysing existing local variable usage • Utilising monadic return values • Generating an L2 specification • Proving correspondence between L1 and L2 • Proving the L2 specification 5.2 Further program optimisations 94 5.3 Type strengthening 95 5.4 Polishing and final theorem 100 5.5 Conclusion 102 5.6 Summary 103 6 Wordabstraction 105 6.1 Reasoning about word arithmetic 105 6.2 Word abstraction 107 6.3 Performing the abstraction 108 • High-level overview • Implementation in Isabelle/HOL 6.4 Word abstraction examples 114 • Maximum of two integers • Absolute value • Primality testing 6.5 Extending the rule set 122
Description: