ogent C : Certified Compilation for a Functional Systems Language LiamO’Connor,ChristineRizkallah,ZilinChen,SidneyAmani,JaphethLim,YutakaNagashima, ThomasSewell,AlexHixon,GabrieleKeller,TobyMurray,GerwinKlein NICTA,Sydney,Australia UniversityofNewSouthWales,Australia fi[email protected] 6 1 0 2 Abstract tion[Sewelletal.2013],Cisnowalanguagewithwellunderstood We present a self-certifying compiler for the Cogent systems semanticsandexistingformalverificationinfrastructure. n language. Cogent is a restricted, polymorphic, higher-order, and If C is so great, why not verify C systems code directly? Af- a terall,thereisanevergrowinglistofsuccesses[Kleinetal.2009, J purelyfunctionallanguagewithlineartypesandwithouttheneed foratrustedruntimeorgarbagecollector.ItcompilestoefficientC 2014;Guetal.2015;Beringeretal.2015]inthisspace.Thereason 1 issimple:verificationofmanuallywrittenCprogramsremainsex- codethatisdesignedtointeroperatewithexistingCfunctions.The 2 pensive.Justashigh-levellanguagesincreaseprogrammerproduc- languageissuitedforlayeredsystemscodewithminimalsharing tivity,theyshouldalsoincreaseverificationproductivity.Certifying suchasfilesystemsornetworkprotocolcontrolcode. L] For a well-typed Cogent program, the compiler produces C ckoeymsptielaptiionnacohfiaevlainngguthaigsegwoaitlhfovrerCifiocgaetniot.n-friendlysemanticsisa code, a high-level shallow embedding of its semantics in Is- P abelle/HOL,andaproofthattheCcodecorrectlyimplementsthis Thestateoftheartforcertifiedcompilationofafullfeatured . functionallanguageisCakeML[Kumaretal.2014],whichcovers s embedding.Theaimisforproofengineerstoreasonaboutthefull c semantics of real-world systems code productively and equation- anentireMLdialect.Cogentistargetedatasubstantiallydifferent [ ally,whileretainingtheinteroperabilityandleannessofC. pointinthedesignspace.CakeMLincludesaverifiedruntimeand garbagecollector,whileCogentworkshardtoavoidthesesoitcan We describe the formal verification stages of the compiler, 1 beapplicabletolow-levelembeddedsystemscode.CakeMLcovers whichincludeautomatedformalrefinementcalculi,aswitchfrom v fullturing-completeMLwithcomplexsemanticsthatworkswell imperativeupdatesemanticstofunctionalvaluesemanticsformally 0 forcodewrittenintheoremprovers.Cogentisarestrictedlanguage justifiedbythelineartypesystem,andanumberofstandardcom- 2 oftotalfunctionswithintentionallysimplesemanticsthatareeasy piler phases such as type checking and monomorphisation. The 5 toreasonaboutequationally.CakeMLisgreatforapplicationcode; compiler certificate is a series of language-level meta proofs and 5 Cogentisgreatforsystemscode,especiallylayeredsystemscode per-program translation validation phases, combined into one co- 0 herenttop-leveltheoreminIsabelle/HOL. with minimal sharing such as the control code of file systems or . networkprotocolstacks.Cogentisnotdesignedforsystemscode 1 CategoriesandSubjectDescriptors F.3.2[LogicsandMeanings withclosely-coupled,cross-cuttingsharing,suchasmicrokernels. 0 ofPrograms]:SemanticsofProgrammingLanguages Cogent’smainrestrictionsarethe(purposeful)lackofrecursion 6 anditerationanditslineartypesystem.Theformerensurestotality, 1 Keywords verification,semantics,lineartypes whichisimportantforbothsystemscodecorrectnessaswellasfor : v asimpleshallowrepresentationinhigher-orderlogic.Thelatteris i 1. Introduction importantformemorymanagementandformakingthetransition X Imaginewritinglow-levelsystemscodeinapurelyfunctionallan- fromimperativeCsemanticstofunctionalvaluesemantics.Even r guageandthenreasoningaboutthiscodeequationallyandproduc- in the restricted target domains of Cogent, real programs will of a tively in an interactive theorem prover. Imagine doing this with- course contain some amount of iteration. This is where Cogent’s out the need for a trusted compiler, runtime or garbage collector integrated foreign function interface comes in: the engineer pro- andlettingthiscodeinteroperatewithnativeCpartsofthesystem, videsherownverifieddatatypesanditeratorinterfacesinCand includingyourownefficientlyimplementedandformallyverified usesthemseamlesslyinCogent,includinginformalreasoning. additionaldatatypesandoperations. Cogentisrestricted,butitisnotatoylanguage.Wehaveusedit Cogentachievesthisgoalbycertifiedcompilationfromahigh- toimplementtwoefficientfull-scaleLinuxfilesystems—acustom level, pure, polymorphic, functional language with linear types, Flash file system and an implementation of standard Linux ext2. specifically designed for certain classes of systems code. For a Weplantoreportontheexperiencewiththeseimplementationsin given well-typed Cogent program, the compiler will produce a separatework.Thefocusofthispaperiswhatcanbelearnedfrom high-level shallow embedding of the program’s semantics in Is- Cogentabouttheformalverificationofcertifyingcompilation. abelle/HOL [Nipkow and Klein 2014], and a theorem that con- In particular, this paper discusses in detail the following con- nectsthisshallowembeddingtotheCcodethatthecompilerpro- tributions: a) the self-certifying Cogent compiler and language; duces:anypropertyprovedoftheshallowembeddingisguaranteed b) the formal semantics of the Cogent language and the switch toholdforthegeneratedC. from imperative update semantics to functional value semantics The compilation target is C, because C is the language most formally justified by the linear type system (§3); c) the top-level existingsystemscodeiswrittenin,andbecausewiththeadventof compilercertificate(§4.1),whichisaseriesoflanguage-levelmeta toolslikeCompCert[Leroy2006,2009]andgcctranslationvalida- proofsandper-programtranslationvalidationphases;d)theverifi- cationstagesthatmakeupthecorrectnesstheorem(§4),including Cogentinputofthecompiler.Mostofthetheoremsassumethatthe automatedrefinementcalculi,formallyverifiedtypechecking,A- Cogentprogramiswell-typed,whichisdischargedautomatically normalisation, and monomorphisation; and e) the lessons learned inIsabellewithtypeinferenceinformationfromthecompiler. inthisprojectonfunctionallanguageformalisationandcompiler Thesolidarrowsontheright-handsideofthefigurerepresent correctnessproofs(§5). refinementproofsandthelabelsonthesearrowscorrespondtothe numbers in the following description. The only arrow that is not 2. Overview formallyverifiedistheonecrossingfromCcodeintoIsabelle/HOL atthebottomofFig.1—thisistheC-to-Isabelleparser[Tuchetal. Our aim in this paper is to build a self-certifying compiler from 2007],whichisamatureverificationtoolusedinanumberoflarge- CogenttoefficientCcode,suchthataproofengineercanreason scaleverifications.Asmentioned,itcouldadditionallybechecked equationally about its semantics in Isabelle/HOL and apply the by translation validation. We briefly describe each intermediate compilertheoremtoderivepropertiesaboutthegeneratedCcode. theorem,startingwiththeSimplcodeatthebottomofthefigure. Formally,thecertificatetheoremisarefinementstatementbetween Forwell-typedCogentprograms,weautomaticallyprove: theshallowembeddingandtheCcode.ThisgeneratedCcodecan becompiledbyCompCert.Italsofallsintothesubsetofthegcc 1. Theorem: The Simpl code produced by the C parser corre- translationvalidationtoolbySewelletal.[2013],whosetheorem sponds to a monadic representation of the C code. The proof wouldcomposedirectlywithourcompilercertificate.1 isgeneratedusinganadjustedversionoftheAutoCorrestool. Shallow embeddings are nice for the human user, but they do 2. Theorem:Themonadicprogramterminatesandisarefinement notprovidemuchsyntacticstructureforconstructingthecompiler ofthemonomorphicCogentdeepembeddingundertheupdate theorem.Therefore,thecompileralsogeneratesadeepembedding semantics. foreachCogentprogramtouseintheinternalproofchain.There 3. Theorem: If a Cogent deep embedding evaluates in the up- aretwosemanticsforthisdeepembedding.(1)aformalfunctional datesemanticsthenitevaluatestothesameresultinthevalue valuesemanticswhereprogramsevaluatetovaluesand(2)afor- semantics. This is a known consequence of linear type sys- malimperativeupdatesemanticswhereprogramsmanipulateref- tems[Hofmann2000],buttoourknowledgeitisthefirstmech- erencestomutableglobalstate. anisedproofofsuchaproperty,esp.forafull-scalelanguage. 4. Theorem:IfamonomorphicCogentdeepembeddingevaluates Isabelle/HOL in the value semantics then the polymorphic deep embedding ADTs ADT verification evaluatesequivalentlyinthevaluesemantics. Cogent Well-typedness cHoirgrehc-tlenveesls 5. Theorem:IfthepolymorphicCogentdeepembeddingevaluates Program in the value semantics then the Cogent shallow embedding Neat Shallow Embedding is HOL 7 evaluatestoacorrespondingshallowIsabelle/HOLvalue. 6 6. Theorem:TheA-normalshallowembeddingis(extensionally) Shallow Embedding is HOL equal in Isabelle/HOL to a syntactically neater shallow em- bedding,whichismoreconvenientforhumanreasoning.This 5 human-friendlyshallowembeddingcorrespondstotheCogent CCoomgpeinlet r generates Polymorphic Deep Embedding has Value Semantics codebeforethecompiler’sA-normalisationphase. 4 has Value Semantics Arrow7indicatesverificationofuser-suppliedabstractdatatypes Monomorphic Deep Embedding has 3 (ADTs)implementedinCandfurthermanualhigh-levelproofson Update Semantics topofthehuman-friendlyshallowembedding.Theseareenabled 2 generates has bytheprevioussteps,butarenotpartofthispaper. AutoCorres Monadic Code Monadic Semantics In§4wedefineinmoredetailtherelationsthatformallylinkthe 1 values(andstates,whenapplicable)thattheseprogramsevaluate Simpl Code has Simpl Semantics to.Steps(3)and(4)aregeneralpropertiesaboutthelanguageand C Parser generates wethereforeprovethemmanuallyonceandforall.Steps(1),(2), (5),and(6)aregeneratedbythecompilerforeveryprogram.The C proofforstep(1)isgeneratedbyAutoCorres.Forsteps(2)and(5) we define compositional refinement calculi that ease the automa- tionoftheseproofs.Step(6),thecorrectnessofA-normalisation, Figure1:Adetailedoverviewoftheverificationchain. is straightforward to prove via rewriting because at this stage we canalreadyuseequationalreasoning. Fig.1showsanoverviewoftheprogramrepresentationsgenerated by the compiler and the break-down of the automatic refinement 3. Language proofthatmakesupthecompilercertificate.Theprogramrepresen- tationsare,fromthebottomofFig.1:theCcode,thesemanticsof InthissectionweformallydefineCogent,includingitslineartype theCcodeexpressedinIsabelle/Simpl[Schirmer2006],thesame system, its two dynamic semantics — update and value — men- expressedasamonadicfunctionalprogram[Greenawayetal.2012, tionedearlierin§2,andtherefinementtheorembetweenthem.We 2014], a monomorphic A-normal deep embedding of the Cogent beginthesectionbywalkingthroughanexampleCogentprograms. program, a polymorphic A-normal deep embedding of the same, 3.1 Example anA-normalshallowembedding,andfinallya‘neat’shallowem- bedding of the Cogent program that is syntactically close to the Fig.2showsanexcerptofourCogentext2implementation.The exampleusesnotall,butmanyfeaturesofthelanguage. 1Atthetimeofwriting,Cogent’soccasionallylargerstackframesleadto ThefirstlineinFig.2showstheCogentsideoftheforeignfunc- gcc emitting memcpy() calls that, while conceptually straightforward to tioninterface.ItdeclaresanabstractCogentdatatypeExSt,im- handle,thetranslationvalidatordoesnotyetcover. plementedinC.Line2showsaparametricabstracttype,andline9 2 2016/1/22 1 type ExSt prim.types t (cid:70) U8 | U16 | U32 | U64 | Bool 23 ttyyppee OUpAtrraay=a<None () | Some a> types τ,ρ (cid:70) α | α! | () 4 type Node = #{mbuf:Opt Buf, ptr:U32, fr:U32, to:U32} | t | Tτm | τ→ρ 5 type Acc = (ExSt, FsSt, VfsInode) 6 type Cnt = (UArray Node, | (cid:104)Cτ(cid:105) | {f ::τ?}m 7 (U32, Node, Acc, U32, UArray Node) -> (Node, Acc)) fieldtypes τ? (cid:70) τ | τ 8 9 uarray_create: all (a :< E). (ExSt, U32) permissions P = {D,S,E} 10 -> <Success (ExSt, UArray a) | Err ExSt> kinds κ ⊆ P 11 12 ext2_free_branch: (U32, Node, Acc, U32) polytypes π ::= ∀(α::Kκ).τ 13 -> (Node, Acc, <Expd Cnt | Iter ()> modes m (cid:70) Read-only | Writable | Unboxed 14 ext2_free_branch (depth,nd,(ex,fs,inode),mdep) = 15 if depth + 1 < mdep typevariables (cid:51) α,β 16 then abs.typenames (cid:51) T,U 17 uarray_create[Node] (ex,nd.to-nd.fr) !nd 18 | Success (ex, children) => kindcontext ∆ (cid:70) α:Kκ 19 let nd_t { mbuf } = nd typecontext Γ (cid:70) x:τ 20 and (children, (ex, inode, _, mbuf)) = 21 uarray_map_no_break #{ 222423 faarcrc === (ceehxxitl,2d_rfiernnoede,e_b,rannocdhe__etn.tfrry,,mbuf), ∆(cid:96)Γ1w(cid:123)eakΓ2 for∆ea(cid:96)cxhi::τ∆,(cid:96)Γτw(cid:123)iea:KkΓ{D} 25 ... } !nd_t i i 2276 iannd(nndd,=(enxd_,tfs{,mbiunfode}), ∆(cid:96)Γ1(cid:123)Γ2(cid:1)Γ3 foreachi:∆(cid:96)τi:K{S} 28 Expd (children, ext2_free_branch_cleanup)) ∆(cid:96)x :τ,Γ ,Γ (cid:123)x :τ,Γ (cid:1)x :τ,Γ 29 | Err ex -> (nd, (ex,fs,inode), Iter ()) i i 1 2 i i 1 i i 2 30 else ... (overbarindicateslists,i.e.zeroormore) Figure2:Cogentexample Figure3:TypeStructureofCogent&structuralcontextoperations socannotbeaccessedagain.Thustheprogrammerissafetobind showsacorrespondingabstractfunctionuarray create(),also anewobjecttothesamenamend(online26)withoutworrying implementedinC.Notethatthisabstractfunctionispolymorphic, aboutnameshadowing.Line26showssurfacesyntaxforput,the withakindconstraintE(see§3.2)ontypeargumenta. dualtotake,whichre-establishesthembuffieldsintheexample. TheintegrationofsuchforeignfunctionsisseamlessontheCo- gentside,butnaturallyhasrequirementsonthecorrespondingC 3.2 TypesandKinding code.TheCsidemustrespecttheCogenttypesystem,and,forex- ample,keepallsharedstateinternaltotheabstracttypetocomply Wadler[1990]firstnotedthatlineartypescanbeusedasawayto with linearity constraints. It must also be terminating and imple- safelymodelmutablestateandsimilareffectswhilemaintaininga menttheuser-suppliedsemanticsthatappearinthecorresponding purelyfunctionalsemantics.Hofmann[2000]laterprovedWadler’s shallowembeddingoftheCogentprograminIsabelle/HOL—ide- intuitionbyshowingthat,foralinearlanguage,imperativeCcode allytheusershouldprovideaformalprooftodischargethecorre- canimplementasimpleset-theoreticsemantics.Weuselineartypes spondingassumptionofthecompilercertificatetheorem. fortworeasons:toensuresafehandlingofheap-allocatedobjects, Abstractfunctionscanbehigher-orderandprovidetheiteration withouttheneedforruntimesupport,andtoallowustoassignto constructs that are intentionally left out from core Cogent. E.g. Cogentprogramsasimple,equational,purelyfunctionalsemantics line 21, uarray map no break() implements a map iterator for implementedviamutablestateandimperativeeffects. arrays.Inourfilesystemapplicationswehavefounditsufficientto ThetypestructureandassociatedsyntaxofCogentispresented provideasmalllibraryofiteratorsfortypessuchasarrays.Wealso in Fig. 3. Our type system is loosely based on the polymorphic interfacedtoanexistingmaturered-blacktreeimplementation. λURALofAhmedetal.[2005].Werestrictthispolymorphismtobe ReturningtotheexampleinFig.2,lines3–7showbasictype rank-1 and predicative, in the style of ML, to permit easy imple- constructorsanddeclarationsofvariants,recordsandtuplesusing mentationbyspecialisationwithminimalperformancepenalty. typevariablesandtheprimitivetypeU32.Forinstance,typeCnt Toeaseimplementation,andtoeliminateanydirectdependency isdefinedasapairofUArray Nodeandafunctiontype.Typesin onaheapallocator,werequirethatallfunctionsbedefinedonthe Cogent are structural [Pierce 2002], i.e. two types with the same top-level. This eliminates the need for linear function types: any structurebutdifferentnamesareintensionallyequal. top-levelfunctioncanbesharedfreelybecausetheycannotcapture Moreover, line 17 calls the abstract polymorphic function anylocalvariables,letalonelinearones. uarray create(), instantiated with type argument Node. The We include a set of primitive integer types (U8, U16 etc.). !ndnotationtemporarilyturnsalinearobjectoftypeNodeintoa Records{f ::τ?}mcomprise(1)asequenceoffields f ::τ?,where read-onlyone(see§3.3.1).Thetwobasic,non-linearfieldstoand τisthetypeonaninaccessiblefield,and(2)amodem(see§3.3.3 frintypeNodecandirectlybeaccessedread-onlyusingprojec- and§3.2.1foramoredetaileddescription).Wealsohavepolymor- tionfunctions.Line18and29arepatternmatchesontheresultof phicvariants(cid:104)Cτ(cid:105),ageneralisedsumtypeinthestyleofOCaml, thefunctioninvocation.Line19showssurfacesyntaxforCogent’s the mechanics of which are briefly described in §3.3.2. Abstract lineartakeconstruct(see§3.3.3),accessingandbindingthembuf typesTτmarealsoparametrisedbymodes.Weomitproducttypes field of nd to the name mbuf (punning as in Haskell), as well as fromthispresentation;theyaredesugaredintounboxedrecords. bindingtherestoftherecordtothenamend t. Themostobvioussimilaritytoλ isouruseofkindstode- URAL The linear type system tracks that the field mbuf is logically termineifatypemaybefreelysharedordiscarded,asopposedto absent in nd t. It also tracks that nd on line 19 has been used, earlier linear type systems, such as that of Wadler [1990], where 3 2016/1/22 ∆(cid:96)τ: κ wouldbeanacceptableinstantiationofatypevariableofkind∅, K aswearefreetowaivepermissionsthatareincludedinakind.We ∆(cid:96)(): κKUnit ∆(cid:96)t: κKPrim ∆(cid:96)τ→ρ: κKFun canproveformallybystraightforwardruleinduction: K K K (α:K∆κ(cid:48))(cid:96)∈α∆: κκ⊆κ(cid:48)KVar (α:Kκ(cid:48))∆∈(cid:96)∆α!κ:⊆κbang(κ(cid:48))KVar! L∆e(cid:96)mτm:Kaκ1(cid:48).(Waiving rights). If ∆ (cid:96) τ :K κ and κ(cid:48) ⊆ κ, then K K Thisresultallowsforasimplekind-checkingalgorithm,notimme- foreachi:∆(cid:96)τ : κ i K KVariant diatelyapparentfromtherules.Forexample,themaximalkindof ∆(cid:96)(cid:104)C τ(cid:105): κ anunboxedstructurewithtwofieldsoftypeτ andτ respectively i i K 1 2 m: κ(cid:48) κ⊆κ(cid:48) m: κ(cid:48) κ⊆κ(cid:48) canbecomputedbytakingtheintersectionofthecomputedmaxi- foreaKchi:∆(cid:96)τ : κ foreachτKnottaken:∆(cid:96)τ : κ malkindsofτ1 andτ2.Thisresultensuresthatthisintersectionis ∆(cid:96)Tτim:KiκK KAbs ∆(cid:96)i{fi ::τ?i}m:Kκ i K KRec alsoavalidkindforτ1andτ2. 3.2.1 KindingforRecordsandAbstractTypes m:Kκ Recall that Cogent may be extended with abstract types, imple- Read-only:K{D,S} Writable:K{E} Unboxed:K{D,S,E} mentedinC,whichwewriteasTτiminourformalisation.Weal- lowabstracttypestotakeanynumberoftypeparametersτ,where i bang(·):τ→τ eachspecificinstancecorrespondstoadistinctCtype.Forexam- ple,aListabstracttype,parameterisedbyitselementtype,would bang(α) = α! correspondtoafamilyofCListtypes,eachonespecialisedtoa bang(α!) = α! particularconcreteelementtype.Becausetheimplementationsof bang(()) = () thesetypesareusersupplied,theuserisfreetospecialiseimple- bang(t) = t mentationsbasedonthesetypeparameters,forexamplerepresent- bang(Tτ m) = Tbang(τ)bang(m) ing an array of boolean values as a bitstring, so long as they can bang(τ→i ρ) = τ→ρ i showthateverydifferentoperationimplementationisarefinement ofthesameuser-suppliedCDSLsemanticsforthatoperation. bang((cid:104)C τ(cid:105)) = (cid:104)C bang(τ)(cid:105) i i i i Values of abstract types may be represented by references to bang({fi ::τ?i}m) = {fi ::bang(τ?i)}bang(m) heap data structures. Specifically, an abstract type or structure is stored on the heap when its associated storage mode m is not bang(·):κ→κ “Unboxed”.Forboxedrecordsandabstracttypes,thestoragemode (cid:40) distinguishes between those that are “Writable” vs. “Read-only”. κ if{D,S}⊆κ bang(κ) = The same is true for record types, written {f ::τ?} m, which are {D,S} otherwise discussedinmoredetailin§3.3.3. bang(·):m→m The storage mode m affects the maximal kind that can be as- signed to the type. For example, an unboxed structure with two bang(Read-only) = Read-only components of type U8 is freely shareable, but if the structure is bang(Writable) = Read-only insteadstoredontheheap,thenawritablereferencetothatstruc- bang(Unboxed) = Unboxed turemustbelinear.Thus,thetypegiventosuchreferenceshasthe “Writable”mode,whosekindis{E},therebypreventingsucharef- Figure4:KindingrulesforCogenttypesandthebang(·)operator erencefrombeingassignedanonlinearkindsuchas{D,S}. 3.2.2 Kindingandbang LikeWadler[1990],weallowlinearvaluestobesharedread-only a type’s linearity is encoded directly into its syntactic structure. KindsinCogentaresetsofpermissions,denotingwhetheravari- in a limited scope. This is useful for practical programming in a languagewithlineartypes,asitmakesourtypesmoreinformative. ableofthattypemaybediscardedwithoutbeingused(D),shared Forexample,towriteafunctiontodeterminethesizeofa(linear) freelyandusedmultipletimes(S),orsafelyboundinalet!expres- bufferobject,anaiveapproachwouldbetowriteafunction: sion(E).Alineartype,valuesofwhichmustbeusedexactlyonce, hasakindthatexcludesDandS,andsoforbidsitbeingdiscarded size:Buf→U32×Buf orshared.Wediscusslet!expressionsin§3.2.2. Another similarity to λ is that we explicitly represent the This function has a cumbersome additional return value just so URAL that the linear argument is not discarded. Further, the type above context operations of weakening and contraction, normally rele- gatedtostructuralrules,asexplicitjudgements:∆ (cid:96) Γ w(cid:123)eak Γ(cid:48) for doesnotexpressthefactthattheinputbufferandoutputbufferare identical—thiswouldneedtobeestablishedbyadditionalproof. weakening (discarding assumptions) and ∆ (cid:96) Γ (cid:123) Γ1 (cid:1)Γ2 for Toaddressthisproblem,weincludeatypeoperatorbang(·),inthe contraction(duplicatingthem).Therulesforthesejudgementsare styleofWadler’s!operator,whichchangesallwritablemodesina presented in Fig. 3. For a typing assumption to be discarded (re- typetoread-onlyones.Thefulldefinitionofbang(·)isinFig.4. spectivelyduplicated),thetypemusthavekind{D}(resp.{S}). Wecanthereforewritethetypeofourfunctionas: ThefullkindingrulesforthetypesofCogentaregiveninFig.4. Basictypessuchas()orU8,aswellasfunctions,aresimplypassed size:bang(Buf)→U32 byvalueanddonotcontainanyheapreferences,sotheymaybe Foranyvalidtypeτ,thekindofbang(τ)willbenonlinear,which given any kind. Kinding for structures and abstract functions is meansthatoursizefunctionnolongerneedstobeencumberedby discussedshortlyin§3.2.1. theextrareturnvalue.Thiskindingresultisformallystatedas: Atypemayhavemultiplekinds,asanonlineartypeassumption may be used linearly, never being shared and being used exactly Lemma2(Kindingforbang(·)). Foranytypeτ,if∆(cid:96)τ: κthen K once. Therefore, a type with a permissive kind, such as {D,S}, ∆(cid:96)bang(τ): bang(κ). K 4 2016/1/22 let!(b) b(cid:48) = b in copy(b,b(cid:48)) would violate the invariants of the primops o ∈ {+,*,/,<=,==,||,<<,...} lineartypesystem,andruinthepurelyfunctionalabstractionthat literals (cid:96) ∈ {123,True,’a’,...} lineartypesallow,asbothbandb(cid:48)wouldrefertothesameobject, expressions e (cid:70) x | () | f[τ] | o(e) | e e andadestructiveupdatetobwouldchangetheshareableb(cid:48). 1 2 We are able to use the existing kind system to handle these | letx=e ine 1 2 safetycheckswiththeinclusionoftheEpermission,forEscapable, | let!(y)x=e1ine2 whichindicatesthatthetypemaybesafelyreturnedfromwithina | ife thene elsee let!. We ensure, via the typing rules of Fig. 6, that the left hand 1 2 3 | (cid:96) | castte | promote(cid:104)Cτ(cid:105)e sideofthebinding(okintheexample)hastheEpermission,which excludestemporarilynonlinearreferencesviabang(·)(seeFig.4). | casee ofCx→e elsey→e 1 2 3 OursolutionisaspowerfulasOdersky’s,butweencodetherestric- | esace | Ce tionsinthekindsystemdirectly,notasside-conditionconstraints | {f =e} | e.f | pute .f(cid:66)e thatrecursivelydescendintothestructureofthebinding’stype. 1 2 | takex{f =y}=e ine 1 2 3.3.2 TypingforVariants functiondef. d (cid:70) (cid:104)f ::π,f x=e(cid:105) | (cid:104)f ::π,(cid:4)(cid:105) programs P (cid:70) d Avarianttype(cid:104)Ciτi(cid:105)isageneralisedsumtype,whereeachalterna- tiveisdistinguishedbyauniquedataconstructorC.Theorderin functionnames (cid:51) f,g i whichtheconstructorsappearinthetypeisnotimportant.Onecan variables (cid:51) x,y create a variant type with a single alternative simply by invoking constructors (cid:51) A,B,C aconstructor,e.g.Some255mightbegiventhetype(cid:104)SomeU8(cid:105). recordfields (cid:51) f,g Theoriginalvalueof255canberetrievedusingtheesacconstruct. The set of alternatives is enlarged by using promote expressions primopType(·) : o→t×t (primoptypes) that are automatically inserted by the type-checker of the surface funDef(·) : f →d (definitionenvironment) language,whichusessubtypingtoinferthetypeofagivenvariant. |·| : t→N (maximumvalue) Asimilartrickisusedfornumericliteralsandcast. Inordertopatternmatchonavariant,weprovideacasecon- Figure5:SyntaxofCogentprograms(afterdesugaring) struct that attempts to match against one constructor. If the con- structor does not match, it is removed from the type and the re- ducedtypeisprovidedtotheelsebranch.Inthisway,atraditional Tointegratethistypeoperatorwithparametricpolymorphism,we multi-waypatternmatchcanbedesugaredbynesting: borrowatrickfromOdersky’sObservertypes[Odersky1992],and casexof tagtypevariablesthathavebeenmadereadonly,usingthesyntax casexof Aa→e α!.Wheneveravariableαisinstantiatedtosomeconcretetypeτ, Aa→ea becomes elsex(cid:48)→a casex(cid:48)of wealsoreplaceα!withbang(τ).Thelemmaaboveensuresthatour Bb→eb Bb→e kindingruleforsuchtaggedvariablesissound,andenablesusto Cc→ec elsex(cid:48)(cid:48)→b letc=esacx(cid:48)(cid:48)ine c provethefollowing: Notethatbecausethetypingruleforesaconlyapplieswhenonly Lemma 3 (Type instantiation preserves kinds). For any type τ, onealternativeremains,ourpatternmatchingisnecessarilytotal. α : κ (cid:96) τ : κ implies ∆ (cid:96) τ[ρ/α] : κ when, for each i, i K i K i i K ∆(cid:96)ρ : κ. 3.3.3 TypingforRecords i K i Some care is needed to reconcile record types and linear types. 3.3 ExpressionsandTyping Assume that Object is a type synonym for an (unboxed) record While Cogent features a rich surface syntax, due to space con- typecontaininganintegerandtwo(linear)buffers. straints, we only document the (full) core language in Fig. 5 to Object={size::U32,b ::Buf,b ::Buf}Unboxed whichthesurfacesyntaxisdesugared. 1 2 Fig.6showsthetypingrulesforCogentexpressions.Manyof Letussaywewanttoextractthefieldb fromanObject.Ifweex- 1 thesearestandardforanylineartypesystem.Wewilldiscusshere tractjustasingleBuf,wehaveimplicitlydiscardedtheotherbuffer therulesforlet!,wherewehavetakenaslightlydifferentapproach b .But,wecan’treturntheentireObjectalongwiththeBuf,as 2 to established literature, and the rules for the extensions we have thiswouldintroducealiasing.Oursolutionistoreturnalongwith madetothetypesystem,suchasvariantsandrecordtypes. the Buf an Object where the field b cannot be extracted again, 1 andreflectthisinthefield’stype,writtenasb :: Buf.Thisfield 1 3.3.1 Typingforlet! extractor,whosegeneralformistakex{f =y}=e ine ,operates 1 2 Ontheexpressionlevel,theprogrammercanuselet!expressions, asfollows:givenarecorde1,itbindsthefield f ofe1 tothevari- in the style of Wadler [1990], to temporarily convert variables of abley,andthenewrecordtothevariable xine2.Unlessthetype linear types to their read-only equivalents, allowing them to be ofthefield f haskind{S},thatfieldwillbemarkedasunavailable, freely shared. In this example, we wish to copy a buffer b onto ortaken,inthetypeofthenewrecordx. 2 abufferb onlywhenb willfitinsideb . Conversely,wealsointroduceaputoperation,which,givena 1 2 1 recordwithatakenfield,allowsanewvaluetobesuppliedinits let!(b1,b2)ok=(size(b2)<size(b1))in place.Theexpressionpute .f(cid:66)e returnstherecordine where ifokthencopy(b ,b )else ... 1 2 1 1 2 thefield f hasbeenreplacedwiththeresultofe .Unlessthetype 2 Notethateventhoughb andb areusedmultipletimes,theyare ofthefield f haskind{D},thatfieldmustalreadybetaken,toavoid 1 2 only used once in a linear context. Inside the let! binding, they accidentallydestroyingouronlyreferencetoalinearresource. have been made temporarily nonlinear. Our kind system ensures Unboxed records can be created using a simple struct literal these read-only, shareable references inside let! bindings cannot {f =e}. We also allow records to be stored on the heap to min- i i “escape” into the outside context. For example, the expression imise unnecessary copying, as unboxed records are passed by 5 2016/1/22 ∆;Γ(cid:96)e:τ ∆;Γ(cid:96)e :∗t i i ∆(cid:96)Γw(cid:123)eakx:τ (cid:96)<|t| primopType(o)=(t,t) ∆;Γ(cid:96)e:t(cid:48) |t(cid:48)|≤|t| Var Unit Literal i PrimOp Cast ∆;Γ(cid:96)x:τ ∆;Γ(cid:96)():() ∆;Γ(cid:96)(cid:96):t ∆;Γ(cid:96)o(e):t ∆;Γ(cid:96)castte:t i ∆(cid:96)Γ(cid:123)Γ1(cid:1)Γ2 funDef(f)=(cid:104)∀(αi::Kκi).τ→τ(cid:48), (cid:105) ∆(cid:96)Γ(cid:123)Γ1(cid:1)Γ2 ∆;Γ1(cid:96)e1:(cid:104)Aρ | Ciτi(cid:105) ∆;Γ (cid:96)e :ρ→τ ∆;Γ (cid:96)e :ρ foreachi:∆(cid:96)ρ : κ ∆;x:ρ,Γ (cid:96)e :τ ∆;y:(cid:104)C τ(cid:105),Γ (cid:96)e :τ 1 1 2 2 App i K i Fun 2 2 i i 2 3 Case ∆;Γ(cid:96)e e :τ ∆;Γ(cid:96) f[ρ]:(τ→τ(cid:48))[ρ/α] ∆;Γ(cid:96)casee ofAx→e elsey→e :τ 1 2 i i i 1 2 3 ∆;Γ(cid:96)e:τ ∆;Γ(cid:96)e:(cid:104)Bρ(cid:105) Bρ⊆Cτ ∆;Γ(cid:96)e:(cid:104)Cτ(cid:105) ∆;Γ(cid:96)e:∗τ Cons Prom Esac ∆;Γ(cid:96)Ce:(cid:104)Cτ(cid:105) ∆;Γ(cid:96)promote(cid:104)Cτ(cid:105)e:(cid:104)Cτ(cid:105) ∆;Γ(cid:96)esace:τ ∆(cid:96)Γw(cid:123)eak∅ ∆(cid:96)Γ(cid:123)Γ (cid:1)Γ ∆(cid:96)ρ: {E} ∆;Γ(cid:96)ε:∗εLε 1 2 K ∆(cid:96)Γ(cid:123)Γ (cid:1)Γ ∆;v :bang(τ),Γ (cid:96)e :ρ ∆(cid:96)Γ(cid:123)Γ (cid:1)Γ 1 2 i i 1 1 1 2 ∆;Γ1(cid:96)∆;eΓ1:(cid:96)ρletx=∆;ex1:inρ,eΓ22:(cid:96)τe2:τLet ∆;vi:∆τ;iv,iΓ:(cid:96)τil,ext!:(vρi,)Γx2=(cid:96)ee12i:nτe2:τLet! ∆;Γ1(cid:96)∆e;:Γτ(cid:96)eei∆:;∗Γτ2τ(cid:96)iei:∗τiLC ∆(cid:96)Γ(cid:123)Γ1(cid:1)Γ2 m(cid:44)Read-only ∆(cid:96)Γ(cid:123)Γ1(cid:1)Γ2 ∆(cid:96)ρ:K{S} ∆;Γ1(cid:96)e1:{gi ::τ?i,f ::ρ,gj ::τ?j}m m(cid:44)Read-only τ?k =ρ ∆;Γ1(cid:96)e1:{fi ::τ?i}m ∆;x:{g∆i;:Γ:τ(cid:96)?it,afk:e:xρ,{gfj=::yτ}?j=}me,yin:eρ,Γ:2τ(cid:96)e2:τTake1 ∆∆;;Γx(cid:96):t{afike::xτ{?if}m=,yy:}ρ=,Γe2(cid:96)inee2::ττ Take2 1 2 k 1 2 ∆(cid:96)Γ(cid:123)Γ1(cid:1)Γ2 m(cid:44)Read-only ∆(cid:96)Γ(cid:123)Γ1(cid:1)Γ2 m(cid:44)Read-only τ?k =ρ ∆;Γ1(cid:96)e1:{gi ::τ?i,f ::ρ,gj ::τ?j}m ∆;Γ2(cid:96)e2:ρPut1 ∆;Γ1(cid:96)e1:{fi ::τ?i}m ∆(cid:96)ρ:K{D} ∆;Γ2(cid:96)e2:ρPut2 ∆;Γ(cid:96)pute .f(cid:66)e :{g ::τ?,f ::ρ,g ::τ?}m ∆;Γ(cid:96)pute .f (cid:66)e :{f ::τ?}m 1 2 i i j j 1 k 2 i i ∆(cid:96){g ::ρ?,f ::τ,g ::ρ?}m: {S} i i j j K ∆;Γ1(cid:96)e1:{gi ::ρ?i,f ::τ,gj ::ρ?j}mMember ∆;Γ(cid:96)ei:∗τi Struct ∆;Γ(cid:96)e.f:τ ∆;Γ(cid:96){f =e}:{f ::τ}Unboxed i i i i Figure6:TypingrulesforCogent value.Theseboxedrecordsarecreatedbyinvokinganexternally- Lemma 4 (Type specialisation). α : κ;Γ (cid:96) e : τ implies i K i defined C allocator function. For these allocation functions, it ∆;Γ[ρ/α](cid:96)e[ρ/α]:τ[ρ/α]when,foreachi,∆(cid:96)ρ : κ. i i i i i i i K i is often convenient to allocate a record with all fields already taken, to indicate that they are uninitialised. Thus a function Theabovelemmaissufficienttoshowthemonomorphicinstanti- for allocating Object-like records might return values of type: ation case, by setting ∆ = ε (the empty context). This lemma is {size::U32,b ::Buf,b ::Buf}Writable. akeyingredientfortherefinementlinkbetweenpolymorphicand 1 2 Foranynonlinearrecord(thatis,(1)read-onlyboxedrecords, monomorphicdeepembeddings(See§4.5). which cannot have linear fields, as well as (2) unboxed records withoutlinearfields)wealsoallowtraditionalmembersyntaxe.f 3.4 DynamicSemantics for field access. The typing rules for all of these expressions are Fig.8definesthebig-stepevaluationrulesforthevaluesemantics giveninFig.6. of Cogent. The relation V (cid:96)e⇓ v states that under environment v V, the expression e evaluates to a resultant value v. These values 3.3.4 TypeSpecialisation aredocumentedinFig.7.Inmanyways,thesemanticsisentirely Asmentionedearlier,weimplementparametricpolymorphismby typical of a purely functional language, albeit with some care to specialising code to avoid paying the performance penalties of handle abstract function calls appropriately. This is intentional, other approaches such as boxing. This means that polymorphism since our goal is to automatically produce a purely functional inourlanguageisrestrictedtopredicativerank-1quantifiers. shallowembeddingfromthissemantics. Thisallowsustospecifydynamicobjects,suchasourvaluetyp- As functions must be defined on the top level, our function ingrelations(see§3.4.1)andourdynamicsemantics(see§3.4),in values (cid:104)(cid:104)λx. e(cid:105)(cid:105) consist only of an unevaluated expression, which termsofsimplemonomorphictypes,withouttypevariables.Thus, isevaluatedwhenthefunctionisapplied.Abstractfunctionvalues, in order to evaluate a polymorphic program, each type variable written(cid:104)(cid:104)abs.f |τ(cid:105)(cid:105),areinsteadpassedmoreindirectly,asapairof must first be instantiated to a monomorphic type. We show that thefunctionnameandalistofthetypesusedtoinstantiateanytype typingoftheinstantiatedprogramfollowsfromthetypingofthe variables.Whenanabstractfunctionvalue(cid:104)(cid:104)abs.f |τ(cid:105)(cid:105)isapplied, polymorphic program, if the type instantiation used matches the the user-supplied semantics (cid:126)f(cid:127) are invoked, which is simply a v kindsofthetypevariables. functionfrominputvaluetooutputvalue. 6 2016/1/22 ValueSemantics type τ. The sets r and w contain all pointers accessible from the value u that are read-only and writable respectively. We use this values v (cid:70) (cid:96) | () to encode the uniqueness property ensured by linear types as ex- plicit non-aliasing constraints in the rules for the correspondence | (cid:104)(cid:104)λx.e(cid:105)(cid:105) (functionvalues) relation, which are given in Fig. 9. Read-only pointers may alias | (cid:104)(cid:104)abs.f |τ(cid:105)(cid:105) (abstractfunctions) otherread-onlypointers,butwritablepointersdonotaliasanyother | Cv (variantvalues) pointer,whetherread-onlyorwritable. | {f =v} (records) Becauseourcorrespondencerelationincludestypes,itnaturally implies a value typing relation for both value semantics (written | a (abstractvalues) v v:τ)andupdatesemantics(writtenu|µ:τ[ro:rrw:w]).Infact, environments V (cid:70) x(cid:55)→v therulesforbothrelationscanbederivedfromtherulesinFig.9 abstractvalues a simply by erasing either the value semantics parts (highlighted v likethis)ortheupdatesemanticsparts(highlightedlike this).As (cid:126)·(cid:127) : f →(v→v) (abstractfunctionsemantics) weultimatelyprovepreservationforthis correspondencerelation v acrossevaluation,thissameerasurestrategycanbeappliedtoour UpdateSemantics proofstoproduceatypepreservationproofforeithersemantics. u.sem.values u (cid:70) (cid:96) | () Formalising uniqueness With this correspondence relation, we | (cid:104)(cid:104)λx.e(cid:105)(cid:105) (functionvalues) canproveourintuitionsaboutlineartypes.Forexample,thefollow- inglemma,whichshowsthatwedonotdiscardanyuniquewritable | (cid:104)(cid:104)abs.f |τ(cid:105)(cid:105) (abstractfunctions) referenceviaweakening,makesuseofthefactthatavalueisonly | Cu (variantvalues) givenadiscardabletypewhenitcontainsnowritablepointers. | {f =u} (records) Lemma5(Weakeningrespectsenvironmenttyping). | a (abstractvalues) u IfU|µ:V :Γ[ro:rrw:w]and(cid:96)Γw(cid:123)eakΓ(cid:48)thenthereexistsr(cid:48)⊆r | p (pointers) suchthat U|µ: V :Γ(cid:48) [ro:r(cid:48)rw:w]. environments U (cid:70) x(cid:55)→u Wealsoproveasimilarlemmaaboutourcontextsplittingjudge- pointers p setsofpointers r,w ment,whichusesthefactthatavalueisonlygivenashareabletype abstractvalues a stores µ: p(cid:57)u u whenitcontainsnowritablepointerstoconcludethatthetwoout- putcontextsgiveaccesstonon-aliasingsetsofwritablepointers. (cid:126)·(cid:127) : f →(u×µ→u×µ) (abstractfunctionsemantics) u Lemma6(Splittingrespectsenvironmenttyping). If U|µ: V :Γ [ro:rrw:w] and(cid:96) Γ (cid:123) Γ (cid:1)Γ thenthereexists Figure7:DefinitionsforValueandUpdateSemantics r ,r and w ,w where r=r ∪r and w1 =w2 ∪w , such that 1 2 1 2 1 2 1 2 U|µ: V : Γ [ro:r rw:w ] and U|µ: V : Γ [ro:r rw:w ] 1 1 1 2 2 2 and w ∩w =∅. The update semantics, by contrast, is much more imperative. 1 2 The semantic rules can also be found in Fig. 8, with associated Inaddition,weproveourmainintuitionaboutbang(·),necessary definitionsinFig.7.Thissemanticsisalsoanevaluationsemantics, forshowingrefinementforlet!expressions. writtenU (cid:96) e|µ ⇓ u|µ(cid:48) inthestyleofPierce[2002].Valuesin u Lemma7(bang(·)makeswritableread-only). theupdatesemanticsmaynowbepointers,writtenp,tovaluesina Ifu|µ:v:τ[ro:rrw:w]thenu|µ:v:bang(τ)[ro:r∪wrw:∅] mutablestoreorheapµ.Thismutablestoreismodelledasapartial functionfromapointertoanupdatesemanticsvalue. Dealingwithmutablestate Wedefineaframingrelationwhich MostoftherulesinFig.8onlydifferfromthevaluesemantics specifies exactly how evaluation may affect the mutable store µ. inthattheythreadthestoreµthroughtheevaluationoftheprogram. Given an input set of writable pointers w, an input store µ, an However,thekeydifferencesariseinthetreatmentofrecordsandof outputsetofpointersw andanoutputstoreiµ ,therelation,wriitten o o abstracttypes,whichmaynowberepresentedasboxedstructures, w |µ framew |µ ,ensuresthreepropertiesforanypointerp: i i o o storedontheheap.Inparticular,notethattheruleUPut2 destruc- Inertia Ifp(cid:60)w ∪w ,thenµ(p)=µ (p). tivelyupdatestheheap,insteadofcreatinganewrecordvalue,and i o i o thesemanticsofabstractfunctions(cid:126)·(cid:127)umayalsomodifytheheap. Leakfreedom Ifp∈wiandp(cid:60)wo,thenµo(p)=⊥. Freshallocation Ifp(cid:60)w andp∈w ,thenµ(p)=⊥. 3.4.1 Update-ValueRefinementandTypePreservation i o i Framingimpliesthatourcorrespondencerelation,forbothvalues In order to show that the update semantics is a refinement of the andenvironments,isunaffectedbyunrelatedstoreupdates: value semantics, we must exploit the information given to us by Cogent’slineartypesystem.Atypicalrefinementapproachtore- Lemma8(Unrelatedupdates). Assumetwounrelatedpointersets latethetwosemanticswouldbetodefineacorrespondencerelation w∩w =∅ andthat w |µframew |µ(cid:48),then 1 1 2 between update semantics states and value semantics values, and showthatanupdatesemanticsevaluationimpliesacorresponding • If u|µ: v:τ [ro:rrw:w] then u|µ(cid:48): v:τ [ro:rrw:w] and valuesemanticsevaluation.However,suchastatementisnottrue w∩w =∅. 2 ifaliasingexists,asadestructiveupdate(from,say,put)wouldre- • If U|µ: V :Γ [ro:rrw:w] then U|µ(cid:48): V :Γ [ro:rrw:w] sultinmultiplevaluesbeingchangedintheupdatesemanticsbut and w∩w =∅. 2 notnecessarilyinthevaluesemantics.Asourtypesystemforbids aliasingofwritablereferences,wemustincludethisinformationin Refinement and preservation With the above lemmas and defi- ourcorrespondencerelation.Writtenasu|µ:v:τ[ro:rrw:w], nitions,weareabletoproverefinementbetweenthevalueandthe this relation states that the update semantics value u with store µ update semantics. This of course requires us to assume the same corresponds to the value semantics value v, which both have the forthesemanticsgiventoabstractfunctions,(cid:126)·(cid:127) and(cid:126)·(cid:127) . v u 7 2016/1/22 V(cid:96)e⇓vv (Vx(cid:55)→(cid:96)xv)⇓∈vvVVVar V(cid:96)()⇓v()V() funDef(f)V=(cid:96)(cid:104)ff[τ::i]∀⇓(αvi(cid:104)(cid:104):λ:Kx.κei[)τ.iτ/α→i](cid:105)(cid:105)ρ,f x=e(cid:105)VFunC funDef(Vf)(cid:96)=f(cid:104)[τfi]::⇓∀v(α(cid:104)(cid:104)aib::sK.fκi|)τ.iτ(cid:105)(cid:105)→ρ,(cid:4)(cid:105)VFunA V(cid:96)e1⇓v(cid:104)(cid:104)λx.e(cid:105)(cid:105) V(cid:96)e1⇓v(cid:104)(cid:104)abs.f |τ(cid:105)(cid:105) V(cid:96)(cid:96)⇓v(cid:96)VLit V(cid:96)Vca(cid:96)sett⇓ev(cid:96)⇓v(cid:96)VCast V(cid:96)e2⇓Vvv(cid:96)(cid:48)e1(xe(cid:55)→2⇓vv(cid:48)v)(cid:96)e⇓vvVAppC V(cid:96)e2V⇓(cid:96)vev1(cid:48)e2(cid:126)f⇓(cid:127)vvvv(cid:48)=vVAppA forVea(cid:96)coh(ie:i)V⇓(cid:96)veoi((cid:96)⇓iv)(cid:96)iVPrimOp V(cid:96)e1⇓vv(cid:48) V(cid:96)e1⇓vv(cid:48) x(cid:55)→v(cid:48),V(cid:96)e2⇓vv VLet x(cid:55)→v(cid:48),V(cid:96)e2⇓vv VLet! V(cid:96)e⇓vv VCons V(cid:96)e⇓vCkv VProm V(cid:96)letx=e1ine2⇓vv V(cid:96)let!(y)x=e1ine2⇓vv V(cid:96)Ce⇓vCv V(cid:96)promote(cid:104)Ciτi(cid:105)e⇓vCkv VV(cid:96)(cid:96)caes1e⇓ev1CofvC(cid:48) x→xe(cid:55)→2evls(cid:48),eVy(cid:96)→e2e3⇓v⇓vvvVCase1 V(cid:96)e1V⇓(cid:96)vBcavs(cid:48)ee1oBfC(cid:44)xC→e2xel(cid:55)→se(yB→v(cid:48))e,3V⇓(cid:96)vve3⇓vvVCase2 VV(cid:96)(cid:96)esea⇓cveC⇓vvvVEsac foreachi:V(cid:96)ei⇓vvi VStr V(cid:96)e⇓v{fi =vi}VMem x(cid:55)→{fiV=(cid:96)vei}1,y⇓v(cid:55)→{fivk=,Vvi(cid:96)}e2⇓vvVTake V(cid:96)e1fo⇓rve{afcih=i(cid:44)vi}k:vV(cid:48)i =(cid:96)ev2i ⇓vv(cid:48)kVPut V(cid:96){fi =ei}⇓v{fi =vi} V(cid:96)e.fk⇓vvk V(cid:96)takex{fk =y}=e1ine2⇓vv V(cid:96)pute1.fk(cid:66)e2⇓v{fi =v(cid:48)i} U(cid:96)e|µ⇓uu|µ(cid:48) U(cid:96)e1|µ⇓uu(cid:48)|µ1 funDeUf(f(cid:96))=f[τ(cid:104)if]:|:µ∀⇓(αui(cid:104)(cid:104):λ:Kx.κei)[.τiτ/α→i](cid:105)ρ(cid:105),|fµx=e(cid:105)UFunC funDUef((cid:96)f)f[=τi(cid:104)]f|µ::⇓∀u(α(cid:104)(cid:104)ia:b:Ks.fκi|).τiτ(cid:105)(cid:105)→|µρ,(cid:4)(cid:105)UFunA Ux(cid:96)(cid:55)→letux(cid:48),=Ue(cid:96)1µin1e|2e2|µ⇓u⇓uu|uµ|2µ2ULet U(cid:96)e1|µ⇓u(cid:104)(cid:104)λx.e(cid:105)(cid:105)|µ1 U(cid:96)e1|µ⇓u(cid:104)(cid:104)abs.f |τi(cid:105)(cid:105)|µ1 U(cid:96)e1|µ⇓uu(cid:48)|µ1 U(cid:96)e2|µ1⇓uUu(cid:96)(cid:48)|eµ12e2(|xµ(cid:55)→⇓uu(cid:48)u)|(cid:96)µe3|µ2⇓uu|µ3UAppC U(cid:96)e2|µ1U⇓u(cid:96)ue(cid:48)1|µe22|(cid:126)µf⇓(cid:127)uuu(u|(cid:48)µ,µ32)=(u,µ3)UAppA U(cid:96)xle(cid:55)→t!(uy)(cid:48),xU=(cid:96)eµ11in|ee22⇓|uµu⇓|uµu2|µ2ULet! U(cid:96)promUo(cid:96)tee(cid:104)|Cµi⇓τiu(cid:105)Cek|µu⇓|µu(cid:48)Cku|µ(cid:48)UProm UU(cid:96)(cid:96)esea|cµe⇓|uµC⇓uuu|µ|(cid:48)µ(cid:48)UEsac U(cid:96){fiU=(cid:96)eeii}||µµ⇓⇓uu∗{ufii|=µ(cid:48)ui}|µ(cid:48)UStr UU(cid:96)e(cid:96)|eµ.fk⇓|uµ{f⇓iu=ukui|}µ|(cid:48)µ(cid:48)UMem1 U(cid:96)e1|µ⇓uCu(cid:48)|µ1 U(cid:96)e1|µ⇓uBu(cid:48)|µ1 B(cid:44)C U(cid:96)e|µ⇓u p|µ(cid:48) U(cid:96)caseex1o(cid:55)→fCu(cid:48),xU→(cid:96)eµ21e|lsee2y⇓u→ue|3µ|2µ⇓uu|µ2UCase1 U(cid:96)casexe(cid:55)→1o(fBCux(cid:48))→,Ue(cid:96)2ee3ls|eµy1→⇓ueu3||µµ2⇓uu|µ2UCase2 Uµ(cid:96)(cid:48)(ep.f)k=|µ{f⇓iu=ukui|}µ(cid:48)UMem2 U(cid:96)e1|µ⇓u{fi =ui}|µ1 U(cid:96)e1|µ⇓u{fi =ui}|µ1 U((cid:96)xx(cid:55)→|µu)⇓∈uuU|µUVar xU(cid:55)→(cid:96)t{afike=xu{if}k,y=(cid:55)→y}u=k,eU1(cid:96)inee22||µµ1⇓⇓uuuu||µµ22UTake1 U(cid:96)Ue2(cid:96)|pµu1t⇓eu1.ufk(cid:48)k(cid:66)|µ2e2|fµor⇓eua{cfhi i=(cid:44)uk(cid:48)i:}u|(cid:48)iµ2=uiUPut1 U(cid:96)e1|µ⇓u p|µ1 µ1(p)={fi =ui} U(cid:96)e1|µ⇓u p|µ1 U(cid:96)e2|µ1⇓uu(cid:48)k|µ2 U(cid:96)Uca(cid:96)sett|µe|⇓µu⇓(cid:96)u|µ(cid:96)(cid:48)|µ(cid:48)UCast U(cid:96)xta(cid:55)→kepx,y{f(cid:55)→k =uky,}U=(cid:96)ee12in|µe12⇓|uµu⇓u|µu2|µ2UTake2 Uµ(cid:96)2(ppu)t=e1{f.fik=(cid:66)uei}2|µ⇓fuorpe|aµch2(ip(cid:44)(cid:66)k:{fui(cid:48)i==uu(cid:48)ii})UPut2 Figure8:CogentValueandUpdateSemantics(somestraightforwardrulesomittedforbrevity) Assumption1. Let f beanabstractfunctionwithtypesignature Inordertoproverefinement,wemustshowthateveryevaluation f ::∀(α :: κ).τ→τ(cid:48),andρ beaninstantiationofthetypevari- ontheconcreteupdatesemanticshasacorrespondingevaluationin i K i i ablesα suchthatforeachi,(cid:96)ρ : κ.Let u andvbeupdate-and theabstractvaluesemantics.WhileTheorem1alreadygetsusmost i i K i value-semanticsvaluessuchthat u|µ: v:τ[ρ/α] [ro:rrw:w]. ofthewaythere,westillneedtoprovethatthevaluesemanticscan i i Theuser-suppliedmeaningof f ineachsemanticsgives(cid:126)f(cid:127) v=v(cid:48) evaluatewhenevertheupdatesemanticsdoes. v and (cid:126)f(cid:127) (u,µ)=(u(cid:48),µ(cid:48)). Then, there exists r(cid:48)⊆r and w(cid:48) such u that u(cid:48)|µ(cid:48): v(cid:48):τ(cid:48)[ρ/α] [ro:rrw:w] and w|µframew(cid:48)|µ(cid:48). i i Lemma9(Upward-propagationofevaluation). Ifε;Γ(cid:96)e:τandU|µ:V : Γ[ro:rrw:w]andU (cid:96)e|µ⇓ u|µ(cid:48), Wefirstprovethatthecorrespondencerelationispreservedwhen u thenthereexistsavsuchthatV (cid:96)e⇓ v both semantics evaluate from corresponding environments. By v erasing one semantics, this becomes a type preservation theo- rem for the other. Due to space constraints, we omit the details ComposingthislemmaandTheorem1,wecannoweasilyprove of the proof in this paper, but the full proof is available in our ourdesiredrefinementstatement. Isabelle/HOLformalisation. Theorem 1 (Preservation of types and correspondence). If Theorem 2 (Value ⇒ Update refinement). If ε;Γ (cid:96) e : τ and ε;Γ (cid:96) e : τ and U|µ: V : Γ [ro:rrw:w] and V (cid:96)e⇓vv U | µ : V : Γ [ro: r rw: w] and U (cid:96) e | µ ⇓u u | µ(cid:48), then there and U (cid:96)e|µ⇓uu|µ(cid:48), then there exists r(cid:48)⊆r and w(cid:48) such that existsavaluevandpointersetsr(cid:48) ⊆randw(cid:48)suchthatV (cid:96)e⇓v v, u|µ(cid:48): v:τ [ro:r(cid:48)rw:w(cid:48)] and w|µframew(cid:48)|µ(cid:48). andu|µ(cid:48):v: τ[ro:r(cid:48)rw:w(cid:48)]andw|µframew(cid:48)|µ(cid:48). 8 2016/1/22 u|µ:v:τ[ro:rrw:w] (cid:96)<|t| RLit RUnit u|µ:v:τ[ro:rrw:w] Cτ∈CiτiRVariant (cid:96)|µ:(cid:96):t [ro:∅rw:∅] ()|µ:():()[ro:∅rw:∅] Cu|µ:Cv:(cid:104)Ciτi(cid:105)[ro:rrw:w] (cid:104)(cid:104)λx.e(cid:105)(cid:105)|µ:(cid:104)(cid:104)λ∅;x.xe:(cid:105)(cid:105)τ:(cid:96)τe→:ρρ[ro:∅rw:∅]RFunC (cid:104)(cid:104)abs.f |ρi(cid:105)(cid:105)fu|µnD:e(cid:104)(cid:104)fa(bf)s.=f |(cid:104)∀ρi((cid:105)α(cid:105)i::(:τK→κi)τ.(cid:48)τ)[→ρi/τα(cid:48)i,](cid:4)[(cid:105)ro:∅rw:∅]RFunA {fi =ui}|µ:u{fii|µ=:v∗i}v:i{:∗fiτi::[τrio}:Urnrbwo:xewd][ro:rrw:w]RRecU auau|µ|µ:A:aavv::ATTττUUnbnobxoexded[r[or:o:rrrwrw::ww]]RAU µ(p)=au p|µµ(p:){=fi{f=i v=i}u:i}{fi ::τui}i|Wµr:i∗tabvile:∗[rτoi:[rror:wr:r{wp}:∪w]w]RRecW pa|uµ|µ:a:Av:aTvτ:AWTrτitaWblreita[brole:r[rrow::r{rpw}:∪ww]]RAW µ(p)=au p|µµ(:p){f=i {=fivi=}:u{i}fi ::τiu}iR|eµad:∗-ovnil:y∗[τrio[:r{op:}r∪rwr:rw∅]:∅]RRecR pa|uµ|µ:a:Av:aTvτ:ARTeaτdR-oenaldy-o[nrloy:{[rpo}:∪rrrrww::∅∅]]RAR u|µ:∗v:∗τ? [ro:rrw:w] au|µ:A av:ATτm[ro:rrw:w] (rulesforabstracttypesareuserprovided) ε|µ:∗ε:∗ε[ro:∅rw:∅]RL1 U|µ:V:Γ[ro:rrw:w] u|µ:v:τ[ro:rrw:w] ui|µ:∗vi:∗τ?i [ro:r(cid:48)rw:w(cid:48)] foreachxi:τi∈Γ: w∩(r(cid:48)∪w(cid:48))=∅ w(cid:48)∩(r∪w)=∅ (xiui(cid:55)→|µui:)v∈i:Uτi [ro:(xrii(cid:55)→rw:viw)i∈]V uui|µ:∗vvi:∗ττ?i [ro:r∪r(cid:48)rw:w∪w(cid:48)] RL2 foreacUh|jµ,k:wVhe:rΓe[jr(cid:44)o:k(cid:83):iwrijr∩w(:rk(cid:83)∪iwwi]k)=∅REnv uuuii||µµ:∗:∗vvvii:∗:∗ττ?i τ[?iro[:rro(cid:48):rrw(cid:48)r:ww:(cid:48)]w(cid:48)]RL3 Figure9:ValueTypingandRefinement.Forvaluetypingrules,erase this textforvaluesemantics,andthistextforupdatesemantics. 4. Verification embedding,p aCprogram,µaCogentstoreandσaCstate.Then m With the formal semantics of Cogent available, this section de- wedefinecorrespondenceasfollows: If (∃r w. U | µ : V : Γ [ro: r rw: w]) and (µ,σ) ∈ R, then p scribeseachoftheproofstepsthatmakeupthecompilercertificate, m successfully terminates starting at σ; and after executing p , for depictedinFig.1in§2. m any resulting value v and state σ(cid:48), there exist µ(cid:48), u, and v such m that: 4.1 Top-LevelTheorem Westartbydescribingthetop-leveltheoremthatformstheprogram (µ(cid:48),σ(cid:48))∈R∧U (cid:96)e|µ⇓uu|µ(cid:48)∧V (cid:96)e⇓vMvrv∧Vrµ(cid:48)vmuvs certificate,emittedbythecompiler.Recallthatforawell-typedCo- Theorem3. GivenaCogentfunction f thattakes xoftypeτas gentprogram,thecompilerproducesCcode,ashallowembedding input,letp beitsgeneratedCcode,sitsshallowembedding,and inIsabelle/HOL,andarefinementproofbetweenthem. m e itsdeep embedding.Let v be anargument of p , andu andv WesayaCprogramcorrectlyimplementsitsCogentshallow m m betheupdateandvaluesemanticsarguments,ofappropriatetype, embedding if the following holds: (i) the C program terminates for f.Ifrisinjective,then with defined execution; and (ii) if the initial C state and Cogent storearerelated,andtheinputvaluesoftheprogramsarerelated, ∀µσ.Vrµv uvs −→ m thentheiroutputvaluesarerelated. correspondencerR(sv )(M re)(p v )UVΓµσ s e m m Thismeans,thecompilercorrectnesstheoremstatesthatavalue relationispreserved.Thisrelationisconcreteandcanbeinspected. whereU =(x(cid:55)→u),V =(x(cid:55)→ v),andΓ=(x(cid:55)→τ). In §3.4.1, we introduced a value typing relation between update This top-level refinement theorem additionally assumes that ab- semanticsandvaluesemantics.Ateachotherrefinementstagein stractfunctionsintheprogramadheretotheirspecificationandthat thefollowingsections,wewillintroduceafurtherrelationbetween theirbehaviourremainsthesamewhentheyaremonomorphised. valuesofthetworespectiveprograms.Bycomposingthesevalue Intuitively,thistheoremstatesthatforrelatedinputvalues,all relations,wegetthevaluerelationVbetweentheresultv ofthe m programsintherefinementchainevaluatetorelatedoutputvalues. Cprogram p andtheshallowembedding sbygoingthroughthe m Thiscanofcoursebeusedtodeducethatthereexistintermediate intermediateupdatesemanticsvalueuandvaluesemanticsresultv. programs through which the C code and its shallow embedding Notethattherelationin§3.4.1alsodependsonaCogentstoreµ. aredirectlyrelated.Theproofengineerdoesnotneedtocarewhat TheCstateandCogentstorearerelatedusingthestaterelationR, thoseintermediateprogramsare. definedindetailin§4.3. Letλe.M reandλv.M rv(definedin§4.5)betwofunctions e v 4.2 Well-typedness thatmonomorphiseexpressionsand(function)values,respectively, usingarenamefunctionrprovidedbythecompiler.Further,letR Before we present each refinement step, we briefly describe the beastaterelation,sashallowembedding,eamonomorphicdeep well-typingtheoremsthatareusedinthesesteps. 9 2016/1/22 The Cogent compiler proves, via an automated Isabelle/HOL shallowembeddingvshouldappearin.Aswithval-relitisdefined tactic, that the monomorphic deep embedding of the input pro- automaticallyforeachCogentprogram. gramiswell-typed.Specifically,thecompilerdefinesfunDef(·)in Givenval-relandtype-relforaparticularCogentprogram,the Isabelle/HOLandprovesthateachCogentfunction f x=eiswell- state relation R defines the correspondence between the store µ typedinaccordancewithitstypeasgivenbyfunDef(·).Polymor- overwhichtheCogentupdatesemanticsoperates,andthestateσ phic well-typing is derived generically in the monomorphisation ofthemonadicshallowembedding. proofin§4.5. Definition1(Monad-to-UpdateStateRelation). (µ,σ)∈Rifand Theorem4(Typing). Let f bea(monomorphic)Cogentfunction, onlyif:forallpointers pinthedomainofµ,thereexistsavaluev wherefunDef(f)=(cid:104)f ::τ→τ(cid:48),f x=e(cid:105).Thenε;x:τ(cid:96)e:τ(cid:48). intheappropriateheapofσ(asdefinedbytype-rel)atlocation p, suchthatval-relµ(p)vholds. Because,aswewillseein§4.3,provingrefinementrequiresaccess With R and val-rel, we define refinement generically between a tothetypingjudgementsforprogramsub-expressionsandnotjust monadic computation p and a Cogent expression e, evaluated forthetoplevel,theCogentcompileralsoinstructsIsabelletostore m under the update semantics. We denote the refinement predicate alloftheintermediatetypingjudgementsestablishedduringtype corres.BecauseRchangesforeachCogentprogram,weparame- checking.Thesetheoremsarestoredinatreestructure,isomorphic terisecorresbyanarbitrarystaterelationR.Itisparameterisedalso tothetypederivationtreefortheCogentprogram.Eachnodeisa bythetypingcontextΓandtheenvironmentU,aswellasbythe typingtheoremforasub-expressionoftheprogram. initial update semantics store µ and monadic shallow embedding stateσ. 4.3 FromCtoCogentMonomorphicDeepEmbedding Definition2. Monad-to-UpdateCorrespondence This section describes the first three transformations from Fig. 1 corresRe p U Γµσ= in§2.Inthefirststep,theCcodeisconvertedtoSimpl[Schirmer m (∃rw.U|µ:Γ[ro:rrw:w])−→ 2006] by the C-to-Isabelle parser [Tuch et al. 2007], used in the (µ,σ)∈R−→ seL4 project [Klein et al. 2009]. This step is kept as simple as possibleandmakesnoefforttoabstractfromthedetailsofC. (¬failed(pmσ)∧ (∀v σ(cid:48).(v ,σ(cid:48))∈results(p σ)−→ ThesecondstepinFig.1,whichisthefirstlinkintheformalre- m m m (∃µ(cid:48)u.U (cid:96)e|µ⇓ u|µ(cid:48)∧(µ(cid:48),σ(cid:48))∈R∧val-reluv ))) finementchain,appliesamodifiedversionoftheAutoCorrestool u m to produce a monadic shallow embedding of the C code seman- ThedefinitionstatesthatifthestaterelationRholdsinitially,then tics,andadditionallyprovesthattheSimplCsemanticsisarefine- the monadic computation p cannot fail and, moreover, for all m mentofthemonadicshallowembedding.WemodifyAutoCorresto executionsof p theremustexistacorrespondingexecutionunder makeitsoutputmorepredictablebyswitchingoffitscontrol-flow theupdatesemamnticsoftheexpressionesuchthatthefinalstates simplificationandforcingittoalwaysoutputtheshallowembed- arerelatedbyRandval-relholdsbetweentheirresults.AutoCorres ding in the nondeterministic state monad of Cock et al. [2008]. provesautomaticallythat:¬failed(p σ)−→results(p σ)(cid:44)∅. m m In this monad, computation is represented by functions of type state⇒(α×state)set×bool.Herestateistheglobalstateofthe RefinementProof TherefinementproofisautomaticinIsabelle, C program, including global variables, while α is the return-type drivenbyasetofsyntax-directedcorresrules,oneforeachCogent ofthecomputation.Acomputationtakesasinputtheglobalstate construct.TheproofproceduremakesuseofthefactthattheCo- andreturnsaset,results,ofpairswithnewstateandresultvalue. genttermisinA-normalformtoreducethenumberofcasesthat Additionallythecomputationreturnsaboolean,failed,indicating needtobeconsideredandtosimplifythehigher-orderunification whetheritfailed(e.g.whethertherewasundefinedbehaviour). problemsthatsomeoftheproofrulesposetoIsabelle. While AutoCorres was designed to facilitate manual reason- This refinement theorem does not need an explicit formal as- ing about C code, here we use it as the foundation for automati- sumptionofwell-typednessoftheCogentprogram.Theprooftac- callyprovingcorrespondencetotheCogentinputprogram.Oneof ticwillsimplyfailforprogramsthatarenotwell-typed. themainbenefitsAutoCorresgivesusisatyped memorymodel. Fig. 10 depicts two corres rules, one for expressions x that Specifically, the state of the AutoCorres monadic representation are variables and the other for let x = a in b. These correspond containsasetoftypedheaps,eachoftype32word ⇒ α,onefor respectively to the two basic monadic operations return, which eachtypeαusedontheheapintheCinputprogram. yieldsvalues,and>>=,forsequencingcomputations. Proving that the AutoCorres-generated monadic embedding Observe that the rule Corres-Let is compositional: to prove neverfailsimpliesthattheCcodeistype-andmemory-safe,and that let x = a in b corresponds to a(cid:48) >>= b(cid:48) the rule involves isfreeofundefinedbehaviour[Greenawayetal.2014].Weprove proving that (1) a corresponds to a(cid:48) and (2) that b corresponds non-failureasaside-conditionoftherefinementstatementfromthe tob(cid:48)wheneachareexecutedovercorrespondingresultsv andv u m AutoCorresshallowembeddingtotheCogentmonomorphicdeep (e.g. as yielded by a and a(cid:48) respectively). This compositionality embeddinginitsupdatesemantics,essentiallyusingCogent’stype significantlysimplifiestheautomationofthecorrespondenceproof. systemtoguaranteeCmemorysafetyduringexecution. ThetypingassumptionsofCorres-Letaredischargedbyappealing This refinement proof is the third step in Fig. 1. To phrase tothetypetheoremtreegeneratedbythecompiler(see§4.2). the refinement statement we first define how deeply-embedded The rules for some of the other constructs, such as take, put, Cogent values and types relate to their corresponding monadic and case, contain non-trivial assumptions about R and about the shallowly-embeddedCvalues.Thevalue-mappingiscapturedby typesusedintheprogram.OnceaprogramanditsRarefixed,a thevaluerelationval-rel,definedinIsabelle/HOLautomaticallyby setofsimplerrulesisautomaticallygeneratedbyspecialisingthe the Cogent compiler using ad hoc overloading. val-rel is defined genericcorresrulesforeachoftheseconstructstotheparticularR separatelyforeachCogentprogrambecausethetypesusedinthe andtypesusedintheinputprogram.Thisineffectdischargesthe shallowCembeddingdependonthoseusedintheCprogramas, non-trivial assumptions of these rules once-and-for-all, allowing e.g.,CstructsarerepresenteddirectlyasIsabelle/HOLrecords. theautomatedproofofcorrespondencetoproceedefficiently. The type relation type-rel is used to determine, for a Cogent Conceptually,therefinementproofproceedsbottom-up,starting value v of type τ, which typed heap in the state of the monadic withtheleaffunctionsoftheprogramandendingwiththetop-level 10 2016/1/22