ebook img

Proving uniformity and independence by self-composition and coupling PDF

0.66 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Proving uniformity and independence by self-composition and coupling

Proving uniformity and independence by self-composition and coupling Gilles Barthe1, Thomas Espitau2, Benjamin Grégoire3 Justin Hsu4, and Pierre-Yves Strub5 1 IMDEA Software Institute 2 Sorbonne Universités, UPMC Paris 6 3 Inria 4 University of Pennsylvania 5 École Polytechnique 7 1 Abstract 0 Proof by coupling is a classical proof technique for establishing probabilistic properties of two 2 probabilistic processes, like stochastic dominance and rapid mixing of Markov chains. More recently, r couplingshavebeeninvestigatedasausefulabstractionforformalreasoningaboutrelationalproperties p A of probabilistic programs, in particular for modeling reduction-based cryptographic proofs and for verifying differential privacy. In this paper, we demonstrate that probabilistic couplings can be used for 1 verifying non-relational probabilistic properties. Specifically, we show that the program logic pRHL— whose proofs are formal versions of proofs by coupling—can be used for formalizing uniformity and ] L probabilistic independence. We formally verify our main examples using the EasyCrypt proof assistant. P . s 1 Introduction c [ Uniformityandprobabilisticindependencearetwoofthemostusefulandcommonlyencountered 2 propertieswhenanalyzingrandomizedcomputations. Uniformdistributionsareacentralbuilding v blockofrandomizedalgorithms. Arguablythesimplestnon-trivialdistribution—thecoinflip—is 7 7 a uniform distribution over two values. Given access to uniform samples, there are known 4 transformationsforconvertingthesamplestosimulatemorecomplexdistributions,likeGaussian 6 or Laplacian distributions. Conversely, turning samples from various non-uniform distributions 0 into uniform samples is an active area of research. . 1 Probabilistic independence is no less useful. The probability of a conjunction of independent 0 events can be decomposed as a product of probabilities of individual events, each which can 7 then be analyzed in isolation. Independent random variables are also needed to apply more 1 sophisticated mathematical tools, like concentration inequalities. : v Given these and other applications, it is not surprising that researchers have investigated i X different methods of reasoning about uniformity and independence. For instance, Pearl and Paz [1188] develop an axiomatic theory based on graphoids for modeling conditional independence r a in probability theory. However, proving uniformity and independence by program verification remains a challenging task. Most verification techniques for probabilistic programs do not treat these properties as first-class assertions, and rely on reasoning principles that are cumbersome to use. Often, the only way to prove uniformity or independence is to prove exact values for the probability of specific events. For example, consider a formal system for proving properties of the form Pr [E] = p, s m which capture the fact that the event E has probability p in the distribution (cid:74)ob(cid:75)tained by executing the randomized program s on some initial memory m (many existing systems use this idea, e.g. [99, 1133, 1155, 1177, 1199, 2200]). Suppose that we want to prove that a program variable x of some finite type A is uniformly distributed in the output distribution s . The only way to m show this property is to analyze the probability of each output: for ev(cid:74)er(cid:75)y a ∈ A, prove that Pr [x=a]= 1 . s m |A| (cid:74) (cid:75) Uniformityandindependencebycouplings G.Barthe,T.Espitau,B.Grégoire,J.Hsu,P.-Y.Strub Forindependence,thesituationissimilar. Assumethatwewanttoprovethatthetwoprogram variables x and y of respective types A and B are (probabilistically) independent in the output distribution s . Thiscanbedonebyexhibitingfunctionsf,g,hsuchthatforeverya∈Aand m b ∈ B, we h(cid:74)av(cid:75)e: Pr [x=a] = f(a), Pr [y =b] = g(b), Pr [x=a∧y =b] = h(a,b). s m s m s m Then, independence(cid:74)be(cid:75)tween x and y holds(cid:74)by(cid:75) proving that h(a,b)(cid:74)=(cid:75) f(a)·g(b) for every a∈A and b∈B. While these approaches work in theory, they can be laborious in practice. It may be awkward to express the probability of x=a, and the functions f, g and h may be difficult to produce. The main contribution of this paper is an alternative method based on probabilistic couplings for proving uniformity and independence. Probabilistic couplings are a classical method for proving sophisticated probabilistic properties (e.g., stochastic dominance, rapid mixing of Markov chains, and more [1166, 2211, 2222]). More recently, couplings have been used to reason about relational properties of probabilistic programs, notably differential privacy [55, 66]. Here we show that uniformity and independence properties can also be verified using coupling, despite being non-relational properties. As a consequence, our verification method inherits the many advantages of reasoning by couplings: compositional reasoning, and no need to reason directly about probabilistic events. Concretely, we show how uniformity and independence can be captured in the relational program logic pRHL [11]. Insummary,ourmaincontributionsarenovelmethodstoproveuniformityandindependence properties of probabilistic programs. We prove the soundness of the methods and demonstrate their usefulness on a class of case studies. Detailed Contributions Uniformity. Suppose we have a program s with a program variable x ranging over a finite set A, and we want to show that x is distributed uniformly over A after executing s. Rather than computing the probability of Pr [x=a] for each a ∈ A, it suffices to show that the s m probabilities of any two outputs are eq(cid:74)u(cid:75)al: ∀a ,a ∈A. Pr [x=a ]=Pr [x=a ]. 1 2 s m 1 s m 2 (cid:74) (cid:75) (cid:74) (cid:75) Now, we can view uniformity as a relational property: if we consider two runs of s, then the probability of x being a in the first run should be equal to the probability of x being a in the 1 2 second run. In pRHL, this property is described by the following judgment: ∀a ,a ∈A. (cid:15)s∼s : φ=⇒ x(cid:104)1(cid:105)=a ⇐⇒ x(cid:104)2(cid:105)=a 1 2 1 2 where the assertion φ asserts that the initial states are equal. Independence. Proving probabilistic independence is more involved. We show how to prove independence in two different ways. Assume that we want to prove that the program variables x and y of respective finite types A and B are independent. First, if the distribution of (cid:104)x,y(cid:105) is uniformly distributed over A×B, then x and y are independent (and are themselves uniformly distributed). Indeed,assumethatforalla∈Aandb∈BwehavePr [x=a∧y =b]= 1 . s m |A|·|B| Then we have Pr [x=a]=(cid:80) Pr [x=a∧y =b]= 1 .(cid:74)A(cid:75) similar argument applies s m b∈B s m |A| to the probabilit(cid:74)y(cid:75)that y = b, from wh(cid:74)ic(cid:75)h independence follows. Thus, our first method of proving independence is by reduction to proving uniformity. This approach is simple to use, but it only applies to proving independence of uniform random variables. A more expressive, but also slightly more complicated approach is to express 2 Uniformityandindependencebycouplings G.Barthe,T.Espitau,B.Grégoire,J.Hsu,P.-Y.Strub probabilistic independence as a property of a modified version of the program, without any requirement on uniformity. More specifically, independence of x and y can be derived from the equality between the probabilities of x=a∧y =b and x =a∧y =b, where in the first case 1 2 the probability is taken over the output of the original program s, and in the second case the probability is taken over the output of the program s ;s , where s and s are renamings of s 1 2 1 2 (we call s ;s a self-composition of s [22, 1122]). The reason is not hard to see. Since the composed 1 2 programs operate on disjoint memory, the final combined output distribution models two independentrunsoftheoriginalprograms. So,theprobabilityPr [x =a∧y =b]— wherem (cid:93)m isthedisjointunionoftwocopiesofm—isequaltot(cid:74)hs1e;sp2r(cid:75)omd1(cid:93)umct2of1Pr [2x =a] and Pr 1 [2y =b]. Since s and s are just renamed versions of the original pro(cid:74)sg1r(cid:75)amm1 s,1these probab(cid:74)isl2it(cid:75)ime2s ar2e in turn equa1l to Pr2 [x=a] and Pr [y =b] in the original program. s m s m Our encoding casts independen(cid:74)ce(cid:75) as a relational (cid:74)pr(cid:75)operty between a program s and its self-composition s ;s , a property which can be directly expressed in pRHL: 1 2 ∀a∈A,b∈B. (cid:15)s∼s ;s : φ=⇒ (x(cid:104)1(cid:105)=a∧y(cid:104)1(cid:105)=b) ⇐⇒ (x (cid:104)2(cid:105)=a∧y (cid:104)2(cid:105)=b) 1 2 1 2 where the precondition φ captures the initial conditions. We show that our approach extends to independence and conditional independence of sets of program variables. Outline Section 22 and Section 33 provide the relevant mathematical background and introduce the setting of our work. Section 44, Section 55 and Section 66 respectively address the case of uniformity, independence, and conditional independence. In each case we demonstrate our method using classic examples of randomized algorithms. We conclude the paper with a discussion of alternative techniques for verifying these properties. 2 Mathematical Background For the sake of simplicity, we restrict ourselves to discrete (countable) sub-distributions. Definition 1. A sub-distribution over a set A is defined by a mass function µ:A→(cid:82)+, which (cid:80) gives the probability of the unitary events a ∈ A. This mass function must be s.t. µ(a) a∈A is well-defined and its weight satisfies |µ|=(cid:52) (cid:80) µ(a)≤1. In particular, the support of the a∈A sub-distribution supp(µ)=(cid:52) {a∈A|µ(a)(cid:54)=0} is discrete. When |µ| is equal to 1, we call µ a distribution. We let (cid:68)(A) denote the set of sub-distributions over A. An event over A is a predicate over A. The probability of an event E in a sub-distribution µ, written Pr [E], is x∼µ (cid:80) defined as µ(x). {x∈A|E(x)} When working with sub-distributions over tuples, the probabilistic versions of the usual projections on tuples are called marginals. For distributions over pairs, we define the first and second marginals π (µ) and π (µ) of a distribution µ over A×B by π (µ)(a)=(cid:52) (cid:80) µ(a,b) 1 2 1 b∈B and π (µ)(b)=(cid:52) (cid:80) µ(a,b). We are now ready to formally define coupling. 2 a∈A Definition 2. Let A and A be two sets, and let Ψ ⊆ A × A . A Ψ-coupling for two 1 2 1 2 sub-distributions µ ,µ resp. over A and A is a sub-distribution µ ∈ (cid:68)(A ×A ) such that 1 2 1 2 1 2 π (µ)=µ and π (µ)=µ and supp(µ)⊆Ψ. We write (cid:74) (cid:104)µ &µ (cid:105) to denote the existence 1 1 2 2 Ψ 1 2 of a Ψ-coupling. In addition to the general definition, we shall also consider a special case of coupling: specifically, we say that (µ ,µ ) are f-coupled if f :A →A is a bijection such that µ (x)= 1 2 1 2 1 µ (f(x)) for every x∈A . In this case, we write f (cid:74) (cid:104)µ &µ (cid:105). 2 1 1 2 3 Uniformityandindependencebycouplings G.Barthe,T.Espitau,B.Grégoire,J.Hsu,P.-Y.Strub Previous works establish a number of basic facts about couplings, see e.g. Barthe et al. [11, 55], In particular, one useful consequence of couplings is that they can show that one event has smaller probability than another. Lemma 3 (Fundamentallemmaofcoupling). Let E and E be predicates over A and A , and 1 2 1 2 let Ψ=(cid:52) {(x ,x )|(x ∈E )⇒(x ∈E )}. If (cid:74) (cid:104)µ &µ (cid:105), then Pr [E ]≤Pr [E ]. 1 2 1 1 2 2 Ψ 1 2 x1∼µ1 1 x2∼µ2 2 One can immediately derive a variant of the lemma where ⇐⇒ and = are used in place of ⇒ and ≤ respectively. The following lemma provides a converse to the fundamental lemma of coupling in the special case where we are interested in proving the equality of two distributions. Lemma 4. For every µ ,µ ∈(cid:68)(A), the following are equivalent: 1 2 • µ =µ ; 1 2 • for every a∈A, Pr [x=a]=Pr [x=a]; x∼µ1 x∼µ2 • for every a∈A, (cid:74) (cid:104)µ &µ (cid:105) where Ψ =(cid:52) {(x ,x )|x =a ⇐⇒ x =a}; Ψa 1 2 a 1 2 1 2 • (cid:74) (cid:104)µ &µ (cid:105) where Ψ =(cid:52) {(x ,x )|x =x }. ΨA 1 2 A 1 2 1 2 We note that the third item (existence of liftings for pointwise equality) is often easier to establish than the last item (existence of lifting for equality), since one can choose the coupling for each possible value of a, rather than showing a single coupling for all values of a. 3 Setting We will work with a simple probabilistic imperative language. Probabilistic assignments are of the form x←$ g, which assigns a value sampled according to the distribution g to the program variable x. The syntax of statements is defined by the grammar: s::=skip|abort|x←e|x←$ g |s;s|if ethenselses|whileedos wherex,eandgrespectivelyrangeover(typed)variablesinX,expressionsinE anddistributions in D. To ensure that the set of states is countable, we require that there are finitely many variables X. As usual E is defined inductively from X and a set F of simply typed function symbols. In this paper, distributions used for sampling are either uniform distributions over a finite type A, or the Bernoulli distribution with parameter p, which we denote by Bern(p). We assume that expressions and statements are typed in the usual way. We assume we are given a set-theoretical interpretation for every type and operator of the language. We define a state as a type-preserving mapping from variables to values, and we let State denote the set of states. The set of states is equipped with the usual functions for reading and writing a value; we use m(x) to denote the value of x in m, and m[x:=v] to denote state update, in this case the state obtained from m by updating the value of x with v. One can equip (cid:68)(State) with a monadic structure, using the Dirac distributions δ for the x unit and distribution expectation (cid:69) [M(x)] for the bind, where x∼µ (cid:88) (cid:69) [M(x)]:x(cid:55)→ µ(a)·M(a)(x). x∼µ a The semantics of expressions and distribution expressions is parametrized by a state m, and is defined in the usual way where we require all distribution expressions to be interpreted as proper distributions (sub-distributions with weight 1). 4 Uniformityandindependencebycouplings G.Barthe,T.Espitau,B.Grégoire,J.Hsu,P.-Y.Strub skip =δ abort =(cid:48) m m m (cid:74) (cid:75) (cid:74) (cid:75) (cid:74)x←e(cid:75)m =δm[x:=(cid:74)e(cid:75)m] (cid:74)x←$ g(cid:75)m =(cid:69)v∼(cid:74)g(cid:75)m[δm[x:=v]] s ;s =(cid:69) [ s ] (cid:74) 1 2(cid:75)m ξ∼(cid:74)s1(cid:75)m (cid:74) 2(cid:75)ξ if ethens elses =if e then s else s 1 2 m m 1 m 2 m (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) whilebdos = lim (if b then s)[n];if b then abort m m (cid:74) (cid:75) n→∞ (cid:74) (cid:75) n times where s[n] (cid:44)s(cid:122);.(cid:125).(cid:124).;s(cid:123). Figure 1: Denotational semantics of programs Definition 5 (Semantics of statements). • The semantics s of a statement s w.r.t. to some initial state m is a sub-distribution m over states, and(cid:74)is(cid:75)defined by the clauses of Fig. 11. • The (lifted) semantics s of a statement s w.r.t. to some initial sub-distribution µ over µ states is a sub-distributi(cid:74)on(cid:75) over states, and is defined as s (cid:44)(cid:69) [ s ] µ∈(cid:68)(State). µ m∼µ m (cid:74) (cid:75) (cid:74) (cid:75) A basic and highly important property of probabilistic programs is termination. We say that a program s is lossless if for every initial memory m, | s | = 1. By now, there are m many sophisticated techniques for proving losslessness even (cid:74)fo(cid:75)r languages that allow both probabilistic sampling and non-determinism (including recent advances by Chatterjee et al. [1100, 1111], Ferrer Fioriti and Hermanns [1144]). These techniques are capable of showing losslessness for all of our examples (in some cases with a high degree of automation), so throughout the paper, we assume that all programs are lossless. This assumption is used in the rules of pRHL and the characterizations of uniformity and independence. 3.1 Self-Composition of Programs For every program s and n ∈ (cid:78), we let s(cid:104)n(cid:105) denote the n-fold self-composition of s, i.e. s(cid:104)n(cid:105) =(cid:52) s ;...,s , where each s is a copy of s where all variables are tagged with a superscript ı. 1 n ı In order to state the main property of self-composition, we define the self-composition of a state; given a state m, we define its n-fold self-composition m(cid:104)n(cid:105) as the state from X(cid:104)n(cid:105) to values, where X(cid:104)n(cid:105) =(cid:52) {xı | x ∈ X,1 ≤ ı ≤ n} such that for every x and ı, m(cid:104)n(cid:105)(xı) =(cid:52) m(x). Given a state m from X(cid:104)n(cid:105), we denote by m the ı-th projection of m. ı Proposition 6. For every program s and state m, we have (cid:89) Pr [∧ Eı]= Pr [E ] (cid:74)s(cid:104)n(cid:105)(cid:75)m(cid:104)n(cid:105) 1≤ı≤n ı 1≤ı≤n (cid:74)s(cid:75)m ı where the event Eı is defined by Eı(m(cid:48)(cid:104)n(cid:105))=(cid:52) E(π (m(cid:48))) for every ı and π is the projection from ı ı a self-composed state to its ı-th component. 3.2 Probabilistic Relational Hoare Logic Probabilistic Relational Hoare Logic (pRHL) is a program logic for reasoning about relational properties of probabilistic programs. Its judgments are of the form (cid:15) s ∼ s : φ =⇒ ψ, 1 2 5 Uniformityandindependencebycouplings G.Barthe,T.Espitau,B.Grégoire,J.Hsu,P.-Y.Strub where s and s are commands and the pre-condition φ and the post-condition ψ are relational 1 2 assertions, i.e. first-order formulae built over generalized expressions. The latter are similar to expressions, except that each variable is tagged with (cid:104)1(cid:105) or (cid:104)2(cid:105) to indicate the execution that it belongs to; we call the two executions left and right. Generalized expressions are interpreted w.r.t. a pair (m ,m ) of states, where the interpretation of the tagged variables x(cid:104)1(cid:105) and x(cid:104)2(cid:105) 1 2 are m (x) and m (x) respectively. We write (m ,m )(cid:15)φ to denote that the interpretation of 1 2 1 2 the assertion φ w.r.t. (m ,m ) is valid. 1 2 Definition 7. A judgment (cid:15) s ∼ s : φ =⇒ ψ is valid iff for every states m and m , 1 2 1 2 (m1,m2)(cid:15)φ implies (cid:74){(m(cid:48)1,m(cid:48)2)|(m(cid:48)1,m(cid:48)2)(cid:15)ψ} (cid:104)(cid:74)s1(cid:75)m1 &(cid:74)s2(cid:75)m2(cid:105). Fig. 22 presents the main rules of the logic; see Barthe et al. [11, 77] for the full system. The logic includes two-sided rules, which operate on both programs, and one-sided rules, which operate on a single program (left or right). The [Conseq] rule is the rule of consequence, and reflects that validity is preserved by weakening the post-condition and strengthening the pre-condition. The [Case] rule allows provingajudgmentbycaseanalysis; specifically, thevalidityofajudgmentwithpre-conditionΦ canbeestablishedfromthevalidityoftwojudgments,onewherethepre-conditionisstrengthened with Ξ and the other where the pre-condition is strengthened with ¬Ξ. The [Struct] rule allows replacing programs by provably equivalent programs. The rules for proving program equivalence are given in Fig. 33, and manipulate judgments of the form Φ(cid:96)c≡c(cid:48), where Φ is a relational assertion. The first rule ([While-Split]) splits a single loop into two loops (the first running while e(cid:48) is true, and the second running for the remaining iterations); this transformation is useful for selecting different couplings in different program iterations. The second rule ([Swap]) reorders two instructions, as long as they modify disjoint variables. This allows us to couple sampling instructions that may come from two different parts of the two programs. Movingontothetwo-sidedrules,the[Seq]ruleforsequentialcompositionsimplyreflectsthe compositional property of couplings. The [Assg] rule is standard. The [Rand] rule informally takes a coupling between the two distributions used for sampling in the left and right programs, and requires that every element in the support of the coupling validates the post-condition. The rule is parametrized by a bijective function f from the domain of the first distribution to the domain of the second distribution. This bijection gives us the freedom to specify the relation between the two samples when we couple them. The [Cond] rule states that two synchronized if statements can related if their respective branches are also related. The [While] rule is the standard while rule adapted to pRHL. Note that we require the guard of the two commands to be equal—so in particular the two loops must make the same number of iterations—and Φ plays the role of the while loop invariant as usual. Theone-sidedrulesAssg-L,Rand-L,Cond-LandWhile-Laresimilartwotheirtwo-sided variant, but only operate on the left program. The full system includes mirrored versions of each one-sided rule, for reasoning about the right program. Throughout the paper, we often assert that the left and the right copies of a state are equal. This is captured by the relational assertion EqMem=(cid:52) (cid:86) x(cid:104)1(cid:105)=x(cid:104)2(cid:105). We also often assert x∈X cross-equality on n-fold composition of states EqMem(cid:104)p(cid:105),(cid:104)q(cid:105) =(cid:52) (cid:86) xı(cid:104)1(cid:105)=x(cid:104)2(cid:105). x∈X,1≤ı≤p,1≤≤q 4 Uniformity Reasoningaboutprobabilisticprogramsoftenrequiresestablishingthatasetofprogramvariables (each ranging over a finite type) is uniformly distributed: 6 Uniformityandindependencebycouplings G.Barthe,T.Espitau,B.Grégoire,J.Hsu,P.-Y.Strub (cid:15)s ∼s : Φ=⇒ Ψ (cid:15)s ∼s : Φ=⇒ Ψ 1 2 1 2 Φ(cid:48) =⇒ Φ Ψ =⇒ Ψ(cid:48) Φ(cid:96)s ≡s(cid:48) Φ(cid:96)s ≡s(cid:48) Conseq Struct 1 1 2 2 (cid:15)s ∼s : Φ(cid:48) =⇒ Ψ(cid:48) (cid:15)s(cid:48) ∼s(cid:48) : Φ=⇒ Ψ 1 2 1 2 (cid:15)s ∼s : Φ∧Ξ=⇒ Ψ (cid:15)s ∼s : Φ=⇒ Ξ 1 2 1 2 (cid:15)s ∼s : Φ∧¬Ξ=⇒ Ψ (cid:15)s(cid:48) ∼s(cid:48) : Ξ=⇒ Ψ Case 1 2 Seq 1 2 (cid:15)s ∼s : Φ=⇒ Ψ (cid:15)s ;s(cid:48) ∼s ;s(cid:48) : Φ=⇒ Ψ 1 2 1 1 2 2 f (cid:74) (cid:104)g &g (cid:105) 1 2 Φ=(cid:52) Ψ[e (cid:104)1(cid:105)/x (cid:104)1(cid:105),e (cid:104)2(cid:105)/x (cid:104)2(cid:105)] Φ=(cid:52) ∀v.Ψ[v/x (cid:104)1(cid:105),f(v)/x (cid:104)2(cid:105)] Assg 1 1 2 2 Rand 1 2 (cid:15)x1 ←e1 ∼x2 ←e2 : Φ=⇒ Ψ (cid:15)x1 ←$ g1 ∼x2 ←$ g2 : Φ=⇒ Ψ Φ =⇒ e =e 1 2 (cid:15)s ∼s : Φ∧e =⇒ Ψs (cid:15)s(cid:48) ∼s(cid:48) : Φ∧¬e =⇒ Ψs(cid:48) Cond 1 2 1 1 2 1 (cid:15)if e thens elses(cid:48) ∼if e thens elses(cid:48) : Φ=⇒ Ψ 1 1 1 2 2 2 (cid:15)s ∼s : Ψ∧e (cid:104)1(cid:105)∧e (cid:104)2(cid:105)=⇒ Ψ∧e (cid:104)1(cid:105)=e (cid:104)2(cid:105) While 1 2 1 2 1 2 (cid:15)whilee dos ∼whilee dos : Ψ∧e (cid:104)1(cid:105)=e (cid:104)2(cid:105)=⇒ Ψ∧¬e (cid:104)1(cid:105)∧¬e (cid:104)2(cid:105) 1 1 2 2 1 2 1 2 Φ=(cid:52) Ψ[e (cid:104)1(cid:105)/x (cid:104)1(cid:105)] Φ=(cid:52) ∀v ∈supp(g ),Ψ[v /x (cid:104)1(cid:105)] Assg-L 1 1 Rand-L 1 1 1 1 (cid:15)x1 ←e1 ∼skip : Φ=⇒ Ψ (cid:15)x1 ←$ g1 ∼skip : Φ=⇒ Ψ (cid:15)s ∼s : Φ∧e (cid:104)1(cid:105)=⇒ Ψ 1 2 1 (cid:15)s(cid:48) ∼s : Φ∧¬e (cid:104)1(cid:105)=⇒ Ψ Cond-L 1 2 1 (cid:15)if e thens elses(cid:48) ∼s : Φ=⇒ Ψ 1 1 1 2 (cid:15)s ∼skip : Ψ∧e (cid:104)1(cid:105)=⇒ Ψ While-L 1 1 (cid:15)whilee dos ∼skip : Ψ=⇒ Ψ∧¬e (cid:104)1(cid:105) 1 1 1 Figure 2: Proof rules (selection) var(s )∩var(s )=∅ While-Split Swap 1 2 Φ(cid:96)whileedos≡whilee∧e(cid:48)dos;whileedos Φ(cid:96)s ;s ≡s ;s 1 2 2 1 Figure 3: Equivalence rules (selection) 7 Uniformityandindependencebycouplings G.Barthe,T.Espitau,B.Grégoire,J.Hsu,P.-Y.Strub Definition 8. A set X = {x ,...,x } of program variables of finite types A ,...,A is 1 n 1 n uniformly distributed in a distribution µ∈(cid:68)(State) iff for every (a ,...,a )∈A ×...×A : 1 n 1 n   (cid:94) (cid:89) 1 Prµ xi =ai= |A | i 1≤i≤n 1≤i≤n Note that the definition of uniformity (and as we will see in later sections, the definition of independence) naturally extends to sets of expressions, and so do our characterizations. 4.1 Characterization The following proposition characterizes uniformity in terms of couplings. Proposition9(Uniformitybycoupling). LetX ={x ,...,x }beasetofvariablesofrespective 1 n finite types A ,...,A . For every program s, the following are equivalent: 1 n 1. for every state m, X is uniformly distributed in s ; m (cid:74) (cid:75) 2. for every two tuples (a ,...,a ),(a(cid:48),...,a(cid:48) )∈A ×···×A , we have 1 n 1 n 1 n     (cid:94) (cid:94) (cid:15)s∼s : EqMem=⇒  xi(cid:104)1(cid:105)=ai ⇐⇒  xi(cid:104)2(cid:105)=a(cid:48)i. 1≤i≤n 1≤i≤n Proof. [1.⇒2.] Let m be a memory and assume that X is uniformly distributed in s . Let m (a ,...,a ),(a(cid:48),...,a(cid:48) )∈A ×···×A . Wedenotebyf :State→Statethebijectio(cid:74)n(cid:75)defined 1 n 1 n 1 n by f(m)=m[x ←a(cid:48)] if ∀i.m[x ]=a  i i 1≤i≤n i i f(m)=m[x ←a ] if ∀i.m[x ]=a(cid:48) i i 1≤i≤n i i f(m)=m otherwise. Let η ∈(cid:68)(State×State) be the distribution defined by η(m ,m )= s (m ) if m =f(m ), 1 2 m 1 2 1 and η(m ,m )=0 otherwise. We prove that η is a Ψ-coupling for s (cid:74) ,(cid:75)where 1 2 m (cid:74) (cid:75) (cid:16) (cid:17) (cid:16) (cid:17) ψ =(cid:52) (cid:86) x (cid:104)1(cid:105)=a ⇐⇒ (cid:86) x (cid:104)2(cid:105)=a(cid:48) . 1≤i≤n i i 1≤i≤n i i Regarding the marginals, we have: (cid:88) π (η)(m )= η(m ,m )=η(m ,f(m ))= s (m ) 1 1 1 2 1 1 m 1 (cid:74) (cid:75) m2 (cid:88) π (η)(m )= η(m ,m )=η(f−1(m ),m )= s (f−1(m )) 2 2 1 2 2 2 m 2 (cid:74) (cid:75) m1 = s (f(m ))= s (m ), m 2 m 2 (cid:74) (cid:75) (cid:74) (cid:75) the last equality being a consequence of X being uniformly distributed in s . Moreover, for m (m ,m )∈supp(η), we have m =f(m ). Thus, m [x ]=a iff m [x ]=a(cid:74)(cid:48),(cid:75)and m ,m |=Ψ. 1 2 2 1 1 i i 2 i i 1 2 [2.⇒1.] Let(a ,...,a ),(a(cid:48),...,a(cid:48) )∈A ×···×A andassumethat(cid:15)s∼s : EqMem=⇒ Ψ, 1 n 1 n 1 n where Ψ is defined as in the previous case. Since m,m|=EqMem, by Lemma 33 we have: (cid:104) (cid:105) (cid:104) (cid:105) Pr (cid:86) x =a =Pr (cid:86) x =a(cid:48) , s m 1≤i≤n i i s m 1≤i≤n i i (cid:74) (cid:75) (cid:74) (cid:75) showing that X is uniform in s . m (cid:74) (cid:75) 8 Uniformityandindependencebycouplings G.Barthe,T.Espitau,B.Grégoire,J.Hsu,P.-Y.Strub By expressing uniformity as a coupling property, we can use pRHL to prove uniformity. To demonstrate the technique, we consider classical examples from the theory of randomized algorithms. 4.2 Simulating a Fair Coin This example considers a process for simulating a fair coin using a biased coin. The idea is simple: 1) toss the coin x←0; twice; 2) if the two outcomes differ, return the value of the y ←0; first coin; 3) if the two outcomes match, repeat from step whilex=ydo 1. The algorithm does not require the bias of the coin to x←$ Bern(p); be known, as long as it is some constant bias and there is y ←$ Bern(p); positive probability of returning 0 and 1. This process can bemodelledbytheprogramsfromFig.44,where0<p<1 Figure 4: Bernoulli uniformizer is a real parameter modeling the probability of the biased coin to return 0 (tail). Our goal is to establish the trivial judgment {(cid:62)}s{(cid:62)} and the following pRHL judgment: (cid:15)s∼s : (cid:62)=⇒ x(cid:104)1(cid:105) ⇐⇒ ¬x(cid:104)2(cid:105) By the fundamental lemma of coupling, this implies that Pr [x=1] = Pr [x=0], and s m s m hence that x is uniformly distributed upon termination. The p(cid:74)ro(cid:75)of proceeds by(cid:74)e(cid:75)stablishing the following invariant: x(cid:104)2(cid:105)=if x(cid:104)1(cid:105)=y(cid:104)1(cid:105) then y(cid:104)2(cid:105) else ¬x(cid:104)1(cid:105) Validity of the invariant entails that the desired postcondition holds when the program exits, as the invariant and the negation of the loop guard both hold. The invariant holds when entering the loop, so we only need to prove that it is preserved by the loop body. The proof proceeds as follows: first, we swap the two random assignments on the right, leading to the judgment: (cid:15)(x←$ Bern(p); y ←$ Bern(p))∼(y ←$ Bern(p); x←$ Bern(p)) : φ(cid:48) =⇒ φ where φ denotes the loop invariant and φ(cid:48) denotes its strengthening by the loop guard—we do not need the precondition, since the values are freshly sampled in the body. Next, we apply the [Rand] rule twice, with the identity bijection. The required pre-condition ∀v ,v , v =(if v =v then v else ¬v ) 1 2 2 1 2 1 1 is clearly true. 4.3 Cyclic Random Walk Consider a random walk over a cyclic path composed of n nodes labeled 0,1,...,n−1: starting fromposition0, ateachstep, weflipafaircoinover{−1,1}andupdatethepositionaccordingly to the result of the coin flip. To take into account that we are on a cyclic structure, all arithmetical operations are in the cyclic ring (cid:90)/n(cid:90)—i.e. are performed modulo n. At each iteration, when moving between two contiguous positions over the circle, we consider that the random walk visited the arc between the two nodes. We want to show that the last visited arc is uniformly distributed. Fig. 55 (left) gives a graphical representation of the random walk, where c is the random walk position and the dashed arc is the last visited arc when c moved from 0 to 1. 9 Uniformityandindependencebycouplings G.Barthe,T.Espitau,B.Grégoire,J.Hsu,P.-Y.Strub d←0; c←0; f ←0; l←0; f n−1 whilel+1≤f do d←$ U{−1,1}; if c=l∧d=1 then l←l+1; 0 if c=f ∧d=−1 then f ←f −1; c c←c+d; l 1 ret←(l,l+1) 2 Figure 5: Cyclic random walk sync’ed anti-sync’ed This process can be seen as a simple version of an algorithmthatsamplesauniformlyrandomspanningtree f l on a graph—when the graph is a cycle, a spanning tree visits all but one of the edges. While Broder [88] analyzes c the general problem, we can verify uniformity for the c a,l a cyclic random walk with couplings. a+1 a+1,f The proof proceeds as follows. We imagine executing f two random walks, from the same initial position. The f goal is to couple the walks so that (a,a+1) is the last b+1 b+1 arc in the first walk if and only if (b,b+1) is the last arc c c in the second walk. If we can show this property for all b,l b,l a,b, then this coupling argument shows that any two arcs have the same probability of being the last arc, hence the last arc must be uniformly distributed. To describe the coupling informally, we first execute asynchronously the two random walks until they eventu- case (i) case (ii) ally synchronize respectively on the arcs (a,a+1) and l (b,b+1). At that point, we are in one of the follow- ing cases: either the random walks synchronize on the same side of the arcs (a,a+1) and (b,b+1), or they synchronize on opposite sides. (These cases are depicted a,l a,f a+1,f a+1 on the left diagrams above, where the arc we want to synchronize on is dashed.) From that point, we execute f thetwoprocessesresp.inlock-step(iftheysynchronized on the same side) or anti-lock-step (if they did not). b+1 b+1 f l Atsomepoint,bothprocesseswillvisittheotherside b,l b ofthearcs(a,a+1)and(b,b+1),andsincetheyexecute in(anti)-lock-step,theseeventswilloccursynchronously. At that point, either the processes finished their walk and they resp. return the arcs (a,a+1) and (b,b+1) as their result (case (i) of the right diagram above), or they have other nodes to visit and so they will not resp. return the arcs (a,a+1) and (b,b+1) (case (ii) of the same diagram). We now detail the formal proof. Consider the program of Fig. 55, where all arithmetical operations are done modulo n. This algorithm instruments the random walk with two points f and l representing the range [f,l] (using clockwise ordering) of all the points that have been visited by the walk. When all nodes of the cycle have been visited (i.e. when l+1=f), the arc between l and l+1 is the only arc that has not been visited by the walk. Let s be the 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.