On Higher-Order Probabilistic Subrecursion (Long Version)∗ Flavien Breuvart Ugo Dal Lago Agathe Herrou 7 1 Abstract 0 We study the expressive power of subrecursive probabilistic higher-order calculi. More 2 specifically,weshowthatendowingaveryexpressivedeterministiccalculuslikeG¨odel’sTwith n various forms of probabilistic choice operators may result in calculi which are not equivalent a as for the class of distributions they give rise to, although they all guarantee almost-sure J termination. Along the way, we introduce a probabilistic variation of the classic reducibility 7 technique, and we prove that the simplest form of probabilistic choice leaves the expressive 1 power of T essentially unaltered. The paper ends with some observations about functional expressivity: expectedly, all the considered calculi represent precisely the functions which T ] O itself represents. L s. 1 Introduction c [ Probabilistic models are more and more pervasive in computer science and are among the most 1 powerfulmodelingtoolsinmanyareaslikecomputervision[20],machinelearning[19]andnatural v language processing [17]. Since the early times of computation theory [8], the very concept of an 6 algorithm has been itself generalised from a purely deterministic process to one in which certain 8 7 elementary computation steps can have a probabilistic outcome. This has further stimulated 4 researchin computation and complexity theory [11], but also in programming languages [21]. 0 Endowingprogramswithprobabilisticprimitives(e.g. anoperatorwhichmodelssamplingfrom . 1 a distribution) poses a challenge to programming language semantics. Already for a minimal, im- 0 perative probabilistic programming language, giving a denotational semantics is nontrivial [16]. 7 When languages also have higher-order constructs, everything becomes even harder [14] to the 1 pointofdisruptingmuchofthebeautifultheoryknowninthedeterministiccase[1]. Thishasstim- : v ulated research on denotational semantics of higher-order probabilistic programming languages, Xi with some surprising positive results coming out recently [9, 4]. Notmuchisknownaboutthe expressivepowerofprobabilistic higher-ordercalculi,asopposed r a to the extensive literature on the same subject about deterministic calculi (see, e.g. [24, 23]). What happens to the class of representable functions if one enrich, say, a deterministic λ-calculus X with certain probabilistic choice primitives? Are the expressive power or the good properties of X somehow preserved? These questions have been given answers in the case in which X is the pure, untyped, λ-calculus [6]: in that case, univesality continues to hold, mimicking what happensinTuringmachines[22]. ButwhatifX isoneofthemanytypedλ-calculiensuringstrong normalisation for typed terms [12]? Butletusdoastepback,first: whenshouldahigher-orderprobabilisticprogrambeconsidered terminating? The questioncanbegivenasatisfactoryanswerbeinginspiredby,e.g.,recentworks on probabilistic termination in imperative languages and term rewrite systems [18, 2]: one could ask the probability ofdivergenceto be 0, calledalmost sure termination property,orthe stronger positive almost sure termination, in which one requires the averagenumber of evaluation steps to be finite. That termination is desirable property, even in a probabilistic setting can be seen, e.g. ∗TheauthorsarepartiallysupportedbyANRproject14CE250005ELICAandANRproject12IS02001PACE. 1 in the fieldof languageslike Church and Anglican,in whichprogramsare oftenassumedto be almost surely terminating, e.g. when doing inference by MH algorithms [13]. In this paper, we initiate a study on the expressive power of terminating higher-order calculi, in particular those obtained by endowing G¨odel’s T with various forms of probabilistic choice operators. In particular, three operators will be analyzed in this paper: • A binary probabilistic operator ⊕ such that for every pair of terms M,N, the term M ⊕N evaluates to either M or N, each with probability 1. This is a rather minimal option, which, 2 however,guarantees universality if applied to the untyped λ-calculus [6] (and, more generally, to universal models of computation [22]). • A combinator R, which evaluates to any natural number n ≥0 with probability 1 . This is 2n+1 the natural generalization of ⊕ to sampling from a distribution having countable rather than finite support. This apparently harmless generalization (which is absolutely non-problematic inauniversalsetting)hasdramaticconsequencesinasubrecursivescenario,aswewilldiscover soon. • A combinatorX such that for everypair of values V,W, the term XV W evaluates to either W or V(XV W), eachwith probability 1. The operatorX canbe seen as a probabilistic variation 2 onP C F ’sfixpointcombinator. As such,Xispotentiallyproblematictotermination,givingrise to infinite trees. This way, various calculi can be obtained, like T⊕, namely a minimal extension of T, or the full calculus T⊕,R,X, in which the three operators are all available. In principle, the only obvious fact about the expressive power of the above mentioned operators is that both R and X are at least as expressiveas ⊕: binarychoice canbe easily expressedby either R orX. Less obviousbut still easy to prove is the equivalence between R and X in presence of a recursive operator (see Section 3.3). But how about, say, T⊕ vs. TR? Traditionally, the expressivities of such languages are compared by looking at the set of func- tions f ∶ N → N defined by typable programs M ∶ NAT → NAT. However, in probabilistic setting, programs M ∶ NAT → NAT computes functions from natural numbers into distributions of nat- ural numbers. In order to fit usual criterions, we need to fix a notion of observation. There are at least two relevant notions of observations, corresponding to two randomised programming paradigms, namely Las Vegas and Monte Carlo observations. The main question, then, consists inunderstandinghowtheobtainedclassesrelatetoeachother,andtothe classofT-representable functions. Alongtheway,however,wemanagetounderstandhowtocapturetheexpressivepower of probabilistic calculi per se. This paper’s contributions can be summarised as follows: • We first take a look at the full calculus T⊕,R,X, and prove that it enforces almost-sure termi- nation, namely that the probability of termination of any typable term is 1. This is done by appropriatelyadaptingthewell-knownreducibilitytechnique[12]toaprobabilisticoperational semantics. We then observe that while T⊕,R,X cannot be positively almost surely terminating, T⊕ indeed is. This already shows that there must be a gap in expressivity. This is done in Section 3. • In Section 4, we look more precisely at the expressive power of T⊕, proving that the mere presence of probabilistic choice does not add much to the expressive power of T: in a sense, probabilistic choice can be “lifted up” to the ambient deterministic calculus. • We look at other fragments of T⊕,R,X and at their expressivity. More specifically, we will prove that (the equiexpressive)TR and TX representprecisely what T⊕ cando at the limit, in a sense which will be made precise in Section 3. This part, which is the most challenging, is done in Section 5. • Section6 is devotedto provingthatboth forMonte Carlo andfor Las Vegas observations,the class of representable functions of TR coincides with the T-representable ones. 2 Probabilistic Choice Operators, Informally AnytermofG¨odel’sTcanbeseenasapurelydeterministiccomputationalobjectwhosedynamics is finitary, due to the well-knownstrong normalizationtheorem (see, e.g., [12]). In particular, the 2 apparent non-determinism due to multiple redex occurrences is completely harmless because of confluence. Inthispaper,indeed,weevenneglectthisproblem,andworkwithareductionstrategy, namely weak call-by-value reduction (keeping in mind that all what we will say also holds in call- by-name). Evaluationof a T-termM of type NAT canbe seen as a finite sequence of terms ending in the normal form n of M (see Figure 1). More generally, the unique normal form of any term T term M will be denoted as M . Noticeably, T is computationally very powerful. In particular, the T-representablefunctions from N to N coincide with the functions which are provablytotal in Peano’s arithmetic [12]. J K As we already mentioned, the most natural way to enrich deterministic calculi and turn them into probabilistic ones consists in endowing their syntax with one or more probabilistic choice operators. Operationally,eachof them models the essentially stochastic process of sampling from a distribution and proceeding depending on the outcome. Of course, one has many options here as for which of the various operators to grab. The aim of this work is precisely to study to which extend this choice have consequences on the overallexpressive power of the underlying calculus. Suppose, for example, that T is endowed with the binary probabilistic choice operator ⊕ de- scribed in the Introduction, whose evaluation corresponds to tossing a fair coin and choosing one of the two arguments accordingly. The presence of ⊕ has indeed an impact on the dynamics of the underlying calculus: the evaluation of any term M is not deterministic anymore, but can be modeled as a finitely branching tree (see, e.g. Figure 3 for such a tree when M is (3⊕4)⊕2). The fact that all branches of this tree have finite height (and the tree is thus finite) is intuitive, and a proof of it can be given by adapting the well-known reducibility proof of termination for T. In this paper, we in fact prove much more, and establish that T⊕ can be embedded into T. If ⊕ is replaced by R, the underlying tree is not finitely branching anymore, but, again, there is not (at least apparently) any infinitely long branch, since each of them can somehow be seen as a T computation (see Figure 2 for an example). What happens to the expressive power of the obtained calculus? Intuition tells us that the calculus should not be too expressive viz. T⊕. If ⊕ is replaced by X, on the other hand, the underlying tree is finitely branching, but its height can well be infinite. Actually, X and R are easily shown to be equiexpressive in presence of higher- order recursion, as we show in Section 3.3. On the other hand, for R and ⊕, no such encoding is available. Nonetheless, TR can still be somehow encoded embedded into T (just that we need an infinite structure) as we will detail in Section 5. From this embeding, we can show that applying Monte Carlo or Las Vegas algorithmon T⊕,X,R results do not add any expressive power to T, This is done in Section 6. 3 The Full Calculus T⊕,R,X All along this paper, we work with a calculus T⊕,R,X whose terms are the ones generated by the following grammar: M,N,L ∶∶ x ∣ λx.M ∣ M N ∣ M,N ∣ π ∣ π = ⟨ ⟩ 1 2 ∣ rec ∣ 0 ∣ S ∣ M ⊕N ∣ R ∣ X. Please observe the presence of the usual constructs from the untyped λ-calculus, but also of primitive recursion, constants for natural numbers, pairs, and the three choice operators we have described in the previous sections. As usual, terms are taken modulo α-equivalence. Terms in which no variable occurs free are, as usual, dubbed closed, and are collected in the set T⊕,R,X. A value is simply a closed term from C the following grammar U,V ∶∶ λx.M ∣ π ∣ π ∣ U,V ∣ rec ∣ 0 ∣ S ∣ S V ∣ X, = 1 2 ⟨ ⟩ and the set of values is T⊕,R,X. Extended values are (not necessarily closed) terms generated by V the same grammar as values with the addition of variables. Closed terms that are not values are 3 called reducible and their set is denoted T⊕,R,X. A context is a term with a unique hole: R C ∶= ⋅ ∣ λx.C ∣ C M ∣ M C ∣ ⟨C,M⟩ ∣ ⟨M,C⟩ ∣ C⊕M ∣ M ⊕C We write T⊕,R,X forLtMhe set of all such contexts. L⋅M Termination of G¨odel’s T is guaranteed by the presence of types, which we also need here. Types are expressions generated by the following grammar A,B ∶∶=NAT ∣ A→B ∣ A×B. Environmental contexts areexpressionsofthe formΓ=x1∶A1,...,xn∶An, whiletyping judgments are of the form Γ⊢M∶A. Typing rules are given in Figure 5. From now on, only typable terms will be considered. We denote by T⊕,R,X A the set ofterms of type A, and similarly for T⊕,R,X A ( ) C ( ) and T⊕,R,X A . We use the shortcut n for values of type NAT: 0 is already part of the language of V ( ) terms, while n+1 is simply S n. 3.1 Operational Semantics While evaluating terms in a deterministic calculus ends up in a value, the same process leads to a distribution of values when performed on terms in a probabilistic calculus. Formalizing all this requiressomecare,butcanbe donefollowingoneofthe manydefinitionsfromthe literature(e.g., [6]). GivenacountablesetX,adistribution onX isaprobabilistic(sub)distributionoverelements L of X: L,M,N ∈D(X)={f ∶X → 0,1 ∑ f x ≤1 [ ]∣ ( ) } x∈X We are especially concerned with distributions over terms here. In particular, a distribution of type A is simply an element of D T⊕,R,X A . The set D T⊕,R,X is ranged over by metavari- ( ( )) ( V ) ables like , , . We will use the pointwise order on distributions, which turns them into an U V W ≤ ωCPO. Moreover, we use the following notation for Dirac’s distributions over terms: M ∶= { } M ↦1 . Thesupportofadistributionisindicatedas ; wealsodefinethereducible {N ↦0 if M ≠N} ∣M∣ andVvhaaluveesaunppoborvtisoufrsaagnmdenntastausra∣Ml m∣Rea∶n=i∣nMg:∣∩foTr⊕Ra,Rn,Xyand∣MD∣VX∶=a∣Mnd∣∩YT⊕V,XR,X,.tNheontionsYlikxeMRanxd M M∈ ⊆ M =M if x Y and Y x 0 otherwise. ( ) ( ) ( ) ∈ M = ( ) Assyntacticsugar,weuseintegralnotationstomanipulatedistributions,i.e.,foranyfamilyof distributions(NM)M∈T⊕,R,X ∶D(T⊕,R,X)T⊕,R,X,theexpression∫MNM.dM standsfor∑M∈T⊕,R,XM(M)⋅ (by abuse of notation, we may define only for M , since the others are not used NM NM ∈ M ∣ ∣ anyway). Thenotationcanbeeasilyadapted,e.g.,tofamiliesofrealnumbers(pM)M∈T⊕,R,X andto other kinds of distributions. We indicate as C the push-forward distribution C M dM M ∫M{ } induced by a context C, and as the norm 1dM of . Remark, finally, that we have the useful equality M d∣∣MM.∣∣The intLegra∫MlMcan be maMnipulated as usual integraLls wMhich M = ∫M{ } respect the less usual equation: dN dN dM (1) ∫(∫MNMdM)LN = ∫M(∫NM LN ) Reduction rules of T⊕,R,X are given by Figure 6. For simplicity, we use the notation M →? M for M → , i.e., M → whenever M is reducible and M whenever M is a value. The M M M= { } { } notation permit to rewrite rule r- as: ∈ ( ) ∀M ∈ M, M →?NM ∣ ∣ (r-∈) → .dM M ∫MNM 4 M →⋯→n 3⊕4 ⊕2 X S 3 ( ) 1 1 Figure 1: A Reduction in T 1 1 2 2 2 2 3 S X S 3 ( ) R 3⊕4 2 12 12 4 SS X S 3 1 1 ( ) 12 41 18 116 2 2 12 12 0 1 2 3 4 3 4 5 ⋱ Figure 2:A Reduction in TR Figure 3:A Reduction in T⊕ Figure 4:A Reduction in TX Γ,x A M B Γ M A B Γ N A ∶ ⊢ ∶ ⊢ ∶ → ⊢ ∶ Γ,x A x A Γ λx.M A B Γ M N B ∶ ⊢ ∶ ⊢ ∶ → ⊢ ∶ Γ 0 NAT Γ S NAT NAT Γ rec A NAT A A NAT A ⊢ ∶ ⊢ ∶ → ⊢ ∶ ×( → → )× → Γ M A Γ N B ⊢ ∶ ⊢ ∶ Γ M,N A B Γ π1 A B A Γ π2 A B B ⊢⟨ ⟩∶ × ⊢ ∶( × )→ ⊢ ∶( × )→ Γ M A Γ N A ⊢ Γ∶ M⊕N ∶⊢A ∶ Γ R∶NAT Γ X∶ A A ×A A ⊢ ⊢ ⊢ ( → ) → Figure 5: Typing Rules. M (r-β) →M (r-@L) λx.M V M V x M V V [ / ]} M ( ) →{ → N M M →N (r-@R) →M (r-⟨⋅⟩L) →M (r-⟨⋅⟩R) M N M M,N ,N V,M V, N M M → ⟨ ⟩→⟨ ⟩ ⟨ ⟩→⟨ ⟩ (r-rec0) (r-recS) rec U,V,0 U rec U,V,S n V n rec U,V,n ⟨ ⟩→{ } ⟨ ⟩→{ ( ⟨ ⟩)} (r-π1) (r-π2) (r-R) π1 ⟨V,U⟩→{V} π2 ⟨V,U⟩→{U} R→{n↦ 2n1+1}n∈ N (r-⊕) (r-X) M 1 V X V,W 1 M⊕N →{N ↦↦ 212} X⟨V,W⟩→{ ( W⟨ ⟩) ↦↦ 212} M , M V , = V ∀ ∈ MR NM ∀ ∈ MV NV ∣ ∣ → ∣ ∣ { } (r-∈) .dM M→∫MNM Figure 6: Operational Semantics. 5 Notice that the reduction → (resp. →?) is deterministic. We can easily define →n (resp. →≤n) as the nth exponentiation of → (resp. →?) and →∗ as the reflexive and transitive closure of → (and →?). In probabilistic systems, we might wantto consider infinite reductions such as the ones induced by X λx.x 0 which reduces to 0 , but in an infinite number of steps. Remark that ( ) { } for any value V, and whenever → , it holds that V V . As a consequence, we can M N M ≤N ( ) ( ) proceed as follows: Definition1. LetM beatermandlet Mn n∈N betheuniquedistributionfamilysuchthatM →≤n ( ) . The evaluation of M is the value distribution Mn M ∶={V ↦nl→im∞Mn(V)}∈D(T⊕V,R,X). The success of is its probaJbiliKty of normalization, which is formally defined as the norm of its M evaluation,i.e.,Succ M ∶= M . M∆nV standsfor V ↦Mn V −Mn−1 V ,thedistributions ( ) ∣∣ ∣∣ { ( ) ( )} of values reachable in exactly n steps. The averagereduction length from M is then J K M ∶= ∑ n⋅ M∆nV ∈ N ∪ +∞ [ ] ( ∣∣ ∣∣) { } n Notice that, by Rule r- , the evaluation is continuous: ∈ ( ) M dM. M =∫ M Any term M of type NAT → NAT repreJsentKs a funJctioKn g ∶ N → D N iff for every n,m it holds that g n m M n m . ( ) = ( )( ) ( ) J K 3.2 Almost-Sure Termination We now have all the necessary ingredients to specify a quite powerful notion of probabilistic com- putation. When, precisely, should such a process be considered terminating? Do all probabilistic branches (see figures 1-4) need to be finite? Or should we stay more liberal? The literature on the subjectisunanimouslypointingtothenotionofalmost-sure termination: aprobabilisticcom- putation should be considered terminating if the set of infinite computation branches, although not necessarily empty, has null probability [18, 10, 15]. This has the following incarnation in our setting: Definition 2. A term M is said to be almost-surely terminating (AST) iff Succ M 1. = ( ) This section is concerned with proving that T⊕,R,X indeed guarantees almost-sure termination. This will be done by adapting Girard-Tait’s reducibility technique. Before giving the main result, some auxiliary lemmas are necessary. Lemma 3. If it exist, the reduction of any integral dM → , is the integral dM L ∫MNM L ∫MLM such that → for any M . NM LM ∈ M ∣ ∣ Proof. • Let such that → for any M . Then in order to derive (LM)M∈∣M∣ NM LM ∈ ∣M∣ → , the only rule we can apply is r- so that there is ′ such NM LM ( ∈) (LM,N)M∈∣M∣,N∈∣NM∣ that LM = ∫NML′M,NdN and N →? L′M,N. Notice that for N ∈ ∣NM∣ ∩ ∣NM′∣, the de- terminism of →? gives that ′ ′ , thus we can define ′ without ambiguity for LM,N = LM′,N LN any N dM . Then we have N →? ′ and ′ dN ∈ ⋃M∈∣M∣∣NM∣ = ∣∫MNM ∣ LN ∫∫MNMdMLN = ′ dN dM dM (we use Eq (1)). Thus, by applying rule r- we get ∫M(∫NMLN ) = ∫MLM ( ∈) dM → dM. ∫MNM ∫MLM • Conversely,weassumethat dM → . Inordertoderiveit,theonlyrulewecanapplyis ∫MNM L r- sothatthereis ′ suchthat ′ dN ′ dN dM ( ∈) (LN)N∈∣∫MNMdM∣ L=∫(∫MNMdM)LN =∫M(∫NM LN ) and N →?L′N. We conclude by setting LM ∶=∫NM L′NdN. 6 Lemma 4. The nth reductions of any integral dM →n , is (if it exists) the integral L ∫MNM L dM such that →n for any M . ∫MLM NM LM ∈∣M∣ Proof. By induction on n: • If n 0 then . = LM =N • Otherwise, dM → ′→n−1 . ByLemma 3,the firststepis possible iff ′ ′ dM ∫MNM L L L =∫MLM with → ′ for any M . By IH, the remaining steps are then possible iff NdMM suLchMthat ′ →n−1∈ ∣M∣for any M . L = ∫MLM LM LM ∈∣M∣ Lemma 5. If (M N)→n L+W then there is M,N ∈D(T⊕,R,X) and U,V ∈D(T⊕V,R,X) such that M →∗ M+U, N →∗N +V and U V →∗L′+W. ( ) Proof. By induction on n. Trivial if n 0. = If n 1, we proceed differently depending whether M and N are values or not. ≥ • If N is not a value then N →N′ and M N′ →n−1L+W. ( ) aNnodticLe =th∫aNt′(LMN′NdN′)′∶=in∫Nsu′{cMh aNw′}adyNth′.atUsfoinrgaLneymNm′a∈4∣,Nw′e∣,c(aMn dNec′o)m→p≤ons−e1WLN=′∫+NW′WNN′.′dBNy′ induction we have N′ →∗ NN′ +VN′ and M →∗ MN′ +UN′ with (UN′ VN′)→∗ L′N′ +WN′ for allN′∈∣N∣′. Summing upwe haveM∶=∫N′MN′dN′, U ∶=∫N′UN′dN′, N ∶=∫N′NN′dN′ and V ∶=∫N′VN′dN′. • If N is a value and M →M′ then M′ N →n−1L+W. Thus we can decompose the equation ( ) similarly along ′ and apply our IH. M • If both M and N are values it is trivial since M and N . U = V = { } { } Lemma 6. For every m,n N and every , D T⊕,R,X and , , D T⊕,R,X , whenever M →m M+U and N →nN +∈ V, we have UMVN≤∈M(N . ) U V W ∈ ( V ) Proof. By induction on m+n. J K J K Trivial if m+n=0. If n 1, we proceed differently depending whether M and N are values or not. ≥ • If N is not a value then N →N′→n−1N +V. UsingLemma4,wecandecomposeN =∫N′NN′dN′ andV =∫N′VN′dN′ insuchawaythatfor any N′ ∈ N′, N′ →≤n−1 NN′ +VN′ (with →≤n−1 dubbing either = or →n−1 depending whether ∣ ∣ N′ is a value or not). Then we get U V =sU (∫N′VN′dN′){=∫N′ U VN′ dN′≤∫N′ M N′ dN′= M N′ . J K J K J K J K • If N is a value and M → M′ then M′ →m−1 M+U. Thus we can decompose the equation similarly along ′ and apply our IH. M • If both M and N are values it is trivial since M and N . U = V = { } { } Thefollowingis acrucialintermediatesteptowardsTheorem8,the mainresultofthis section. Lemma 7. For any M,N: M N M N . In particular, if the application M N is almost- = surely terminating, so are M and N. J K JJ K J KK Proof. ≤ There is Ln,Wn n≥1 such that M N →nLn+Wn for all n and such that M N = lim ( ). (Applyin)g Lemma (5 )gives , , ′ D T⊕,R,X n Wn Mn Nn Ln ∈ and (Un,V)n ∈D(T⊕V,R,X) such that M →∗ Mn +Un, N →∗ Nn +Vn and (Un Vn) →∗ LJ′n( +WKn). Thus M and N with leading to the required inequality. ⋁nUn≤ ⋁nVn≤ Wn≤ Un Vn J K J K J K 7 ≥ Thereis Mn,Nn,Un,Vn n≥1suchthatM →nMn+UnandN →nNn+Vnforallnandsuch ( ) ( ) that M lim and N lim . Thisleadstotheequality M N lim . = n Un = n Vn = m,n Um Vn ( ) ( ) Finally, by Lemma 6, for any m,n, each approximant of is below M N , so is their Um Vn sup. J K J K JJ K J KK J K J K J K Theorem 8. The full system T⊕,R,X is almost-surely terminating (AST), i.e., ∀M ∈T⊕,R,X, Succ M =1. ( ) Proof. The proof is is based on the the notion of a reducible term, which is given as follows by induction on the structure of types: RedNAT∶= M ∈T⊕,R,X NAT M is AST { ( )∣ } RedA→B ∶= {M ∣∀V ∈RedA∩T⊕V,R,X, (M V)∈RedB} RedA×B ∶= M π1 M ∈RedA, π2 M ∈RedB { ∣( ) ( ) } Then we can observe that: • The reducibility candidates over Red are →-saturated: A byinductiononAwecanindeedshowthatifM → then Red iffM Red . M M ⊆ A ∈ A • Trivial for A NAT. ∣ ∣ = • If A B →C: then for all value V Red , M V → V , thus by IH M V Red = ∈ B M ∈ C ( ) ( ) ( ) iff V M′ V M′ Red ; which exactly means that Red iff M = ∈ M ⊆ C M ⊆ B→C ∣ ∣ { ∣ ∣ ∣} ∣ ∣ M Red . ∈ B→C • If A = A1×A2: then for i ∈ {1,2}, (πi M) → (πi M) so that by IH, (πi M) ∈ RedAi iff π Red ; which exactly means that Red iff M Red . • The(reidMuci)b⊆ility caAnididates over Red are preci∣sMely∣⊆the ASAT1×Ate2rms M∈suchAt1h×aAt2 M Red : A ⊆ A this goes by induction on A. • Trivial for A NAT. J K = • Let M Red : ∈ B→C remark that there is a value V Red , thus M V Red and M V is AST by IH; ∈ B ∈ C ( ) ( ) using Lemma 7 we get M AST and it is easy to see that if U M then U for ∈ ∈ M ∣ ∣ ∣ ∣ some M →∗ so that U Red by saturation. M ∈ B→C Conversely, let M be AST with M Red and let V Red JbeKa value: ⊆ B→C ∈ B ∣ ∣ by IH, for any U M Red J wKe have U V AST with an evaluation supported by ∈ ⊆ B→C ∣ ∣ ( ) elements of Red ; by Lemma 7 M V M V meaning that M V is AST and has C = an evaluation suppoJrtedK by elements of Red , so that we can conc(lude b)y IH. C • Let M Red : J K JJ K K ∈ A1×A2 then π M Red and π M is AST by IH; using Lemma 7 we get M AST and it is ( 1 )∈ A1 ( 1 ) easy to see that if U M then U for some M →∗ so that U Red by ∈ ∣ ∣ ∈ ∣M∣ M ∈ A1→A2 saturation. J K Conversely, let M be AST with M Red and let i 1, 2: ∣ ∣⊆ A1→A2 ∈{ } byIH,foranyU M Red wehave π U AST withanevaluationsupportedby elements of Red ∈;∣by L∣e⊆mma A71J→πA2KM π (Mi )meaning that π M is AST and has an evaluation suApipoJrteKd by elementis of R=edi , so that we can conc(luide b)y IH. Ai • Every term M such that x1 ∶ A1,..J.,xn ∶KAnJ ⊢JM K∶KB is a candidate in the sense that if V Red for every 1 i n, then M V x ,...,V x Red : i∈ Ai ≤ ≤ [ 1/ 1 n/ n]∈ B byinduction onthe type derivation. The onlydifficult casesarethe recursion,the application, and the X: • For the operator rec: We have to show that if U ∈RedA and V ∈RedNAT→A→A then for all n N , rec U V n Red . We proceed by induction on n: ∈ ∈ A • If n( 0: rec U)V 0→ U Red and we conclude by saturation. = ⊆ A • Otherwise: recU V (n{+}1)→V n rec U V n Red since rec U V n Red by ∈ A ∈ A IH and since n∈RedN and V ∈RedN(→A→A, we c)onclude by satu(ration. ) 8 • For the application: we have to show that if M Red and N Red then M N ∈ A→B ∈ A ∈ ( ) Red . But since N Red , this means that it is AST and for every V N , M V B ∈ A ∈ ∈ ∣ ∣ ( ) Red . In particular,by Lemma 7, we have M N M N so that M N is AST and B = M N M V Red . ( J )K • ∣For the o∣p⊆e⋃raVt∈o∣JrNXK∣:∣we have∣⊆to shoBw that fJor anyKvaJlue UJ RKKed and V Red if holds ∈ A→A ∈ A tJhat XKU V Red J. K ∈ A By a(n easy)induction on n, Un V Red since U0 V V Red and Un+1 V ∈ A = ∈ B = U Un V Red whenever U(n V R)ed and U Red . ∈ B ∈ B ∈ B→B Mo(reover),byaneasyinductiononnwehave X U V = 2n1+1 Un (X U V) +∑i≤n 2i1+1 Ui V : q y • trivial for n 0, J K J K = • we knowthat X U V = 21 V +21 V(X U V) andwe get V(X U V) = V X U V = 2n1+1 Un+1(X UJ V) +K ∑i≤nJ 2Ki1+1 UJi+1V byKLemma 7 anJd IH, whicKh iJs suJfficientKKto conclqude. y q y At the limit, we get X U V = ∑i∈N 2i1+1 Ui V . We can then conclude that (X U V) is AST (since each of the Ui V Red qare AyST and 1 1) and that M N J ( K ) ∈ B ∑i 2i+1 = ∣ ∣ = Ui V Red . ⋃i ⊆ A ∣ ∣ J K This concluqdes thye proof. Almost-sure termination could howeverbe seen as too weak a property: there is no guarantee abouttheaveragecomputationlength. Forthisreason,anotherstrongernotionisoftenconsidered, namely positive almost-sure termination: Definition 9. A term M is said to be positively almost-surely terminating (or PAST) iff the average reduction length M is finite. [ ] G¨odel’s T, when paired with R, is combinatorially too powerful to guarantee positive almost sure termination: Theorem 10. T⊕,R,X is not positively almost-surely terminating. Proof. Let Expo∶=rec 2 λxy.rec y λx.S y ∶NAT→NAT be the program computing n↦2n+1 2inn+t1im; tehe2na+v1e.raTgheerned(uEcxtpioo(nRl)en∶gNtAhTisist(choumsp)utin)g, with probability 2n1+1 the number 2n+1 in time 2n+1 [Expo R] = ∑2n+1 = +∞. n 3.3 On Fragments of T⊕,R,X: a Roadmap ThecalculusT⊕,R,X containsatleastfourfragments,namelyG¨odel’sTandthethreefragmentsT⊕, TR andTX correspondingtothethree probabilisticchoiceoperatorsweconsider. Itisthennatural to ask oneselves how these fragments relate to each other as for their respective expressive power. At the end of this paper, we will have a very clear picture in front of us. The firstsuchresultis the equivalence betweenthe apparentlydualfragmentsTR andTX. The embeddings are in fact quite simple: getting X from R only requires “guessing” the number of iterations via R and then use rec to execute them; conversely, capturing R from X is even easier: it corresponds to counting the recursive loops done by some executions of X: Proposition 11. TR and TX are both equiexpressive with T⊕,R,X. Proof. The calculus TR embeds the full system T⊕,R,X via the encoding:1 M ⊕N ∶= rec λz.N,λxyz.M,R 0; X ∶= λxy.rec y,λz.x,R . ⟨ ⟩ ⟨ ⟩ 1Notice that the dummyabstractions onz and the 0 at the end ensurethe correct reduction order by making λz.N avalue. 9 The fragment TX embeds the full system T⊕,R,X via the encoding: M ⊕N ∶= X λxy.M,λy.N 0; R ∶= X S,0 . ⟨ ⟩ ⟨ ⟩ In both cases,the embedding is compositionaland preservestypes. That the two embeddings are correct can be proved easily, see [3]. Notice how simulating X by R requiresthe presenceof recursion,while the converseis nottrue. The implications of this fact are intriguing, but lie outside the scope of this work. In the following, we will no longer consider TX nor T⊕,R,X but only TR, keeping in mind that all these are equiexpressive due to Proposition11. The rest of this paper, thus, will be concerned withunderstandingtherelativeexpressivepowerofthethreefragmentsT,T⊕,andTR. Cananyof the(obvious)strictsyntactical inclusionsbetweenthembeturnedintoastrictsemantic inclusion? Are the three systems equiexpressive? In order to compare probabilistic calculi to deterministic ones, several options are available. Themostcommononeistoconsidernotionsofobservationsovertheprobabilisticoutputs;thiswill bethepurposeofSection6. Inthissection,wewilllookatwhetheritispossibletodeterministically represent the distributions computed by the probabilistic calculus at hand. We say that the distribution M∈D N is finitely represented by2 f ∶N →B , if there exists a q such that for every ( ) k q it holds that f k 0 and ≥ = ( ) k↦f k M= { ( )} Moreover, the definition can be extended to families of distributions by requiring the Mn n existence of f ∶ N ×N →B , q∶N →N such that ∀k≥q n ,f n,k =0 an(d ) ( ) ( ) ( ) ∀n, Mn= k↦f n,k . { ( )} In this case, we say that the representation is parameterized. WewillseeinSection4thatthedistributionscomputedbyT⊕ areexactlythe(parametrically) finitely representable by T terms. In TR, however, distributions are more complex (infinite, non- rational). That is why only a characterization in terms of approximations is possible. More specifically, a distribution D NAT is said to be functionally represented by two functions M ∈ f ∶ N ×N →B and g ∶N →N iff fo(r ev)ery n∈N and for every k ≥g n it holds that f n,k =0 ( ) ( ) ( ) and 1 k∑∈N ∣ M(k) − f(n,k) ∣ ≤ n. Inotherwords,thedistribution canbeapproximatedarbitrarilywell,anduniformly,byfinitely M representableones. Similarly,wecandefineaparameterizedversionofthisdefinitionatfirstorder. InSection5 , weshow thatdistributions generatedby TR terms areindeed uniformlimits over those of T⊕; using our result on T⊕ this give their (parametric) functional representability in the deterministic T. 4 Binary Probabilistic Choice This section is concerned with two theoretical results on the expressive power of T⊕. Taken together, they tell us that this fragment is not far from T. 4.1 Positive Almost-Sure Termination TheaveragenumberofstepstonormalformisinfiniteinT⊕,R,X. Wewillprovethat,onthecontrary, T⊕ is positive almost-surely terminating. This will be done by adapting (and strengthening!) the reducibility-based result from Section 3.2. 2Here we denote B for binomial numbers n (where m,n ∈ N) and BIN for their representation in system T 2m encodedbypairsofnaturalnumbers. 10