ebook img

Bayesian Inference using the Symmetric Monoidal Closed Category Structure PDF

0.17 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bayesian Inference using the Symmetric Monoidal Closed Category Structure

Bayesian Inference using the 6 Symmetric Monoidal Closed Category Structure 1 0 2 Kirk Sturtz b e F 4 Abstract ] T Using the symmetric monoidal closed category structure of the category of mea- C surable spaces, in conjunction with the Giry monad which we show is a strong . monad, we analyze Bayesian inference maps and their construction in relation to h t the tensor product probability. This perspective permits the inference maps to be a m seen as a pullback construction. [ 3 1 Introduction v 3 9 A theory for constructing Bayesian inference maps is developed by exploiting the sym- 5 metric monoidal closed category (SMCC) structure of the category of measurable spaces, 2 0 Meas. Using this property the inference maps can be constructed as pullbacks. . 1 While the construction of these maps requires the SMCC structure of Meas, the 0 BayesianinferenceproblemitselfismostnaturallycharacterizedwithintheKleisicategory 6 1 of the Giry monad, K , where denote the Giry monad on the category of measurable v: spaces, Meas.[4] In th(Gis)introducGtion we provide a general overview of this perspective i which permits one to quickly understand precisely what the Bayesian inference problem X entails. r a A Bayesian model is a pair P ,P consisting of a probability measure P X , X Y∣X X and a (regular) conditional pro(bability m)easure P X Y .1 Thus at eac∈hGp(oin)t Y∣X ∶ → G( ) x X, PY∣X ⋅x is a probability measure on Y, and this conditional probability PY∣X in ∈ (∣ ) turn determines the conditional probability P defined as the probability measure on X×Y∣X X ×Y at every x X as the composite ∈ X Y × δ P x Γ X×Y∣X ⋅ P x (∣ ) X×Y∣X x P Y∣X 1 ⌜ ⌝ X Y 1 A regular conditional probability is also commonly referred to as a kernel. 1 INTRODUCTION 2 giving the push forward probability measure PX×Y∣X ⋅x =PY∣X Γ−x1 ⋅ x , where (∣ ) ( ()∣ ) Γ x Y X Y × y x,y ( ) is the constant graph map. On the other hand, given P we have X×Y∣X P Ay = δ P Ay X∣Y( ∣ ) = P( πX ⋆ πX−×1Y∣YA)(y ∣ ) X×Y∣Y( X ( )∣ ) where πX ∶X ×Y X is the coordinate projection map. These two processes are inverse → to each other, and hence knowledge of either P or P solves the inference problem. X∣Y X×Y∣Y Given a Bayesian model P ,P we can construct the K -diagram X Y∣X ( ) (G) 1 P P P X Y X Y × P P X×Y∣X X×Y∣Y P Y∣X X Y P X∣Y Diagram 1: The Bayesian inference problem. where by definition P = PX×Y∣X ⋆ PX, and PY = PπY−1 denotes the marginal probabil- ity measure on Y given P. An inference map for the Bayesian model is a conditional probability P such that one can use P and P , in lieu of P and P , to X×Y∣Y Y X×Y∣Y X X×Y∣X determine the same joint probability measure P on X ×Y, P P =P =P P X×Y∣X ⋆ X X×Y∣Y ⋆ Y which is Bayes equation. Using the constant graph maps Bayes equation can be written ∫ PY∣X Γ−x1 ζ x dPX =∫ PX∣Y Γ−y1 ζ y dPY ∀ζ ∈ΣX×Y X ( ( )∣ ) Y ( ( )∣ ) Provided we restrict Meas to the category of Standard Spaces or Polish Spaces, the existence of inference maps is well known.[3, 10] It is possible to extend this class by using -kernels rather than kernels which are regular conditional probabilities which we have D been, and will continue, to call conditional probabilities. Using -kernels requires an D awkward category and hence we assume Standard Spaces. This issue is discussed further 2 SOME BACKGROUND AND NOTATION 3 in the appendix. Our choice of Standard Spaces is philosophical based upon the idea probability theory should be independent of any topological properties. A good source covering the essential aspects of Standard Spaces is [5, Chapter 2]. Faden[3, Proposition 5] shows that almost pre-standard spaces suffice. The objective of this paper is to serve as a stepping stone to develop better compu- tational methods for inference. Towards that end, we believe it is necessary to exploit the SMCC structure of Meas, in conjunction with the strong monad structure of , to G develop better computational schemes for inference. The basic structural maps, which are natural transformations, discussed herein provide the basis for such developments. 2 Some Background and Notation A measurable function is called a map or a random variable. The σ-algebra associated with the product space X ×Y is the product σ-algebra. If a probability on any space assumes only one of the two values 0,1 we call the probability deterministic. Every map f ∶X Y induces a determinist{ic pr}obability, defined for all B ∈ΣY and x∈X by → 1 iff f x ∈B X δ Y δf(B∣x)={ 0 othe(rw)ise . f Using the graph of f, Γf ∶ X X ×Y which maps x x,f x gives the deterministic → ↦ ( ( )) measure 1 iff x,f x ∈ζ X X ×Y δΓ ζ x = ( ( )) δ f( ∣ ) { 0 otherwise Γ f which plays an essential role in the theory. If a Bayesian model P ,P has a deterministic probability P then we say we X Y∣X Y∣X ( ) have a deterministic Bayesian model, and write it as PX,δf where f ∶X Y is the map giving rise to the deterministic probability.2 If P (is not d)eterministic t→hen we say the Y∣X model P ,P is a nondeterministic Bayesian model. X Y∣X ( ) The tensor product monad Meas,⊗,1 structure on Meas, defined on the objects ( ) by ⊗ Meas Meas Meas × X Y X Y × ⊗ 2 In a countably generated measurable space very deterministic measure arises from a measurable function [2]. Consequently,under the assumptionofStandardSpaces,everydeterministic measurearises from a map f and vice versa. 3 SOME STRUCTURAL MAPS IN MEAS,⊗,1 4 ( ) where X ⊗Y is the set X ×Y with the σ-algebra generated by the family of constant graph maps, Γ and Γ , x y Γy Γx X X Y Y ⊗ x x,y y ( ) makes Meas a symmetric monoidal closed category.3 Thus the evaluation maps Γ⌜f⌝ Γx X X YX YX ⊗ f evX,Y evx Y where the function spaces YX are defined such that the point evaluation maps are all measurable. Recall, the product σ-algebra is a sub σ-algebra of the tensor σ-algebra, and, typically we employ the tensor product σ-algebra only when we require the use of the evaluation maps. Without the finer σ-algebra the tensor product σ-algebra provides, the evaluation maps are not measurable. 3 Some structural maps in Meas, ,1 ( ⊗ ) The monoidal closed structure of Meas, in conjunction with the strong monad structure of , provides several natural transformation which are indispensable tools. These basic G “structural maps” are due to A. Kock [7, 8]. As we need to show is a strong monad it G is necessary to prove Theorem 3.1. There is a natural transformation between the two functors Meas Meas Meas ⋅⊗ ⋅ ⋅×⋅ ∶ × G()G( ) → defined at component X,Y by ( ) τ′′ X,Y X ⊗ Y X ×Y . G( ) G( ) x,Q QΓ−1 x ( ) 3Further details concerning the SMCC structure of Meas can be found in [9]. The necessary proofs follow readily from the definition of the final σ-algebra on the set X Y via the constant graph maps. × 3 SOME STRUCTURAL MAPS IN MEAS,⊗,1 5 ( ) Proof. To verify τ′′ is measurable consider the diagram ΓQ Γx Γx X X ⊗ Y Y x,Q Q G( ) G( ) ( ) τ′′ τ′′ X,Y G(X ×Y) evΓ−x1(ζ) QΓ−x1 . ev ev ζ ζ 0,1 QΓ−1 ζ x [ ] ( ) For a fixed Q∈ Y , the composite mapping x QΓ−1 ζ is well known to be x measurable[1, PGro(po)sition 5.1.2, p156]. On the ↦other ha(n)d, for a fixed x ∈ X, the com- posite map on the right hand triangle in the above diagram is precisely the evaluation map evΓ−1(ζ), and these maps generate the σ-algebra on Y . Thus, for every ζ ∈ΣX×Y, x G( ) Γ−x1 τX′′,Y−1 evζ−1 U ∈ Y ∀U ∈ΣI ( ( ( ))) G( ) Since these are all measurable in Y , and X⊗ Y has the largest σ-algebra such that all the constant graph maps are Gm(eas)urable it foGl(low)s the argument of Γ−1 in the above x equation is measurable. Thus τX′′,Y−1(evζ−1(U)) ∈ΣX⊗G(Y). Now, using the fact G(X ×Y) is generated by the evaluation maps ev , the measurability of τ′′ follows. ζ X,Y f To prove naturality in the first argument note that for X X′ we obtain the Meas Ð→ arrow X G(f) X ′ mapping P Pf−1. This gives the commutative square G( )Ð→ G( ) ↦ τ′′ id X′,Y X′×Y X′ Y X′ Y G( ) X′ Y ⊗ ⊗ × G( ) G( ) G( ) f ⊗1G(Y) f ⊗1Y f ×1Y G( ) G( ) X Y X Y X Y ⊗ ⊗ × G( ) τ′′ G( ) id G( ) X,Y G( X×Y) which maps elements according to (f(x),Q) QΓ−f(1x) =QΓ−x1(f ⊗1Y)−1 x,Q QΓ−1 x ( ) 3 SOME STRUCTURAL MAPS IN MEAS,⊗,1 6 ( ) where equality in the upper right hand corner follows from the observation that f ×1Y X Y X′ Y × × id id X×Y X′×Y f ⊗1Y . X Y X′ Y ⊗ ⊗ Γ Γ x f(x) Y g To prove naturality in the second argument let Y Y′. (As the restriction to the subspace σ-algebra, Σ ⊆ Σ , should now be evÐi→dent, it is not included in the X×Y X⊗Y following diagrams.) This gives us the commutative square τ′′ X,Y′ X Y′ X Y′ ⊗ ⊗ G( ) G( ) 1X ⊗ g 1X ⊗g G( ) G( ) τ′′ X,Y X Y X Y ⊗ ⊗ G( ) G( ) which maps elements according to (x,Qg−1) Qg−1Γ˜−x1 =QΓ−x1(1⊗g)−1 x,Q QΓ−1 x ( ) where equality in the upper right hand corner follows from the observation that Γ˜ x Y′ X Y′ ⊗ g 1X ⊗g Y X Y ⊗ Γ x 3 SOME STRUCTURAL MAPS IN MEAS,⊗,1 7 ( ) Using the fact that the monoidal structure is symmetric we also have a natural trans- formation defined on components by τ′ X,Y X ⊗Y X ×Y . G( ) G( ) P,y PΓ−1 y ( ) The superscript on τ is used to denote which coordinate the probability measures are given. The natural transformation st st X,Y XY X G(Y) G( ) g g G( ) can now be constructed using the natural transformation τ′′ via st YX X,Y G(Y)G(X) f G(f)=G(evYX,X)○τY′′X,X ○Γf Γ ev G(X) YX YX,X G( ) YX ⊗ X G(X) YX ⊗X G(X) Γf τY′′X,X ○Γf ( G( )) τ′′ G(X) G( ) YX,X where ΓYX is the unit of the adjunction ●⊗ X ●G(X), G( )⊢ Γ Y YX ⌜ ⌝ YX ⊗ X G(X) ( G( )) f Γ f with Γ is the constant graph function with value f. The equality f G(f)=G(evX,Y)○τY′′X,X ○Γf then follows from G(f)(⌜P⌝) = ⎛G(evX,Y)○τY′′X,X ○Γf⎞(⌜P⌝) ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹=¹¹¹¹¹¹¹P¹¸f−¹¹¹¹1¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ ⎝ ⎠ = ev τ′′ f, P G( YX,X)( YX,X( ⌜ ⌝)) = ev PΓ˜−1 note Γ ≠Γ˜ G( YX,X)( f ) f f = PΓ˜−1ev−1 f YX,X = Pf−1 4 THE CONSTRUCTION OF BAYESIAN INFERENCE MAPS 8 where the last line follows from the fact the constant graph map Γ˜ satisfies the equation f expressed by the commutativity of the diagram Γ˜ f X YX X ⊗ ev f X,Y Y 4 The construction of Bayesian inference maps The construction of Bayesian inference maps using the symmetric monoidal closed cate- gory structure of Meas applies to both the deterministic and nondeterministic Bayesian models. For simplicity we first present the deterministic case and subsequently show the minor changes required to address the nondeterministic Bayesian model. However, it is worth noting that in constructing Bayesian inference maps it suffices (for standard spaces) to consider the special case where the given conditional probability P is a deterministic probability. The general problem, as given in diagram 1, can then Y∣X be solved using the same approach on the Bayesian model P ,δ , expressed by the ( X×Y πY) K -diagram (G) 1 PX×Y PX×YΓ−πY1 PY =PX×YπY−1 X Y Y × × (1) δΓ δ πY πY δ πY X Y Y × P X×Y∣Y to construct the conditional probability P , which in turn specifies P . X×Y∣Y X∣Y 4.1 Inference maps for deterministic models Our generic deterministic Bayesian problem (base model) is represented by the diagram 4 THE CONSTRUCTION OF BAYESIAN INFERENCE MAPS 9 1 PX PXΓ−f1 PY △= PXf−1 X Y × P δΓ X×Y∣Y f P =δ Y∣X f X Y P X∣Y Diagram 2: The generic deterministic Bayesian model. where the dashed arrows are the inference maps to be determined. The inference maps can be constructed by taking the pullback of the Meas-diagram4 m Lim X ×Y Y G( ) st Y,G(X×Y) 2 X Y G(Y) × G ( ) P ! X×Y∣Y µG(Y) X×Y X Y G(Y) × G( ) evPY =G(X ×Y)PY 1 X ×Y P Γ−1 G( ) ⌜ X f ⌝ withweirtehthPeYm.apTehvePmY aispemvaliunatthioisnpautllPbYa,cki.ed,igaigvreanmainsythmeaspu⌜bTse⌝t∈oGf(aXll×PYX×)YG∣(YY)∈preXco×mYpoYse, ⌜ ⌝ G( ) which upon application of the vertical arrows on the right hand side of the diagram, then yield the composite 4The notation P Γ−1 is used to emphasize that the diagram is in Meas, rather than K where ⌜ X f ⌝ (G) probabilities are actual arrows. 4 THE CONSTRUCTION OF BAYESIAN INFERENCE MAPS 10 P Y 1 ⌜ ⌝ Y G( ) P Γ−1 P ⌜ X f ⌝ G( X×Y∣Y) X Y 2 X Y × × µ G( ) X×Y G ( ) which coincides with P Γ−1 since the diagram is a pullback. The commutativity of this X f diagram is Bayes equation for the deterministic model. The fact the diagram is a pullback is trivial as Meas has all limits and the point P Γ−1 is monic so the limit corresponds ⌜ X f ⌝ to a subset of X ×Y Y. G( ) Thelimit, Lim, inthepullbackcanbeconstructed, andcharacterized, using thetensor product probability PX⊗PY, or alternatively, it can be constructed “pointwise” using the family of tensor probabilities PX ⊗δy =PXΓ−y1.5 Method 1: Using Radon Nikodym derivatives. Note that PXΓ−f1 ≪ PX ⊗PY. OmnintPhe bAas,icPrecBtan.g6leLs,etAh×beBa∈RΣadXo×nY Nwikeohdayvme dPeXriΓva−f1t(ivAe×foBr t)hi=s aPbXs(oAlut∩efc−o1n(tBin)u)it≤y X Y ( ( ) ( )) condition, PXΓ−f1(ζ) = ∫ζhd(PX ⊗PY) = hd P Γ−1 dP ∫Y ∫ζ ( X y ) Y (2) ´=P¹¹¹¹¹¸X⊗¹¹δ¹¹¹¹¶y = h ,y dP dP ∫Y(∫Γ−y1(ζ) (⋅ ) X) Y As PXΓ−f1(X ×Y) = 1 it follows that ∫Xh(⋅,y)dPX = 1 PY −a.e.. Suppose V ∈ ΣY such that ∫Xh(⋅,y)dPX =1 for all y ∈V. Then P ζ y △= ∫ζh(⋅,⋅)d(PXΓ−y1) y ∈V X×Y∣Y( ∣ ) { PXΓ−f1 y /∈V defines a conditional probability, the inference map for the deterministic model, which is absolutely continuous with respect to P Γ−1, and by equation (2), the weighted sum X y PXΓ−f1 =∫ PX×Y ⋅y dPY. (3) Y (∣ ) The sequence of vertical arrows in the pullback diagram show that this expression make sense without requiring evaluation on any measurable set in X ×Y, as the multiplication 5 Inthe standardtheoreticalapproachto computing inference mapsthe RadonNikodym(RN) deriva- tives arecomputedwithrespectto measuresonthe spaceY ratherthanthe productspace X Y. Using × RN derivatives, either directly or indirectly, on the product space is preferable as subsequent arguments illustrate. (One need not “glue together” a family of RN derivatives.) 6The probability P Γ−1 is the push forward of the tensor product probability P P by the idem- potent map Γ π , XP f P Γ π −1 P Γ−1. X ⊗ Y f ○ X ( X ⊗ Y)( f ○ X) = X f

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.