7 1 0 2 Algebras of Information n a A New and Extended Axiomatic J 0 Foundation 1 ] T I . Prof. Dr. Ju¨rg Kohlas, s c [ 1 Dept. of Informatics DIUF v University of Fribourg 8 5 CH – 1700 Fribourg (Switzerland) 6 E-mail: [email protected] 2 0 http://diuf.unifr.ch/tcs . 1 0 Version: January 11, 2017 7 1 : v i X r a 2 Contents 1 Introduction 5 I Labeled Algebras 13 2 Conditional Independence 15 2.1 Quasi-Separoids . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Arithmetic of Partitions . . . . . . . . . . . . . . . . . . . . . 18 2.3 Families of Compatible Frames . . . . . . . . . . . . . . . . . 22 3 Labeled Information Algebras 29 3.1 Axiomatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Valuation Algebras . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Semiring Information Algebras . . . . . . . . . . . . . . . . . 42 4 Local Computation 55 4.1 Markov Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Computing in Markov Trees . . . . . . . . . . . . . . . . . . . 62 4.3 Computation in Hypertrees . . . . . . . . . . . . . . . . . . . 66 II Domain-Free Algebras 71 5 Domain-Free Information Algebras 73 5.1 Unlabeling of Information . . . . . . . . . . . . . . . . . . . . 73 5.2 Domain-Free Axiomatics . . . . . . . . . . . . . . . . . . . . . 75 5.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6 Order of Information 87 6.1 The Idempotent Case . . . . . . . . . . . . . . . . . . . . . . 87 6.2 Regular Algebras . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.3 Separative Algebras . . . . . . . . . . . . . . . . . . . . . . . 93 3 4 CONTENTS 7 Proper Information 99 7.1 Ideal Completion . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.2 Compact Algebras . . . . . . . . . . . . . . . . . . . . . . . . 102 7.3 Duality For Compact Algebras . . . . . . . . . . . . . . . . . 114 7.4 Continuous Algebras . . . . . . . . . . . . . . . . . . . . . . . 124 7.5 Atomic Algebras . . . . . . . . . . . . . . . . . . . . . . . . . 134 III Constructing New Algebras 143 8 Information Maps 145 8.1 Continuous Maps . . . . . . . . . . . . . . . . . . . . . . . . . 145 8.2 Cartesian Closed Categories . . . . . . . . . . . . . . . . . . . 151 9 Random Maps 155 9.1 Simple Random Variables . . . . . . . . . . . . . . . . . . . . 155 9.2 Random Mappings . . . . . . . . . . . . . . . . . . . . . . . . 162 9.3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 168 10 Allocations of Probability 179 10.1 Algebra of Allocations of Probability . . . . . . . . . . . . . . 179 10.2 Random Mappings and Allocations . . . . . . . . . . . . . . . 187 11 Support Functions 199 11.1 Characterisation . . . . . . . . . . . . . . . . . . . . . . . . . 199 11.2 Generating Support Functions . . . . . . . . . . . . . . . . . . 206 11.3 Canonical Random Mappings . . . . . . . . . . . . . . . . . . 210 11.4 Minimal Extensions . . . . . . . . . . . . . . . . . . . . . . . 219 11.5 The Boolean Case . . . . . . . . . . . . . . . . . . . . . . . . 225 References 235 Chapter 1 Introduction Thebasicideabehindinformationalgebras(Kohlas, 2003a;Kohlas & Schmid, 2014) isthatinformation comes inpieces, each referringtoacertain question, that these pieces can be combined or aggregated and that the part relating to a given question can be extracted. This algebraic structure can be given different forms. Questions are often represented by a lattice of domains, and a popular model is based on the subset lattice of a set of variables. Pieces of information are then represented by valuations associated with these domains. This leads then to an algebraic structure called valuation algebras (Kohlas, 2003a). The axiomatics of this algebraic structure was in essence proposed by (Shenoy & Shafer, 1990a). Valuation algebras have already many important applications in Computer Science related to con- straint systems, relational databases, different uncertainty formalisms like probability, belief functions, fuzzy set and possibility measures, and many more, we refer to (Pouly & Kohlas, 2011). An important particular case of valuation algebras, both from practical as well as theoretical point of views, are idempotent valuation algebras, also called proper information algebras: The combination of a piece of information with itself or part of itself gives nothing new. This allows to introduce an order between pieces of informa- tion reflecting information content. It relates proper information algebras also to domain theory (Kohlas, 2003a; Kohlas & Schmid, 2014). The basic view of information as pieces which can be combined, which relate to questions and from which the part relating to given questions can be extracted, leads to two different but essentially equivalent alge- braic structure, labeled and domain-free valuation algebras (Kohlas, 2003a; Kohlas & Schmid, 2014). Theoriginalproposalofanaxiomaticsin(Shenoy & Shafer, 1990a) was in labeled form; later (Shafer, 1991) proposed the domain-free form. However, forvaluationalgebras, thetwoformsarenotfullyequivalent, there are labeled forms which have no domain-free form and vice vera. An im- portant contribution of this paper is to give a new axiomatic system, where there exists a full duality between these two forms. 5 6 CHAPTER 1. INTRODUCTION The representation of questions by lattice of domains or even subsets of variables is unnecessary restrictive and excludes important applications in Computer Science. Already early work on belief functions (Shafer, 1976) considered a reference structure for belief functions called family of com- patible frames. This is not covered by valuation algebras. In this paper a much more general abstract framework for representing questions is pro- posed and based on it a new system of axioms for information algebras, covering the previous forms of valuation algebras and proper information algebras as special cases. Originally, the theory of valuation algebras in (Shenoy & Shafer, 1990a)wasmotivatedbythedesiretogeneralisethelocal computationschemeforprobabilitiesproposedin(Lauritzen & Spiegelhalter, 1988) for other formalisms of uncertainty, especially belief functions. This goal will also be maintained for the new algebraic structures presented here. We claim however, that these algebraic structures represent moreover essential features of information in general, beyond particular uncertainty calculi. In probability theory, conditional independence structures between variables are essential for efficient local computation. It has been known since long thatstructuresofconditional independencecan begeneralised beyondprob- ability (Studeny, 1993; Shenoy, 1994b; Studeny, 1995). In fact, we claim that conditional independence is a basic issue for information and infor- mation algebras in general. In (Dawid, 2001) a fundamental mathematical structure called separoids is abstracted underlying all the concepts of con- ditional independence and its applications. It is shown in this paper that an even weaker concept (called here quasi-separoid) is sufficient to allow for local computation schemes in the context of information algebras in appro- priate conditional independence structures. The basis of the theory of information algebras as developed here, is the relation of conditional independence among questions or domains rep- resenting them. In Chap. 2 it is argued that questions should be partially ordered according to their granularity, their acuteness or coarseness of the possible answers. In fact, this partial order is required in the present theory to form a join-semilattice. The join of two questions is the coarsest among all questions finer than both original questions; the join represents thus the combined question of the two original ones. In addition, a three-place relation among questions is required which describes the conditional inde- pendence of two questions, given a third one. This relation is requested to satisfy four conditions, which are natural requirements for a concept of con- ditional independence. In fact a separoid, the usual concept for modelling conditionalindependenceandirrelevance,satisfies(amongothers)thesecon- ditions. Therefore, a join-semilattice together with a three-place relation satisfying these conditions is called a quasi-separoid (or q-separoid). An important source of q-separoids are join-semilattices of partitions of some universe and they form useful models of systems of questions. Somewhat more general that partitions are families of compatible frames (f.c.f). This 7 notion has been introduced in (Shafer, 1976). Here a slightly modified ver- sion of this concept is proposed and it is shown how q-separoids arise from f.c.f. Both q-separoids or partitions of f.c.f generalise the most often used multivariate model, where questions are represented by families of variables and theirs domains, as in Bayesian networks, belief functions, etc. In this last case, q-separoids become separoids and this links our general theory to the more classical approach to valuation and information algebras. Q-separoids model questions. In Chap 3, pieces of information are added, each piece referring to an element of the q-separoid, to a deter- mined question. But information can to be transported or extracted rela- tive to other questions, and also pieces of information can be combined or aggregated. The corresponding operations are introduced and the required properties of them are stated as axioms. In particular, the operations of transport and combination are related to conditional independence. This determines then a labeled information algebra. For certain particular q- separoids, the axioms can be transformed into those of classical valuation algebras, (Shenoy & Shafer, 1990a; Kohlas, 2003a). The latter appear in this way as particular cases of the general information algebras treated in this text. In relation to partition and f.c.f q-separoids, pieces of informa- tion may be represented by subsets of the universe or of frames. These set information algebras are important models of information algebras. A general problem of information processing can be formulated in the framework of information algebras as combining a number of pieces of infor- mation and then extracting from the combination the part correspondingto oneorseveralgivenquestions. Formulatedinthisway, thismaywellbecom- putationallyinfeasible. Forprobabilisticnetworks(Lauritzen & Spiegelhalter, 1988) proposed a computational scheme which avoids this problem by organising thecomputations in such a way that combination andextraction always can be done on the small domains of the pieces of information involved in the combination. This is called local computation. In (Shenoy & Shafer, 1990a) it was shown that local computation can be applied much more generally to valuation algebras. Here we demonstrate that it can be used in the even more general framework of information algebras. The underlying structures ofthedomainsoftheinformationtobeusedinlocalcomputationiscalled in the literature join or junction trees, hypertrees or Markov trees. These are concepts which are defined relative to multivariate models. They determine certain structures of conditional independence. These structures exist also with respect to the much more general concept of q-separoids. That implies that local computation can be executed in Markov trees as defined relative to q-separoids. However, whereas the concepts of join trees, hypertrees and Markov trees are equivalent in the particular case of multivariate models, in the sense that one notion may be transformed into another one and vice versa, this is no more true in the case of general q-separoids. It turns out that the basic conditional independence structure for local computation in 8 CHAPTER 1. INTRODUCTION information algebras is the one of Markov trees. As labeled algebras, they are based on q-separoids. Labeledinformationalgebrasareconvenientforcomputations. Butthere is an alternative form of information algebras, namely domain-free informa- tion algebras. They are introduced in part II. These algebras are more adapted for theoretical, algebraic considerations. Domain-free information algebras are derived from labeled ones by unlabeling (Chap. 5). As in a labeled algebra, there is the operation of combination or aggregation of in- formation. The operation of transport of a piece of information from its domain to another domain becomes an operation of information extraction, by which the part of a piece of information relevant to a given domain is extracted. Conversely, labeled information algebras may obtained from domain-free ones. This establishes a full duality between these two forms; they are the two sides of a coin. Somepieces of information may bemoreinformative than an other ones. This is reflected by some order between the elements of an information algebra: A piece of information is more informative than a second one, if it is obtained from the latter by combination with a third one. In the importantcaseofidempotent informationalgebras. thisestablishesanatural partial order in the information algebra which respects combination and extraction: Combined information of two or more pieces of information is moreinformativethaneachofitsparts. Also,byextraction, informationcan only be lost. This partial order of information in idempotent information algebras is very important and is studied and exploited in later parts of the text. However, even in the general, non idempotent case, the relation is of interest and important. It determines no more a partial, but only a preorder in the information algebra. This preorder however is no more natural in general, in particular, extraction does not respect the order in all cases. Such preorders are also studied in semigroup theory and there regular semigroups are of particular interest. This notion can be extended to valuation algebras (Kohlas, 2003a) and even to information algebras as understood here. And it turns out that in regular information algebras the preorder becomes natural. Even more general are separative information algebras, andinthisframeworkthepreorderisstillnatural. Thesequestions of information order are discussed in Chap. 6. For the rest of part II and also part III only idempotent information algebras are considered. As mentioned above, they can be considered as “proper” information algebras: The combination of a piece of information with partof itdoesnot give newinformation. Mathematically speaking, the partial information order mentioned above adds an essential new element, which is exploited in Chap. 7. In the partial information order of proper information algebras, combination generates the supremum of the pieces of information combined. In this way the pieces of information determine a join-semilattice. Consistent collections of pieces of information form then 9 an ideal in this semilattice. These ideals form themselves an information algebra, extending and completing the original one to a complete lattice. Furthermore, in computation only finite pieces of information can be pro- cessed, whereas general pieces of information may beapproximated by finite ones. This idea can be expressed by the well-known order-theoretic notion of finite or compact elements and this idea is also the motivation of domain theory as a theory of computation (Gierz, 2003). In this respect, proper information algebras become particular instances of domains. Like in this theory, algebraic and continuous information algebras can be defined and considered. The essential points which distinguishes the theory of proper information algebra from domain theory, is the presence of the extraction operators and that approximation of information takes place not only glob- ally in the algebra but locally on each domain of the underlying q-separoid. Thediscussion refers to domain-freeinformation algebras, butthe extension of duality covering labeled algebras is also addressed. Most constructions of universal algebra to obtain new algebras from existing ones, apply to information algebras. But in part III we restrict ourselves to constructions which make sense from the point of view of in- formation. Again, we limit in this part ourselves to idempotent information algebras. So, inChap. 8weconsiderinformationmaps,thatis, maps,which take elements from a first information algebra as input and map them to elements of a second information algebra as output. Such maps should be monotone: more information as input should result in more information as output. Such maps represent themselves information and in fact, they form an information algebra. If the input comes from a continuous algebra, the information maps should by continuous, that is respect convergence in some sense. It turns out that continuous maps form a continuous informa- tion algebra. Both, general idempotent information algebras together with monotone maps and algebraic or continuous information algebras together with continuous maps, determine Cartesian closed categories. Information in practice is often uncertain. Whether a piece of infor- mation is valid, may depend whether some assumptions are satisfied or not. This can be modelled by mappings from a space of assumptions to a proper information algebra. Some assumptions may be more likely to hold, more probable. So, it makes sense to assume the space of as- sumptions to be a probability space (Chap. 9). In this view, Chap. 9 becomes an abstract theory of probabilistic argumentation, generalising (Haenni et al., 2000; Kohlas, 2003b). Itisalsoageneralisation ofthetheory ofhints(Kohlas & Monney, 1995),anditsapplicationtostatisticalinference (Kohlas & Monney, 2007). Such random mappings form again information algebras, and according to various restrictions imposed on these maps dif- ferent algebras may be obtained. Random maps may be used to evaluate hypotheses according to their likelihood. For this purpose, the probability of the assumptions supporting 10 CHAPTER 1. INTRODUCTION a given hypothesis may be considered. At first, this poses some problems of measurability. However, following (Shafer, 1973; Shafer, 1979), from the probability space a probability algebra may be obtained and a so-called al- location of probability may be derived. This allows to associate a numerical degree of support to any hypothesis which can be formulated within the in- formation algebra or its ideal completion. Our interpretation of this notion by probabilistic argumentation differs from Shafer’s epistemological inter- pretation as partial belief; the mathematics however is the same. What is new, is that it is shown that idempotent information algebras provide the natural, general framework for such a theory, much more general than the classical set-based theory. In fact, allocations of probability may be defined independent of random mappings. And they represent again information, they can be given the structure of idempotent information algebras (Chap. 10). If the allocations are derived from random mappings, the algebra of allocations is homomorphic to the one of random mappings, in some cases even isomorphic. However in the most general case, a random mapping carries more information than its associated allocation of probability. The question arises, whether any allocation of probability in an infor- mation algebra may be derived from a random mapping. This question is addressed in Chap. 11 in the context of supportfunctions. This is an exten- sion of the studies in (Shafer, 1973; Kohlas & Monney, 1994). Again it is shown that the natural mathematical context for this study are information algebras and the question is answered in the positive. All these three last chapters are in fact an extension of what is usually called Dempster-Shafer theory of evidence and discussed with respect to fields of sets, instead of information algebras. Itmustbeemphasisedthattherearemanysubjectsandissuesregarding information algebras, not addressed here. We hint at a few of important subjects and issues. Local computation is based on Markov trees associated with combina- tions of pieces of information. How to find appropriate Markov trees for a given combination? Inthemultivariate setting, jointrees (which inthiscase are also Markov trees) can befoundby selecting asequence of variable elim- inations. An extensive literature presents heuristics for “good” elimination sequences. All this does not carry over to q-separoids in general, not even to partition- or f.c.f-separoids and nothing is known so far about finding Markov trees in these general cases. So, this is a big open question, which is of course basic for the practical application of local computation beyond multivariate models. Also, there are several architectures for local computation for proba- bilistic networks, which make use of some partial division (or information elimination). These architectures apply also for general valuation algebras in the multivariate setting, if some appropriate concept of information elim- ination is assumed (Shenoy, 1994a). Mathematically speaking, what is re-