ebook img

Ordering information on distributions PDF

0.74 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Ordering information on distributions

7 ORDERING INFORMATION ON DISTRIBUTIONS 1 0 2 john van de wetering n master thesis a J 3 2 ] O L . s c [ 1 v 4 2 9 6 0 . 1 Department of Computer Science 0 7 Wolfson College 1 University of Oxford : v i X July 2016 r a John van de Wetering, Ordering Information on Distributions © July 2016 supervisor : Prof. Bob Coecke ABSTRACT This thesis details a class of partial orders on the space of probabil- ity distributions and the space of density operators which capture the idea of information content. Some links to domain theory and 1 computational linguistics are also discussed. Chapter details some 2 usefultheoremsfromordertheory.InChapter wedefineanotionof aninformationorderingonthespaceofprobabilitydistributionsand 3 see that this gives rise to a large class of orderings. In Chapter we extendtheideaofaninformationorderingtothespaceofdensityop- erators and in particular look at the maximum eigenvalue order. We will discuss whether this order might be unique given certain restric- 4 tions. In Chapter we discuss a possible application in distributional languagemodels,namelyinthestudyofentailmentanddisambigua- tion. ACKNOWLEDGMENTS I’d like to thank Bob Coecke for supervising me for the duration of my Master thesis work and for supplying me with a very inter- esting problem, Klaas Landsman for putting me in contact with the rightpeopleandforagreeingtobethesecondreader,andthepeople workingattheQuantumGroupforhelpingmewithallthequestions I had and for supplying a stimulating environment. The text, theorems and proofs contained in this thesis are, unless otherwise noted, my own work. Some of the results in this thesis 2016 related to distributional semantics have been presented at the Workshop on the Intersection between NLP, Physics and Cognitive Science and will be published in the proceedings as part of a EPTCS 24 volume [ ]. – John van de Wetering 3 2016 th of July iii CONTENTS List of Figures v introduction 1 1 orderings 2 11 2 . Definitions 12 3 . Some examples 13 4 . Proving directed completeness 14 9 . Examples of upwards small posets 15 11 . The upperset map 16 12 . Convex uppersets 2 ordering distributions 14 21 14 . Information Orders 22 16 . Examples 221 16 . . Bayesian order 222 17 . . Renormalised Löwner orders 23 20 . Basic properties 24 21 . Restricted Information Orders 25 24 . Classification of restricted information orders 251 25 . . Polynomials and affine functions 252 28 . . Classification 253 29 . . Adding additional inequalities 254 30 . . Changing parameters 255 32 . . Antisymmetryandnon-contradictingorders 256 34 . . The maximum restricted order 257 36 . . Entropy 26 39 . Domains 27 41 . Summary and conclusions 3 ordering density operators 43 31 43 . Preliminaries 32 47 . Renormalising the Löwner order 33 49 . Composing systems 34 53 . Unicity of the maximum eigenvalue order 341 56 . . The rank of operators 342 57 . . Alternative derivations 343 58 . . Last steps 344 60 . . Graphical intuition 35 62 . Extending the order to positive operators 4 entailment in distributional semantics 64 41 64 . Distributional natural language models 42 64 . Entailment and disambiguation 43 65 . Known methods 44 66 . Applications of Information Orders 441 67 . . Smoothing 442 68 . . Forms of grading 45 69 . Compositionality conclusion 70 bibliography 72 a classification of restricted orders 74 iv LIST OF FIGURES Figure 1 Bayesian order on ∆3 17 Figure 2 Renormalised Löwner orders in ∆3 19 Figure 3 The unique information order on ∆2 20 4 21 Figure Minimalstructureofaninformationorder Figure 5 Comparison of 3 and 36 (cid:118)max (cid:118)−L 6 41 Figure Increasingsequencewithnoapproximation 7 41 Figure Summaryofpropertiesofinformationorders Figure 8 diagonalmatricesinDO(2)mappedbyF+ 60 Figure 9 DO(2) mapped into a cone 61 Figure 10 ∆3 mapped by F+ 61 11 67 Figure Smoothing in information orders v INTRODUCTION One of the most fundamental ideas in mathematics (if not the most) is that of ordering objects. Even before you can count numbers, you have to be able to say which number is bigger than another (idea 9 stolen from [ ]). One of the most fundamental ideas in science (if not the most) is that of information. Science is ultimately about the pursuit of more accurate knowledge about the world and this knowledge is gained somehow via information transfer. Combining these two fundamental idea’s then gives rise to a nat- ural question: what kind of order structure exists with relation to information content? To start answering this question we have to precisely define what we mean by ’order structure’ and ’information content’. The order structure we will take to be a partial order, some properties of which 1 we will look at in Chapter . Information content we will take to be a collection of certain properties of a state an agent can be in. That is: certain states are more informative than others. The agent has a pref- erence of being in the more informative state. The precise question we wish toanswer in this thesiswill then be: is therea natural choice of partial order on the space of states (either classical or quantum) that orders the states according to their information content? Note that we haven’t actually defined yet what we mean by infor- mation content. We will actually not give a complete definition of that inthis thesis. We will skirtthe issue byspecifying some minimal set of properties that a notion of information content should satisfy 2 in Chapter , and look at what kind of partial orders are compatible with these properties. An exact definition of information content is left as an exercise to the reader. In classical physics, a state can be represented by a probability dis- tribution over the different definite (pure) states a system can be in. A partial order over classical states will thus be a partial order on the space of probability distributions. In a similar vein in quantum physics a state can be represented by a density matrix. We will be 2 lookingatclassicalstates(probabilitydistributions)inChapter ,and 3 at quantum states (density matrices) in Chapter . A possible application of this theory is in computation. When we want to know if a certain computation is producing valuable output we might want to check whether the information content of the state theprocessisinisactuallyincreasingornot.Apowerfulwaytostudy the behaviour of processes is by using a special kind of partial order called a domain. For this reason we will also show properties related todomaintheory.Anotherapplicationisincomputationallinguistics. Some concepts in language are related to the information content 4 present in words. This is studied in detail in Chapter . 1 1 ORDERINGS 11 definitions . We’ll start with the basic definitions related to orders. definition 1.1.1: Apreorder onasetPisabinaryrelationwhich (cid:118) is • Reflexive: x P:x x. ∀ ∈ (cid:118) • Transitive: x,y,z P:x y and y z = x z. ∀ ∈ (cid:118) (cid:118) ⇒ (cid:118) A partial order is a preorder that is also antisymmetric: • if x y and y x then x=y. (cid:118) (cid:118) Asetwhichhasapartialorderdefinedonitiscalledaposetand is denoted as (P, ) or just as P when it is clear which partial (cid:118) order we are referring to. definition 1.1.2: For an element x in a poset (P, ) we define the (cid:118) uppersetof x as x= y P ; x y andconverselythedownset ↑ { ∈ (cid:118) } of x as x= z P ; z x . ↓ { ∈ (cid:118) } definition 1.1.3: Let S P be a subset of a poset. The join (or ⊆ supremum) of S if it exists is the smallest upper bound of S and isdenotedas S.Converselythemeet(orinfinum)ofSifitexists ∨ is the largest lower bound of S and is denoted as S. ∧ So if the meet and join of S exist we have for all s S s S ( S is ∈ (cid:118)∨ ∨ an upperbound) and for all p P that are upperbounds of S S p ∈ ∨ (cid:118) ( S is minimal), and the same with the directions reversed for S. ∨ ∧ A specific kind of particularly nice type of poset is a domain. In order to define what a domain is we need some further definitions. definition 1.1.4: A subset S P of a poset P is called (upwards) ⊆ directediffforallx,y Sthereisaz Ssuchthatx zandy z. ∈ ∈ (cid:118) (cid:118) A particular kind of directed subset is an increasing sequence. This is a set of elements (ai)i N such that ai aj for i j. ∈ (cid:118) ≤ definition 1.1.5: For x,y P we define x y iff for all directed ∈ (cid:28) subsets S P with existing supremum we have that when y ⊆ (cid:118) S then there exists an s S such that x s. We call the ∨ ∈ (cid:118) (cid:28) approximationrelationandsaythat xapproximatesy.Wedenote Approx(y)= x P ; x y . We call the poset P continuous iff { ∈ (cid:28) } Approx(y) is directed with supremum y, for all y P. ∈ definition 1.1.6: If all directed subsets of a poset P have a join we call P directed complete and say that P is a directed complete poset which we will abbreviate to dcpo. 2 1.2 some examples 3 definition 1.1.7: A poset P is a domain if it is a dcpo and continu- ous. Since we will often be talking about different partial orders on the same space, we will often say that a partial order itself is a dcpo/do- main when it turns the underlying set into a dcpo/domain. Domains are spaces that allow a natural way to talk about con- 1 tinuous approximation of elements[ ]. This is why they are used when talking about for instance formal semantics of programming 25 languages such as in [ ]. We will not specifically use the theory of domains, but we will note it when certain partial orders have a dcpo or domain structure. Whentalkingaboutmathematicalstructuresweareofcourseinter- ested in the structure preserving maps. definition 1.1.8: A map f :(S, ) (P, ) between posets (or S P (cid:118) → (cid:118) preorders) is called monotone if for all a,b S with a b we S ∈ (cid:118) have f(a) f(b). The map is called Scott-continuous iff it pre- P (cid:118) serves all directed joins. That is, if we have a directed subset of S called D whose join exists we have f(D)= f( D). ∨ ∨ Therelevantmorphismsforposetsaremonotonemaps,andfordcpo’s theyareScott-continuousmaps.NotethatScott-continuousmapsare always monotone. If a monotone map f is bijective and its inverse is also monotone then f is called an order isomorphism and S and P are called order isomorphic. definition 1.1.9: Let f : (S, ) (P, ) be a monotone map S P (cid:118) → (cid:118) from a preordered set S to a poset P. We call f strict monotone iff for all a,b S with a b and f(a) = f(b) then a = b. If f ∈ (cid:118) is furthermore Scott-continuous and P a dcpo then we call f a measurement. Note that if S is a preorder and it allows a strict monotone map to a poset, then S is a partial order. Because a b implies f(a) f(b) S P (cid:118) (cid:118) and b a implies f(b) f(a), and since is antisymmetric we S P P (cid:118) (cid:118) (cid:118) have f(a)= f(b) which by strictness implies a=b so that is also S (cid:118) antisymmetric. Any injective monotone map is also strict monotone, so strict monotonicity can be seen as a generalisation of injectivity. 12 some examples . Partialordersoccureverywhereinmathematics,sowecouldlisthun- dreds of examples, but a few will hopefully suffice. example 1.2.1: Foranyset X thepowerset P(X)isaposetwiththe partial order given by inclusion. The maximal element is X and the minimal element is the empty set. P(X) is in fact a complete lattice:alljoinsandmeetsexistandaregivenrespectivelybythe union of the sets, and the intersection of the sets. 1.3 proving directed completeness 4 example 1.2.2: The real line R is a poset with x y iff x y. In (cid:118) ≤ fact, it is totally ordered: for all x =y in R we have either x y (cid:54) (cid:118) or y x. R is also a lattice with the join and meet of finite (cid:118) sets given by the maximal and minimal elements of the set. For any poset (P, ) we denote the dual order as (P , ), which is ∗ ∗ (cid:118) (cid:118) givenby x yiffy x.Let[0,∞) betherestrictionofRtothe ∗ ∗ (cid:118) (cid:118) 0 positiverealswiththereversedorder.Themaximalelementis R andanydirectedsetisandecreasingsequencein boundedby 0, so the supremum is well defined. So [0,∞) contains all joins ∗ of directed sets, so it is a dcpo. We furthermore have x y iff (cid:28) y<x which means that it is continuous, so [0,∞) is a domain. ∗ example 1.2.3: For a locally compact space X its upper space is given by UX= K X ; ∅=K compact . { ⊆ (cid:54) } When equipped with the reversed inclusion order: A B iff (cid:118) B A it is a continuous dcpo with the join of a directed sub- ⊆ set given by the intersection (which is again a compact set, and garantueed non-empty because of directedness) and A B iff (cid:28) B int(A),whereint(A) denotestheinteriorofaset.Themaxi- ⊆ malelementsarethesingletons,andUX hasaminimalelement if and only if X is compact, in which case X is the minimal ele- ment. If X is compact then UX is compact, and if X is a compact metric space then UX is also a compact metric space with the metric given by (cid:40) (cid:41) d (A,B)=max sup d (a,B) ,sup d (A,b) . UX X X { } { } a A b B ∈ ∈ 13 proving directed completeness . There are some general methods to show that a partial order is di- 17 221 rected complete. A useful one was given in [ , Theorem . . ]: theorem 1.3.1: Givenaposet(P, )andamapµ:P [0,∞) that (cid:118) → ∗ isstrictmonotoneandpreservesthejoinsofdirectedsequences we have the following: • P is a dcpo. • µ is Scott continuous • EverydirectedsubsetS Pcontainsanincreasingsequence ⊆ whose supremum is S. ∨ • For all x,y P, x y iff for all increasing sequences (a ) i ∈ (cid:28) with y a then there is an n such that x a . i n (cid:118)∨ (cid:118) • For all x P Approx(x) is directed with supremum x if ∈ and only if it contains an increasing sequence with supre- mum x. 1.3 proving directed completeness 5 In short, such a map µ makes sure P is a dcpo and that wherever you normally have to work with a directed set you can instead sim- plify to working with an increasing sequence. We will now show that a certain class of topological posets has the same sort of properties. Results very similar to these can be found 13 in [ , Chapter VI] although the results proven here are sometimes 1 slightly more general. We also use different terminology . definition 1.3.2: Let X be a Haussdorff topological space. It is called first countable if it admits a countable neighbourhood ba- sis. It is called separable if it contains a countable dense subset, and it is called sequentially compact iff any sequence contains a convergent subsequence. XbeingHaussdorfmeansthatlimitsofnetsandsequencesareunique when they exist. First countable means that the topology can be un- derstoodintermsofsequences,insteadofthemoregeneralnets.Sep- arable ensures that the space isn’t too large. Sequentially compact is a different notion of compactness. For metric spaces it is equivalent to the requirement of compactness. definition 1.3.3: Let(X, )beaposetwithXfirstcountableHauss- (cid:118) dorff. We call upwards small iff for every x X, x is sequen- (cid:118) ∈ ↑ tially compact and x is closed. Dually, it is called downwards ↓ smallif x issequentiallycompactand x isclosed.Itiscalled ↓ ↑ small iff it is downwards small and upwards small. An upwards small poset has uppersets that are bounded in a cer- tain sense. Indeed for X a subset of Euclidean space, sequentially compact is equivalent to bounded and closed. Note that a sequen- tiallycompactsubspaceisalwaysclosed.Closednessofuppersets(or downsets) means that if we have a convergent sequence a a and n → x a (or a x) for all n, then x a (or a x). Note that if X is n n (cid:118) (cid:118) (cid:118) (cid:118) sequentially compact, then a poset is upwards small if and only if it isdownwardssmall.Thisdefinitionalsworksforpreorders,sincethe requiredpropertieshavenothingtodowithantisymmetry,butwhen not otherwise specified we will assume to be a partial order. Up- (cid:118) wards small partial orders turn out to interact really nicely with the topology.Theyareaspecialcaseofwhatintheliteratureisknownas a pospace: a topological poset (X, ) where the graph of is a closed (cid:118) (cid:118) subset of X2. lemma 1.3.4: Let (X, ) be a first countable Haussdorff space with (cid:118) upwards small, then (cid:118) • All increasing sequences have a join. • All increasing sequences converge. • the join of an increasing sequence is equal to its limit. 1 Theproofsgivenhereareoriginalbecausetheauthorwasn’tawarethatthesestate- ments were already proven. Since the source mentioned is behind a paywall, the proofsherecanbeseenasapublicservice.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.