ebook img

Graded Entailment for Compositional Distributional Semantics PDF

0.68 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Graded Entailment for Compositional Distributional Semantics

Graded Entailment for Compositional Distributional Semantics DeaBankova BobCoecke MarthaLewis DanMarsden Microsoft UniversityofOxford [email protected] {coecke,marlew,daniel.marsden}@cs.ox.ac.uk Abstract In(Coeckeetal.2010),theauthorsdevelopthecategoricalcom- 6 Thecategoricalcompositionaldistributionalmodelofnaturallan- positionaldistributionalmodelofnaturallanguagesemantics.This 1 guageprovidesaconceptuallymotivatedproceduretocomputethe modelleveragesthesharedcategoricalstructureofpregroupgram- 0 meaningofsentences,givengrammaticalstructureandthemean- mars and vector spaces to provide a compositional structure for 2 ings of its words. This approach has outperformed other models distributionalsemantics.Ithasproducedstate-of-the-artresultsin n inmainstreamempiricallanguageprocessingtasks.However,until measuringsentencesimilarity(Kartsaklisetal.2012;Grefenstette andSadrzadeh2011),effectivelydescribingaspectsofhumanun- a recentlyithaslackedthecrucialfeatureoflexicalentailment–as derstandingofsentences. J dootherdistributionalmodelsofmeaning. A satisfactory account of natural language should incorporate Inthispaperwesolvetheproblemofentailmentforcategorical 5 a suitable notion of lexical entailment. Until recently, categorical compositionaldistributionalsemantics.Takingadvantageoftheab- 2 compositional distributional models of meaning have lacked this stractcategoricalframeworkallowsustovaryourchoiceofmodel. crucialfeature.Inordertoaddresstheentailmentproblem,weex- Thisenablestheintroductionofanotionofentailment,exploiting ] ploitthefreedominherentinourabstractcategoricalframeworkto L ideasfromthecategoricalsemanticsofpartialknowledgeinquan- change models. We move from a pure state setting to a category tumcomputation. C usedtodescribemixedstatesandpartialknowledgeintheseman- The new model of language uses density matrices, on which s. weintroduceanovelrobustgradedordercapturingtheentailment tics of categorical quantum mechanics. Meanings are now repre- sentedbydensitymatricesratherthansimplevectors.Weusethis c strength between concepts. This graded measure emerges from a [ general framework for approximate entailment, induced by any extra flexibility to capture the concept of hyponymy, where one word may be seen as an instance of another. For example, red is commutativemonoid.Quantumlogicembedsinourgradedorder. 2 a hyponym of colour. The hyponymy relation can be associated Ourmaintheoremshowsthatentailmentstrengthliftscompo- v withanotionoflogicalentailment.Someentailmentiscrisp,for sitionallytothesentencelevel,givingalowerboundonsentence 8 example:dogentailsanimal.However,wemayalsowishtoper- entailment. We describe the essential properties of graded entail- 0 mit entailments of differing strengths. For example, the concept ment such as continuity, and provide a procedure for calculating 9 dog gives high support to the the concept pet, but does not com- entailmentstrength. 4 pletelyentailit:somedogsareworkingdogs.Thehyponymyre- 0 CategoriesandSubjectDescriptors ArtificalIntelligence[Natu- lation we describe here can account for these phenomena. We . ralLanguageProcessing]:LexicalSemantics should also be able to measure entailment strengths at the sen- 1 tencelevel.Forexample,werequirethatCujoisadogcrisplyen- 0 Keywords Categorical Compositional Distributional Semantics, tails Cujoisananimal, but that the statement Cujoisadog does 6 ComputationalLinguistics,Entailment,DensityOperator. notcompletelyentailCujoisapet.Again,therelationwedescribe 1 herewillsuccessfullydescribethisbehaviouratthesentencelevel. : v 1. Introduction Anobviouschoiceforalogicbuiltuponvectorspacesisquan- i tum logic (Birkhoff and Von Neumann 1936). Briefly, this logic X Finding a formalization of language in which the meaning of a represents propositions about quantum systems as projection op- r sentencecanbecomputedfromthemeaningofitspartshasbeena eratorsonanappropriateHilbertspace.Theseprojectionsforman a long-standinggoalinformalandcomputationallinguistics. orthomodularlatticewherethedistributivelawfailsingeneral.The Distributionalsemanticsrepresentindividualwordmeaningsas logical structure is then inherited from the lattice structure in the vectorsinfinitedimensionalrealvectorspaces.Ontheotherhand, usualway.Inthecurrentwork,weproposeanorderthatembeds symbolic accounts of meaning combine words via compositional the orthomodular lattice of projections, and so contains quantum rules to form phrases and sentences. These two approaches are logic. This order is based on the Lo¨wner ordering with proposi- in some sense orthogonal. Distributional schemes have no obvi- tions represented by density matrices. When this ordering is ap- ous compositional structure, whereas compositional models lack pliedtodensitymatriceswiththestandardtracenormalization,no acanonicalwayofdeterminingthemeaningofindividualwords. propositions compare, and therefore the Lo¨wner ordering is use- lessasappliedtodensityoperators.Thetrickweuseistodevelop anapproximateentailmentrelationshipwhicharisesnaturallyfrom anycommutativemonoid.Weintroducethisingeneraltermsand describeconditionsunderwhichthisgivesagradedmeasureofen- Permissiontomakedigitalorhardcopiesofpartorallofthisworkforpersonalorclassroomuseisgrantedwithout tailment.Thisgradingbecomescontinuouswithrespecttonoise. feeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnotice OurframeworkisflexibleenoughtosubsumetheBayesianpartial andthefullcitationonthefirstpage.CopyrightsforcomponentsofthisworkownedbyothersthanACMmustbe honored.Abstractingwithcreditispermitted.Tocopyotherwise,torepublish,topostonservers,ortoredistributeto orderingofCoeckeandMartin(2011)andprovidesitwithagrad- lists,contacttheOwner/Author.Requestpermissionsfrompermissions@acm.orgorPublicationsDept.,ACM,Inc.,fax +1(212)869-0481.CopyrightheldbyOwner/Author.PublicationRightsLicensedtoACM. ing. Copyright(cid:13)c ACM...$15.00 Mostcloselyrelatedtothecurrentworkaretheideasin(Balkır Attemptshavealsobeenmadetoincorporateentailmentmea- 2014;Balkıretal.2016,2015).Inthiswork,theauthorsdevelopa sures with elements of compositionality. Baroni et al. (2012) ex- gradedformofentailmentbasedonvonNeumannentropyandwith ploit the entailment relations between adjective-noun and noun linkstothedistributionalinclusionhypothesesdevelopedby(Gef- pairstotrainaclassifierthatcandetectsimilarrelations.Theyfur- fetandDagan2005).Theauthorsshowhowentailmentattheword therdevelopatheoryofentailmentforquantifiers. levelcarriesthroughtoentailmentatthesentencelevel.However, thisisdonewithouttakingaccountofthegrading.Incontrast,the 2. CategoricalCompositionalDistributional measurethatwedevelophereprovidesalowerboundfortheentail- Meaning mentstrengthbetweensentences,basedontheentailmentstrength betweenwords.Further,themeasurepresentedhereisapplicableto Compositional and distributional account of meaning are unified awiderrangeofsentencetypesthanin(Balkıretal.2015).Some in (Coecke et al. 2010), constructing the meaning of sentences oftheworkpresentedherewasdevelopedhereinthefirstauthor’s from the meanings of their component parts using their syntactic MScthesis(Bankova2015). structure. Densitymatriceshavealsobeenusedinotherareasofdistribu- tionalsemantics.Theyareexploitedin(Kartsaklis2015;Piedeleu 2.1 PregroupGrammars 2014; Piedeleu et al. 2015) to encode ambiguity. Blacoe et al. InordertodescribesyntacticstructureweuseLambek’spregroup (2013)usedensityoperatorstoencodethecontextsinwhichaword grammars(Lambek1999).Thischoiceofgrammarisnotessential, occurs,butdonotusetheseoperatorsinacompositionalstructure. and other forms of categorial grammar can be used, as argued Quantum logic has been applied to distributional semantics in (Coecke et al. 2013). A pregroup (P,≤,·,1,(−)l,(−)r) is a in(WiddowsandPeters2003),allowingqueriesoftheform‘suit partially ordered monoid (P,≤,·,1) where each element p ∈ P NOT lawsuit’. Here, the vector for ‘suit’ is projected onto the hasaleftadjointpl andarightadjointpr,suchthatthefollowing subspaceorthogonalto‘lawsuit’.Asimilarapproach,inthefield inequalitieshold: ofinformationretrieval,isdescribedinVanRijsbergen(2004).In thissetting,documentretrievalismodelledasaformofquantum pl·p≤1≤p·pl and p·pr ≤1≤pr·p (1) logicalinference. Intuitively, we think of the elements of a pregroup as linguistic Themajorizationpreorderingondensitymatriceshasbeenex- types.Themonoidalstructureallowsustoformcompositetypes, tensivelyusedinquantuminformation(Nielsen1999),howeverit and the partial order encodes type reduction. The important right cannotbeturnedintoapartialorderandthereforeitisofnouseas and left adjoints then enable the introduction of types requiring anentailmentrelation. furtherelementsoneithertheirleftorrightrespectively. ThepregroupgrammarPreg overanalphabetBisfreelycon- B structed from the atomic types in B. In what follows we use an 1.1 Background alphabet B = {n,s}. We use the type s to denote a declarative Within distributional semantics, word meanings are derived from sentenceandntodenoteanoun.Atransitiveverbcanthenbede- textcorporausingwordco-occurrencestatistics(LundandBurgess notednrsnl.Ifastringofwordsandtheirtypesreducestothetype 1996;MitchellandLapata2010;BullinariaandLevy2007).Other s,thesentenceisjudgedgrammatical.ThesentenceJohnkickscats methods for deriving such meanings may be carried out. In par- istypedn(nrsnl)n,andcanbereducedtosasfollows: ticular, we can view the dimensions of the vector space as at- tributesoftheconcept,andexperimentallydeterminedattributeim- n(nrsnl)n≤1·snln≤1·s·1≤s portanceastheweightingonthatdimensionasin(Hampton1987; This symbolic reduction can also be expressed graphically, as McRae et al. 2005; Vinson and Vigliocco 2008; Devereux et al. shown in figure 1. In this diagrammatic notation, the elimination 2014).Distributionalmodelsoflanguagehavebeenshowntoef- oftypesbymeansoftheinequalitiesn·nr ≤ 1andnl·n ≤ 1 fectivelymodelvariousfacetsofhumanmeaning,suchassimilarity is denoted by a ‘cup’ while the fact that the type s is retained is judgements(McDonaldandRamscar2001),wordsensediscrimi- representedbyastraightwire. nation(Schu¨tze1998;McCarthyetal.2004)andtextcomprehen- sion(LandauerandDumais1997;Foltzetal.1998). John kicks cats Entailmentisanimportantandthrivingareaofresearchwithin distributionalsemantics.ThePASCALRecognisingTextualEntail- n nr s nl n mentChallenge(Daganetal.2006)hasattractedalargenumberof researchersintheareaandgeneratedanumberofapproaches.Pre- vious lines of research on entailment for distributional semantics investigatethedevelopmentofdirectedsimilaritymeasureswhich Figure1:Atransitivesentenceinthegraphicalcalculus can characterize entailment (Weeds et al. 2004; Kotlerman et al. 2010;LenciandBenotto2012).GeffetandDagan(2005)introduce apairofdistributionalinclusionhypotheses,whereifawordven- 2.2 CompositionalDistributionalModels tailsanotherwordw,thenallthetypicalfeaturesofthewordvwill alsooccurwiththewordw.Conversely,ifallthetypicalfeatures Thesymbolicaccountanddistributionalapproachesarelinkedby ofv alsooccurwithw ,v isexpectedtoentailw.Clarke(2009) thefactthattheysharethecommonstructureofacompactclosed defines a vector lattice for word vectors, and a notion of graded category.Thiscompatibilityallowsthecompositionalrulesofthe entailmentwiththepropertiesofaconditionalprobability.Rimell grammartobeappliedinthevectorspacemodel.Inthiswaywe (2014) explores the limitations of the distributional inclusion hy- canmapsyntacticallywell-formedstringsofwordsintooneshared pothesisbyexaminingthethepropertiesofthosefeaturesthatare meaningspace. notsharedbetweenwords.Aninterestingapproachin(Kielaetal. A compact closed category is a monoidal category in which 2015)istoincorporateothermodesofinputintotherepresentation foreachobjectAthereareleftandrightdualobjectsAl andAr, ofaword.Measuresofentailmentarebasedonthedispersionofa andcorrespondingunitandcounitmorphismsηl : I → A⊗Al, wordrepresentation,togetherwithasimilaritymeasure. ηr :I →Ar⊗A,(cid:15)l :Al⊗A→I,(cid:15)r :A⊗Ar →I suchthat thefollowingsnakeequationshold: (1A⊗(cid:15)l)◦(ηl⊗1A)=1A ((cid:15)r⊗1A)◦(1A⊗ηr)=1A A Al A = A A Ar A = A ((cid:15)l⊗1Al)◦(1Al⊗ηl)=1Al (1Ar⊗(cid:15)r)◦(ηr⊗1Ar)=1Ar Theunderlyingposetofapregroupcanbeviewedasacompact closedcategorywiththemonoidalstructuregivenbythepregroup monoid, and (cid:15)l,ηl,ηr,(cid:15)r the unique morphisms witnessing the inequalitiesof(1). Ar A Ar = Ar Al A Al = Al DistributionalvectorspacemodelsliveinthecategoryFHilb offinitedimensionalrealHilbertspacesandlinearmaps.FHilbis compactclosed.EachobjectV isitsowndualandtheleftandright Figure4:TheSnakeEquations unitandcounitmorphismscoincide.Givenafixedbasis{|v (cid:105)} of i i V,wedefinetheunit: η:R→V ⊗V ::1(cid:55)→(cid:88)|v (cid:105)⊗|v (cid:105) Strong monoidal functors automatically preserve the compact i i closed structure. For our example Preg , we must map the i {n,s} noun and sentence types to appropriate finite dimensional vector andcounit: spaces: (cid:88) (cid:88) (cid:15):V ⊗V →R:: cij|vi(cid:105)⊗|vj(cid:105)(cid:55)→ cii Q(n)=N Q(s)=S ij i Compositetypesarethenconstructedfunctoriallyusingthecorre- Hereweusethephysicistsbra-ketnotation,fordetailssee(Nielsen spondingstructureinFHilb.Eachmorphismαinthepregroupis andChuang2010). mappedtoalinearmapinterpretingsentencesofthatgrammatical type.Then,givenwordvectors|w (cid:105)withtypesp ,andatypereduc- 2.3 GraphicalCalculus i i tionα:p ,p ,...p →s,themeaningofthesentencew w ...w 1 2 n 1 2 n Themorphismsofcompactclosedcategoriescanbeexpressedin isgivenby: a convenient graphical calculus (Kelly and Laplaza 1980) which |w w ...w (cid:105)=Q(α)(|w (cid:105)⊗|w (cid:105)⊗...⊗|w (cid:105)) wewillexploitinthesequel.Objectsarelabelledwires,andmor- 1 2 n 1 2 n phismsaregivenasverticeswithinputandoutputwires.Compos- For example, as described in section 2.1, transitive verbs have ingmorphismsconsistsofconnectinginputandoutputwires,and type nrsnl, and can therefore represented in FHilb as a rank 3 thetensorproductisformedbyjuxtaposition,asshowninfigure2. spaceN⊗S⊗N.ThetransitivesentenceJohnkickscatshastype n(nrsnl)n,whichreducestothesentencetypevia(cid:15)r ⊗1 ⊗(cid:15)l. s A Soifwerepresent|kicks(cid:105)by: A B A f (cid:88) |kicks(cid:105)= c |e (cid:105)⊗|s (cid:105)⊗|e (cid:105) ijk i j k A f g ◦ f = B ijk g usingthedefinitionsofthecounitsinFHilbwethenwehave: B C B C |Johnkickscats(cid:105)=(cid:15)N ⊗1S⊗(cid:15)N(|John(cid:105)⊗|kicks(cid:105)⊗|cats(cid:105)) (cid:88) = c (cid:104)John|e (cid:105)⊗|s (cid:105)⊗(cid:104)e |cats(cid:105) ijk i j k A C A C A A ijk (cid:88)(cid:88) f ⊗ g = f g f∗ = f = cijk(cid:104)John|ei(cid:105)(cid:104)ek|cats(cid:105)|sj(cid:105) j ik B D B D B B The category FHilb is actually a †-compact closed category. A †-compact closed category is a compact closed category with an Figure2:MonoidalGraphicalCalculus additionaldaggerfunctorthatisanidentityonobjectsinvolution, satisfyingnaturalcoherenceconditions.Inthegraphicalcalculus, By convention the wire for the monoidal unit is omitted. The the dagger operation “flips diagrams upside-down”. In the case morphisms(cid:15)andηcanthenberepresentedby‘cups’and‘caps’as of FHilb the dagger sends a linear map to its adjoint, and this showninfigure3.Thesnakeequationscanbeseenasstraightening allows us to reason about inner products in a general categorical wires,asshowninfigure4. setting. (cid:15)l (cid:15)r Meaningsofsentencesmaybecomparedusingtheinnerprod- ucttocalculatethecosinedistancebetweenvectorrepresentations. So,ifsentenceshasvectorrepresentation|s(cid:105)andsentences(cid:48) has ηl ηr representation|s(cid:48)(cid:105),theirdegreeofsynonymyisgivenby: (cid:104)s|s(cid:48)(cid:105) Figure3:CompactStructureGraphically (cid:112) (cid:104)s|s(cid:105)(cid:104)s(cid:48)|s(cid:48)(cid:105) 2.4 GrammaticalReductionsinVectorSpaces The abstract categorical framework we have introduced allows meaningstobeinterpretednotjustinFHilb,butinany†-compact Following(PrellerandSadrzadeh2011),reductionsofthepregroup closed category. We will exploit this freedom when we move to grammarmaybemappedintothecategoryFHilboffinitedimen- densitymatrices.Detailedpresentationsoftheideasinthissection sionalHilbertspacesandlinearmapsusinganappropriatestrong aregivenin(Coeckeetal.2010;PrellerandSadrzadeh2011)and monoidalfunctorQ: an introduction to relevant category theory given in (Coecke and Q:Preg→FHilb Paquette2011). 3. DensityMatricesinCategoricalCompositional Table1:TableofdiagramsinCPM(C)andC DistributionalSemantics CPM(C) C 3.1 PositiveOperatorsandDensityMatrices E(ε)=ε ⊗ε ε:A∗⊗A∗⊗A⊗A→I ∗ The methods outlined in section 2 can be applied to the richer A∗ A A∗ A∗ A A setting of density matrices. Density matrices are used in quan- tummechanicstoexpressuncertaintyaboutthestateofasystem. For unit vector |v(cid:105), the projection operator |v(cid:105)(cid:104)v| onto the sub- ε:|ei(cid:105)⊗|ej(cid:105)⊗|ek(cid:105)⊗|el(cid:105)(cid:55)→(cid:104)ei|ek(cid:105)(cid:104)ej|el(cid:105) space spanned by |v(cid:105) is called a pure state. Pure states can be thoughtofasgivingsharp,unambiguousinformation.Ingeneral, E(η)=η∗⊗η η:I →A⊗A⊗A∗⊗A∗ densitymatricesaregivenbyaconvexsumofpurestates,describ- ingaprobabilisticmixture.Statesthatarenotpurearereferredtoas mixedstates.Necessaryandsufficientconditionsforanoperatorρ A∗ A A∗ A∗ A A toencodesuchaprobabilisticmixtureare: (cid:80) η:1(cid:55)→ |e (cid:105)⊗|e (cid:105)⊗|e (cid:105)⊗|e (cid:105) ij i j i j • ∀v∈V.(cid:104)v|ρ|v(cid:105)≥0 B D B∗D∗ D B • ρisself-adjoint. • ρhastrace1. f f f f Operatorssatisfyingthefirsttwoaxiomsarecalledpositiveopera- 1 2 1 2 tors.Thethirdaxiomensuresthattheoperatorrepresentsaconvex mixture of pure states. However, relaxing this condition gives us A C A∗C∗ C A differentchoicesfornormalization,whichwewilloutlineinsec- f ⊗f :A∗⊗C∗⊗C⊗A→B∗⊗D∗⊗D⊗B tion5.4. 1 2 Indistributionalmodelsofmeaning,wecanconsiderthemean- ingofawordwtobegivenbyacollectionofunitvectors{|wi(cid:105)}i, k(cid:55)→k∗⊗k where each |w (cid:105) represents an instance of the concept expressed i Thisfunctorpreservesthe†-compactclosedstructure,andisfaith- by the word. Each |w (cid:105) is weighted by p ∈ [0,1], such that i i (cid:80) ful“uptoaglobalphase”(Selinger2007). p =1.Theseweightsdescribethemeaningofwasaweighted i i combinationofexemplars.Thenthedensityoperator: 3.3 DiagrammaticcalculusforCPM(C) (cid:88) w = pi|wi(cid:105)(cid:104)wi| As CPM(C) is also a †-compact closed category, we can use (cid:74) (cid:75) i thegraphicalcalculusdescribedinsection2.3.Byconvention,the represents the word w. For example a cat is a fairly typical pet, diagrammatic calculus for CPM(C) is drawn using thick wires. andatarantulaislesstypical,soasimpledensityoperatorforthe ThecorrespondingdiagramsinCaregivenintable1. wordpetmightbe: 3.3.1 SentenceMeaninginthecategoryCPM(FHilb) pet =0.9×|cat(cid:105)(cid:104)cat|+0.1×|tarantula(cid:105)(cid:104)tarantula| Inthevectorspacemodelofdistributionalmodelsofmeaningthe (cid:74) (cid:75) 3.2 TheCPMConstruction transitionbetweensyntaxandsemanticswasachievedviaastrong monoidalfunctorQ:Preg→FHilb.Languagecanbeassigned ApplyingSelinger’sCPMconstruction(Selinger2007)toFHilb semantics in CPM(FHilb) in an entirely analogous way via a producesanew†-compactclosedcategoryinwhichthestatesare strongmonoidalfunctor: positiveoperators.Thisconstructionhaspreviouslybeenexploited in a linguistic setting in (Kartsaklis 2015; Piedeleu et al. 2015; S:Preg→CPM(FHilb) Balkıretal.2016). Definition 3. Let w ,w ...w be a string of words with corre- ThroughoutthissectionCdenotesanarbitrary†-compactclosed 1 2 n sponding grammatical types t in Preg . Suppose that the type category. reduction is given by t ,...t i−→r x forBsome x ∈ Ob(Preg . 1 n B Definition1(Completelypositivemorphism). AC-morphismϕ : Let w bethemeaningofwordw inCPM(FHilb),i.e.astate i i A∗⊗A→B∗⊗Bissaidtobecompletelypositive(Selinger2007) ofth(cid:74)efo(cid:75)rmI → S(t ).Thenthemeaningofw w ...w isgiven i 1 2 n ifthereexistsC ∈Ob(C)andk ∈C(C⊗A,B),suchthatϕcan by: bewrittenintheform: w w ...w =S(r)( w ⊗...⊗ w ) 1 2 n 1 n (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) (k∗⊗k)◦(1A∗ ⊗ηC⊗1A) We now have all the ingredients to derive sentence meanings Identity morphisms are completely positive, and completely inCPM(FHilb). positivemorphismsareclosedundercompositioninC,leadingto Example 1. We firstly show that the results from FHilb lift to thefollowing: CPM(FHilb).LetthenounspaceNbearealHilbertspacewith Definition2. IfC isa†-compactclosedcategorythenCPM(C) basisvectorsgivenby{|ni(cid:105)}i,whereforsomei,|ni(cid:105) = |Clara(cid:105) isacategorywiththesameobjectsasCanditsmorphismsarethe andforsomej,|nj(cid:105) = |beer(cid:105).Letthesentencespacebeanother completelypositivemorphisms. spaceSwithbasis{|si(cid:105)}i.Theverb|likes(cid:105)isgivenby: (cid:88) The †-compact structure required for interpreting language in |likes(cid:105)= C |n (cid:105)⊗|s (cid:105)⊗|n (cid:105) pqr p q r oursettingliftstoCPM(C): pqr Theorem1. CPM(C)isalsoa†-compactclosedcategory.There ThedensitymatricesforthenounsClaraandbeerareinfactpure isafunctor: statesgivenby: E:C →CPM(C) Clara =|n (cid:105)(cid:104)n | and beer =|n (cid:105)(cid:104)n | i i j j (cid:74) (cid:75) (cid:74) (cid:75) andsimilarly, likes inCPM(FHilb)is: Thesisters enjoy drinks (cid:74) (cid:75) (cid:88) likes = C C |n (cid:105)(cid:104)n |⊗|s (cid:105)(cid:104)s |⊗|n (cid:105)(cid:104)n | pqr tuv p t q u r v (cid:74) (cid:75) pqrtuv Themeaningofthecompositesentenceissimply(εN⊗1S⊗εN) N N N(cid:48) S N N S N(cid:48) N(cid:48) N(cid:48) appliedto( Clara ⊗ likes ⊗ beer )asshowninfigure5,with interpretatio(cid:74)ninF(cid:75)Hilb(cid:74) show(cid:75)ni(cid:74)nfigu(cid:75)re6. Clara likes beer Figure7:AtransitivesentenceinCwithimpurestates N N S N N Diagrammatically,thisisshowninfigure7. Theimpurityisindicatedbythefactthatthepairsofstatesare connectedbywires(Selinger2007). Figure5:AtransitivesentenceinCPM(C) 4. PredicatesandEntailment If we consider a model of (non-deterministic) classical computa- Clara likes beer tion,astateofasetXisjustasubsetρ⊆X.Similarly,apredicate isasubsetA⊆X.WesaythatρsatisfiesAif: N N N(cid:48) S N N S N(cid:48) N(cid:48) N(cid:48) ρ⊆A whichwewriteasρ (cid:13) A.PredicateAentailspredicateB,writ- tenA|=Bifforeverystateρ: ρ(cid:13)A ⇒ ρ(cid:13)B Figure6:AtransitivesentenceinCwithpurestates ClearlythisisequivalenttorequiringA⊆B. Intermsoflinearalgebra,thiscorrespondsto: 4.1 TheLo¨wnerOrder Claralikesbeer =ϕ( Clara ⊗ likes ⊗ beer ) Asourlinguisticmodelsderivefromaquantummechanicalformal- (cid:74) (cid:75) (cid:88)(cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) ism,positiveoperatorsformanaturalanalogueforsubsetsasour = CiqjCiuj|sq(cid:105)(cid:104)su| predicates.Thisfollowsideasin(D’HondtandPanangaden2006) qu and earlier work in a probabilistic setting in (Kozen 1983). Cru- (cid:80) cially,wecanorderpositiveoperators(Lo¨wner1934). Thisisapurestatecorrespondingtothevector C |s (cid:105). q iqj q Definition4(Lo¨wnerOrder). ForpositiveoperatorsAandB,we However,inCPM(FHilb)wecanworkwithmorethanthe define: purestates. A(cid:118)B ⇐⇒ B−Aispositive Example 2. Let the noun space N be a real Hilbert space with basisvectorsgivenby{|n (cid:105)} .Let: Ifweconsiderthisasanentailmentrelationship,wecanfollow i i ourintuitionsfromthenon-deterministicsetting.Firstlyweintro- (cid:88) (cid:88) (cid:88) |Annie(cid:105)= ai|ni(cid:105), |Betty(cid:105)= bi|ni(cid:105), |Clara(cid:105)= ci|ni(cid:105) duceasuitablenotionofsatisfaction.ForpositiveoperatorAand i i i density matrix ρ, we define ρ (cid:13) A as the positive real number (cid:88) (cid:88) tr(ρA).Thisgeneralizessatisfactionfromabinaryrelationtoabi- |beer(cid:105)= d |n (cid:105), |wine(cid:105)= e |n (cid:105) i i i i naryfunctionintothepositivereals.WethenfindthattheLo¨wner i i order can equivalently be phrased in terms of satisfaction as fol- andwiththesentencespaceS,wedefine: lows: (cid:88) |likes(cid:105)= Cpqr|np(cid:105)⊗|sq(cid:105)⊗|nr(cid:105) Lemma 1 (D’Hondt and Panangaden (2006)). Let A and B be pqr positiveoperators.A(cid:118)Bifandonlyifforalldensityoperatorsρ: (cid:88) |appreciates(cid:105)= Dpqr|np(cid:105)⊗|sq(cid:105)⊗|nr(cid:105) ρ(cid:13)A ≤ ρ(cid:13)B pqr Linguistically, we can interpret this condition as saying that Then,wecanset: everynoun,forexample,satisfiespredicateB atleastasstrongly 1 asitsatisfiespredicateA. thesisters = (|Annie(cid:105)(cid:104)Annie|+|Betty(cid:105)(cid:104)Betty|+|Clara(cid:105)(cid:104)Clara|) (cid:74) (cid:75) 3 4.2 QuantumLogic 1 drinks = (|beer(cid:105)(cid:104)beer|+|wine(cid:105)(cid:104)wine|) (cid:74) (cid:75) 2 Quantumlogic(BirkhoffandVonNeumann1936)viewsthepro- 1 jectionoperatorsonaHilbertspaceaspropositionsaboutaquan- enjoy = (|like(cid:105)(cid:104)like|+|appreciate(cid:105)(cid:104)appreciate|) tumsystem.AstheLo¨wnerorderrestrictstotheusualorderingon (cid:74) (cid:75) 2 projectionoperators,wecanembedquantumlogicwithintheposet Then,themeaningofthesentence: ofprojectionoperators,providingadirectlinktoexistingtheory. s=Thesistersenjoydrinks 4.3 AGeneralSettingforApproximateEntailment isgivenby: Wecanbuildanentailmentpreorderonanycommutativemonoid, s =(ε ⊗1 ⊗ε )( thesisters ⊗ enjoy ⊗ drinks ) viewingtheunderlyingsetasacollectionofpropositions.Wethen N S N (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) write: operatorswithmatrixrepresentations: A|=B     1 0 0 1 0 0 andsayAentailsBifthereexistsapropositionDsuchthat: A=0 1 0 B =0 0 0 A+D=B 0 0 0 0 0 1 If our commutative monoid is the powerset of some set X, with PredicateAcannotentailB withanypositivestrengthk.Wecan unionthebinaryoperationandunittheemptyset,thenwerecover seeB−kAisneverapositiveoperatorasthefollowingexpression ournon-deterministiccomputationexamplefromtheprevioussec- isalwaysnegative: tion.Ifontheotherhandwetakeourcommutativemonoidtobe    1−k 0 0 0 thepositiveoperatorsonsomeHilbertspace,withadditionofoper- (cid:0) (cid:1) 0 1 0  0 −k 01 atorsandthezerooperatorasthemonoidstructure,werecoverthe 0 0 1 0 Lo¨wnerordering. In linguistics, we may ask ourselves does dog entail pet? We can find a more general positive operator E such that A (cid:118) Na¨ıvely,theanswerisclearlyno,noteverydogisapet.Thisseems B+Ethough,as: toocrudeforrealisticapplicationsthough,mostdogsarepets,and         1 0 0 0 0 0 1 0 0 0 0 0 sowemightsaydogentailspettosomeextent.Thismotivatesour needforanapproximatenotionofentailment. 0 1 0+0 0 0=0 0 0+0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 ForpropositionE,wesaythatAentailsBtotheextentEif: A|=B+E Thereforethegeneralerrortermsofferstrictlymorefreedomthan k-hyponymy. We think of E as a error term, for instance in our dogs and pets example,Eaddsbackindogsthatarenotpets.Expandingdefini- 5. HyponymyinCategoricalCompositional tions,wefindAentailsBtoextentEifthereexistsDsuchthat: DistributionalSemantics A+D=B+E (2) Modelling hyponymy in the categorical compositional distribu- Fromthismoresymmetricalformulationitiseasytoseethatfor tionalsemanticsframeworkwasfirstconsideredin(Balkır2014). arbitrary propositions A, B, proposition A trivially entails B to She introduced an asymmetric similarity measure called repre- extentA,asbycommutativity: sentativeness on density matrices based on quantum relative en- A+B =B+A tropy. This can be used to translate hyponym-hypernym relations to the level of positive transitive sentences. Our aim here will be It is therefore clear that the mere existence of a suitable error to provide an alternative measure which relies only on the prop- term is not sufficient for a weakened notion of entailment. If we erties of density matrices and the fact that they are the states restrictourattentiontoerrorsinacompletemeetsemilatticeE , A,B inCPM(FHilb).Thiswillenableustoquantifythestrengthof we can take the lower bound on the E satisfying equation (2) as the hyponymy relationship, described as k-hyponymy. The mea- our canonical choice. Finally, if we wish to be able to compare sureofhyponymythatweusehastwoadvantagesovertherepre- entailment strengths globally, this can be achieved by choosing a sentativenessmeasure.Firstly,itcombineswiththelinearsentence partialorderKof“errorsizes”andmonotonefunctions: maps so that we can work with sentence-level entailment across E −κ−A−,B→K alargerrangeofsentences.Secondly,duetothewayitcombines A,B withlinearmaps,wecangiveaquantitativemeasuretosentence- sendingerrorstotheircorrespondingsize. levelentailmentbasedontheentailmentstrengthsbetweenwords, For example, if A and B are positive operators, we take our whereasrepresentativenessisnotshowntocombineinthisway. complete lattice of error terms E to be all operators of the A,B form (1−k)A for k ∈ [0,1], ordered by the size of 1−k. We 5.1 Propertiesofhyponymy then take k as the strength of the entailment, and refer to it as Before proceeding with defining the concept of k-hyponymy, we k-hyponymy. will list a couple of properties of hyponymy. We will show later InthecaseoffinitesetsA,B,wetakeE =P(A),andtake A,B thatthesecanbecapturedbyournewmeasure. thesizeoftheerrortermsas: cardinalityofE • Asymmetry.IfAisahyponymofB,thenthisdoesnotimply cardinalityofA thatYisahyponymofX.Infact,wemayevenassumethatonly oneoftheserelationshipsispossible,andthattheyaremutually measuring “how much” of A we have to supplement B with, as exclusive. For example, football is a type of sport and hence indicatedintheshadedregionbelow: football-sportisahyponym-hypernympair.However,sportis notatypeoffootball. B • Pseudo-transitivity. If X is a hyponym of Y and Y is a hy- ponym of Z, then X is a hyponym of Z. However, if the hy- A ponymy is not perfect, then we get a weakened form of tran- sitivity. For example, dog is a hyponym of pet, and pet is a hyponymofthingsthatarecaredfor.However,noteverydog Intermsofconditionalprobability,theerrorsizeisthen: is well cared-for, so the transitivity weakens. An outstanding question is where the entailment strength reverses. For exam- P(A|¬B) ple, dog imperfectly entails pet, and pet imperfectly entails 4.3.1 k-hyponymyVersusGeneralErrorTerms mammal,butdogperfectlyentailsmammal. We can see that the general error terms are strictly more general Themeasureofhyponymythatwedescribedaboveandnamed than considering the k-hyponymy case. If we consider positive k-hyponymywillbedefinedintermsofdensitymatrices-thecon- tainers for word meanings. The idea is then to define a quantita- Symmetry k-hyponymyisneithersymmetricnoranti-symmetric. tiveorderonthedensitymatrices,whichisnotapartialorder,but Forexampleitisnotsymmetricsince: doesgiveusanindicationoftheasymmetricrelationshipbetween (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) 1 0 1 0 1 0 1 0 words. (cid:50) but (cid:54)(cid:50) 0 0 1 0 1 0 1 k 0 0 5.2 OrderingPositiveMatrices butitisalsonotanti-symmetric,since A density matrix can be used to encode the extent of preci- sion that is needed when describing an action. In the sentence (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) 1 0 1/2 0 1/2 0 1 0 Itookmypettothevet,wedonotknowwhetherthepetisadog, 0 1/2 (cid:50)1/2 0 1 and 0 1 (cid:50)1/2 0 1/2 cat, tarantula and so on. The sentence Itookmydogtothevet is more specific. We can think of the meaning of the word pet as Transitivity k-hyponymy satisfies a version of transitivity. Sup- representedby: poseA(cid:50) BandB (cid:50) C.ThenA(cid:50) C,since: k l kl pet =pd|dog(cid:105)(cid:104)dog|+pc|cat(cid:105)(cid:104)cat|+ B (cid:118)kAandC (cid:118)lB =⇒ C (cid:118)klA (cid:74) (cid:75) p |tarantula(cid:105)(cid:104)tarantula|+... t bytransitivityoftheLo¨wnerorder. where ∀i.pi ≥0 and (cid:88)pi =1 B (cid:50)ForthCeamnadxAim(cid:50)alvalueCs,kwmeaxh,almvaex,thmeminaxeqsuuaclhitythatA (cid:50)kmax B, i lmax mmax Wethenwishtodevelopanorderondensitymatricessothatdog,as m ≥k l max max max representedby|dog(cid:105)(cid:104)dog|ismorespecificthanpetasrepresented by pet .Thisorderingmaythenbeviewedasanentailmentrela- Continuity For A (cid:50)k B, when there is a small perturbation to tion(cid:74),an(cid:75)dwewishtoshowthatentailmentbetweenwordscanliftto A,thereisacorrespondinglysmalldecreaseinthevalueofk.The thelevelofsentences,sothatthesentenceItookmydogtothevet perturbation must lie in the support of B, but can introduce off- entailsthesentenceItookmypettothevet. diagonalelements. Wenowdefineournotionofapproximateentailment,following Theorem 3. Given A (cid:50) B and density operator ρ such that thediscussionsofsection4.3: k supp(ρ) ⊆ supp(B),thenforanyε > 0wecanchooseaδ > 0 Definition5(k-hyponym). WesaythatAisak-hyponymofBfor suchthat: agivenvalueof kintherange(0,1]andwriteA(cid:50)k Bif: A(cid:48) =A+δρ =⇒ A(cid:48) (cid:50)k(cid:48) Band|k−k(cid:48)|<ε. 0(cid:118)B−kA 5.4 Scaling Notethatsuchak neednotbeuniqueorevenexistatall.We will consider the interpretation and implications of this later on. When comparing positive operators, in order to standardize the Moreover, whenever we do have k-hyponymy between A and B, magnitudes resulting from calculations, it is natural to consider thereisnecessarilyalargestsuchk. normalizing their trace so that we work with density operators. Unfortunately,thisisapoorchoicewhenworkingwiththeLo¨wner Definition6(k-maxhyponym). IfAisak-hyponymofBforany orderasdistinctpairsofdensityoperatorsareneverorderedwith k∈(0,1],thenthereisnecessarilyamaximalpossiblesuchk.We respecttoeachother.Insteadweconsiderboundingouroperators denote it by kmax and define it to be the maximum value of k in as having maximum eigenvalue 1, as suggested in (D’Hondt and therange(0,1]forwhichwehaveA(cid:50)k B,inthesensethatthere Panangaden 2006). With this ordering, the projection operators doesnotexistk(cid:48) ∈(0,1]s.t.k(cid:48) >kandA(cid:50)k(cid:48) B. regain their usual ordering and we recover quantum logic as a suborderofoursetting. Ingeneral,weareinterestedinthemaximalvaluekforwhichk- Ourframeworkisflexibleenoughtosupportothernormaliza- hyponymyholdsbetweentwopositiveoperators.Thisk-maxvalue tionstrategies.Theoptimalchoiceforlinguisticapplicationsisleft quantifiesthestrengthoftheentailmentbetweenthetwooperators. Inwhatfollows,foroperatorAwewriteA+forthecorresponding tofutureempiricalwork.Moreinterestingideasarealsopossible. ForexamplewecanembedtheBayesianorder(CoeckeandMartin Moore-Penrosepseudo-inverseandsupp(A)forthesupportofA. 2011)withinoursettingviaasuitabletransformationonpositive Lemma2(Balkır(2014)). LetA,Bbepositiveoperators. operators.ThisisdescribedinmoredetailinappendixsectionA.1. Furthertheoreticalinvestigationsofthistypearelefttofuturework. supp(A)⊆supp(B) ⇐⇒ ∃k.k>0andB−kA≥0 Lemma3. Forpositiveself-adjointmatricesA,Bsuchthat: 5.5 Examples supp(A)⊆supp(B) In this section we give three simple examples and illustrate the orderfor2-dimensionalmatricesintheBlochsphere. B+Ahasnon-negativeeigenvalues. Example3. Considerthedensitymatrix: Wenowdevelopanexpressionfortheoptimalkintermsofthe matricesAandB. pet =1/2|dog(cid:105)(cid:104)dog|+1/2|cat(cid:105)(cid:104)cat| (cid:74) (cid:75) Theorem2. Forpositiveself-adjointmatricesA,Bsuchthat: Theentailmentstrengthksuchthatk|dog(cid:105)(cid:104)dog|≤ pet is 1. 2 (cid:74) (cid:75) supp(A)⊆supp(B) Further, if two mixed states ρ, σ can both be expressed as convex combinations of the same two pure states, the extent to themaximumksuchthatB−kA≥0isgivenby1/λwhereλis whichonestateentailstheothercanalsobederived. themaximumeigenvalueofB+A. Example4. Forstates: 5.3 Propertiesofk-hyponymy ρ=r|ψ(cid:105)(cid:104)ψ|+(1−r)|φ(cid:105)(cid:104)φ| Reflexivity k-hyponymyisreflexivefork =1.Foranyoperator A,A−A=0. σ=s|ψ(cid:105)(cid:104)ψ|+(1−s)|φ(cid:105)(cid:104)φ| theentailmentstrengthksuchthatkσ≤ρisgivenby: more words from the other. We will show that in this case the (cid:40) hyponymy is ‘lifted’ to the sentence level, and that the k-values r ifr<s k= s arepreservedinaveryintuitivefashion.Afterconsideringacouple (1−r) otherwise ofspecificsentenceconstructions,wewillgeneralisethisresultto (1−s) accountforabroadcategoryofsentencepatternsthatworkinthe (cid:80) Example5. Supposethat B =k A + k X .Then: j i(cid:54)=j i i compositionaldistributionalmodel. (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) A (cid:50) B k 6.1 k-hyponymyinpositivetransitivesentences (cid:74) (cid:75) (cid:74) (cid:75) foranyk≤kj. Apositivetransitivesentencehasthediagrammaticrepresentation Fromtheaboveexamplewenoticethatthevaluek definitely in CPM(FHilb) given in figure 7. The meaning of the sen- 1 givesusk -hyponymybetweenAandB,butitisactuallypossible tencesubjverbobjisgivenby: 1 thatthereexistsavalue,sayl,suchthatl > k1 andforwhichwe (εN ⊗1S⊗εN)( subj ⊗ verb ⊗ obj ) , havel-hyponymybetweenAandB.Indeed,thishappenswhenever (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) wehaveanlforwhich: wheretheεN and1S morphismsarethosefromCPM(FHilb). Wewillrepresentthesubjectandobjectby: (cid:88) (k −l)(cid:104)x| A |x(cid:105)≥− p (cid:104)x| X |x(cid:105) 1 i i (cid:88) (cid:88) (cid:74) (cid:75) i(cid:54)=1 (cid:74) (cid:75) subj = aik|ni(cid:105)(cid:104)nk| and obj = bjl|nj(cid:105)(cid:104)nl|. (cid:74) (cid:75) (cid:74) (cid:75) ik jl Thus,k maynotbethemaximumvalueforhyponymybetweenA 1 and B. In general, however, we are interested in making the Finally,lettheverbbegivenby: strongest assertion we can and therefore we are interested in the (cid:88) verb = C |n (cid:105)(cid:104)n |⊗|s (cid:105)(cid:104)s |⊗|n (cid:105)(cid:104)n | maximum value of k, which we call the entailment strength. For (cid:74) (cid:75) pqrtuv p t q u r v matricesonR2,wecanrepresenttheseentailmentstrengthsvisu- pqrtuv allyusingtheBlochsphererestrictedtoR2-the‘Blochdisc’. Theorem4. Letn1,n2,n3,n4benounswithcorrespondingden- sitymatrixrepresentations n , n , n and n ,suchthatn 1 2 3 4 1 5.5.1 Representingtheorderinthe‘Blochdisc’ isak-hyponymofn andn(cid:74) is(cid:75)a(cid:74)l-h(cid:75)yp(cid:74)ony(cid:75)mofn(cid:74) .T(cid:75)hen: 2 3 4 The Bloch sphere, (Bloch 1946), is a geometrical representation ϕ(n verbn )(cid:50) ϕ(n verbn ), 1 3 kl 2 4 of quantum states. Very briefly, points on the sphere correspond whereϕ=ε ⊗1 ⊗ε isthesentencemeaningmapforpositive topurestates,andstateswithinthespheretoimpurestates.Since N S N transitivesentences. we consider matrices only over R2, we disregard the complex phase which allows us to represent the pure states on a circle. A 6.2 GeneralSentencek-hyponymy purestatecos(θ/2)|0(cid:105)+sin(θ/2)|1(cid:105)isrepresentedbythevector Wecanshowthattheapplicationofk-hyponymytovariousphrase (sin(θ),cos(θ))onthecircle. typesholdsinthesameway.Inthissectionweprovideageneral We can calculate the entailment factor k between any two proof for varying phrase types. We adopt the following conven- points on the disc. For example, in figure 8 we show contour tions: maps of the entailment strengths for the state with Bloch vector (3sin(π/5),3cos(π/5)),usingthemaximumeigenvaluenormal- • Apositivephraseisassumedtobeaphraseinwhichindividual 4 4 ization. wordsareupwardlymonotoneinthesensedescribedby(Mac- CartneyandManning2007).Thismeansthat,forexample,the |0(cid:105) phrasedoesnotcontainanynegations,includingwordslikenot. 0.2 • Thelengthofaphraseisthenumberofwordsinit,notcounting 0.2 0.4 0.40.6 definiteandindefinitearticles. 0.8 Theorem5(GeneralisedSentencek-Hyponymy). LetΦandΨbe 1 0.40.2 twopositivephrasesofthesamelengthandgrammaticalstructure, expressedinthesamenounspacesN andsentencespacesS.De- 6 0.8 0. notethenounsandverbsofΦ,intheorderinwhichtheyappear, 2 1 |0(cid:105)√−2|1(cid:105) 0. 0.4 0.6 0.8 0.6 0.4 0.2 |0(cid:105)√+2|1(cid:105) cafboonyrrdArie1(cid:74)sB,∈p.o1.n(cid:75){.,d1,.iA,n..g.n.,..(cid:74)dS,Beinmnn}si(cid:75)iltaayrnreldmsyp,asdeotcermtniicoeveetseklyt.bheeS∈suedp(eip0nno,oΨs1tee]b.dtyhFbBainyt1a(cid:74)l.(cid:74)AlA.y,.i1(cid:75)Bl(cid:75)e,nt(cid:50)..ϕ.kL.ie,bt(cid:74)e(cid:74)tAhBtenhiie(cid:75)(cid:75)r i 0.4 sentence meaning map for both Φ and Ψ, such that ϕ(Φ) is the 0.2 meaningofΦandϕ(Ψ)isthemeaningofΨ.Then: 0.2 ϕ(Φ)(cid:50) ϕ(Ψ). k1···kn sok ···k providesalowerboundontheextenttowhichϕ(Φ) 1 n |1(cid:105) entailsϕ(Ψ) Intuitively,thismeansthatif(someof)thefunctionalwordsof Figure8:EntailmentstrengthsintheBlochdiscforthestatewith a sentence Φ are k-hyponyms of (some of) the functional words Blochvector(3sin(π/5),3cos(π/5)). ofsentenceΨ,thenthishyponymyistranslatedintosentencehy- 4 4 ponymy. Upward-monotonicity is important here, and in particu- larimplicitquantifiers.Itmightbeobjectedthatdogsbarkshould notimplypetsbark.Iftheimplicitquantificationisuniversal,then 6. MainResultsonCompositionality this is true, however the universal quantifier is downward mono- Wewillnowconsiderwhathappenswhenwehavetwosentences tone, and therefore does not conform to the convention concern- such that one of them contains one or more hyponyms of one or ing positive phrases. If the implicit quantification is existential, thensomedogsbarkdoesentailsomepetsbark,andtheproblemis 1 1 = s + (ϕ( Mary ⊗ scoffs ⊗ choc ) averted.Discussionofthebehaviourofquantifiersandotherword 4(cid:74) 1(cid:75) 4 (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) typesisgivenin(MacCartneyandManning2007). +ϕ( Mary ⊗ nibbles ⊗ cake ) Thequantityk ···k isnotnecessarilymaximal,andindeed (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) 1 n +ϕ( Mary ⊗ nibbles ⊗ choc )) usually is not. As we only have a lower bound, zero entailment (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) strengthbetweenapairofcomponentsdoesnotimplyzeroentail- Therefore: mentstrengthbetweenentiresentences.Resultsforphrasesinvolv- 1 s − s =ϕ( Mary ⊗ choc ⊗ choc ) ingrelativeclausesmaybefoundinappendixC. (cid:74) 2(cid:75) 4(cid:74) 1(cid:75) (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) Corollary1. Considertwosentences: +ϕ( Mary ⊗ nibbles ⊗ cake ) (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) (cid:79) (cid:79) +ϕ( Mary ⊗ nibbles ⊗ choc )) Φ= Ai Ψ= Bi (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) i (cid:74) (cid:75) i (cid:74) (cid:75) We can see that s2 − 14 s1 is positive by positivity of the individualelement(cid:74)san(cid:75)dthe(cid:74)fact(cid:75)thatpositivityispreservedunder suchthatforeachi∈{1,...,n}wehave A (cid:118) B ,i.e.thereis i i strictentailmentineachcomponent.Then(cid:74)the(cid:75)reis(cid:74)stri(cid:75)ctentailment additionandtensorproduct.Therefore: betweenthesentencesϕ(Φ)andϕ(Ψ). s (cid:50) s 1 kl 2 (cid:74) (cid:75) (cid:74) (cid:75) Weconsideraconcreteexample. asrequired. Compositionalityofk-hyponymyinatransitivesentence. More examples may be found in appendix B. Suppose we have a noun 7. Conclusion space N with basis {|e (cid:105)} , and sentence space S with basis i i Integrating a logical framework with compositional distributional {|x (cid:105)} We consider the verbs nibble, scoff and the nouns cake, j j semanticsisanimportantstepinimprovingthismodeloflanguage. chocolate,withsemantics: By moving to the setting of density matrices, we have described nibble = (cid:88) a a |e (cid:105)(cid:104)e |⊗|x (cid:105)(cid:104)x |⊗|e (cid:105)(cid:104)e | a graded measure of entailment that may be used to describe the pqr tuv p t q u r v (cid:74) (cid:75) extent of entailment between two words represented within this pqrtuv (cid:88) enrichedframework.Thisapproachextendsuniformlytoprovide scoff = bpqrbtuv|ep(cid:105)(cid:104)et|⊗|xq(cid:105)(cid:104)xu|⊗|er(cid:105)(cid:104)ev| entailment strengths between phrases of any type. We have also (cid:74) (cid:75) pqrtuv shownhowalowerboundonentailmentstrengthofphrasesofthe (cid:88) samestructurecanbecalculatedfromtheircomponents. cake = c c |e (cid:105)(cid:104)e | i j i j (cid:74) (cid:75) We can extend this work in several directions. Firstly, we can i (cid:88) examinehownarrowingdownaconceptusinganadjectivemight chocolate = didj|ei(cid:105)(cid:104)ej| operate. For example, we should have that redcar entails car. (cid:74) (cid:75) i Other adjectives should not operate in this way, such as former whichmakethesenounsandverbspurestates.Themoregeneral informerpresident. eatandsweetsaregivenby: Another line of inquiry is to examine transitivity behaves. In somecasesentailmentcanstrengthen.Wehadthatdogentailspet 1 eat = ( nibble + scoff ) toacertainextent,andthatpetentailsmammaltoacertainextent, (cid:74) (cid:75) 2 (cid:74) (cid:75) (cid:74) (cid:75) butthatdogcompletelyentailsmammal. 1 sweets = ( cake + chocolate ) Ourframeworksupportsdifferentmethodsofscalingthepos- (cid:74) (cid:75) 2 (cid:74) (cid:75) (cid:74) (cid:75) itive operators representing propositions. Empirical work will be Then requiredtoestablishthemostappropriatemethodinlinguisticap- scoff (cid:50) eat plications. (cid:74) (cid:75) 1/2 (cid:74) (cid:75) Sentenceswithnon-identicalstructuremustalsobetakeninto cake (cid:50) sweets 1/2 account.Oneapproachtothismightbetolookatthefirststagein (cid:74) (cid:75) (cid:74) (cid:75) Weconsiderthesentences: thesentencereductionsatwhichtheelementwisecomparisoncan bemade.Forexample,inthetwosentencesJohnrunsveryslowly, s =Johnscoffscake 1 andHungryboyssprintquickly,wecancomparethenounphrases s =Johneatssweets John,andHungryboys,theverbsrunsandsprints,andtheadverb 2 phrasesveryslowlyandquickly.Further,theinclusionofnegative Thesemanticsofthesesentencesare: wordslikenot,ornegativeprefixes,shouldbemodelled. s =ϕ( Mary ⊗ scoffs ⊗ cake ) 1 (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) s =ϕ( Mary ⊗ eats ⊗ sweets ) Acknowledgments 2 (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) andaspertheorem5,wewillshowthat s (cid:50) s wherekl= Bob Coecke, Martha Lewis, and Daniel Marsden gratefully ac- 1 kl 2 1 × 1 = 1 Expanding s weobtain: (cid:74) (cid:75) (cid:74) (cid:75) knowledge funding from AFOSR grant Algorithmic and Logical 2 2 4 2 (cid:74) (cid:75) AspectswhenComposingMeanings. 1 s =ϕ( Mary ⊗ ( nibbles + scoffs ) (cid:74) 2(cid:75) (cid:74) (cid:75) 2 (cid:74) (cid:75) (cid:74) (cid:75) References 1 ⊗ ( cake + choc )) 2 (cid:74) (cid:75) (cid:74) (cid:75) E.Balkır. Usingdensitymatricesinacompositionaldistributionalmodel 1 ofmeaning.Master’sthesis,UniversityofOxford,2014. = (ϕ( Mary ⊗ scoffs ⊗ cake ) 4 (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) E.Balkır,D.Kartsaklis,andM.Sadrzadeh.Sentenceentailmentincompo- +ϕ( Mary ⊗ scoffs ⊗ choc ) sitionaldistributionalsemantics,2015.ToappearISAIM2016. (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) E.Balkır,M.Sadrzadeh,andB.Coecke.Distributionalsentenceentailment +ϕ( Mary ⊗ nibbles ⊗ cake ) using density matrices. In Topics in Theoretical Computer Science, (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) +ϕ( Mary ⊗ nibbles ⊗ choc )) pages1–22.Springer,2016. (cid:74) (cid:75) (cid:74) (cid:75) (cid:74) (cid:75) D. Bankova. Comparing meaning in language and cognition - p- T.K.LandauerandS.T.Dumais.AsolutiontoPlato’sproblem:Thelatent hypononymy,conceptcombination,asymmetricsimilarity.Master’sthe- semanticanalysistheoryofacquisition,induction,andrepresentationof sis,UniversityofOxford,2015. knowledge.Psychologicalreview,104(2):211,1997. M. Baroni, R. Bernardi, N.-Q. Do, and C.-C. Shan. Entailment above A.LenciandG.Benotto. Identifyinghypernymsindistributionalseman- thewordlevelindistributionalsemantics. InProceedingsofthe13th ticspaces. InProceedingsoftheFirstJointConferenceonLexicaland ConferenceoftheEuropeanChapteroftheACL,pages23–32,2012. Computational Semantics-Volume 1: Proceedings of the main confer- G.BirkhoffandJ.VonNeumann.Thelogicofquantummechanics.Annals enceandthesharedtask,andVolume2:ProceedingsoftheSixthInter- ofMathematics,pages823–843,1936. nationalWorkshoponSemanticEvaluation,pages75–79.ACL,2012. W.Blacoe,E.Kashefi,andM.Lapata. Aquantum-theoreticapproachto K.Lo¨wner. U¨bermonotonematrixfunktionen. MathematischeZeitschrift, distributionalsemantics.InHLT-NAACL,pages847–857,2013. 38(1):177–216,1934. F.Bloch.Nuclearinduction.Physicalreview,70(7-8):460,1946. K. Lund and C. Burgess. Producing high-dimensional semantic spaces fromlexicalco-occurrence. BehaviorResearchMethods,Instruments, J.A.BullinariaandJ.P.Levy. Extractingsemanticrepresentationsfrom &Computers,28(2):203–208,1996. wordco-occurrencestatistics:Acomputationalstudy.Behaviorresearch methods,39(3):510–526,2007. B.MacCartneyandC.D.Manning. Naturallogicfortextualinference. In ProceedingsoftheACL-PASCALWorkshoponTextualEntailmentand D.Clarke. Context-theoreticsemanticsfornaturallanguage:anoverview. Paraphrasing,pages193–200.ACL,2007. In Proceedings of the Workshop on Geometrical Models of Natural LanguageSemantics,pages112–119.ACL,2009. D.McCarthy,R.Koeling,J.Weeds,andJ.Carroll. Findingpredominant word senses in untagged text. In Proceedings of the 42nd Annual B.CoeckeandK.Martin. Apartialorderonclassicalandquantumstates. MeetingonACL,page279.ACL,2004. InNewStructuresforPhysics,pages593–683.Springer,2011. S.McDonaldandM.Ramscar. Testingthedistributionalhypothesis:The B.CoeckeandE´.O.Paquette. Categoriesforthepractisingphysicist. In influenceofcontextonjudgementsofsemanticsimilarity. InProceed- NewStructuresforPhysics,pages173–286.Springer,2011. ingsofthe23rdAnnualConferenceoftheCognitiveScienceSociety, B.Coecke,M.Sadrzadeh,andS.Clark. Mathematicalfoundationsfora pages611–616,2001. compositionaldistributionalmodelofmeaning.arXiv:1003.4394,2010. K. McRae, G. S. Cree, M. S. Seidenberg, and C. McNorgan. Semantic B.Coecke,E.Grefenstette,andM.Sadrzadeh.Lambekvs.Lambek:Func- featureproductionnormsforalargesetoflivingandnonlivingthings. torialvectorspacesemanticsandstringdiagramsforLambekcalculus. Behaviorresearchmethods,37(4):547–559,2005. AnnalsofPureandAppliedLogic,164(11):1079–1100,2013. J.MitchellandM.Lapata.Compositionindistributionalmodelsofseman- I.Dagan,O.Glickman,andB.Magnini. ThePASCALrecognisingtextual tics.Cognitivescience,34(8):1388–1429,2010. entailmentchallenge. InMachinelearningchallenges.evaluatingpre- M. A. Nielsen. Conditions for a class of entanglement transformations. dictiveuncertainty,visualobjectclassification,andrecognisingtectual PhysicalReviewLetters,83(2):436,1999. entailment,pages177–190.Springer,2006. M. A. Nielsen and I. L. Chuang. Quantum computation and quantum B. J. Devereux, L. K. Tyler, J. Geertzen, and B. Randall. The Centre information.CambridgeUniversityPress,2010. forSpeech,LanguageandtheBrain(CSLB)conceptpropertynorms. Behaviorresearchmethods,46(4):1119–1127,2014. R.Piedeleu.Ambiguityincategoricalmodelsofmeaning.Master’sthesis, UniversityofOxford,2014. E.D’HondtandP.Panangaden. Quantumweakestpreconditions. Mathe- maticalStructuresinComputerScience,16(03):429–451,2006. R. Piedeleu, D. Kartsaklis, B. Coecke, and M. Sadrzadeh. Open sys- tem categorical quantum semantics in natural language processing. P.W.Foltz,W.Kintsch,andT.K.Landauer. Themeasurementoftextual arXiv:1502.00831,2015. coherencewithlatentsemanticanalysis. Discourseprocesses,25(2-3): 285–307,1998. A. Preller and M. Sadrzadeh. Bell states and negative sentences in the distributedmodelofmeaning.ElectronicNotesinTheoreticalComputer M.GeffetandI.Dagan.Thedistributionalinclusionhypothesesandlexical Science,270(2):141–153,2011. entailment. InProceedingsofthe43rdAnnualMeetingonACL,pages 107–114.ACL,2005. L. Rimell. Distributional lexical entailment by topic coherence. EACL 2014,page511,2014. E.GrefenstetteandM.Sadrzadeh. Experimentalsupportforacategorical compositionaldistributionalmodelofmeaning. InProceedingsofthe M.Sadrzadeh,S.Clark,andB.Coecke. TheFrobeniusanatomyofword Conference on Empirical Methods in Natural Language Processing, meaningsI:subjectandobjectrelativepronouns. JournalofLogicand EMNLP’11,pages1394–1404,Stroudsburg,PA,USA,2011.ACL. Computation,pageext044,2013. J.A.Hampton. Inheritanceofattributesinnaturalconceptconjunctions. H.Schu¨tze. Automaticwordsensediscrimination. Computationallinguis- Memory&Cognition,15(1):55–71,1987. tics,24(1):97–123,1998. D. Kartsaklis. Compositional Distributional Semantics with Compact P. Selinger. Dagger compact closed categories and completely positive ClosedCategoriesandFrobeniusAlgebras. PhDthesis,Universityof maps.ElectronicNotesinTheoreticalComputerScience,170:139–163, Oxford,2015. 2007. D.Kartsaklis,M.Sadrzadeh,andS.Pulman. Aunifiedsentencespacefor C.J.VanRijsbergen. Thegeometryofinformationretrieval,volume157. categoricaldistributional-compositionalsemantics:Theoryandexperi- CambridgeUniversityPress,2004. ments.InInProceedingsofCOLING:Posters,pages549–558,2012. D.P.VinsonandG.Vigliocco. Semanticfeatureproductionnormsfora M. Kelly and M. Laplaza. Coherence for compact closed categories. largesetofobjectsandevents.BehaviorResearchMethods,40(1):183– JournalofPureandAppliedAlgebra,19:193–213,1980. 190,2008. D.Kiela,L.Rimell,I.Vulic,andS.Clark. Exploitingimagegeneralityfor J.Weeds,D.Weir,andD.McCarthy. Characterisingmeasuresoflexical lexicalentailmentdetection.InProceedingsofthe53rdAnnualMeeting distributionalsimilarity. InProceedingsofthe20thinternationalcon- oftheACL.ACL,2015. ferenceonComputationalLinguistics,page1015.ACL,2004. L.Kotlerman,I.Dagan,I.Szpektor,andM.Zhitomirsky-Geffet. Direc- H. Weyl. Das asymptotische verteilungsgesetz der eigenwerte linearer tionaldistributionalsimilarityforlexicalinference. NaturalLanguage partiellerdifferentialgleichungen(miteineranwendungaufdietheorie Engineering,16(04):359–389,2010. derhohlraumstrahlung).MathematischeAnnalen,71(4):441–479,1912. D.Kozen.AprobabilisticPDL.InProceedingsofthefifteenthannualACM D.WiddowsandS.Peters. Wordvectorsandquantumlogic:Experiments symposiumonTheoryofcomputing,pages291–297.ACM,1983. withnegationanddisjunction. Mathematicsoflanguage,8(141-154), 2003. J.Lambek. Typegrammarrevisited. InLogicalaspectsofcomputational linguistics,pages1–27.Springer,1999.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.