Proceedings 10th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation May 26, 2014 Reykjavik, Iceland HarryBunt,editor i isa -10: 10th Joint ACL – ISO Workshop on Interoperable Semantic Annotation Workshop Programme 08.30 – 08:50 Registration 08:50 -- 09:00 Opening by Workshop Chair 09:00 -- 10:30 Session A 09:00 -- 09:30 Hans-Ulrich Krieger, A Detailed Comparison of Seven Approaches for the Annotation of Time-Dependent Factual Knowledge in RDF and OWL 09:30 -- 10:00 Antske Fokkens, Aitor Soroa, Zuhaitz Beloki, Niels Ockeloen, Piek Vossen, German Rigau and Willem-Robert van Hage, NAF and GAF: Linking Linguistic Annotations 10:00 -- 10:15 Johan Bos, Semantic Annotation Issues in Parallel Meaning Banking 10:15 --10:30 Assaf Toledo, Stavroula Alexandropoulou, Sophie Chesney, Robert Grimm, Pepijn Kokke, Benno Kruit, Kyriaki Neophytou, Antony Nguyen and Yoad Winter, A Proof-Based Annotation Platform of Textual Entailment 10:30 – 11:00 Coffee break 11:00 -- 13:00 Session B 11:00 -- 11:15 Bolette Pedersen, Sanni Nimb, Sussi Olsen, Anders Soegaard and Nnicola Soerensen, Semantic Annotation of the Danish CLARIN Reference Corpus 11:15 -- 11:45 Kiyong Lee, Semantic Annotation of Anaphoric Links in Language 11:45 -- 12:00 Laurette Pretorius and Sonja Bosch, Towards extending the ISOcat Data Category Registry with Zulu Morphosyntax 12:00 -- 13:00 Harry Bunt, Kiyong Lee, Martha Palmer, Rashmi Prasad, James Pustejovsky and Annie Zaenen, ISO Projects on the development of international standards for the annotation of various types of semantic information 13:00 – 14:00 Lunch break 14:00 -- 16:00 Session C 14:00 -- 14:30 Volha Petukhova, Understanding Questions and Finding Answers: Semantic Relation Annotation to Compute the Expected Answer Type 14:30 -- 14:45 Susan Windisch Brown, From Visual Prototypes of Action to Metaphors: Extending the IMAGACT Ontology of Action to Secondary Meanings 14:45 -- 15:15 Ekaterina Lapshinova-Koltunski and Kerstin Anna Kunz, Annotating Cohesion for Multillingual Analysis 15:15 -- 16:00 Poster session: elevator pitches followed by poster visits Leon Derczynski and Kalina Bontcheva: Spatio-Temporal Grounding of Claims i ii Made on the Web in Pheme Mathieu Roche: How to Exploit Paralinguistic Features to Identify Acronyms in Texts Sungho Shin, Hanmin Jung, Inga Hannemann and Mun Yong Yi: Lessons Learned from Manual Evaluation of NER Results by Domain Experts Milan Tofiloski, Fred Popowich and Evan Zhang: Annotating Discourse Zones in Medical Encounters Yu Jie Seah and Francis Bond: Annotation of Pronouns in a Multilingual Corpus of Mandarin Chinese, English and Japanese 16:00 --16:30 Coffee break 16:30 -- 18:00 Session D 16:30 -- 17:00 Elisabetta Jezek, Laure Vieu, Fabio Massimo Zanzotto, Guido Vetere, Alessandro Oltramari, Aldo Gangemi and Rossella Vanvara, Extending `Senso Comune' with Semantic Role Sets 17:00 -- 17:30 Paulo Quaresma, Amália Mendes, Iris Hendrickx and Teresa Gonçalves Automatic Tagging of Modality: Identifying Triggers and Modal Values 17:30 -- 18:00 Rui Correia, Nuno Mamede, Jorge Baptista and Maxina Eskenazi, Using the Crowd to Annotate Metadiscursive Acts 18:00 Workshop Closing ii iii Editor Harry Bunt Tilburg University Workshop Organizers/Organizing Committee Harry Bunt Tilburg University Nancy Ide Vassar College, Poughkeepsie, NY Kiyong Lee Korea University, Seoul James Pustejovsky Brandeis University, Waltham, MA Laurent Romary INRIA/Humboldt Universität Berlin Workshop Programme Committee Jan Alexandersson DFKI, Saarbrücken Paul Buitelaar National University of Ireland, Galway Harry Bunt Tilburg University Thierry Declerck DFKI, Saarbrücken Liesbeth Degand Université Catholique de Louvain Alex Chengyu Fang City University Hong Kong Anette Frank Universität Heidelberg Robert Gaizauskas University of Sheffield Koiti Hasida Tokyo University Nancy Ide Vassar College Elisabetta Jezek Università degli Studi di Pavia Michael Kipp University of Applied Sciences, Augsburg Inderjeet Mani Yahoo, Sunnyvale Martha Palmer University of Colorado, Boulder Volha Petukhova Universität des Saarlandes, Saarbrücken Andrei Popescu-Belis Idiap, Martigny, Switzerland Rarhmi Prasad University of Wisconsin, Milwaukee James Pustejovsky Brandeis University Laurent Romary INRIA/Humboldt Universität Berlin Ted Sanders Universiteit Utrecht Thorsten Trippel University of Bielefeld Zdenka Uresova Charles University, Prague Piek Vossen Vrije Universiteit Amsterdam Annie Zaenen Stanford University iii iv Table of contents Hans-Ulrich Krieger A Detailed Comparison of Seven Approaches for the Annotation of Time-Dependent Factual Knowledge in RDF and OWL 1 Antske Fokkens, Aitor Soroa, Zuhaitz Beloki, Niels Ockeloen, Piek Vossen, German Rigau and Willem-Robert van Hage NAF and GAF: Linking Linguistic Annotations 9 Johan Bos Semantic Annotation Issues in Parallel Meaning Banking 17 Assaf Toledo, Stavroula Alexandropoulou, Sophie Chesney, Robert Grimm, Pepijn Kokke, Benno Kruit, Kyriaki Neophytou, Antony Nguyen and Yoad Winter A Proof-Based Annotation Platform of Textual Entailment 21 Bolette Pedersen, Sanni Nimb, Sussi Olsen, Anders Søgaard and Nicolai Sørensen Semantic Annotation of the Danish CLARIN Reference Corpus 25 Kiyong Lee Semantic Annotation of Anaphoric Links in Language 29 Laurette Pretorius and Sonja Bosch Towards extending the ISOcat Data Category Registry with Zulu Morphosyntax 39 Volha Petukhova Understanding Questions and Finding Answers: Semantic Relation Annotation to Compute the Expected Answer Type 44 Susan Windisch Brown From Visual Prototypes of Action to Metaphors: Extending the IMAGACT Ontology of Action to Secondary Meanings 53 Ekaterina Lapshinova-Koltunski and Kerstin Anna Kunz Annotating Cohesion for Multillingual Analysis 57 Leon Derczynski and Kalina Bontcheva Spatio-Temporal Grounding of Claims Made on the Web, in Pheme 65 Mathieu Roche How to Exploit Paralinguistic Features to Identify Acronyms in Texts 69 Sungho Shin, Hanmin Jung, Inga Hannemann and Mun Yong Yi Lessons Learned from Manual Evaluation of NER Results by Domain Experts 73 Milan Tofiloski, Fred Popowich and Evan Zhang Annotating Discourse Zones in Medical Encounters 79 iv v Yu Jie Seah and Francis Bond Annotation of Pronouns in a Multilingual Corpus of Mandarin Chinese, English and Japanese 82 Elisabetta Jezek, Laure Vieu, Fabio Massimo Zanzotto, Guido Vetere, Alessandro Oltramari, Aldo Gangemi and Rossella Vanvara Extending `Senso Comune' with Semantic Role Sets 88 Paulo Quaresma, Amália Mendes, Iris Hendrickx and Teresa Gonçalves Automatic Tagging of Modality: Identifying Triggers and Modal Values 95 Rui Correia, Nuno Mamede, Jorge Baptista and Maxina Eskenazi Using the Crowd to Annotate Metadiscursive Acts 102 v vi Author Index Alexandropoulou, Stavroula 21 Baptista, Jorge 102 Beloki, Zuhaitz 9 Bond, Francis 88 Bontcheva, Kalina 65 Bos, Johan 17 Bosch, Sonja 39 Brown, Susan Windisch 53 Chesney, Sophie 21 Correia, Rui 102 Derczynski, Leon 65 Eskenazi, Maxina 102 Fokkens, Antske 9 Gangemi, Aldo 88 Gonçalves, Teresa 95 Grimm, Robert 21 Hage, Willem-Robert van 9 Hannemann, Inga 73 Hendrickx, Iris 95 Jezek, Elisabetta 88 Jung, Hanmin 73 Kokke, Pepijn 21 Krieger, Hans-Ulrich 1 Kruit, Benno 21 Kunz, Kerstin Anna 57 Lapshinova-Koltunski, Ekaterina 57 Lee, Kiyong 29 Mamede, Nuno 102 Mendes, Amália 95 Neophytou, Kyriaki 21 Nguyen, Antony 21 Nimb, Sanni 25 Ockeloen, Niels 9 Olsen, Sussi 25 Oltramari, Alessandro 88 vi vii Pedersen, Bolette 25 Petukhova, Volha 44 Popowich, Fred 79 Pretorius, Laurette 39 Quaresma, Paolo 95 Rigau, German 9 Roche, Mathieu 69 Seah, Yu Jie 82 Shin, Sungho 73 Soroa, Aitor 9 Søgaard, Anders 25 Sørensen, Nnicolai 25 Tofiloski, Milan 79 Vanvara, Rossella 88 Vetere, Guido 88 Vieu, Laure 88 Vossen, Piek 9 Winter, Yoad 21 Yi, Mun Yong 73 Zanzotto, Fabio Massimo 88 Zhang, Evan 79 vii A Detailed Comparison of Seven Approaches for the Annotation of Time-Dependent Factual Knowledge in RDF and OWL Hans-UlrichKrieger GermanResearchCenterforAI(DFKIGmbH) Stuhlsatzenhausweg3,66123Saarbru¨cken,Germany [email protected] Abstract Representing time-dependent factual knowledge in RDF and OWL has become increasingly important in recent times. Extending OWLrelationinstancesorRDFtripleswithfurthertemporalargumentsisusuallyrealizedthroughnewindividualsthathidetherange argumentsoftheextendedrelation.Asaresult,reasoningandqueryingwithsuchrepresentationsisextremelycomplex,expensive,and error-prone.Inthispaper,wediscussseveralwell-knownapproachestothisproblemandpresenttheirprosandcons.Threeofthemare comparedinmoredetail,bothonatheoreticalandonapracticallevel. Wealsopresentschematafortranslatingtriple-basedencodings into general tuples, and vice versa. Concerning query time, our preliminary measurements have shown that a general tuple-based approachcaneasilyoutperformtriple-basedencodingsbyseveralordersofmagnitude. Keywords:temporal annotation; synchronic & diachronic relations; binary vs. N-ary representation schemata for factual state- ments. 1. Introduction statements whose truth value do (or do not) change over time. Synchronic relations, such as dateOfBirth, are rela- Representing temporally-changing information becomes tions whose instances do not change over time, thus there increasinglyimportantforreasoningandqueryservicesde- isnodirectneedtoattachatemporalextenttothem. Con- fined on top of RDF and OWL, for practical applications sider,e.g.,thenaturallanguagesentence such as business intelligence in particular, and for the Se- mantic Web/Web 2.0 in general. Extending binary OWL TonyBlairwasbornonMay6,1953. ABox relation instances or RDF triples with further tem- poral arguments translates into a massive proliferation of AssumingaRDF-basedN-triplerepresentation(Grantand useless “container” objects. Reasoning and querying with Beckett, 2004), an information extraction (IE) system suchrepresentationsisextremelycomplex,expensive,and mightyieldthefollowingsetoftriples: error-prone. tb rdf:type Person In this paper, we critically discuss several well-known ap- tb hasName "Tony Blair" proaches to the encoding of time-dependent information tb dateOfBirth "1953-05-06"ˆˆxsd:date in RDF and OWL. We present seven approaches and ex- plain their pros and cons. Three of them are then com- Sincethereisonlyoneuniquedateofbirth,thisworksper- paredinmoredetail,boththeoreticallyandpracticallyw.r.t. fectlywellandproperlycapturetheintendedmeaning. space consumption and answer time for simple queries. Diachronic relationships, however, vary with time, i.e., Two of the three approaches stay within the existing RDF their truth value do change over time. Representation paradigm, whereas the third proposal argues for replacing frameworks such as OWL that are geared towards unary theRDFtriplebyamoregeneraltupleinordertoeaserea- andbinaryrelationscannotdirectlybeextendedbyafur- soning and querying, but also to come up with ontologies ther(temporal)argument.Considerthefollowingsentence: that have a smaller memory footprint when compared to Christopher Gent was Vodafone’s chairman un- semanticallyequivalenttriple-basedencodings. tilJuly2003. Later, Chrisbecamethechairman Inordertomakethemeasurementsforthethreeapproaches ofGlaxoSmithKlinewitheffectfrom1stJanuary comparable, wehaveusedtherule-basedsemanticreposi- 2005. toryHFC(Krieger,2013)thatwehavedevelopedoverthe lastyearsandwhichiscomparabletopopularengines,such Giventhis,anIEsystemmightdiscoverthefollowingtime- as Jena, OWLIM, or Virtuoso. We also present schemata dependentfacts: fortranslatingtemporaltriple-basedencodingsintogeneral [????-??-??,2003-07-??]: cg isChairman vf tuples,andviceversa. Concerningquerytime,ourprelim- [2005-01-01,????-??-??]: cg isChairman gsk inarymeasurementshaveshownthatageneraltuple-based approachcaneasilyoutperformatriple-basedencodingby Applying the synchronic temporal representation schema 1to5ordersofmagnitude. fromabovegivesus cg isChairman vf 2. SynchronicandDiachronicRelations cg hasTime [????-??-??,2003-07-??] Linguisticsandphilosophymakeadistinctionbetweensyn- cg isChairman gsk chronic and diachronic relations in order to characterize cg hasTime [2005-01-01,????-??-??] 1 However,theresultingRDFgraphmixesuptheassociation holds relation instance can at least be encoded by intro- betweentheoriginalstatementsandtheirtemporalextent ducing a new individual o, represented as an RDF blank node. Wenotethatintheoriginalcalculus,situationswere [????-??-??,2003-07-??]: cg isChairman vf definedataninstantoftime,thusweuseonlyasingletem- *[2005-01-01,????-??-??]: cg isChairman vf poralargumentthere. *[????-??-??,2003-07-??]: cg isChairman gsk [2005-01-01,????-??-??]: cg isChairman gsk holds(worksFor(p,c),t) o.holds(o,t) 7−→ ∃ ∧ type(o,AtemporalFact) subject(o,p) asthesecondandthirdassociationisnotsupportedbythe ∧ ∧ predicate(o,worksFor) object(o,c) abovenaturallanguagequotation. ∧ Asanalternative,wemightturntheworksForrelationinto 3. ApproachestoDiachronicRepresentation aclass: Several well-known techniques of extending binary rela- holds(worksFor(p,c),t) o.holds(o,t) tionswithadditionalargumentshavebeenproposedinthe 7−→ ∃ ∧ type(o,WorksFor) subject(o,p) object(o,c) literature. ∧ ∧ However, this would require to always introduce a new 3.1. EquipRelationWithTemporalArguments classfortherepresentationofeachdiachronicrelation. This approach has been pursued in temporal databases (calledvalidtime)andthelogicprogrammingcommunity. 3.3. ReifytheOriginalRelation Forinstance,abinaryrelation,suchasworksForbetween Reifyingarelationinstanceagainleadstotheintroduction apersonpoftypePersonandacompanycoftypeCom- of a new object and five additional new relationships. In panybecomesaquaternaryrelationwithtwofurthertem- addition, a new class needs to be introduced for each rei- poral arguments s and e, expressing the temporal interval fiedrelation,plusaccessorstotheoriginalarguments,very [s,e] in which the atemporal statement worksFor(p,c) is similar to the approach directly above. Furthermore, and true(instantsarerepresentedbystatingthats=e): very important, relation reification loses the original re- lation name, thus requiring a massive modification of the worksFor(p,c) worksFor(p,c,s,e) 7−→ originalontology. Unfortunately,OWLanddescriptionlogic(DL)ingeneral Coming back to our worksFor example, we obtain only support unary (classes) and binary (properties) rela- (WorksForisthenewlyintroducedclass) tions in order to guarantee decidability of the usual in- worksFor(p,c,s,e) o.type(o,WorksFor) ference problems. Thus forward chaining engines (such 7−→ ∃ ∧ person(o,p) company(o,c) as OWLIM and Jena) as well as tableaux-based reasoners ∧ ∧ starts(o,s) ends(o,e) (e.g., Racer or Pellet) are unable to handle such descrip- ∧ tions. It is worth noting that this encoding can be seen as a kind Wenoteherethatthisapproachisclearlythesilverbulletof of “owlfication” of Neo-Davidsonian semantics (Parsons, representing binary factual statements, since it is the easi- 1990),astheoriginalrelationisturnedintoanevent. est and most natural one, although a direct interpretation is incompatible with RDF and almost all currently avail- 3.4. YAGO’sFactIdentifier ablereasoners. Wewillfavorthiskindofrepresentationin TheapproachYAGO(Hoffartetal., 2011)takesisrelated thesecondpartofthepaperwhenpresentingthemeasure- to Approach 2 and 3 directly above, as it is a kind of ex- ments,usingHFC(Krieger,2013). ternalreification. YAGOusesitsownextensionoftheN3 plaintripleformat,calledN4,whichassociateuniqueiden- 3.2. ApplyaMeta-LogicalPredicate tifiersiwitheachtime-dependentfact. McCarthy & Hayes’ situation calculus, James Allen’s in- Theabovequaternaryrelationinstancethenisrepresented terval logic, and the knowledge representation formal- asfollows: ism KIF use variants of the meta-logical predicate holds. Hence, our worksFor(p,c) relation instance becomes worksFor(p,c,s,e) i.i:worksFor(p,c) 7−→ ∃ ∧ holds(worksFor(p,c),t). McCarthy&Hayescallastate- occursSince(i,s) occursUntil(i,e) ∧ ment whose truth value changes over time a fluent (Mc- Carthy and Hayes, 1969). The extended quaternary rela- Notethattheassociationi : worksFor(p,c)hasthedisad- tion from the previous subsection can be seen as a rela- vantage of not being part of the triple repository (as it is tionalfluent,whereastheholdsexpressionhere,however, a quadruple technically; we guess that there exists a sepa- embodiesafunctionalfluent,meaningthatworksFor(p,c) rateextendablemappingtable). Thus,entailmentrulesand isassumedtoyieldasituation-dependentvalue. queries will never have access to these quadruples, unless somecustomfunctionalityhasbeenimplementedinthese- SuchkindsofrelationsarenotpossibleinOWL,sincede- mantic repository. Nevertheless, this is a valid and proper scriptionlogicslimitthemselvestosubsetsoffunction-free annotationschema,howevernotexpressibleinOWL. first-order logic and because only a weak form of relation compositionispossibleinOWL.However,wecanreifythe Rather,suchakindofassociationcanbeseenasanexten- atemporal fact worksFor(p,c) in RDF, so that the above sion of the idea behind annotation properties in OWL in 2