Table Of ContentProceedings 10th Joint ISO - ACL SIGSEM Workshop
on Interoperable Semantic Annotation
May 26, 2014
Reykjavik, Iceland
HarryBunt,editor
i
isa
-10:
10th Joint ACL – ISO Workshop on Interoperable Semantic
Annotation
Workshop Programme
08.30 – 08:50 Registration
08:50 -- 09:00 Opening by Workshop Chair
09:00 -- 10:30 Session A
09:00 -- 09:30 Hans-Ulrich Krieger, A Detailed Comparison of Seven Approaches for the
Annotation of Time-Dependent Factual Knowledge in RDF and OWL
09:30 -- 10:00 Antske Fokkens, Aitor Soroa, Zuhaitz Beloki, Niels Ockeloen, Piek Vossen, German
Rigau and Willem-Robert van Hage, NAF and GAF: Linking Linguistic Annotations
10:00 -- 10:15 Johan Bos, Semantic Annotation Issues in Parallel Meaning Banking
10:15 --10:30 Assaf Toledo, Stavroula Alexandropoulou, Sophie Chesney, Robert Grimm, Pepijn
Kokke, Benno Kruit, Kyriaki Neophytou, Antony Nguyen and Yoad Winter,
A Proof-Based Annotation Platform of Textual Entailment
10:30 – 11:00 Coffee break
11:00 -- 13:00 Session B
11:00 -- 11:15 Bolette Pedersen, Sanni Nimb, Sussi Olsen, Anders Soegaard and Nnicola
Soerensen, Semantic Annotation of the Danish CLARIN Reference Corpus
11:15 -- 11:45 Kiyong Lee, Semantic Annotation of Anaphoric Links in Language
11:45 -- 12:00 Laurette Pretorius and Sonja Bosch, Towards extending the ISOcat Data Category
Registry with Zulu Morphosyntax
12:00 -- 13:00 Harry Bunt, Kiyong Lee, Martha Palmer, Rashmi Prasad, James Pustejovsky and
Annie Zaenen, ISO Projects on the development of international standards for the
annotation of various types of semantic information
13:00 – 14:00 Lunch break
14:00 -- 16:00 Session C
14:00 -- 14:30 Volha Petukhova, Understanding Questions and Finding Answers: Semantic
Relation Annotation to Compute the Expected Answer Type
14:30 -- 14:45 Susan Windisch Brown, From Visual Prototypes of Action to Metaphors: Extending
the IMAGACT Ontology of Action to Secondary Meanings
14:45 -- 15:15 Ekaterina Lapshinova-Koltunski and Kerstin Anna Kunz, Annotating Cohesion for
Multillingual Analysis
15:15 -- 16:00 Poster session: elevator pitches followed by poster visits
Leon Derczynski and Kalina Bontcheva: Spatio-Temporal Grounding of Claims
i
ii
Made on the Web in Pheme
Mathieu Roche: How to Exploit Paralinguistic Features to Identify Acronyms
in Texts
Sungho Shin, Hanmin Jung, Inga Hannemann and Mun Yong Yi: Lessons
Learned from Manual Evaluation of NER Results by Domain Experts
Milan Tofiloski, Fred Popowich and Evan Zhang: Annotating Discourse Zones in
Medical Encounters
Yu Jie Seah and Francis Bond: Annotation of Pronouns in a Multilingual Corpus of
Mandarin Chinese, English and Japanese
16:00 --16:30 Coffee break
16:30 -- 18:00 Session D
16:30 -- 17:00 Elisabetta Jezek, Laure Vieu, Fabio Massimo Zanzotto, Guido Vetere, Alessandro
Oltramari, Aldo Gangemi and Rossella Vanvara, Extending `Senso Comune' with
Semantic Role Sets
17:00 -- 17:30 Paulo Quaresma, Amália Mendes, Iris Hendrickx and Teresa Gonçalves
Automatic Tagging of Modality: Identifying Triggers and Modal Values
17:30 -- 18:00 Rui Correia, Nuno Mamede, Jorge Baptista and Maxina Eskenazi, Using the Crowd
to Annotate Metadiscursive Acts
18:00 Workshop Closing
ii
iii
Editor
Harry Bunt Tilburg University
Workshop Organizers/Organizing Committee
Harry Bunt Tilburg University
Nancy Ide Vassar College, Poughkeepsie, NY
Kiyong Lee Korea University, Seoul
James Pustejovsky Brandeis University, Waltham, MA
Laurent Romary INRIA/Humboldt Universität Berlin
Workshop Programme Committee
Jan Alexandersson DFKI, Saarbrücken
Paul Buitelaar National University of Ireland, Galway
Harry Bunt Tilburg University
Thierry Declerck DFKI, Saarbrücken
Liesbeth Degand Université Catholique de Louvain
Alex Chengyu Fang City University Hong Kong
Anette Frank Universität Heidelberg
Robert Gaizauskas University of Sheffield
Koiti Hasida Tokyo University
Nancy Ide Vassar College
Elisabetta Jezek Università degli Studi di Pavia
Michael Kipp University of Applied Sciences, Augsburg
Inderjeet Mani Yahoo, Sunnyvale
Martha Palmer University of Colorado, Boulder
Volha Petukhova Universität des Saarlandes, Saarbrücken
Andrei Popescu-Belis Idiap, Martigny, Switzerland
Rarhmi Prasad University of Wisconsin, Milwaukee
James Pustejovsky Brandeis University
Laurent Romary INRIA/Humboldt Universität Berlin
Ted Sanders Universiteit Utrecht
Thorsten Trippel University of Bielefeld
Zdenka Uresova Charles University, Prague
Piek Vossen Vrije Universiteit Amsterdam
Annie Zaenen Stanford University
iii
iv
Table of contents
Hans-Ulrich Krieger
A Detailed Comparison of Seven Approaches for the Annotation of Time-Dependent Factual
Knowledge in RDF and OWL 1
Antske Fokkens, Aitor Soroa, Zuhaitz Beloki, Niels Ockeloen, Piek Vossen,
German Rigau and Willem-Robert van Hage
NAF and GAF: Linking Linguistic Annotations 9
Johan Bos
Semantic Annotation Issues in Parallel Meaning Banking 17
Assaf Toledo, Stavroula Alexandropoulou, Sophie Chesney, Robert Grimm,
Pepijn Kokke, Benno Kruit, Kyriaki Neophytou, Antony Nguyen and Yoad Winter
A Proof-Based Annotation Platform of Textual Entailment 21
Bolette Pedersen, Sanni Nimb, Sussi Olsen, Anders Søgaard and Nicolai Sørensen
Semantic Annotation of the Danish CLARIN Reference Corpus 25
Kiyong Lee
Semantic Annotation of Anaphoric Links in Language 29
Laurette Pretorius and Sonja Bosch
Towards extending the ISOcat Data Category Registry with Zulu Morphosyntax 39
Volha Petukhova
Understanding Questions and Finding Answers: Semantic Relation Annotation
to Compute the Expected Answer Type 44
Susan Windisch Brown
From Visual Prototypes of Action to Metaphors: Extending the IMAGACT Ontology
of Action to Secondary Meanings 53
Ekaterina Lapshinova-Koltunski and Kerstin Anna Kunz
Annotating Cohesion for Multillingual Analysis 57
Leon Derczynski and Kalina Bontcheva
Spatio-Temporal Grounding of Claims Made on the Web, in Pheme 65
Mathieu Roche
How to Exploit Paralinguistic Features to Identify Acronyms in Texts 69
Sungho Shin, Hanmin Jung, Inga Hannemann and Mun Yong Yi
Lessons Learned from Manual Evaluation of NER Results by Domain Experts 73
Milan Tofiloski, Fred Popowich and Evan Zhang
Annotating Discourse Zones in Medical Encounters 79
iv
v
Yu Jie Seah and Francis Bond
Annotation of Pronouns in a Multilingual Corpus of Mandarin Chinese, English
and Japanese 82
Elisabetta Jezek, Laure Vieu, Fabio Massimo Zanzotto, Guido Vetere, Alessandro
Oltramari, Aldo Gangemi and Rossella Vanvara
Extending `Senso Comune' with Semantic Role Sets 88
Paulo Quaresma, Amália Mendes, Iris Hendrickx and Teresa Gonçalves
Automatic Tagging of Modality: Identifying Triggers and Modal Values 95
Rui Correia, Nuno Mamede, Jorge Baptista and Maxina Eskenazi
Using the Crowd to Annotate Metadiscursive Acts 102
v
vi
Author Index
Alexandropoulou, Stavroula 21
Baptista, Jorge 102
Beloki, Zuhaitz 9
Bond, Francis 88
Bontcheva, Kalina 65
Bos, Johan 17
Bosch, Sonja 39
Brown, Susan Windisch 53
Chesney, Sophie 21
Correia, Rui 102
Derczynski, Leon 65
Eskenazi, Maxina 102
Fokkens, Antske 9
Gangemi, Aldo 88
Gonçalves, Teresa 95
Grimm, Robert 21
Hage, Willem-Robert van 9
Hannemann, Inga 73
Hendrickx, Iris 95
Jezek, Elisabetta 88
Jung, Hanmin 73
Kokke, Pepijn 21
Krieger, Hans-Ulrich 1
Kruit, Benno 21
Kunz, Kerstin Anna 57
Lapshinova-Koltunski, Ekaterina 57
Lee, Kiyong 29
Mamede, Nuno 102
Mendes, Amália 95
Neophytou, Kyriaki 21
Nguyen, Antony 21
Nimb, Sanni 25
Ockeloen, Niels 9
Olsen, Sussi 25
Oltramari, Alessandro 88
vi
vii
Pedersen, Bolette 25
Petukhova, Volha 44
Popowich, Fred 79
Pretorius, Laurette 39
Quaresma, Paolo 95
Rigau, German 9
Roche, Mathieu 69
Seah, Yu Jie 82
Shin, Sungho 73
Soroa, Aitor 9
Søgaard, Anders 25
Sørensen, Nnicolai 25
Tofiloski, Milan 79
Vanvara, Rossella 88
Vetere, Guido 88
Vieu, Laure 88
Vossen, Piek 9
Winter, Yoad 21
Yi, Mun Yong 73
Zanzotto, Fabio Massimo 88
Zhang, Evan 79
vii
A Detailed Comparison of Seven Approaches for the Annotation of
Time-Dependent Factual Knowledge in RDF and OWL
Hans-UlrichKrieger
GermanResearchCenterforAI(DFKIGmbH)
Stuhlsatzenhausweg3,66123Saarbru¨cken,Germany
krieger@dfki.de
Abstract
Representing time-dependent factual knowledge in RDF and OWL has become increasingly important in recent times. Extending
OWLrelationinstancesorRDFtripleswithfurthertemporalargumentsisusuallyrealizedthroughnewindividualsthathidetherange
argumentsoftheextendedrelation.Asaresult,reasoningandqueryingwithsuchrepresentationsisextremelycomplex,expensive,and
error-prone.Inthispaper,wediscussseveralwell-knownapproachestothisproblemandpresenttheirprosandcons.Threeofthemare
comparedinmoredetail,bothonatheoreticalandonapracticallevel. Wealsopresentschematafortranslatingtriple-basedencodings
into general tuples, and vice versa. Concerning query time, our preliminary measurements have shown that a general tuple-based
approachcaneasilyoutperformtriple-basedencodingsbyseveralordersofmagnitude.
Keywords:temporal annotation; synchronic & diachronic relations; binary vs. N-ary representation schemata for factual state-
ments.
1. Introduction statements whose truth value do (or do not) change over
time. Synchronic relations, such as dateOfBirth, are rela-
Representing temporally-changing information becomes
tions whose instances do not change over time, thus there
increasinglyimportantforreasoningandqueryservicesde-
isnodirectneedtoattachatemporalextenttothem. Con-
fined on top of RDF and OWL, for practical applications
sider,e.g.,thenaturallanguagesentence
such as business intelligence in particular, and for the Se-
mantic Web/Web 2.0 in general. Extending binary OWL TonyBlairwasbornonMay6,1953.
ABox relation instances or RDF triples with further tem-
poral arguments translates into a massive proliferation of AssumingaRDF-basedN-triplerepresentation(Grantand
useless “container” objects. Reasoning and querying with Beckett, 2004), an information extraction (IE) system
suchrepresentationsisextremelycomplex,expensive,and mightyieldthefollowingsetoftriples:
error-prone.
tb rdf:type Person
In this paper, we critically discuss several well-known ap- tb hasName "Tony Blair"
proaches to the encoding of time-dependent information tb dateOfBirth "1953-05-06"ˆˆxsd:date
in RDF and OWL. We present seven approaches and ex-
plain their pros and cons. Three of them are then com- Sincethereisonlyoneuniquedateofbirth,thisworksper-
paredinmoredetail,boththeoreticallyandpracticallyw.r.t. fectlywellandproperlycapturetheintendedmeaning.
space consumption and answer time for simple queries. Diachronic relationships, however, vary with time, i.e.,
Two of the three approaches stay within the existing RDF their truth value do change over time. Representation
paradigm, whereas the third proposal argues for replacing frameworks such as OWL that are geared towards unary
theRDFtriplebyamoregeneraltupleinordertoeaserea- andbinaryrelationscannotdirectlybeextendedbyafur-
soning and querying, but also to come up with ontologies ther(temporal)argument.Considerthefollowingsentence:
that have a smaller memory footprint when compared to
Christopher Gent was Vodafone’s chairman un-
semanticallyequivalenttriple-basedencodings.
tilJuly2003. Later, Chrisbecamethechairman
Inordertomakethemeasurementsforthethreeapproaches
ofGlaxoSmithKlinewitheffectfrom1stJanuary
comparable, wehaveusedtherule-basedsemanticreposi-
2005.
toryHFC(Krieger,2013)thatwehavedevelopedoverthe
lastyearsandwhichiscomparabletopopularengines,such Giventhis,anIEsystemmightdiscoverthefollowingtime-
as Jena, OWLIM, or Virtuoso. We also present schemata dependentfacts:
fortranslatingtemporaltriple-basedencodingsintogeneral
[????-??-??,2003-07-??]: cg isChairman vf
tuples,andviceversa. Concerningquerytime,ourprelim-
[2005-01-01,????-??-??]: cg isChairman gsk
inarymeasurementshaveshownthatageneraltuple-based
approachcaneasilyoutperformatriple-basedencodingby
Applying the synchronic temporal representation schema
1to5ordersofmagnitude.
fromabovegivesus
cg isChairman vf
2. SynchronicandDiachronicRelations
cg hasTime [????-??-??,2003-07-??]
Linguisticsandphilosophymakeadistinctionbetweensyn- cg isChairman gsk
chronic and diachronic relations in order to characterize cg hasTime [2005-01-01,????-??-??]
1
However,theresultingRDFgraphmixesuptheassociation holds relation instance can at least be encoded by intro-
betweentheoriginalstatementsandtheirtemporalextent ducing a new individual o, represented as an RDF blank
node. Wenotethatintheoriginalcalculus,situationswere
[????-??-??,2003-07-??]: cg isChairman vf
definedataninstantoftime,thusweuseonlyasingletem-
*[2005-01-01,????-??-??]: cg isChairman vf
poralargumentthere.
*[????-??-??,2003-07-??]: cg isChairman gsk
[2005-01-01,????-??-??]: cg isChairman gsk
holds(worksFor(p,c),t) o.holds(o,t)
7−→ ∃ ∧
type(o,AtemporalFact) subject(o,p)
asthesecondandthirdassociationisnotsupportedbythe ∧ ∧
predicate(o,worksFor) object(o,c)
abovenaturallanguagequotation. ∧
Asanalternative,wemightturntheworksForrelationinto
3. ApproachestoDiachronicRepresentation
aclass:
Several well-known techniques of extending binary rela-
holds(worksFor(p,c),t) o.holds(o,t)
tionswithadditionalargumentshavebeenproposedinthe
7−→ ∃ ∧
type(o,WorksFor) subject(o,p) object(o,c)
literature.
∧ ∧
However, this would require to always introduce a new
3.1. EquipRelationWithTemporalArguments
classfortherepresentationofeachdiachronicrelation.
This approach has been pursued in temporal databases
(calledvalidtime)andthelogicprogrammingcommunity.
3.3. ReifytheOriginalRelation
Forinstance,abinaryrelation,suchasworksForbetween
Reifyingarelationinstanceagainleadstotheintroduction
apersonpoftypePersonandacompanycoftypeCom-
of a new object and five additional new relationships. In
panybecomesaquaternaryrelationwithtwofurthertem-
addition, a new class needs to be introduced for each rei-
poral arguments s and e, expressing the temporal interval
fiedrelation,plusaccessorstotheoriginalarguments,very
[s,e] in which the atemporal statement worksFor(p,c) is
similar to the approach directly above. Furthermore, and
true(instantsarerepresentedbystatingthats=e):
very important, relation reification loses the original re-
lation name, thus requiring a massive modification of the
worksFor(p,c) worksFor(p,c,s,e)
7−→ originalontology.
Unfortunately,OWLanddescriptionlogic(DL)ingeneral Coming back to our worksFor example, we obtain
only support unary (classes) and binary (properties) rela- (WorksForisthenewlyintroducedclass)
tions in order to guarantee decidability of the usual in-
worksFor(p,c,s,e) o.type(o,WorksFor)
ference problems. Thus forward chaining engines (such 7−→ ∃ ∧
person(o,p) company(o,c)
as OWLIM and Jena) as well as tableaux-based reasoners ∧ ∧
starts(o,s) ends(o,e)
(e.g., Racer or Pellet) are unable to handle such descrip- ∧
tions.
It is worth noting that this encoding can be seen as a kind
Wenoteherethatthisapproachisclearlythesilverbulletof of “owlfication” of Neo-Davidsonian semantics (Parsons,
representing binary factual statements, since it is the easi- 1990),astheoriginalrelationisturnedintoanevent.
est and most natural one, although a direct interpretation
is incompatible with RDF and almost all currently avail- 3.4. YAGO’sFactIdentifier
ablereasoners. Wewillfavorthiskindofrepresentationin TheapproachYAGO(Hoffartetal., 2011)takesisrelated
thesecondpartofthepaperwhenpresentingthemeasure- to Approach 2 and 3 directly above, as it is a kind of ex-
ments,usingHFC(Krieger,2013). ternalreification. YAGOusesitsownextensionoftheN3
plaintripleformat,calledN4,whichassociateuniqueiden-
3.2. ApplyaMeta-LogicalPredicate
tifiersiwitheachtime-dependentfact.
McCarthy & Hayes’ situation calculus, James Allen’s in-
Theabovequaternaryrelationinstancethenisrepresented
terval logic, and the knowledge representation formal-
asfollows:
ism KIF use variants of the meta-logical predicate holds.
Hence, our worksFor(p,c) relation instance becomes worksFor(p,c,s,e) i.i:worksFor(p,c)
7−→ ∃ ∧
holds(worksFor(p,c),t). McCarthy&Hayescallastate- occursSince(i,s) occursUntil(i,e)
∧
ment whose truth value changes over time a fluent (Mc-
Carthy and Hayes, 1969). The extended quaternary rela- Notethattheassociationi : worksFor(p,c)hasthedisad-
tion from the previous subsection can be seen as a rela- vantage of not being part of the triple repository (as it is
tionalfluent,whereastheholdsexpressionhere,however, a quadruple technically; we guess that there exists a sepa-
embodiesafunctionalfluent,meaningthatworksFor(p,c) rateextendablemappingtable). Thus,entailmentrulesand
isassumedtoyieldasituation-dependentvalue. queries will never have access to these quadruples, unless
somecustomfunctionalityhasbeenimplementedinthese-
SuchkindsofrelationsarenotpossibleinOWL,sincede-
mantic repository. Nevertheless, this is a valid and proper
scriptionlogicslimitthemselvestosubsetsoffunction-free
annotationschema,howevernotexpressibleinOWL.
first-order logic and because only a weak form of relation
compositionispossibleinOWL.However,wecanreifythe Rather,suchakindofassociationcanbeseenasanexten-
atemporal fact worksFor(p,c) in RDF, so that the above sion of the idea behind annotation properties in OWL in
2
Description:(2004). RDF test cases. Technical report, W3C, 10 February. Hayes, Patrick and Welty, Chris. (2006). Defining N-ary relations on the semantic web. Technical report This paper attempts to integrate several existing coreference annotation schemes into an extended annotation scheme ASana. The.